Estated Blog

Browse by topic:

More from Estated

Estated’s Data Pipeline and Data Accuracy Measures

For datasets on the scale of Estated’s property database, several very difficult challenges must be addressed to ensure data is as accurate as possible. Paramount among the challenges to data accuracy are data freshness, normalization, and trustworthiness of data sources. If getting the data you need seems like far too of daunting of a task for your organization to be worrying about, you are right. Don’t worry, however: Estated has a team of skilled engineers, data wranglers, and domain experts working hard to produce cutting-edge solutions to the problems involved in producing accurate and comprehensive data on such a massive scale.

First and foremost, for data to be accurate, it must be as fresh as possible.


Property data is not static. The house you paid $200,000 for in the 90s is now likely worth far, far more. In addition to general inflation and changes in the market, it may have been renovated, or perhaps new rooms or other valuable additions were added to it. What freshness amounts to in this context is how recently the data were collected. Consequently, it is important that you are getting the most up-to-date data available to maximize the effectiveness of your business and personal decisions. Estated’s database is combines thousands of data sources ranging from County Assessor offices to internally produced and proprietary data, and everything in between. In addition to the issues latent in tracking down thousands of data sources only a single time to ensure comprehensiveness, it is also necessary to determine the schedules on which these many thousands of independent data sources release their data in order to ensure freshness. Of course, these independent data sources all have independent releasing schedules. As you may have guessed, this is far from a simple task. Don’t fret, however, our data collection system is automated at all possible levels to ensure that this is case. In addition to our proprietary automated data collection tools, our teams of clerical workers diligently collect data manually wherever necessary. Thus, when using Estated, you can be confident that you are getting the freshest and most comprehensive property data available.

Another key issue in data accuracy is normalization across independent datasets.

Maddeningly, there are a seemingly infinite of sea potential formats data may come in for any particular domain. Chief among these in property data are the formats of addresses and names of people and organizations. For instance, the format of the same address may look very different whether you are talking to UPS or to your County Assessor. So, how then do we use different data sources at all? Estated’s engineering team has spent years of engineering time developing proprietary tools that address these problems for you. These tools perform normalization of things such as casing, component ordering, and canonicalization, and a wide array of other data normalization concerns. The result is a comprehensive set of nicely formatted data points for a particular property resolved from a huge number of sources now ready for your consumption.

How would you feel if you wanted to buy a house and your County Assessor told you that the property you just invested had less bedrooms than the realtor had originally informed you? Trust is never easy when it comes to business and personal decisions, and we at Estated get that. This may come as a big surprise to you, but often times there are failures at clerical and bureaucracy levels in governmental organizations. These failures produce errors both within and across data sources and can result in deeply confusing conflicts. Consequently, a complex technical system must be in place to ensure our data remains the highest possible quality. That is why we personally vet all our data sources and apply specialized rules to each in order to merge the data from all possible sources before serving them to you. The result is an authoritative record for a property produced only after applying several complex rules coded into Estated’s propriety software. Our team of domain experts and engineers spend their days ensuring you can use your energy on the most important areas of your data-driven decisions instead of worrying about potential clerical errors and mind-numbing bureaucracy.

Whether you are making personal or business decisions, there is no question that when properties are involved the outcomes are very high-stakes. Consequently, property data accuracy is of the upmost importance. When leveraging Estated’s technologies and domain experts, you can be certain that you have the upper-hand without having to spend a lifetime navigating the tremendously complicated landscape of property data accuracy.