data engineering apis

More Accurate Queries and Analytics: Easily Standardize Inconsistent Data via API

by Interzoid Team

Posted on October 23rd, 2019

Standardization of Inconsistent Data

Data is inherently inconsistent, at least with respect to how it is stored electronically. For example:

- "GE", "G.E.", "Gen Electric", and "General Electric Corp" are all permutations of the same company name.

- "SF", "S.F.", "San Fran", and "San Francisco" are all variations of the same city name.

- "NY", "n.y.", "New York", and "Nueva York" are variations as well, including non-US names.

- "California", "Cal", "CA", "Calif", and "Calif." are all versions of the western state.

- "Great Britain", "Britain", "United Kingdom", and "UK" all represent the same nation.

These inconsistencies can make data very difficult to analyze. To illustrate using the examples above, the query "SELECT customers WHERE country = 'Britain'" could leave out a great number of customers who actually are in Britain, but whose data records might have different spellings or representations of the country name in the country field. Worse yet, the inconsistency can be the culprit of significantly incomplete or inaccurate report data that could be used as the basis for important strategic decision making.

Not only is this a challenge when querying or reporting within a single database, but inconsistent data can be difficult to overcome when merging data that shares columns and used as the basis of the merge.

The problem here of course is obvious, but the question becomes how to remedy the scenario without manually eyeballing and editing troves of data, which would require a great deal of time and be quite costly.

So what can be done to improve the situation?

Fortunately, common standardization libraries have been built that address the issue for specific content, such as geographically-based data. These are available as the basis of callable standardization APIs built to respond with a standardized version of input data. Existing databases can be standardized one record at a time in a batch-like process. This can be achieved by calling one of these APIs for each column in a record to be standardized sequentially for every record.

However, standardizing data at the point of data collection, such as when customer information is being entered into a Web form, can quickly and inexpensively resolve the challenges associated with unstandardized data up front, so no back-end data processing need occur, at least for the requirement of standardizing inconsistent data.

Of course, you can use drop boxes (in the case of states or countries) and other methods to control some data inconsistency, however this approach, while useful, can only be used when data is collected through an interactive Web form, and is not typically available when data is being read through an API, or when multiple datasets are being combined.

Here are some standardization APIs to try to standardize certain types of data content:

City Name Standardization

Country Name Standardization

State Name Standardization

If you have requirements for standardizing data content types other than these above, such as company names, product information, telephone number, or others, reach out to and we can analyze a sample of the data to see how we can help.

Getting Started with Interzoid
Three ways to achieve better, more usable, and higher value data with Interzoid
Connect Directly to Cloud SQL Databases and Perform Data Quality Analysis
Achieve better, more consistent, more usable data
Free Trial Credits
Register for an Interzoid API account and receive free trial credits. See how your strategic data assets can be improved.
Automate API Integration into Cloud Databases
Run live data quality exception and enhancement reports on major Cloud Data Platforms direct from your browser.
Step-by-Step Tutorial for Data Matching
See quickly one example of how inconsistent data can be identified within databases and datasets with ease.
Example API Usage Code on Github
Sample Code for invoking APIs on Interzoid in multiple programming languages
Business Case: Cloud APIs and Cloud Databases
See the business case for API-driven data enhancement - directly within your important datasets
Documentation and Overview
See our documentation site.
Product Newsletter
Receive Interzoid product and technology updates.

All content (c) 2018-2023 Interzoid Incorporated. Questions? Contact

201 Spear Street, Suite 1100, San Francisco, CA 94105-6164

Interested in Data Cleansing Services?
Let us put our Machine Learning-based processes and data tools to work for you.

Start Here
Terms of Service
Privacy Policy

Use the Interzoid Cloud Connect Data Platform and Start to Supercharge your Cloud Data now.
Connect to your data and start running data analysis reports in minutes:
API Integration Examples and SDKs:
Documentation and Overview: Docs site
Interzoid Product and Technology Newsletter: Subscribe