Posted on August 27th, 2019
Unfortunately, many organizations wait until they recognize they have data quality issues in their existing customer data assets before they begin thinking about how to address the challenge. This realization is usually triggered because a significant problem has been experienced or uncovered as part of a business process, when viewing internal reports, or worst case, upon hearing from an unhappy customer. Further analysis typically demonstrates the discovered issue to be the tip of the iceberg of a long list of customer data problems. Could scenarios like these be prevented? If so, could APIs be the foundation of a solution?
By the time scenarios like these are discovered, an organization's poor data quality issues can be deep, far-reaching, and expensive to remedy. They usually require extensive and disruptive data re-engineering, database processing, and customized data scrubbing processes to resolve. However, a more proactive approach, using APIs and their real-time nature, can address these potential issues up front and at significantly less cost, and eliminating large, costly, back-end data cleanup projects. Improving the value of customer data assets is an example use case that really showcases the value of API-based solutions.
Before we discuss solution approaches, let's discuss the various data accuracy challenges most organizations likely face at least to some degree.
First, there is the data inconsistency problem. There are many ways to represent a piece of information in a database, such as the name of a company. "Wal-mart", "Walmart", "WALMART CORP.", "Walmart Corporation", and even misspellings such as "Wallmart" might appear in a database, all permutations of data that represent the same business entity. This of course can cause significant problems in generating business intelligence reports driven by company name, as one example. Per-customer analysis, searching for organization matches in a database, and matching across internal data sources are other situations where data inconsistency can be problematic and costly to resolve.
Matching existing data of several different content types can also be a challenge. For example, should a record of "John Snow" from "Northern Wall Industries" be inserted as a new customer into a customer data store when a "Jon Snow" from "North Wall Inc." is already a customer? Having the same customer in a database multiple times can cause another long list of problems such as incomplete customer views, a lack of understanding of a true customer base, and the potential embarrassment of treating someone like a prospect when they are already a customer.
Another data quality scenario is data verification. For example, verifying whether an address physically exists according to USPS or other national mailing authority sources can be quite useful. If one discovers an instance of invalid data in a downstream business process, such as a returned mailer indicating that a given customer address is incorrect or non-existent, it might prove difficult and quite problematic to contact the customer to resolve.
These are just a few examples of data quality issues that might occur within customer data. Resolving them traditionally has required costly, project-oriented, back-office data quality resolution projects. However, an upfront, API-driven approach can showcase the power of APIs and their flexibility when incorporated as part of a solution.
So how do APIs help? Many of these data quality issues can be addressed at the point of data collection, including when a customer or prospect is filling out an online form, when data is being obtained over the phone, or in many other ways that customer data is collected, simply by making use of an integrated API. Introducing an API data quality check *before* data is ever entered into a customer database prevents potential problems at the source, so these data quality issues don't expensively propagate themselves throughout the customer data lifecycle.
For example, a data standardization API can compare a company name field's data to a master standards company name data source that first algorithmically removes generic corporate names such as "inc", "corp", "the", and other noise words. It could then converge all permutations of company names, usually by incorporating an extensive external database of company name standards, into a standardized entry. A new instance of collected company name data can go through this process and be compared against the company name standards dictionary via the API and adjusted accordingly before it ever enters the customer database, reducing the incidence of inconsistent company data substantially. This solves a lot of the problems associated with organization name inconsistency early on in the customer data lifecycle.
Another use case is to leverage a matching API to identify pairs or sets of data that exceed similarity thresholds and therefore are likely the same entity. An individual's name for example can call an API that heuristically generates a similarity key. A similarity key is created using known name variations (Bob, Robert, Rob, etc.) as well as algorithmically eliminating noise. If all of the current customer records also have a similarity key generated, these keys can be used to search for possible matches rather than comparing the actual data, casting a much wider net when searching for matching name records.
Other examples include a data verification API that compares mailing addresses to a USPS source to ensure it physically exists. Email addresses can be checked utilizing other purpose-built APIs that perform actions such as email syntax checking and active mail server validation to ensure email addresses are valid to prevent bounces. Again, the simple integration enabling the call out of a physical address or email address to a validation API can resolve problem data before it becomes part of customer data assets that must be dealt with later. In addition, reference data sources that serve as the basis of the validation can be swapped out with newer, more accurate versions without a single change to the API or the integration of such, another significant benefit.
The cost savings of standardizing, matching, and validating data in real-time with an integrated API can be very substantial as opposed to waiting until downstream to solve these problems with exhaustive batch data processing exercises, especially if the data is being used as a part of a data lake, data warehouse, or other business intelligence activity as well, where the same bad data can reappear after every data load.
Clearly, in the case of keeping customer data assets accurate, complete, and valuable, an API approach and all the associated benefits of API-use proves they are an ideal weapon in the solution.