Check out our High Performance Batch Processing API: Match and Enrich Data Using CSV/TSV Files as Input Data to our APIs Learn More

"Elementizing" a Data Column for Better Data Consistency

by Interzoid Team


Posted on March 7th, 2023


Element Analysis of a Data Column

"Elementizing" a data column, also known sometimes as "tokenizing", is an important component of data quality and improving the value of important data assets. It identifies at the element level, rather than the record level, various permutations of elements that exist in a data column.

Identifying these elements with particular datasets can go a long way in helping to standardize a given type of data content, which in turn provides better data analysis results and better outcomes for other data-driven processes, including Data Science, Artificial Intelligence, Machine Learning, Analytics, CRM, Marketing, and more.

For example, here is a list of elements with their corresponding counts for a full name field after performing an element analysis:

Database data analysis results with counts

This type of analysis can be performed on any dataset column using Interzoid and our Cloud Data Connect browser-based Cloud application. It works with CSV files, TSV files, and also SQL database tables in the Cloud, including AWS RDS, Snowflake, Azure SQL, Google Cloud SQL, and other various forms of Postgres and MySQL databases. You can see the interactive capability here.

You can also perform this type of analysis using the same product from the command-line as an API or with Curl. This allows this type of analysis to occur on a scheduled basis, perhaps as part of a batch script, or within ETL/ELT data pipelines. Here is an example you can try right now by pasting the below query string into a Web browser address bar. If using Curl, be sure to remove the "HTML=true" parameter:


    https://connect.interzoid.com/run?function=elementize&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/companies.csv&table=CSV&column=1&html=true
                

And from the command-line using Curl (also spelled as cURL):

Linux & Mac

    curl 'https://connect.interzoid.com/run?function=elementize&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/companies.csv&table=CSV&column=1'
                
Windows

    curl "https://connect.interzoid.com/run?function=elementize&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/companies.csv&table=CSV&column=1"
                

For more information about parameters and expanded options performing element analysis of a data column as part of any workflow, check out the Element Analysis Workflow Guide.

High-Performance Batch Processing: Call our APIs with Text Files as Input.
Perform bulk data enrichment using CSV or TSV files.
More...
Available in the AWS Marketplace.
Optionally add usage billing to your AWS account.
More...
See our Snowflake Native Application. Achieve Data Quality built-in to SQL statements.
Identify inconsistent and duplicate data quickly and easily in data tables and files.
More...
Connect Directly to Cloud SQL Databases and Perform Data Quality Analysis
Achieve better, more consistent, more usable data.
More...
Try our Pay-as-you-Go Option
Start increasing the usability and value of your data - start small and grow with success.
More...
Free Trial Usage Credits
Register for an Interzoid API account and receive free usage credits. Improve the value and usability of your strategic data assets now.
Automate API Integration into Cloud Databases
Run live data quality exception and enhancement reports on major Cloud Data Platforms direct from your browser.
More...
Check out our full list of AI-powered APIs
Easily integrate better data everywhere.
More...
Business Case: Cloud APIs and Cloud Databases
See the business case for API-driven data enhancement - directly within your important datasets
More...
Documentation and Overview
See our documentation site.
More...
Product Newsletter
Receive Interzoid product and technology updates.
More...