Posted on March 7th, 2023
"Elementizing" a data column, also known sometimes as "tokenizing", is an important component of data quality and improving the value of important data assets. It identifies at the element level, rather than the record level, various permutations of elements that exist in a data column.
Identifying these elements with particular datasets can go a long way in helping to standardize a given type of data content, which in turn provides better data analysis results and better outcomes for other data-driven processes, including Data Science, Artificial Intelligence, Machine Learning, Analytics, CRM, Marketing, and more.
For example, here is a list of elements with their corresponding counts for a full name field after performing an element analysis:
This type of analysis can be performed on any dataset column using Interzoid and our Cloud Data Connect browser-based Cloud application. It works with CSV files, TSV files, and also SQL database tables in the Cloud, including AWS RDS, Snowflake, Azure SQL, Google Cloud SQL, and other various forms of Postgres and MySQL databases. You can see the interactive capability here.
You can also perform this type of analysis using the same product from the command-line as an API or with Curl. This allows this type of analysis to occur on a scheduled basis, perhaps as part of a batch script, or within ETL/ELT data pipelines. Here is an example you can try right now by pasting the below query string into a Web browser address bar. If using Curl, be sure to remove the "HTML=true" parameter:
https://connect.interzoid.com/run?function=elementize&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/companies.csv&table=CSV&column=1&html=true
And from the command-line using Curl (also spelled as cURL):
Linux & Mac
curl 'https://connect.interzoid.com/run?function=elementize&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/companies.csv&table=CSV&column=1'
Windows
curl "https://connect.interzoid.com/run?function=elementize&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/companies.csv&table=CSV&column=1"
For more information about parameters and expanded options performing element analysis of a data column as part of any workflow, check out the Element Analysis Workflow Guide.
All content (c) 2018-2023 Interzoid Incorporated. Questions? Contact support@interzoid.com
201 Spear Street, Suite 1100, San Francisco, CA 94105-6164
Interested in Data Cleansing Services?
Let us put our Generative AI-enhanced data tools and processes to work for you.
Start Here
Terms of Service
Privacy Policy
Use the Interzoid Cloud Connect Data Platform and Start to Supercharge your Cloud Data now.
Connect to your data and
start running data analysis reports in minutes: connect.interzoid.com
API Integration Examples and SDKs: github.com/interzoid
Documentation and Overview: Docs site
Interzoid Product and Technology Newsletter: Subscribe
Partnership Interest? Inquire