Code Example: Matching Individual Names using Python
Matching individual person names reliably is hard. You have abbreviations, nicknames, reordered components, punctuation, transliteration issues, and inconsistent spacing — all of which quickly break simple string comparisons. The Interzoid Individual Name Matching API solves this by generating an AI-powered similarity key for each full name, so different variations that represent the same person map to the same key.
In this walkthrough, we’ll look at the Python examples in the Interzoid Platform GitHub repository:
github.com/interzoid/interzoid-platform / individual-name-matching / python-examples
We’ll see how to:
- Call the
getfullnamematchAPI from Python - Generate similarity keys for one or more names
- Use those keys to group and match names across datasets
- Understand why this AI-powered approach is superior to fuzzy matching libraries and Levenshtein-distance style algorithms
- Python 3.x
- An Interzoid API key: www.interzoid.com/register-api-account
- Basic familiarity with HTTP/JSON in Python
How the Individual Name Matching API Works
The core API used by the Python examples is:
https://api.interzoid.com/getfullnamematch
For each full name you send, the API returns a SimKey:
{
"SimKey": "N1Ai4RfV0SRJf2dJwDO0Cvzh4xCgQG",
"Code": "Success",
"Credits": "5828243"
}
Different text variations of the same individual name (for example
"James Kelly", "Jim Kelley",
"Mr. Jim H. Kellie") will produce the same
SimKey, which is what makes matching and deduplication straightforward.
Basic Python Example: Single Name → Similarity Key
The simplest example in the Python examples directory shows how to call the API once and print the similarity key:
import urllib.request
import json
import urllib.parse
API_KEY = 'YOUR-API-KEY-HERE'
fullname = 'James Johnston'
url = (
'https://api.interzoid.com/getfullnamematch'
+ '?license=' + urllib.parse.quote(API_KEY)
+ '&fullname=' + urllib.parse.quote(fullname)
)
with urllib.request.urlopen(url) as response:
data = json.loads(response.read().decode('utf-8'))
simkey = data.get('SimKey')
code = data.get('Code')
credits = data.get('Credits')
print("Full name:", fullname)
print("Similarity key:", simkey)
print("Code:", code)
print("Remaining credits:", credits)
This pattern is deliberately dependency-free and uses only Python’s standard library
(urllib and json), making it easy to run
anywhere.
Extending to File-Based Matching
The Python examples in the repository are designed to be adapted into batch workflows. A common pattern is:
- Read a CSV or TSV file with a column containing full names
- Call
getfullnamematchfor each row - Append the
SimKeyas a new column - Use the similarity keys to group or cluster equivalent names
At a high level, a file-processing script will:
# Pseudocode for a file-based Python workflow
for row in csv_reader:
fullname = row['full_name']
simkey = call_getfullnamematch(API_KEY, fullname)
row['simkey'] = simkey
output_writer.writerow(row)
Once you have a similarity key column, matching is just a grouping operation:
- Group by
simkeyto find clusters of equivalent names - Identify potential duplicates where multiple records share the same key
- Drive downstream merge or review workflows off of those clusters
Why AI-Powered Similarity Keys Beat Fuzzy Matching and Levenshtein
Generic string comparison approaches — such as Levenshtein distance or common fuzzy matching libraries — treat text as opaque strings. They measure character-level edits, but they do not understand:
- Nicknames vs. formal names (e.g., “Bob” vs. “Robert”)
- Title and honorific noise (“Mr.”, “Dr.”, “Ms.”, etc.)
- Transposed name components (“Smith, John” vs. “John Smith”)
- Cross-language or cultural variations
Interzoid’s Individual Name Matching is explicitly AI-powered and built on specialized algorithms, knowledge bases, and ML-driven models tuned to the domain of personal names. Instead of computing a generic edit distance, it:
- Understands name structure and ordering
- Accounts for nicknames and common variants
- Normalizes punctuation, casing, and spacing
- Leverages an ever-growing knowledge base of real-world name variations
In practice, this makes it much more accurate and robust than raw fuzzy matching or Levenshtein-based techniques, especially on noisy, real-world datasets where nicknames, misspellings, and cultural variations are common.
Comparing Two Names Directly from Python
There are two common approaches for comparing two names with Python:
- Generate a similarity key for each name using the HTTP API and consider them a match if the keys are equal.
- Use the Python package for the Name Match Scoring API to directly obtain a 0–100 score indicating how likely the two names represent the same individual.
A simple pattern using similarity keys might look like this:
name_a = "James Kelly"
name_b = "Jim Kelley"
simkey_a = get_simkey(API_KEY, name_a)
simkey_b = get_simkey(API_KEY, name_b)
if simkey_a == simkey_b:
print("Likely the same individual")
else:
print("Different individuals")
For scoring-based workflows (threshold logic, ranking candidates, etc.), you can use the separate Python package that calls the name match scoring API and returns a numeric score for a pair of names, then apply your own thresholds and business rules.
Running the Python Examples
-
Clone the repository:
git clone https://github.com/interzoid/interzoid-platform.git cd interzoid-platform/individual-name-matching/python-examples -
Open the Python example file(s) and replace
YOUR-API-KEY-HEREwith your actual API key. -
Run the script:
python individual-name-matching-example.py(or whatever filename you choose when adapting the examples). -
Inspect the printed
SimKeyvalues, and adapt the example for file-based processing if you want to generate match reports or cluster records.
The Python examples in the Interzoid Platform repository provide a concise, practical starting point for integrating AI-powered individual name matching into your systems. By generating similarity keys or using name match scores, you get a far more accurate, context-aware signal than what is possible with generic fuzzy matching libraries or Levenshtein-distance alone.
Clone the repo, plug in your API key, and start using similarity keys to match, deduplicate, and cluster individual names in your data pipelines with significantly higher quality and less custom matching logic.