Generating a Company Name Match Report in Node.js
Duplicate and inconsistent company names cause significant downstream issues in CRM systems, sales operations, analytics pipelines, account hierarchies, identity graphs, and more. Slight variations like "IBM", "I.B.M.", "International Bus Machines", and "Intl. Business Machine Co" all refer to the same organization, yet traditional string comparison or fuzzy similarity algorithms often fail to recognize them as equivalent.
This blog entry demonstrates how a simple Node.js script can generate a high-quality match report by assigning a similarity key to each company name using:
- Interzoid’s AI- and ML-powered company matching algorithms
- Normalization and domain-specific knowledge bases
- Advanced semantic processing far beyond Levenshtein or fuzzy matching libraries
The result is a fast, accurate clustering of duplicate or equivalent company names—critical for deduplication, account unification, data cleansing, and boosting ROI across operational and analytical workflows.
/company-name-matching/node-examples/generate-match-report.js
Requirements:
- Node.js 14+
- An Interzoid API key: Register here
- A text file of company names (one per line)
How This Works: AI-Powered Similarity Keys
Each company name is sent to the API:
getcompanymatchadvanced.
Instead of returning a fuzzy score or an edit-distance calculation, the API returns a Similarity Key (SimKey)—a canonical representation of the normalized company name, generated using:
- AI/ML linguistic models
- Domain-trained normalization logic
- Extensive knowledge bases of company naming variations
Any text variation that represents the same company will share the same SimKey. This is vastly more reliable than fuzzy matching or Levenshtein distance, which operate on raw character sequences without understanding semantics or organization naming conventions.
Variations correctly clustered include:
- GE, Gen Electric, General Electric Co.
- Bank of America, BOA, Bnk of America Corp.
- Google, Google LLC, Google Incorporated
The logic is identical to Interzoid’s other match-report examples for:
- Individual names (full-name matching)
- Street addresses (address normalization + matching)
All follow the same match-report pattern: generate similarity keys → sort → cluster → output.
The Node.js Script
Below is the full Node.js example script from the Interzoid Platform repository. It reads a file of company names, computes similarity keys, sorts them, and prints clusters of duplicates.
// generate-match-report.js
const fs = require("fs");
const https = require("https");
// Replace this with your API key from https://www.interzoid.com/manage-api-account
const API_KEY = "YOUR_API_KEY_HERE";
// Input file containing one company name per line
const INPUT_FILE_NAME = "sample-input-file.txt";
/**
* Calls Interzoid's getcompanymatchadvanced API for a single company name
* and returns a Promise resolving to the similarity key (SimKey) as a string.
* Returns an empty string on error or if SimKey is missing.
*/
function callCompanyMatchAPI(companyName) {
return new Promise((resolve) => {
// URL-encode the company name to safely embed it in the query string
const companyParam = encodeURIComponent(companyName);
const apiURL =
"https://api.interzoid.com/getcompanymatchadvanced" +
`?license=${API_KEY}` +
`&company=${companyParam}` +
"&algorithm=model-v4-wide";
https
.get(apiURL, (res) => {
let data = "";
res.on("data", (chunk) => {
data += chunk;
});
res.on("end", () => {
try {
const json = JSON.parse(data);
const simKey = json.SimKey || "";
resolve(simKey);
} catch (err) {
console.error(
`Error parsing JSON for "${companyName}": ${err.message}`
);
console.error("Raw response:", data);
resolve("");
}
});
})
.on("error", (err) => {
console.error(`Error calling API for "${companyName}": ${err.message}`);
resolve("");
});
});
}
async function main() {
// Each record will hold the original input and its similarity key
const records = [];
// Read the input file contents
let fileContents;
try {
fileContents = fs.readFileSync(INPUT_FILE_NAME, "utf8");
} catch (err) {
console.error("Error reading input file:", err.message);
return;
}
// Split into lines and process each non-empty line
const lines = fileContents.split(/\r?\n/);
for (const line of lines) {
const company = line.trim();
// Skip blank lines
if (!company) continue;
const simKey = await callCompanyMatchAPI(company);
// Skip if no SimKey returned
if (!simKey) continue;
records.push({ input: company, simKey });
}
if (records.length === 0) {
console.log("No records with similarity keys found.");
return;
}
//--------------------------------------------------------------------
// Sort records strictly by simKey only so that all matching keys
// are adjacent in the array. This makes it easy to find clusters.
//--------------------------------------------------------------------
records.sort((a, b) => a.simKey.localeCompare(b.simKey));
//--------------------------------------------------------------------
// Walk the sorted list and build clusters of records that share
// the same simKey. Only print clusters of size >= 2.
//--------------------------------------------------------------------
let currentKey = null;
let cluster = [];
function printCluster(c) {
if (c.length < 2) return; // Only print clusters with 2 or more
for (const r of c) {
console.log(`${r.input},${r.simKey}`);
}
console.log(); // blank line between clusters
}
for (const rec of records) {
if (rec.simKey !== currentKey) {
if (cluster.length > 0) printCluster(cluster);
currentKey = rec.simKey;
cluster = [rec];
} else {
cluster.push(rec);
}
}
// Final cluster
if (cluster.length > 0) printCluster(cluster);
}
main().catch((err) => {
console.error("Unexpected error:", err);
});
Interpreting the Output
The output is a set of clusters:
GE,7p3fj92x8as2
Gen Electric,7p3fj92x8as2
General Electric Co,7p3fj92x8as2
IBM,08xs81snnq2l
International Bus Machines,08xs81snnq2l
Each group represents names that refer to the same organization. The script only outputs clusters with at least two entries, making it ideal for deduplication and review workflows.
Why This Beats Fuzzy Matching / Levenshtein Distance
Classic fuzzy matching approaches struggle with:
- Acronyms vs full names (GE vs General Electric)
- Semantic equivalence (“IBM” vs “International Business Machines”)
- Corporate suffix noise (Inc, LLC, Ltd., Corp.)
- Missing or extra tokens
- Cross-language variations
Interzoid’s matching engine is AI- and ML-powered, built on:
- Normalization corpuses
- Specialized lexical knowledge bases
- Semantic models trained on organizational naming behavior
- Advanced string feature extraction
It understands intent behind organization names, not just surface-level characters. This produces match results that are dramatically more accurate than raw string distance.
With only a few lines of Node.js, you can generate enterprise-grade match reports that accurately identify duplicate or variant company names—something fuzzy matching libraries consistently fail to do at scale. The same Interzoid-powered workflow is available for individual name matching and street address matching, giving you a unified way to clean and deduplicate all major entity types.
Try the example script, use the sample file in the GitHub repo, and integrate similarity-key-based clustering into your operational and analytics pipelines to dramatically improve data quality and ROI.