Get started
Private Equity
Get started
Get started
Salesforce
Get started
Get started
DataGroomr
Get started
Deduplication of 1.2M Accounts in Salesforce
Context
Impact
To build a global database of potential acquisition targets, the client—a private equity firm—developed a high-volume “account sourcing engine” in Salesforce. Sourcing & Enrichment tools like SourceScrub, Grata, 6sense, RocketReach, and Apollo were integrated to ensure no relevant company was missed.
But this approach led to a data overload. Up to 20 duplicates of the same entity appeared—caused by slight differences in naming, domains, or missing hierarchies. Subsidiaries were often logged as standalone parent accounts.
The impact was significant: rep conflicts, double outreach, conflicting intel, wasted resources - also preventing opportunities to automate various business processes. With trust in the CRM eroding, the system was no longer reliable.

The full solution was implemented and deployed to production in under two months, with automated logic for both retroactive deduplication and proactive duplicate prevention.
The result was a clean, trusted CRM foundation that improved rep confidence, reduced redundant outreach, and restored the reliability of downstream reporting and automation.
1.2M
Account records scanned.
40,000
Duplicate accounts identified & merged.
0
Ownership conflicts across merged records.
The goal was to merge duplicate records without losing valuable or current account data, while also resolving ownership conflicts between duplicate accounts assigned to different Account Executives. The deduplication process followed four key steps
1. Generation
A unique primary key (PK) was generated based on normalized website URLs to match duplicate accounts reliably.
2. Identification & Blocking
Proactive logic was added to flag duplicates during record creation (before save), while still allowing distinct subsidiaries and similar entities to be created separately.
3. Scoring
A custom scoring algorithm evaluated record quality by weighing field completeness, data freshness, and recent AE activity, determining a “winning” record in each cluster.
4.Merging
Winning records were enriched with the most accurate values from the group, preserving ownership and consolidating key account details.