A history of data de-duplication
 
 

A history of data de-duplication

by johnt 9. August 2011 11:08

In the 1970s Littlewoods wanted to ensure they only delivered one catalogue to each of their million plus customer addresses, this was done by printing out all the names and addresses of their shoppers on continuous stationery then physically going through the list to look for duplicates, some had as many as 6 catalogues.

This took months to complete, I know I was one of the checkers, but it saved thousands of pounds. Back in those days you only had two campaigns a year one in spring and another in the autumn so there was time to do this sort of thing.

Today, many commercial CRM and proprietary customer database systems include de-duplication systems, but most of these are very limited and will only detect exact duplicates. Data8’s system employs a number of powerful fuzzy matching techniques to quickly identify duplicates where the name or address is misspelt. Back in the 70’s, the checkers of the data where the fuzzy logic looking for similar names and addresses.

Today it would take a few hours to de-duplicate a million records, back then you could only do the job for a couple of hours before your eyes started going fuzzy.

Tags:

General

Comments

Comments are closed

RecentComments

Comment RSS
Skip Navigation Links
Request handled by server S1 for client 38.107.179.216 at 5/22/2012 3:59:10 AM