How to Achieve a Single Customer View
Clients want to obtain a Single Customer View of their contact database/CRM system to let them understand the types of individuals/businesses that they are dealing with. Contact data is held on a variety of databases, the quality of the data entered will vary and as we have seen from experience, customers may appear in more than one database leading to multiple communications and wasted money; or alternatively they may have moved address, again wasting communications.
Not only is there a cost for holding all this incorrect data it also leads to a distorted picture of the members on the database e.g. if demographic profiling is used the results will show too many customers and possibly skew the profiles.
2.0 The Business Case Companies need to know how many customers/supporters/members they have, they want to cleanse the data to save money on any marketing campaigns. Importantly they also need to be assured that they are not in breach of the Data Protection Act for, ‘Personal data shall be accurate and, where necessary, kept up to date’. The cost of handling returns and customer complaints can also be reduced.
3.0 The Problem Statement
Companies have many databases containing details of customers/supporters/members. The chances are that people could appear in more than one database and in some cases all of them. The task is to merge the databases to present a unified view. Once the initial merge is completed then it is important to implement a process to maintain the database by regularly cleansing the data.
Step 1 Agreeing Rules
A set of rules and processes need to be agreed on how to consolidate certain records and identify others for further examination. This is to ensure that important records are kept and only to archive duplicate or inaccurate data.
Once the specification are agreed and signed off, the rules and techniques detailed below can be applied to create a programme to speed up the identification and merging of duplicate records.
Step2 Data cleansing and validation rules.
It is only possible to move onto de-duplication once the data cleansing is completed, prior to this the addresses would be in a non standard format and it would be difficult to find any matches at all. Standardising the data maximises the chances of finding a match.
How to Obtain a Single Customer View Page 2
The cleansing procedures:
2.1 PAF Cleansing
This matches addresses on the databases against the Postcode Address File (PAF) from the Royal Mail, automatically correcting errors where possible.
2.2 Telephone Validation
This initial validation checks if a number is callable based on factors such as, number of digits and dialing codes etc. If required there are more detailed validation methods to confirm that there is a live line.
2.3 Email Validation
Initially this will be validated at server level. This means that data8 checked that the mail server for the domain is alive. Again there are more detailed checks that can be applied.
The levels of validation are chosen dependent on what the client wishes to achieve and the rules agreed.
De-duplication Once the data is cleansed it is possible to look for duplicates, in order to identify duplicates certain rules and assumptions needed to be made, based on this, bespoke algorithms are designed to produce the required results.
Candidate Retrieval This algorithm works by considering each record in the database and finding any suitable candidates to be considered as duplicates with the currently considered record. The candidate retrieval process can be:
a) Records with the same Name and Address (where populated)
b) Records with the same Name and Date of Birth (where D of B populated)
c) Records with same Name and Email (where email populated)
The process then becomes recursive and each new candidate gets considered to build a full possible chain of records that are similar and may be considered as duplicates.
How to Obtain a Single Customer View Page 3
Fuzzy Matching & Formatting The quality of data capture across the databases will vary, therefore it is necessary to develop a custom set of fuzzy logic matching. See table below.
Just using exact matching would allow duplicates on the database, small differences in the matching routine can be allowed. For example is John Smith the same as J Smith, if they are, the records should be merged. Exact matching would identify them as two different records, doubling the marketing communications that John Smith would receive.
Identification of Duplicates data8 then create a candidate list of potentially similar records, and develops a set of rules which are used to break the list into three categories
SAME - Records identified as the same and therefore merged MANUAL - Records identified as possibly the same but a significant difference indicates a manual check must be performed. UNCHANGED - The remainder rejected from the duplication process and returned unchanged.
The rules with which data8 allocated candidates vary here is an example of a recent de-duplication exercise:
The following details must match all records in this chain:
Name must be the same (i.e. the name has to be populated and match at initial level) Non conflicting Address (Allow blanks) Non conflicting Date of Birth (Allow blanks) Over18 flag must be the same (Allow blanks)
In the following situations, all records that meet the below criteria were put into the manual investigation pot:
Name must be the same Addresses will conflict (Not blank)
Field Allowable Difference
DOB: Differs by one corresponding digit
1st of the month date to date with same month and year
US date format to UK. i.e. MM/DD/YYYY to DD/MM/YYYY
Address: Data8 PAF matching
Email: Differs by one character
Telephone: Data8 Telephone Formatting
How to Obtain a Single Customer View Page 4
Non conflicting Date of Birth (Allow blanks)
Name must be the same
Non conflicting address (Allow blanks) Conflicting Date of Birth (Not blank)
Any records that are not matched in any of the above criteria were put into the ‘Unchanged’ pot and returned without merging.
Once candidates are identified as members of the ‘SAME’ duplicate set, then rules to merge fields have to be defined. What is illustrated is part of the merge priority of the fields.
Taking the merging of the ‘Firstname’ field as an example.
This example assumes that the surname and address appears seven times in the databases, however the first name associated with surname and address varies and in some cases it is blank. The choice therefore is to select from John, Jon or Jonathon, blank values are ignored. The value to take would be Jon using the ‘Winner Takes All’ rule, it appears twice while John and Jonathon appear only once.
Output Files Two output files are generated to show merged record details and a further file where there were possible duplicates requiring manual investigation. After a manual intervention any duplicates are removed and the files merged.
1. The first output file contained the resultant merged and unchanged records based on the previous rules. A column will also show any related records which contributed to the merge process.
2. Another output file is created which shows all records merged and any associated records.
‘Firstname’ John BLANK Jon BLANK Jonathon Jon BLANK
How to Obtain a Single Customer View Page 5
3. A third output file shows records which were identified for manual investigation. This includes a field showing all related records .
After checking the files requiring a manual check the data can be then all brought together as one database.
Goneaways, Movers and Deceased
Once all the data is one database it is then time to identify those people who have gone away and where possible find an alternative address i.e. movers. Also it is important to identify those who have died to avoid the embarrassment of sending out a marketing communication.
Maintaining the data
In the longer term the data should be cleansed very couple of months, the costs will be minimal as most of it is already correct. Alternatively by using web api it is possible to validate the contact data, in real time, as it flows into the company, confirming address, telephone number and email as it is input on a website or any business application including POS.