Skip to the content

Deduplication

Identify duplicate records within the data.

Description

This service checks the data provided for duplicates based on a matching name and address. When processing residential data, the level at which the name must match can be set.

This service is costed on a per record basis.

This service requires the following data to be available before it can be used:

Address

This service can also make use of the following optional data to improve its effectiveness:

Person Name
Company Name

Upload Options

There are no options that can be configured for this service at the upload stage.

Download Options

The following options can be configured for this service at the download stage.

Suppression Type

Suppression Type

Set this to configure what should happen when a match is made by this service:

  • Append Flags adds an extra column to your output file which is populated when a match is made and blank otherwise.
  • Suppress Records removes any matching records from your output file.
Match Level

Match Level

Set this to indicate how closely names need to match for a record to be considered a match by this service.

  • Surname allows two completely different forenames to match so long as they have a fuzzy match on the surname
  • Initial requires a fuzzy match on both the surname and forename, allowing single initials to match. For example, "R Smith" would be allowed to match "Robert Smith", but "Richard Smith" and "Robert Smith" would not be considered a match.
  • Forename requires a fuzzy match on both the surname and forename, and single initials are not allowed to match.
  • Personal intelligently picks the highest match level available based on your data. If a record does not contain any forename information then a Surname match will be allowed, but if forename data is present then at least an Initial level match will be required.

Output Columns

This service adds the following columns to your output data.

Unique ID

Unique ID

The unique ID of this record assigned by Data8 for use in deduplication.

Deduplicate Flag

Deduplicate Flag

Identifies any duplicates found at the match level selected with "Dup" and blank if the record is unique.

Duplicate IDs

Duplicate IDs

A list of Unique IDs that this record matches with at the match level selected. The list is semi-colon delimited.

Keep Flag

Keep Flag

A binary flag to indicate whether this record should be kept in the file or removed as a superfluous duplicate at the match level selected. "1" indicates a record that should be kept, "0" indicates a record to discard.