Cleansing your data can be a time consuming process. Extracting your data from a database to send to
a bureau either by post or through a website, reviewing a data quality audit, purchasing any required
data and re-importing it into your database all takes valuable time.
With Data8's unique Integr8 Batch solution, you can fully automate the entire process. A simple bespoke
application running on your system can submit your data to us for cleansing automatically, retrieve a
data quality audit, and make some automatic decisions as to what cleansed data should be purchased. The
cleansed data is then downloaded and re-imported into your database, all without any human interaction.
By using this unique system, you can save valuable time and money as well as avoiding the possibility
of human error in your data cleansing processes. As it can be fully integrated into your systems, you
can ensure data is cleansed automatically on a regular basis, keeping your data right up to date.
The Batch Data Cleansing service from Data8 is accessed through SOAP web services,
making it simple to use from any major programming language.
The service is split into parts which can be combined as needed depending on the requirements of
the calling application. The main parts of the service are:
Typical Usage
The typical method of using the Integr8 Batch services is as follows:
- Define a file format using the FileFormatStore service. This describes the format of the data you will be uploading,
which only needs to be set up once and can be reused for many jobs.
- Define a workflow using the WorkflowStore service. This describes the data cleansing tasks that will be performed
on your data, which only needs to be set up once and can be reused for many jobs.
- Upload your data using the UploadManager service. Your uploaded data is assigned a unique identifier which your
application must retain in order to submit the data for processing.
- Submit your data for processing using the JobManager service, supplying the name of the file format and workflow
you configured earlier and the unique identifier assigned to your uploaded data. Each job must be given a unique
name by which you can refer to it later.
- Wait for your job to finish processing by polling the JobManager service intermittently.
- Retrieve the statistics describing the quality of your data from the ResultsManager service. These statistics can
be presented to the user or can be used to make an automated decision on whether any cleansed data needs to be
purchased.
- Retrieve a quote for all the available data cleansing services from the ResultsManager service. Services can be
removed from the quote if they are not needed, or options can be changed to affect how the output from each service
is produced. After making any changes to the quote, an updated quote must be retrieved from the ResultsManager
service for any changes to be reflected in the price.
- Request the data included in your quote from the ResultsManager service, supplying the name of your original job and
your quote to identify the data to purchase. A purchase order number generated by your system must also be included.
- Wait for the cleansed data to become available by polling the ResultsManager service intermittently.
- Download the cleansed data from the ResultsManager service and save it to your system. The data will be provided in
the same format as your original data, but may have some additional columns added to the end of each row.
- Retrieve a description of the data found in each column from the ResultsManager service.
- Import the data from your locally saved cleansed file back into your database, using the column descriptions to
identify which columns data should be taken from during the import.
In order to purchase the cleansed results of any Integr8 Batch data cleansing job, your application must
supply a purchase order number. At this point your account must have a sufficient credit balance remaining.
By supplying a purchase order number, your application on your behalf is making a binding contract for the
quoted services.