Upload
duongtruc
View
218
Download
5
Embed Size (px)
Citation preview
1 © 2009 KBACE Technologies, Inc.
De-duplicate Dirty Data Now and Forever Using Oracle Data Quality Management
Rita Beck, Senior Principal ConsultantMarch 13th, 2009
© 2009 KBACE Technologies, Inc.2
Agenda
•Data Quality Management (DQM) Basics•DQM Tools•Smart Search•Batch Duplicate Identification•Conclusion
© 2009 KBACE Technologies, Inc.3
Data Quality Management (DQM) Basics
© 2009 KBACE Technologies, Inc.4
Why Use Data Quality Management?
•Inconsistent Information
•Inaccurate Financial Reporting
•Customer Dissatisfaction
•Inefficient Sales and Marketing
© 2009 KBACE Technologies, Inc.5
Inaccurate Financial Reporting•Scenario 1•Average Sales Volume = $100,000
Company abc$100,000
ABC Company$100,000
Company Abc$100,000
Corp. ABC$100,000
Company abC$100,000
Comp. aBc$100,000
© 2009 KBACE Technologies, Inc.6
Inaccurate Financial Reporting
•Scenario 2•Average Sales Volume = $600,000
Company ABC$600,000
© 2009 KBACE Technologies, Inc.7
Duplicate Customers
Company a.b.c.
ABC Company
Company Abc
Corp. ABC
Company abC
Comp. aBc
© 2009 KBACE Technologies, Inc.8
What is Data Quality Management?
•Prevents Future Duplicates from Entering the System • Manually or via Import
•Identifies Existing Duplicates
© 2009 KBACE Technologies, Inc.9
How Does DQM Work?• Transforms and Standardizes TCA Registry data
• Copies standardized data into separate staged schema tables
• Performs user-defined searches • Within the TCA Registry • Between the TCA Registry and other sets of data
• Determines potential duplicate records
© 2009 KBACE Technologies, Inc.10
DQM Tools
© 2009 KBACE Technologies, Inc.11
Data Quality Management Tools•Word Replacements•Entities•Attributes•Transformations•Match Rules
© 2009 KBACE Technologies, Inc.12
Data Quality Management Tools•Word Replacements•Entities•Attributes•Transformations•Match Rules
© 2009 KBACE Technologies, Inc.13
Word Replacements
•Provide Standardization
Robert
Robert
Robert
Robert
Robert
Robbie
Bob
Rob
Bobby
Roberto
© 2009 KBACE Technologies, Inc.14
Word Replacements
© 2009 KBACE Technologies, Inc.15
Data Quality Management Tools•Word Replacements•Entities•Attributes•Transformations•Match Rules
© 2009 KBACE Technologies, Inc.16
Entities•Party
•Address
•Contact
•Contact Point
© 2009 KBACE Technologies, Inc.17
Data Quality Management Tools•Word Replacements•Entities•Attributes•Transformations•Match Rules
© 2009 KBACE Technologies, Inc.18
Attributes
•Derived from columns within the TCA Registry tables
•Attributes make up an Entity
•Used for matching purposes between an Input Record and the TCA Registry data
© 2009 KBACE Technologies, Inc.19
Attributes
© 2009 KBACE Technologies, Inc.20
Data Quality Management Tools•Word Replacements•Entities•Attributes•Transformations•Match Rules
© 2009 KBACE Technologies, Inc.21
Transformations
• Capitalize all letters
• Remove non-alphanumeric characters
• Reduce all instances of white space to a single white space
• Remove double letters
• Remove vowels except initial vowels
D’ Angello
D’ ANGELLO
D ANGELO
D ANGL
D ANGELLO
D ANGELLO
© 2009 KBACE Technologies, Inc.22
Transformations
© 2009 KBACE Technologies, Inc.23
Data Quality Management Tools•Word Replacements•Entities•Attributes•Transformations•Match Rules
© 2009 KBACE Technologies, Inc.24
Match Rule Purposes• Search
• Used for Search User Interfaces
• Expanded Duplicate Identification• Used for Identifying and Preventing Duplicates
• Bulk Duplicate Identification• Used for Identifying Duplicates
© 2009 KBACE Technologies, Inc.25
Score Based Matching
•Acquisition • Provides an initial set of potential matches
•Scoring • Assigns scores to further filter matches
© 2009 KBACE Technologies, Inc.26
Example Match Rule
© 2009 KBACE Technologies, Inc.27
Smart Search
© 2009 KBACE Technologies, Inc.28
What is Smart Search?
•Used to identify records within the TCA tables that are potential duplicates of user entered data
•Match Rule Purpose = Search
© 2009 KBACE Technologies, Inc.29
Smart Search Process
Step 1
TCA Registry
Transformations Applied
TCA Staged Schema
DQM Staging Program
© 2009 KBACE Technologies, Inc.30
Smart Search Process
Step 2
Transformations AppliedUser Input
StandardizedUser Input
Smart SearchMatch Rule
© 2009 KBACE Technologies, Inc.31
Smart Search ProcessStep 3
Acquisition and Scoring
Match Criteria and Thresholds
Applied
Smart SearchMatch Rule
TCA Staged Schema Duplicates
Between User Input
and TCA Registry
StandardizedUser Input
© 2009 KBACE Technologies, Inc.32
Smart Search Example: Existing TCA Records
• TCA Registry Record #1
KBACE6 Trafalgar SquareNashua, NH 03063
• TCA Registry Record #2
KBACE Technologies, IncorporatedSix Trafalgar Sq.Neshua, NH 03063
© 2009 KBACE Technologies, Inc.33
DQM Searching
© 2009 KBACE Technologies, Inc.34
DQM Search Results
© 2009 KBACE Technologies, Inc.35
Non-DQM Search
© 2009 KBACE Technologies, Inc.36
DQM Smart Search
© 2009 KBACE Technologies, Inc.37
Smart Search –Entering New Record
© 2009 KBACE Technologies, Inc.38
Batch Duplicate Identification
© 2009 KBACE Technologies, Inc.39
What is Batch Duplicate Identification?
•Used to identify duplicate parties that already exist in the TCA Registry
•Match Rule Purposes • Bulk Duplicate Identification
• Expanded Duplicate Identification
© 2009 KBACE Technologies, Inc.40
Batch Duplicate Identification Process
Step 1
TCA Registry
Transformations Applied
DQM Staging Program
TCA Staged Schema
© 2009 KBACE Technologies, Inc.41
Batch Duplicate Identification Process
Step 2
Acquisition and Scoring
Match Criteria and Thresholds
Applied
Bulk (or Expanded) Duplication Identification
Match Rule
Duplicates Within
TCA Registry
TCA Staged Schema
(Self Join)
TCA Staged Schema
© 2009 KBACE Technologies, Inc.42
Define Duplicate Identification Batch
© 2009 KBACE Technologies, Inc.43
Duplication Identification Batch Results
© 2009 KBACE Technologies, Inc.44
Duplication Identification Batch Details
© 2009 KBACE Technologies, Inc.45
Duplication Identification Batch Results
© 2009 KBACE Technologies, Inc.46
Products Using DQM Functionality
1. Marketing Online (AMS)2. Receivables (AR)3. Sales (ASN)4. TeleSales (AST)5. Customers Online (OCO)6. Inventory (INV)7. Lease Management (OKL)8. Partner Management (PV)9. Sales for Communications (XNC)10. Healthcare Transaction Base (HTB)11. CRM Foundation (JTF)
© 2009 KBACE Technologies, Inc.47
Let Data Quality Management Work for You!
•Enhance Search Results
•Prevent Future Duplication
•Identify and Merge Existing Duplicates
© 2009 KBACE Technologies, Inc.48
Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S
© 2009 KBACE Technologies, Inc.49
For Additional Information
•For the recording and presentation, please visit:http://kbace.com/Services/Webinars.aspx
•Contact Rita Beck at [email protected]