49
1 © 2009 KBACE Technologies, Inc. De-duplicate Dirty Data Now and Forever Using Oracle Data Quality Management Rita Beck, Senior Principal Consultant March 13th, 2009

De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

Embed Size (px)

Citation preview

Page 1: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

1 © 2009 KBACE Technologies, Inc.

De-duplicate Dirty Data Now and Forever Using Oracle Data Quality Management

Rita Beck, Senior Principal ConsultantMarch 13th, 2009

Page 2: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.2

Agenda

•Data Quality Management (DQM) Basics•DQM Tools•Smart Search•Batch Duplicate Identification•Conclusion

Page 3: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.3

Data Quality Management (DQM) Basics

Page 4: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.4

Why Use Data Quality Management?

•Inconsistent Information

•Inaccurate Financial Reporting

•Customer Dissatisfaction

•Inefficient Sales and Marketing

Page 5: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.5

Inaccurate Financial Reporting•Scenario 1•Average Sales Volume = $100,000

Company abc$100,000

ABC Company$100,000

Company Abc$100,000

Corp. ABC$100,000

Company abC$100,000

Comp. aBc$100,000

Page 6: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.6

Inaccurate Financial Reporting

•Scenario 2•Average Sales Volume = $600,000

Company ABC$600,000

Page 7: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.7

Duplicate Customers

Company a.b.c.

ABC Company

Company Abc

Corp. ABC

Company abC

Comp. aBc

Page 8: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.8

What is Data Quality Management?

•Prevents Future Duplicates from Entering the System • Manually or via Import

•Identifies Existing Duplicates

Page 9: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.9

How Does DQM Work?• Transforms and Standardizes TCA Registry data

• Copies standardized data into separate staged schema tables

• Performs user-defined searches • Within the TCA Registry • Between the TCA Registry and other sets of data

• Determines potential duplicate records

Page 10: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.10

DQM Tools

Page 11: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.11

Data Quality Management Tools•Word Replacements•Entities•Attributes•Transformations•Match Rules

Page 12: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.12

Data Quality Management Tools•Word Replacements•Entities•Attributes•Transformations•Match Rules

Page 13: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.13

Word Replacements

•Provide Standardization

Robert

Robert

Robert

Robert

Robert

Robbie

Bob

Rob

Bobby

Roberto

Page 14: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.14

Word Replacements

Page 15: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.15

Data Quality Management Tools•Word Replacements•Entities•Attributes•Transformations•Match Rules

Page 16: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.16

Entities•Party

•Address

•Contact

•Contact Point

Page 17: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.17

Data Quality Management Tools•Word Replacements•Entities•Attributes•Transformations•Match Rules

Page 18: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.18

Attributes

•Derived from columns within the TCA Registry tables

•Attributes make up an Entity

•Used for matching purposes between an Input Record and the TCA Registry data

Page 19: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.19

Attributes

Page 20: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.20

Data Quality Management Tools•Word Replacements•Entities•Attributes•Transformations•Match Rules

Page 21: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.21

Transformations

• Capitalize all letters

• Remove non-alphanumeric characters

• Reduce all instances of white space to a single white space

• Remove double letters

• Remove vowels except initial vowels

D’ Angello

D’ ANGELLO

D ANGELO

D ANGL

D ANGELLO

D ANGELLO

Page 22: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.22

Transformations

Page 23: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.23

Data Quality Management Tools•Word Replacements•Entities•Attributes•Transformations•Match Rules

Page 24: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.24

Match Rule Purposes• Search

• Used for Search User Interfaces

• Expanded Duplicate Identification• Used for Identifying and Preventing Duplicates

• Bulk Duplicate Identification• Used for Identifying Duplicates

Page 25: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.25

Score Based Matching

•Acquisition • Provides an initial set of potential matches

•Scoring • Assigns scores to further filter matches

Page 26: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.26

Example Match Rule

Page 27: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.27

Smart Search

Page 28: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.28

What is Smart Search?

•Used to identify records within the TCA tables that are potential duplicates of user entered data

•Match Rule Purpose = Search

Page 29: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.29

Smart Search Process

Step 1

TCA Registry

Transformations Applied

TCA Staged Schema

DQM Staging Program

Page 30: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.30

Smart Search Process

Step 2

Transformations AppliedUser Input

StandardizedUser Input

Smart SearchMatch Rule

Page 31: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.31

Smart Search ProcessStep 3

Acquisition and Scoring

Match Criteria and Thresholds

Applied

Smart SearchMatch Rule

TCA Staged Schema Duplicates

Between User Input

and TCA Registry

StandardizedUser Input

Page 32: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.32

Smart Search Example: Existing TCA Records

• TCA Registry Record #1

KBACE6 Trafalgar SquareNashua, NH 03063

• TCA Registry Record #2

KBACE Technologies, IncorporatedSix Trafalgar Sq.Neshua, NH 03063

Page 33: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.33

DQM Searching

Page 34: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.34

DQM Search Results

Page 35: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.35

Non-DQM Search

Page 36: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.36

DQM Smart Search

Page 37: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.37

Smart Search –Entering New Record

Page 38: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.38

Batch Duplicate Identification

Page 39: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.39

What is Batch Duplicate Identification?

•Used to identify duplicate parties that already exist in the TCA Registry

•Match Rule Purposes • Bulk Duplicate Identification

• Expanded Duplicate Identification

Page 40: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.40

Batch Duplicate Identification Process

Step 1

TCA Registry

Transformations Applied

DQM Staging Program

TCA Staged Schema

Page 41: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.41

Batch Duplicate Identification Process

Step 2

Acquisition and Scoring

Match Criteria and Thresholds

Applied

Bulk (or Expanded) Duplication Identification

Match Rule

Duplicates Within

TCA Registry

TCA Staged Schema

(Self Join)

TCA Staged Schema

Page 42: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.42

Define Duplicate Identification Batch

Page 43: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.43

Duplication Identification Batch Results

Page 44: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.44

Duplication Identification Batch Details

Page 45: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.45

Duplication Identification Batch Results

Page 46: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.46

Products Using DQM Functionality

1. Marketing Online (AMS)2. Receivables (AR)3. Sales (ASN)4. TeleSales (AST)5. Customers Online (OCO)6. Inventory (INV)7. Lease Management (OKL)8. Partner Management (PV)9. Sales for Communications (XNC)10. Healthcare Transaction Base (HTB)11. CRM Foundation (JTF)

Page 47: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.47

Let Data Quality Management Work for You!

•Enhance Search Results

•Prevent Future Duplication

•Identify and Merge Existing Duplicates

Page 48: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.48

Q U E S T I O N SQ U E S T I O N SA N S W E R SA N S W E R S

Page 49: De-duplicate Dirty Data Now and Forever Using Oracle …kbace.com/sites/default/files/webinar_presentations/De-Duplicating...Now and Forever Using Oracle Data Quality Management Rita

© 2009 KBACE Technologies, Inc.49

For Additional Information

•For the recording and presentation, please visit:http://kbace.com/Services/Webinars.aspx

•Contact Rita Beck at [email protected]