91
© 2007 Knowledge Integrity, inc. www.knowledge-integrity.com (301)754-6350 1 Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc. www.knowledge-integrity.com

Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

Embed Size (px)

Citation preview

Page 1: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

1

Data Profiling,Data Governance, and

Data Quality forMaster Data Management

David LoshinKnowledge Integrity, Inc.

www.knowledge-integrity.com

Page 2: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

2

AgendaWe will explore how data profiling, data quality, and data governance will support the emergence of the core master data program:

Master data management basicsMaster object identification and modelingData consolidation and integrationOperational master service needsData governance brings it all under controlReview of Data Quality technology for MDM

Page 3: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

3

What is “Master Data”?Core business objects used in the different applications across the organization, along with their associated metadata, attributes, definitions, roles, connections, and taxonomies, e.g.:

CustomersSuppliersPartsProductsLocationsContact mechanisms

“David Loshin purchased seat 15B on US Airways flight 238 from Baltimore (BWI) to San Francisco (SFO) on July 20, 2006.”

Master Data Object Value

Customer David Loshin

Product Seat 15B

Flight 238

Location BWI

Location SFO

Page 4: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

4

What is Master Data Management?

Master Data Management (MDM) incorporates the business applications, information management methods, and data management tools to implement the policies, procedures, infrastructure that support the capture, integration, and subsequent shared use of accurate, timely, consistent and complete master data.

Page 5: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

5

Challenges to MDM Success

•Uncoordinated Enterprise•Data Ownership•Performance Management

Organizational

•Data Quality•Data Standards•Data Governance

Operational

•MDM Architecture•Transition Plan•System Performance

Technical

Understanding and addressing these issues providesgrounding for the core MDM operational components

Page 6: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

6

The MDM Component Layer Model

Architecture

Identification

Management

Governance

Integration

BusinessProcess

Management

Master Data Model MDM System Architecture

MDM Service Layer Architecture

DataStandards

MetadataManagement

DataQuality

DataStewardship

Administration/Configuration

Hierarchy Management Identity Management

Migration Plan

Record Linkage Merging and Consolidation

Identity Search and Resolution

MDM Component Service Layer

Application Integration and Synchronization Service Layer

MDM Business Component Layer

Business Rules

Business Process Integration

Page 7: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

7

ArchitectureMaster data modelMDM system frameworkService layer architecture

ArchitectureMaster Data Model MDM System Architecture

MDM Service Layer Architecture

Page 8: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

8

Central Master/Coexistence

Page 9: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

9

Registry

Page 10: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

10

Transaction Hub

Page 11: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

11

MDM Architectural StylesDifferent underlying architectural approaches influence ways of integrating data management services:

Data assessment and master object identificationUnderlying master data modelData profilingParsing, Standardization, NormalizationCleansingEnhancement

Two high-level approaches:RepositoryRegistry

Page 12: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

12

Repository – Initial Data Consolidation1. Data sources are staged and cleansed2. Cleansed data is brought into a master

staging area for consolidation3. Records are checked for previous registration

with the unique identification services4. Survivorship rules are applied5. For new records, a unique identifier is created6. The resulting record is enhanced7. The enhanced record is stored in the master

repository8. The unique identification service is notified

MasterRepository

UID1

IntakeSourcedata Staging Cleanse EnhanceSurvivorship

UniqueIdentification

Service

3

2

1

4

5

6

8

MasterStaging

IntakeSourcedata Staging Cleanse

IntakeSourcedata Staging Cleanse 7

Page 13: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

13

Registry – Initial Data Consolidation1. Source data is cleansed and enhanced2. The unique identification service is consulted with the

identifying information to determine if the entity has already been registered, and if not, to generate a unique identifier for registration

3. The records are managed within their respective data locations, and

4. The records are tagged with the returned unique identifier5. The mapping from the registry to the data locations is

registered with the associated identifying information

IntakeSource 2data Staging Cleanse

1

IntakeSource 3data Staging Cleanse

IntakeSource 1data Staging Cleanse

Enhance

Enhance

Enhance

UniqueIdentification

Service

UID1

UID1

UID1

UID12

3

4

5

Page 14: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

14

Master Data ModelLimited universe of common master objects

Party, customer, product, part, supplier, claim, instrument

Universal models may be suitable as starting pointsChallenges:

Resolution of metadata in a consistent mannerCreating a model that accommodates all applications properly

CUST

First VARCHAR(15)Middle VARCHAR(15)Last VARCHAR(21)Address1 VARCHAR(45)Address2 VARCHAR(45)City VARCHAR(30)State CHAR(2)ZIP CHAR(9)

Nightingale-Patterson

CUSTOMER

FirstName VARCHAR(14)MiddleName VARCHAR(14)LastName VARCHAR(20)TelNum NUMERIC(10)

Nightingale-Patterso

Last VARCHAR(21)

LastName VARCHAR(20)

Page 15: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

15

Master Object Identification and ModelingDocument business processesSegregate services from dependent dataIdentify target data areasInventory and identify candidate master entitiesSource data analysisAssembling the Target Model

Page 16: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

16

Business Process ModelingA “business process” is a coordinated set of activities intended to achieve a desired goal or produce a desired output productModels are designed to capture both the high level and detail of the business process

activityinputs

controls

actors

output

Page 17: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

17

Sharing Enterprise Data ObjectsInteractions between activities depend on shared data:

activity

activity

activity

Instance values of common data types representing business facts communicate input and control during the business process

Page 18: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

18

Segregating Master Data ServicesEvaluating the catalog of data elements through which business processes are coordinated yields a candidate list of master data objectsAssembling that catalog enables a review of the business application services

activity

Identify application services at the business level as they access master data objects

Page 19: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

19

Finding Master DataWhat data elements constitute our “master data”?How do we locate and isolate master data objects that exist within the enterprise?How do we assess the variances between the different representations in order to consolidate instances into a single view?

Page 20: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

20

Centralizing Semantics – Basic QuestionsMaster data will be distributed and replicated across the application environmentsThe initial goal is to consolidate the replicated copies into a single repositoryBefore materializing a single master record for any entity, one must be able to:

Discover which data resources may contain entity informationUnderstand which attributes carry identifying informationExtract identifying information from the data resourceTransform the identifying information into a standardized or canonical formEstablish similarity to other standardized records

Page 21: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

21

CUSTOMER

FirstName VARCHAR(14)MiddleName VARCHAR(14)LastName VARCHAR(30)TelNum NUMERIC(10)

CUST

First VARCHAR(15)Middle VARCHAR(15)Last VARCHAR(40)Address1 VARCHAR(45)Address2 VARCHAR(45)City VARCHAR(30)State CHAR(2)ZIP CHAR(9)

Structures and Semantics Vary Across the Enterprise

Page 22: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

22

CUSTOMER

FirstName VARCHAR(14)MiddleName VARCHAR(14)LastName VARCHAR(30)TelNum NUMERIC(10)

CUST

First VARCHAR(15)Middle VARCHAR(15)Last VARCHAR(40)Address1 VARCHAR(45)Address2 VARCHAR(45)City VARCHAR(30)State CHAR(2)ZIP CHAR(9)

CUSTOMER

CUST

Identifying Information

Contact Information

“A customer is an individual who has

purchased one of our products”

“A customer is an individual to whom we have delivered one of

our products”

Syntax Structure Semantics

Page 23: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

23

Master Object ResolutionResolution of candidate master data types requires a compete view of what composes the information architectureThis entails cataloging data sets, their attributes, data domains, definitions, contexts, and semanticsThis view must facilitate the resolution of:

Format at the element level,Structure at the instance level, andSemantics across all levels

This introduces three challenges:Collecting and analyzing master metadataResolving similarity in structureUnderstanding and unifying master data semantics

Page 24: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

24

Collecting and Analyzing Master MetadataMetadata sources:

Data dictionariesE/R modelsCOBOL copybooksSubject matter experts

Document all data element characteristics within a metadata repository using a standard representationThe consolidated metadata repository will enumerate data characteristics in a standardized wayThe standard representation enables statistical analyses such as:

Assessing frequency of occurrence of namesComparing the lengths of name fieldsDiscovering dependencies between attribute names and assigned types

Page 25: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

25

Resolving Similarity in StructureDifferent underlying master meta-models are likely to share many commonalitiesStructures will probably reflect similar collections of attributes and relations

Example: Many customer data sets will contain customer names along with variant contact mechanisms such as addresses or telephone numbers

The data analyst can review existing models to identify similar object structuresTwo “hints” in structural similarity:

Overlapping structures may reflect data sets carrying identifying information with variant sets of attributesDerived structures potentially reflect an embedding of core master data attributes within a specialized version of the same object

Page 26: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

26

Unifying Master Data SemanticsWhat is the qualitative difference between pure syntactic/structural metadata and its underlying meaning?Review semantic consistency in data element naming and data types, sizes, and structuresNext, document business meanings:

What are the data element definitions?Are there authoritative sources for these definitions?Do similar objects have different business meanings?

Resolution of variance will help in standardizing both the representations and meanings for identified master data object types

Page 27: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

27

Proposing the Target ModelEvaluate the catalog of identified data elements

Seek out the frequently created, referenced, modified, retired

Assess object organizational structureEvaluate conceptual structures as they map to business process useExample: locations are composed of street, city, state, ZIP code

Identify and resolve anomalies across data element sizes, types, formatsPropose an object modelValidate the object model within the information frameworkValidate the object model within the application framework

Page 28: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

28

Data Profiling for Master Data AnalysisCharacteristics of data element metadata is critical for analysisCombination of artifact review and empirical analysisData profiling can provide “ground-truth” evidence of consistency with metadata

Page 29: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

29

The Real Data Quality ProblemThere are no objective measures of data qualityThe only industry metrics are based on name deduplication or address standardizationData cleansing does not address long term quality improvementThe answer: Use data profiling to analyze the current state of data and correlate anomalies to data consumer expectations

Page 30: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

30

The Assessment ProcessData set selection, profiling and profile reviewImpact AnalysisRule Definition, Review, and RefinementObtain Objective Measurements

Page 31: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

31

What Can Go Wrong with Data?Data entry errorsData conversion errorsUnexpected valuesMismatched syntax, formats and structuresInconsistent valuesMissing values

Page 32: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

32

Reasons for Data DegradationLimited tracking of documentation with systemExploitation of “shortcuts” for implementationMultiple front end interfaces to same back endLimited system functionality driving “creativity”Changes in use and perception of data

Page 33: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

33

Forensics and ArcheologyUsing the proper analysis, profiling reveals errors in the data, as well as insight into their originNaïve analysts use profiling to find and correct erroneous dataSavvy analysts use profiling to find and correct erroneous processes

Page 34: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

34

Hidden SecretsDefault null valuesProgramming errorsData element domainsFormat ConstraintsAbstract Data TypesAugmented DomainsOverloaded attribute useOrphansUnused attributes

Page 35: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

35

Default Null Values

Social Security Numbers ‘000000000’, ‘999999999’

Names‘?’, ‘??’, ‘???’, ‘????’, ‘?????’,‘N/A’, ‘NA’, ‘NONE’, ‘None’, ‘UNKNOWN’, ‘Unknown’, ‘n/a’, ‘na’, ‘none’, ‘unknown’

Phone Numbers‘No phone number provided’000-000-0000999-999-9999

Page 36: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

36

Programming ErrorsIn one application, we had been assured that all records were evaluated prior to insertion into the database to ensure uniquenessProfiling results showed that numerous duplicates existedAfter bringing this to the attention of the programmers, they acknowledged that a programming error existed and was to be fixed in the next release

Page 37: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

37

Data Element DomainsAn evaluation of a “Sex” column revealed three distinct values: “M”, “F”, and “U”

Page 38: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

38

Format ConstraintsPattern analysis of a telephone number field showed that there were telephone numbers that contained both numerals and alphabetic letters:

1-800-MATTRES

Page 39: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

39

Abstract Data TypesIn legacy systems, pattern analysis can reveal embedded use of abstract data types such as dates, SSNs, telephone numbers, etc.

Page 40: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

40

Augmented Domains A set of manufactured values are added to a known domain to account for alternate use of the attribute

Additional “local codes” added to FIPS county codes to account for more than one location per county

Page 41: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

41

Overloaded Attribute UseIn one table a column named FOREIGN_COUNTRY contained telephone numbers and email addresses

Page 42: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

42

OrphansIn a review of a legacy foreign key relationship for case management, it was determined that a relatively large percentage of records in one table did not have a corresponding record in the other table

Page 43: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

43

Unused AttributesIn one data quality assessment, we found one set of columns that were null a significant part of the time, and another that always had the same valueThese results indicate that the attribute is not actually being used

Page 44: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

44

More Hidden SecretsMetadata

Data typesAttribute sizesRangesValue set cardinality

StructureEmbedded tablesRelational structureRelationship cardinalityKey relationshipsMaster reference data

Business RulesDomain constraintsDependency constraintsDerivation rulesCompleteness rulesConsistency rules

Page 45: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

45

Summary of Data ProfilingProfiling used to:

Provide empirical analysis of data value setsAssess existing metadata against published descriptionsAssess quality of each data setIdentify outliers for further reviewCharacterize feasibility of consolidation

Page 46: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

46

IdentificationEvery object subject to “mastering” is managed using a unique representation within the master repositoryAny time data intended to refer to that object is seen by an application, its unique representation must be found, verified, and presented back to the application by the MDM platform

IdentificationRecord Linkage Merging and Consolidation

Identity Search and Resolution

Page 47: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

47

IdentificationUnique IdentificationParsing and StandardizationNormalizationRecord LinkageSurvival Strategies

Page 48: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

48

Example – Unique Identification

David H LoshinH David LoshinDavid Howard LoshinDavid LoshinHoward Loshin

David LosrinDavid Lotion

David Lasrin

David LoskinDavid Lashin

David Laskin

David Loshing

Mr. LoshinJill and David LoshinD LoshinLoshin DavidLoshin, Howard

The Loshin FamilyHD Loshin

Mr. David Loshin

Enterprise Knowledge Management : The Data Quality Approach (The Morgan Kaufmann Series in Data Management Systems) (Paperback) by David Loshin

Business Intelligence: The Savvy Manager's Guide (The Savvy Manager's Guides) (Paperback -June 2003) by David Loshin

The Geometrical Optics Workbook(Paperback - June 1991) by David S Loshin

?

Howard David Loshin

Page 49: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

49

Identity Search and Resolution

David Howard Loshin

Loshin, Howard

Howard David LoshinHoward David LoshinHoward David Loshin

Objective: Provide the services that will seek the matching record in the master index that represents the “queried” object

Page 50: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

50

Record Linkage – Parsing and StandardizationParsing

Identifying and tagging pieces of each data value within a semantic context

StandardizationCorrecting terms based on defined rulesAssembling components into recognized patterns

NormalizationAdjust data elements into a common representation

TransformationRule-based modifications into target canonical representationsTransformation into target format

Loshin, HowardD LoshinHowardDFirst HowardMiddle DLast Loshin

Page 51: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

51

Record Linkage – Similarity ScoringRecords are deemed to be the same when their identifying information matches within a reasonable degree of similaritySpecific bits and pieces of identifying information are necessary to enable distinction between unique objectsIdentifying values are weighted as to their contribution to the similarity scoreThe analysts must perform a continual assessment of similarity criteria to inoculate against false positives

Page 52: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

52

Merging and ConsolidationStatic merging and consolidation takes existing data sets, seeks to segment records into equivalence classes representing the same entity, and merges data from matched records into a single master record

Depends on automationThresholds for matching set ahead of timeRecords between thresholds pulled for administrative review

Inline merging and consolidation matches new instances against existing master records and updates the master records with validated current changes

Integrates rules with probabilistic algorithms

Page 53: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

53

Merging and Consolidation

Knowledge Integrity, Inc. 301-754-6350

Loshin 301-754-6350DavidHoward

Knowledge Integrity Incorporated 301 754-6350

LotionDavid 1163 Kersey Rd

David Loshin

Knowledge Integrity

301-754-6350

Page 54: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

54

SurvivorshipRules govern the determination of surviving data element values

Quality of data sourceCurrency of data sourcePriority

Loshin 301-754-6350DavidHoward

LotionDavid 1163 Kersey Rd

Loshin 301-754-6350DavidHoward 1163 Kersey Rd

Page 55: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

55

Identification – SummaryValues are parsed into recognizable componentsComponents are standardizedIdentifying information is collected for searchMatching is facilitated via similarity scoringMatched records are consolidated during integration

Page 56: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

56

Operational Master Data ServicesMDM operations require ongoing support for a number of data services:

Inline classification and matchingData cleansingData enhancementUnique identification

Page 57: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

57

Inline ClassificationMaster objects are grouped by their characteristics into hierarchiesEach presented object is subjected to a classification service to assist in narrowing scope for matching

¼” metal l thr philClassification

Service

“screw”

Page 58: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

58

Data Standardization/CleansingKnowledge base is used to transform into a standardized format

¼” metal l thr phil

” in

thr threaded

phil Philips

l left

hx Hexr right

flt Flat

wd wood

thr

phil

l

in

threaded

Philips

left

1/4metal

CleansingService

Page 59: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

59

Inline MatchingObjects are submitted to the matching service to determine if an instance already exists in the master repositoryRelies on traditional parsing/standardization/linkage

¼” metal l thr philMatchingService

“screw”

MasterRepositoryscrews

parts

Operationalprocess

Page 60: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

60

Unique IdentificationFor unmatched records:

Identifying information is extractedNew unique identifier is created by UID serviceRecord is stored to master repositoryMaster index is updatedUpdates are communicated

Page 61: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

61

Data EnhancementReference data sets are consulted inline for value-added data appendExample:

Customer data is enhanced with geo-demographicsVendor data is enhanced with reliability scoreProduct data is enhanced with tolerance metrics

Page 62: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

62

GovernanceData StandardsData Quality ManagementData Stewardship

GovernanceDataStandards

MetadataManagement

DataQuality

DataStewardship

Page 63: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

63

Aligning Data Objectives and Business Strategy

Clarify and understand the existing “Information Architecture”Create an inventory of data assets

Applications, data assets, documentation, metadata, usageInventory of data elements and “owning” application

Finance

Sales

Marketing

HumanResources

Legal

Compliance

CustomerService

Page 64: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

64

Standardizing Data Sharing

Customers

Partners

LoBs

In any enterprise, we can be confident that information exchanges are understood the same way through the definition and use of data standards

Standard definitions, processes, exchanges

Page 65: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

65

Standards: Challenges for Critical Data ElementsAbsence of clarity

…makes it difficult to determine semantics

Ambiguity in definition…introduces conflict into the process

Lack of Precision…leads to inconsistency in representation and reporting

Variant source systems and frameworks…encourage “turf-oriented” biases

Flexibility of data motion mechanisms…leads to multitude of approaches for data movement

Page 66: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

66

Data Quality Management GoalsEvaluate business impact of poor data quality and develop ROI models for Data Quality activitiesDocument the information architecture showing data models, metadata, information usage, and information flow throughout enterpriseIdentify, document, and validate Data Quality expectationsEducate your staff in ways to integrate Data Quality as an integral component of system development lifecycleGovernance framework for Data Quality event tracking and ongoing Data Quality measurement, monitoring, and reporting of compliance with customer expectationsConsolidate current and planned Data Quality guidelines, policies, and activities

Page 67: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

67

Engineering Data Quality into the System

Analyze/profiledata

Assessdata qualitydimensions

Application

IMS

Flat File

RDBMS

VSAM

Createmonitoring

system

Recommenddata

transformations

Generate data quality reports

Send data quality reportsto data owners

Improved enterprisedata quality

Data quality,Validity, &

Transformationrules

Page 68: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

68

Operationalizing Data GovernanceActualization of data governance activities enables:

The identification of explicit and hidden risks associated with data expectationsThe actualization of the implementation of business policyOversight of the definition of critical data elementsMonitoring and auditing information quality rule complianceManaging enterprise data ownership and stewardshipCoordination and oversight of enterprise data quality

In general, data governance provides management oversight for organizational observance of different kinds of information policies

Page 69: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

69

Roles and Responsibilities

Executive Sponsorship

Data Governance Oversight

Data Steering Committee

Provide senior management support at the C-level, warrants the enterprise adoption of measurably high quality data, and negotiates quality SLAs with external data suppliers.

LOB Data GovernanceLOB Data GovernanceLOB Data GovernanceLOB Data Governance

Strategic committee composed of business clients to oversee the governance program, ensure that governance priorities are set and abided by, delineates data accountability.

Tactical team tasked with ensuring that data activities have defined metrics and acceptance thresholds for quality meeting business client expectations, manages governance across lines of business, sets priorities for LOBs and communicates opportunities to the Governance Oversight committee.

Data governance structure at the line of business level, defines data quality criteria for LOB applications, delineates stewardship roles, reports activities and issues to Data Coordination Council

Page 70: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

70

Stewardship: Manual InterventionIssues with addressing data quality events:

Immediate remediation of flawed data – does this imply data correction?Not all data flaws can be captured via automated processes –this implies manual reviews

Accuracy may only be measured by comparing values directlyCarefully integrate manual intervention when necessary in a controlled manner

Page 71: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

71

Summary - GovernanceOne of critical success factors for MDM deploymentBenefits all stakeholdersEstablishes link between business objectives and information quality

Page 72: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

72

MDM Functional Services

Browsing &Editing

PropagationSynchronization

RulesManager

WorkflowManager

Create

AccessControl

Read Modify Retire

AuthenticationSecurityManagement

Data Quality MetadataManagementIntegration

Audit

Modeling IdentityManagementMatch/Merge Hierarchy

Management

Page 73: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

73

Example: Generic Master Data Services

“Object Locate”

Master Index

“Object Factory”

Page 74: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

74

Data Quality Technologies for MDMData parsing and standardizationRecord Linkage/MatchingData scrubbing/cleansingData enhancementData profilingData auditing/monitoring

Page 75: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

75

Data Parsing and StandardizationDefined patterns fed into “rules engine” used for:

Distinguishing between valid and invalid stringsTriggering actions when invalid strings are recognized

Page 76: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

76

Parsing and Standardization Example

(999)999-9999

999-999-9999999.999.99991-(999)999-9999

1-999-999-99991 999.999.9999

1 (999) 999-9999

Area CodeExchangeLine

3017546350

(301) 754-6350(999) 999-9999

Page 77: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

77

Parsing and Standardization - ApproachSimplest approach: standard regular expression or context-free parsing with actions attached to success statesMore complex:

Standard parsing with variant lookaheadContext-sensitive parsingTable-driven string/pattern matching

Page 78: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

78

Data Scrubbing/CleansingIdentify and correct flawed data

Data imputationAddress correctionElimination of extraneous dataDuplicate eliminationPattern-based transformations

Complements (and relies on) parsing and standardization

Page 79: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

79

Cleansing Example

IBMInternational Bus Mach

Int’l Bus MachIntl Business MachinesIBM Inc.

International Business Machines

Lookup Table

Page 80: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

80

Matching/Record LinkageIdentity recognition and harmonizationApproaches used to evaluate “similarity” of recordsUse in:

Duplicate analysis and eliminationMerge/PurgeHouseholdingData EnhancementData CleansingCustomer Data Integration

Page 81: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

81

Matching/Record Linkage - Example

Knowledge Integrity, Inc. 301-754-6350

Loshin 301-754-6350DavidHoward

Knowledge Integrity Incorporated 301 754-6350

LotionDavid 1163 Kersey Rd

David Loshin

Knowledge Integrity

301-754-6350

Page 82: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

82

Matching/Record LinkagePairwise comparisons reliant on similarity scoring (inefficient)More efficient algorithms block records into candidate sets, then do pairwise comparisons

Statistical (Fellegi & Sunter, Jaro, Winkler)Artificial Intelligence (e.g., Borthwick’s MEDD)

Approximate string matchingPhonetic compression (Soundex, NYSIIS)N-gramsWinkler (probability-based)

Page 83: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

83

Data EnhancementData improvement process that relies on record linkageValue-added improvement from third-party data sets:

Address correctionGeo-Demographic/Psychographic importsList append

Typically partnered with data providers

Page 84: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

84

Subject Number Percent

OCCUPANCY STATUS

Total housing units 1,704 100.0

Occupied housing units 1,681 98.7

Vacant housing units 23 1.3

Tenure

Occupied housing units 1,681 100.0

Owner-occupied housing units 1,566 93.2

Renter-occupied housing units 115 6.8

Subject

SCHOOL ENROLLMENTPopulation 3 years and over enrolled in school 1,571 100

Nursery school, preschool 142 9Kindergarten 102 6.5Elementary school (grades 1-8) 662 42.1High school (grades 9-12) 335 21.3College or graduate school 330 21

EDUCATIONAL ATTAINMENTPopulation 25 years and over 3,438 100

Less than 9th grade 122 3.59th to 12th grade, no diploma 249 7.2High school graduate (includes equivalency) 996 29Some college, no degree 674 19.6Associate degree 180 5.2Bachelor's degree 676 19.7Graduate or professional degree 541 15.7

Percent high school graduate or higher 89.2 (X)Percent bachelor's degree or higher 35.4 (X)

Number Percent

Page 85: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

85

Data ProfilingEmpirical analysis of “ground truth”Couples:

Statistical analysisFunctional dependency analysisAssociation rule analysis

The statistical analysis part is a no-brainerThe other stuff is hard

Page 86: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

86

Data Profiling – ApproachColumn Profiling/Frequency Analysis (Easy)

Done in place using database capability (e.g., SQL), or using hash tables

Cross-Table/Redundancy (Harder)May be done using queries, or using set intersection algorithms

Cross-Column/Dependency (Hardest)Uses fast iterative set partitioning algorithmsA priori, TANE, FD_MINE

Page 87: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

87

Data Auditing/MonitoringProactive assessment of compliance with defined rulesProvide reporting or scorecardingUseful for

“Quality” processes“Data debugging”Business impact assessment

Page 88: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

88

IssuesData quality is seen as a technical problem requiring a technical solutionDisconnect between information value and achieving business objectives leads to ignoring DQ until it is too lateData is often corrected, instead of flawed processes

Page 89: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

89

Integration with Master DataMaster reference data managementSemantic metadata integrated with information qualityData quality activity trackingService-oriented processingCorrelating business impacts to information valueComplex rule frameworks and systemsBusiness rule definition and management

Page 90: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

90

Tools Summary1st generation tools focus on “correction”2nd generation tools look at analysis and discoveryGrowing MDM needs:

Standardized rulesBusiness impact correlationPerformance metricsSemantic metadataGeneric metadata-based descriptive capabilityHistorical auditing and tracking

Page 91: Data Profiling, Data Governance, and Data Quality for ... .pdf · Data Profiling, Data Governance, and Data Quality for Master Data Management David Loshin Knowledge Integrity, Inc

© 2007 Knowledge Integrity, inc.www.knowledge-integrity.com

(301)754-6350

91

Questions?If you have questions, comments, or suggestions, please contact meDavid [email protected]