Upload
amice-singleton
View
214
Download
0
Embed Size (px)
Citation preview
Information Quality in PracticeInformed Decisions Group Copyright 2005
Information Quality in Information Quality in Practice: The Good, the Practice: The Good, the
Bad and the UglyBad and the UglyLeon Schwartz, Ph.D.
Informed Decisions Group
November 16, 2005
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.
Information Quality in PracticeInformed Decisions Group Copyright 2005
QuickTime™ and aTIFF (Uncompressed) decompressor
are needed to see this picture.What Me Worry?What Me Worry?
Business Champions for TDQM Programs are scarce,
because Data Quality is difficult to define &
measure, even though Poor Data Quality
costs Billions of dollars.
Information Quality in PracticeInformed Decisions Group Copyright 2005
Information Quality in Information Quality in PracticePractice
Prolog: Poor Data Costs $Billions
The Good: You Can Clean it Up The Bad: The Cost of Avoidance The Ugly: The Pogo Effect Epilogue: What is Data Quality, anyway?
Information Quality in PracticeInformed Decisions Group Copyright 2005
Poor Data Quality Costs Poor Data Quality Costs $Billions$Billions
Data quality problems cost U.S. businesses $611 billion a year.
40% of firms have suffered losses. 2% of customer records are obsolete in one month.
Customer duplication rates range 5 to 20%.
The Web is increasing data entry errors.
Source: Data Warehouse Institute Study, 2002Source: Data Warehouse Institute Study, 2002
Information Quality in PracticeInformed Decisions Group Copyright 2005
Effects of Bad Customer Effects of Bad Customer DataData Low credibility among customers & suppliers
Poor decision making Lost customers/clients Unnecessary printing & postage Poor customer service Lost business opportunities Inefficient utilization of staff
Information Quality in PracticeInformed Decisions Group Copyright 2005
Data Affects Your Data Affects Your SuccessSuccess
rocess
Algorithm
DATA
eople
olitics
PRelative influence of
on an OR/MS project
Information Quality in PracticeInformed Decisions Group Copyright 2005
Room for ImprovementRoom for Improvement
Only 11% have implemented a DQ program*– 48% have no plan for a program
26% purchased a data quality tool*– 52% have no plans
Still very far from 6 Sigma! Easy to improve Quality, if…..
*Source: Data Warehouse Institute Study, 2002*Source: Data Warehouse Institute Study, 2002
Information Quality in PracticeInformed Decisions Group Copyright 2005
Information must be Information must be UsefulUseful
How good is good enough? How often is often enough? How much is it worth?
……..You Can Answer the ..You Can Answer the FollowingFollowing
Information Quality in PracticeInformed Decisions Group Copyright 2005
Information Quality in Information Quality in PracticePractice
Prolog: Poor Data costs $billions
The Good: You can Clean it up The Bad: The Cost of Avoidance The Ugly: The Pogo Effect Epilogue: What is Data Quality, anyway?
Information Quality in PracticeInformed Decisions Group Copyright 2005
QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.
Data Quality Starts with Data Quality Starts with AccessAccess
Data does not exist anywhere Exists, but you can’t find it You found it, but you can’t get to it
You can get to it, but you don’t have authority to use it
You can use it, but it is a total MESS“I never realized HOW BAD!”
Data Warehouse NIRVANA!It’s dirty, but useful.
Information Quality in PracticeInformed Decisions Group Copyright 2005
Data Quality & the Data Data Quality & the Data WarehouseWarehouse
Quality Control the Match Measure & Improve Integrity Flag “out of range” Values Manually examine BIG “leftovers”
Audit a random sample of Customers
“I never realized HOW BAD our data is!”
Integrating data can improve Integrating data can improve Quality, Quality, if you…if you…
Information Quality in PracticeInformed Decisions Group Copyright 2005
Matching Improves Matching Improves QualityQuality
Name Address
Phone Rules
– Group IDGroup ID Account IDAccount ID Account IDAccount ID DunsDuns
– OperationsOperations CleanseCleanse TransformTransform ConsolidateConsolidate
Information Quality in PracticeInformed Decisions Group Copyright 2005
Establish Q. A. Establish Q. A. ProceduresProcedures
Use a common sample Establish replicable process Document carefully Realize the subjectivity Train the Vendor Audit the Vendor
Information Quality in PracticeInformed Decisions Group Copyright 2005
Quality Control Your Quality Control Your MatchMatch
Match Quality Statistics
Data Stats By GROUPS/ Samplematched published who UNIQUES "Good" "Marginal" "Bad" size
Jun-99 Dec-99 PB GROUPS 96.8% 1.5% 1.7% 411UNIQUES 96.3% 1.6% 2.1% 816
Mar-00 May-00 Vendor GROUPS 98.0% 1.0% 1.0% 300UNIQUES **
Jun-00 Nov-00 Vendor GROUPS 98.0% 2.0% 0.0% 300UNIQUES **
Jan-01 Mar-01 Vendor GROUPS 99.0% 0.5% 0.5% 700UNIQUES 99.5% 0.3% 0.2% 4900
Jan-01 Mar-01 PB GROUPS 97.3% 0.6% 2.1% 700UNIQUES 99.2% 0.2% 0.6% 482
Information Quality in PracticeInformed Decisions Group Copyright 2005
Document Integrity Document Integrity RulesRulesIntegrity Rules for IMT Database
Version ChangesV1.2 6b. addedV1.3 2a. updatedV1.4 2a, 4c, 5d updated; 7i, 7j deleted.V1.5 Descriptive headings added, 5b updated.
1. Each Duns group should have a primary Duns account.Every distinct duns:groupid has one record for which duns:groupid =duns.accountid.
2.&3. Duns groups and establishments should be consistent.
2a. Each active Duns-linked establishment should have a primary Dunsaccount.Currently (3/6/96),
Means:Groupbu At Dun & Bradstreet On our Duns tableWe call itDB Current Exists DunsDO No longer exists Retained Duns ObsoleteDM No longer exists Missing Duns Missing
Information Quality in PracticeInformed Decisions Group Copyright 2005
Measure & Reduce Measure & Reduce ViolationsViolations
Integrity RulesShorthand Number of Errors
Rule Description 1996q2 1996q4 Change Summary How Comment Caused
1 d groupid/accountid 0 0 02a. e:d groupid/accountid 10,628 11,037
plus DM accounts 551 0is total 11,179 11,037 -142 Result of 1996q2. Duns removed data from their database. Data dropped 96q2.
2b. e:d groupid/groupid 16,008 16,008 0 Result of 1996q2. Problem in 1996 q2 Duns update.3 d groupid/accountid 0 0 04a. Definition: starduns NA NA NA4b. d starduns groupid 16,008 16,008 0 Result of 1996q2. Problem in 1996 q2 Duns update.4c. e starduns data 82 126 44 Duns input changes. Okay. Can naturally change as Duns data changes.5a. a:e 93,604 40,302 -53,302 Process on 1996q2 data. Process recorded meters immediately removed as dups, not estabs.5b. p:a 3,474 4,106 632 Rejected addresses (cum). Bad input data. Address rejected by match vendor.5c. natlaccct:a 1,582 0 -1,582 Process. Completely fixed with new update.5d. lease:a 0 4,670 4,670 Rejected addresses (cum). Process did not fully adjust for new Colonial Pacific data.5e. mgmtsvs:a 17 23 6 Rejected addresses (cum).5f. contact:a 33,663 9,112 -24,551 Process. Mal-adjustment of tables.5g. custsummary:a 2,563 0 -2,563 Process. Corrected.6a. e:a 0 0 0 Process.6b. e:a prime 147,370 1 -147,369 Process.7a. e:a not null 1 4,566 4,565 Process on new data sources. Data was dropped in 1996q2, causing no integrity error then.7b. e:a null 0 0 0 Process.7c. CP,FX,ML,PM:a not null 0 15,752 15,752 Process on changed data. Data was dropped in 1996q2, causing no integrity error then.7d. CP,FX,ML,PM:a null UNK UNK Process.7e. PC,CL:a not null UNK 0 Process on new data source.7f. PC,CL:a null UNK 1,123 Process on new data source. Process did not fully adjust for new Colonial Pacific data.7g. MG:a not null 0 0 0 Process on new data source.7h. MG:a null 0 483 483 Process on new data source. Process did not account for new MG updating.
24 Total So Far -203,357
Information Quality in PracticeInformed Decisions Group Copyright 2005
Flag “out of range” Flag “out of range” ValuesValuesLeases -- Percent Change
Table BU CO Total Current History Active Inactivelease97q1 CL L 0.0% 0.0% NA NA 0.0%lease97q1 CL L0 1.8% 0.0% NA -3.0% 4.3%lease97q1 CL L1 4.8% 3.3% 38.0% 1.8% 60.4%lease97q1 CL L2 0.1% 0.1% NA 0.1% 0.0%lease97q1 CL L3 97.3% 97.4% 72.7% 101.5% 25.9%lease97q1 CL L4 1.2% 1.0% 80.0% -0.3% 21.5%lease97q1 CL L5 1.3% 1.1% 21.6% -1.8% 14.8%lease97q1 CL L6 2.5% 0.3% 52.9% -20.0% 32.0%lease97q1 CL L7 1.7% 0.6% 60.8% -7.5% 34.7%lease97q1 CL L8 1.8% 0.8% 39.4% -5.5% 19.9%lease97q1 CL L9 2.9% 0.9% 76.3% -8.4% 84.5%lease97q1 PC 10 6.8% 6.8% NA 3.5% 17.8%lease97q1 PC 15 7.3% 7.3% NA 3.5% 16.2%lease97q1 PC 20 2.6% 2.6% NA -2.5% 13.9%lease97q1 PC 30 23.2% 13.5% NA -58.8% 106.9%lease97q1 PC 32 NA NA NA NA NAlease97q1 PC 33 NA NA NA NA NAlease97q1 PC 34 NA NA NA NA NAlease97q1 PC 35 2.6% 2.6% NA -96.5% 299.4%lease97q1 PC 40 0.9% 0.9% NA 0.3% 1.8%lease97q1 PC 50 14.7% 0.2% NA -72.4% 83.0%lease97q1 PC 55 0.0% 0.0% NA -100.0% 255.6%lease97q1 PC 60 10.0% 10.0% NA 1.1% 50.2%lease97q1 PC 65 5.7% 5.6% NA -3.7% 50.6%lease97q1 PC 70 0.0% 0.0% NA -100.0% 1400.0%lease97q1 PC 72 0.0% 0.0% NA -100.0% NATotal CL Total 6.3% 5.0% 44.5% 1.1% 37.7%Total PC Total 7.0% 6.8% NA 2.3% 21.3%Total Total Total 6.9% 6.7% 77.9% 2.2% 22.1%
Looking at counts saves
the day
Information Quality in PracticeInformed Decisions Group Copyright 2005
Manually Examine BIG Manually Examine BIG “Leftovers”“Leftovers”
Products identified by simple "ACE" as belonging to ABC Investment Corp:
ACTIVE PRODUCTS ALL PRODUCTS
total incorrect %incorrect corrected total total incorrect %incorrect
Establishments 84 22 26.19% 62 Establishments 97 23 23.71%Accounts 107 24 22.43% 83 Accounts 114 24 21.05%
Products 531 67 12.62% 464 Products 792 88 11.11%Product $ $1,658,729 $65,670 3.96% $1,593,059 Product $ $2,098,254 $95,052 4.53%
Products caught by simple "ACE" as bogus (NID="FDL", wrong Duns Ult):
ACTIVE PRODUCTS ALL PRODUCTS
Establishments 33 Establishments 37Accounts 37 Accounts 38
Products 190 Products 301Product $ $355,207 Product $ $535,612
Products found by simple "ACE" that were missed by National Account ID (NID):
ACTIVE PRODUCTS ALL PRODUCTS
Establishments 11 Establishments 13Accounts 16 Accounts 19
Products 35 Products 54Product $ $78,373 Product $ $118,407
Products found by simple "ACE" that were missed by Duns ultimate and/or match:
ACTIVE PRODUCTS *Kentucky ALL PRODUCTS
"Mail Factory"Establishments 27 1 Establishments 31
Accounts 33 3 Accounts 36Products 281 167 Products 420
Product $ $1,135,701 $942,829 Product $ $1,347,495 Pareto
Information Quality in PracticeInformed Decisions Group Copyright 2005
Ensuring Data QualityFocus on the PROCESS (TQM)Define Quality Metrics (KPIs)Use Data Cleansing Tools
NCOAType “data cleansing” in Google for list
Document everythingAudit regularlyTest, test, test
Who is using? How?Beg
ins an
d ends
with
the
CUSTOM
ER
Information Quality in PracticeInformed Decisions Group Copyright 2005
Information Quality in Information Quality in PracticePractice
Prolog: Poor Data costs $billions
The Good: You can Clean it up The Bad: The Cost of Avoidance The Ugly: The Pogo Effect Epilogue: What is Data Quality, anyway?
Information Quality in PracticeInformed Decisions Group Copyright 2005
Who’s Cleaning Up?Who’s Cleaning Up? Data Quality Software Vendors
– IBM (acquired Ascential who acquired Vality)– SAS (acquired DataFlux)– Harte-Hanks (acquired Trillium)– Firstlogic, Unitech, Innovative Systems– Similarity Systems (ACQUIRED Evoke SW)
Address Matching & Cleansing Vendors– Pitney Bowes acquired Group 1 (4/05) and Firstlogic (???)
– Plus 100s of service bureaus Specialty houses
– I.e., Comanage for telcomm companies
….and the data is still dirty.
Information Quality in PracticeInformed Decisions Group Copyright 2005
Information Requirements Information Requirements are Relativeare Relative
Strategic objectives or goals Who are the clients (THEY) What THEY need When they need it Where they need it How they need it
Information Quality in PracticeInformed Decisions Group Copyright 2005
Data Quality Programs Data Quality Programs are Rareare Rare
Scope the Effort Data Discovery
Categorize Data DefectsDevelop DQ rules
Define DQ Program Launch & Track
- Information Inventory- ”As-is” processes- Information Priorities
- Data Description- Simple Data Checks- Data Mining
- Integrity, retention, refresh, reliability- Classify defects & causes
- Metrics, KPIs
Information Quality in PracticeInformed Decisions Group Copyright 2005
Dealing with DENIAL is Dealing with DENIAL is DauntingDaunting
Expose shoddy business processes Change business practices Agree on common definitions, rules, roles
Train employees Tackle political/cultural issues
Information Quality in PracticeInformed Decisions Group Copyright 2005
Information Quality in Information Quality in PracticePractice
Prolog: Poor Data costs $billions
The Good: You can Clean it up The Bad: The Cost of Avoidance The Ugly: The Pogo Effect Epilogue: What is Data Quality, anyway?
Information Quality in PracticeInformed Decisions Group Copyright 2005
Sources of ErrorsSources of Errors
Technical– Careless calculations– Poor programming
Process– Human error– Negligence– Intent (policy)
Political
Information Quality in PracticeInformed Decisions Group Copyright 2005
Actual / Forecast To-Be Business processes complete Data quality activities
– 1.6 MM obsolete identified and purged– 2.3 MM duplicates identified, 325K identified for elimination (Customers confirmed need for 1.95 MM duplicates, based upon
current capabilities)– Cleansed 3.6 M U.S. records (via Finalist, Customer Contact)– D&B DUNS linkage in process. Identified 577 K duplicates, 2.9 M unique DUNS customers
Analyzed and Improving processes which create bad data– Identified and documented sources of create / update / delete to legacy customer records. – Removed change authorization from 2,940 employees, primarily Sales, Service, Product Supply, and PBCC New Business
Operations– Identified and corrected 4 significant (and numerous minor) legacy systems problems creating incorrect and/or duplicate
customer information Conversion to SAP environment
– Production environment complete– 34 interface and conversion development activities– Customer Master Live (Converted from IMS to SAP) on track for December 6
User Training– User and Power user training developed– Power User Boot Camp training completed November 22– End user training (1,300 users) scheduled for January
Fix the Basics: Customer Fix the Basics: Customer MasterMaster KPI Target – level 1
Cleanse 6.9 million root recordsEliminate duplicate customer records (est x %)Eliminate inactive customer records (est x%)Reduce business processes creating incorrect Customer InformationPopulate and interface SAP Customer Master
Customer Master live by Dec. 31, 2002
Target – level 3
Customer Master live by 1Q 03
Information Quality in PracticeInformed Decisions Group Copyright 2005
Avoiding ErrorsAvoiding Errors Technical
– Error Trapping– CMM program
Process– Edit checks– Training– Streamlining
Political– Culture change
“This customer already is in our
database.”
Information Quality in PracticeInformed Decisions Group Copyright 2005
Unreliable Cancellation Unreliable Cancellation Data Data
Creates a “Lose-Lose-Lose” Creates a “Lose-Lose-Lose” I. Suspect cancellations identified
• Audit reports sent to field• VP, Sales fired
II. Customer Retention• Executive focus• The Pogo Effect
III. Fix the Basics• “Software enhancement”
IV. Order to Cash• “All fixed for 2005”
QuickTime™ and aTIFF (LZW) decompressor
are needed to see this picture.
Information Quality in PracticeInformed Decisions Group Copyright 2005
Taking I.Q. to the Next Taking I.Q. to the Next LevelLevel
Merge/Purge/Address Hygiene no longer good enough
Move from Repair to Correct to Prevent
Organizational Change, Compromise and Accountability impact program budget
How to JUSTIFY $$ when I.Q. is so fuzzy??
Information Quality in PracticeInformed Decisions Group Copyright 2005
Information Quality in Information Quality in PracticePractice
Prolog: Poor Data costs $billions
The Good: You can Clean it up The Bad: The Cost of Avoidance The Ugly: The Pogo Effect Epilogue: What is Data Quality, anyway?
Information Quality in PracticeInformed Decisions Group Copyright 2005
It’s All About It’s All About PerceptionPerception We’ve had this problem for 20 years. We know we had this problem for 10 years
Every organization has the problem We know it will cost to improve it How much of an improvement can I buy? What is the ROI? Can I believe what you tell me?
Information Quality in PracticeInformed Decisions Group Copyright 2005
Wang & Strong ID 179 I.Q. Wang & Strong ID 179 I.Q. AttributesAttributes
Ponniah defines 17
Redman defines 27
Marakas defines 11
Information Quality in PracticeInformed Decisions Group Copyright 2005
Where to Start?Where to Start?
Too many definitions: no clarity Need to focus! Most include ACCURACY as one dimension
Even Accuracy is a fuzzy concept– What are ‘errors”?– What are “true” values? “false”? “suspect”?
Can we even measure accuracy “accurately”?
Information Quality in PracticeInformed Decisions Group Copyright 2005
Even the Lexicon of Terms Even the Lexicon of Terms is Fuzzyis Fuzzy
Direct observation of “errors”– Subjective– Unreliable– Impractical even with moderate size data sets
$ High cost
Automated error reports– Who creates the rules?
– Needs to be audited
– Misses subtleties– Lower cost
Quality>>Accuracy>>Error-Quality>>Accuracy>>Error-freefree
A Major Research A Major Research ChallengeChallenge
Information Quality in PracticeInformed Decisions Group Copyright 2005
Find the ErrorsFind the Errors
You be the JUDGE
Custname Street City State Zip PhoneAlec Gomez and Sons 1 Hyde Park Village Chicago Ilinois 56750 (312) 299-3111
Bill Able 191 York Ave, Apt. 19K New York New York 10028 (212) 333-6666
Kyle Costner 1993 Michigan Avenue Chicago Illinois 56723 (212) 423-4441
Joe Diehr 110 W.90th Street Chicago Illinois 56750 (312) 299-3333
Sandra Cimino Interiors 99 Sunset Bld Solana Beach California 92119 (710) 000-1212
Randy Shay 00 Bay Shore Drive San Francisco California 92013 (410) 345-7890
Center Street Catering Yonkers New York New York 10123 (914) 449-1919
Gene Mastow and Partners 155 W. 80 Street New York New York 10028 (646) 484-4482
George Jenkins, Inc. 1442 Columbus Avenue Nee York New York 10023 (212) 422-4102
Blaire Wallace 60 Cerntal Avenue Solana Beach California 92119 (710) 000-1414
Ron Johnson Tourss 000 Marine Drive Chicago Illinois 56700 (312) 222-9999
Jane Smith 113 Creative Place San Francisco California 92001 (410) 355-5555
Richard Green, LLP 112 W. 87th Street Chicago Illinois 56750 (312) 111-0000
Cresent Designs 2 Execution Suffering New 7ork 99999 (045) 369-6690
Information Quality in PracticeInformed Decisions Group Copyright 2005
The Impact of Context The Impact of Context is Clearis Clear
Transfer Function - Graduates only
y = -1.0063x + 22.718
R2 = 0.2803
0
5
10
15
20
25
30
35
0 2 4 6 8 10 12 14 16
error count
perceived accuracy
Transfer Function - Undergrads
y = -0.2397x + 18.216
R2 = 0.03
0
5
10
15
20
25
30
35
0 2 4 6 8 10 12 14 16
error count
perceived accuracy
Information Quality in PracticeInformed Decisions Group Copyright 2005
What about Cognition?What about Cognition?Transfer Function - Business Professionals only
y = -1.2199x + 24.444
R2 = 0.4011
y = -0.0017x 3 + 0.0855x2 - 2.1068x + 25.893
R2 = 0.4120
5
10
15
20
25
30
35
0 2 4 6 8 10 12 14 16
error count
perceived accuracy
Information Quality in PracticeInformed Decisions Group Copyright 2005
The 3 C’sThe 3 C’s
Cognition
Context
Content
Preference Per
ceptio
n
Performance
Analytical
Aptitude
Functional Experience SME
Information Quality in PracticeInformed Decisions Group Copyright 2005
The Data Quality Perception The Data Quality Perception Research WebsiteResearch Website
http://www.xkimo.com/dqpresearch/
Leon SchwartzLeon Schwartzwww.informeddecisionsgroup.www.informeddecisionsgroup.comcom
Thank you for your Thank you for your timetime
Information Quality in PracticeInformed Decisions Group Copyright 2005
Omit the AnalystsOmit the AnalystsTransfer Function - Management only
y = -0.8056x + 19.809
R2 = 0.1732
0
5
10
15
20
25
30
35
0 2 4 6 8 10 12 14 16
error count
perceived accuracy
Information Quality in PracticeInformed Decisions Group Copyright 2005
Research DesignResearch Design
Samples created with 0-15 errors (17% max)
Samples randomly presented (see website) Practice session (6 samples) Respondents asked to rate 16 samples on 1-30 scale (modified Magnitude Estimation)
Double anchors used 63 students (grad & undergrad) attempted
Information Quality in PracticeInformed Decisions Group Copyright 2005
The Simple TaskThe Simple Task
Please examine the data/report above, and estimate the accuracy of the information by placing your cursor and clicking on the line below:
Error Prone Error Free(Too many mistakes to be useful) (No discernable mistakes)
Low Accuracy High Accuracy
Anchor Study Fiasco!
Information Quality in PracticeInformed Decisions Group Copyright 2005
The Perceptual Transfer The Perceptual Transfer FunctionFunction
Number of errors (objective)Error rate (objective)
Perceived accuracy
(subjective)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
3
2
1
Information Quality in PracticeInformed Decisions Group Copyright 2005
perceived accuracy
0
5
10
15
20
25
30
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
error count
All Graduate All Graduate StudentsStudents
Information Quality in PracticeInformed Decisions Group Copyright 2005
Business Business ProfessionalsProfessionals
perceived accuracy
0
5
10
15
20
25
30
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
error count