22
Copyright Kenoconnordata.com 2012

Do you know what's in the data you're consuming

Embed Size (px)

Citation preview

Page 1: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

Page 2: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

Do You Know What's in the Data You're Consuming?

Ken O’Connor – Kenoconnordata.com6th Nov 2012

Page 3: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

As food consumers, we are provided with facts about the food we’re consuming – it’s the law

Ingredients – the basic facts

Allergy Information –Can mean life or death to some

Nutrition InformationEnables us to make “informed choices” about the food we buy

We don’t all use the food facts given to us – Those who choose/need to control their diet are in a position to do so

Page 4: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

We know where food such as beef comes from…

Traceability –Hugely important to restore confidence in beef following the Mad Cow disease (BSE) crisis

Page 5: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

We know that our food has not been tampered with, since it left its “trusted source”

Tamperproof lids and seals –

Introduced following Tylenol poisonings killed 7 people in Chicago in 1982

Best Before / Use by date

Page 6: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

What do you know about the data you depend on? • Data consumers are seldom provided with facts about

the data feeding their critical business processes

• Most data consumers assume the data input to their business processes is “right”, or “OK”.

• They often assume it is the job of the IT function to ensure the data is “right”.

• Almost all data consumers are also data producers – unaware of their role in the data supply chain

Page 7: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

In order to trust data; in order to confidently base business decisions on data, I believe…

As data consumers, you and I have the right to expect facts about the data provided to us. We should:

• Know what’s in the data we’re consuming

• Know where it comes from

• Know the quality controls applied to it

Page 8: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

What basic facts do you need to know about the data you consume?

2,500,000

Data field name: Date of Birth

Age Range Count Percentage0-19 200,000 8.00%20-59 1,800,000 72.00%60-99 310,000 12.40%100-119 44,000 1.76%120-169 20,500 0.82%170+ 500 0.02%No Date of birth 125,000 5.00%Total 2,500,000 100.00%

Data Content FactsTotal number of customer records:

Age ranges - based on date of birth

Let’s look at a profile of “Customer Date of birth” as an example…

Could Marketing use this data to target 20 to 59 year olds?

Could this data be used to calculate pension annuities?

Do you spot anything unusual about these dates of birth?

Data that may be fit for one purpose may not be fit for a different purposeArmed with basic facts – the data consumer can make an informed choice

Page 9: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

Data profiling helps – but does it provide the facts we need?

1. Accuracy? No – we cannot tell if the dates of birth are accurate

2. Completeness? Yes – 95% complete

3. Validity? Perhaps valid dates – but could a customer be 170+

4. Timeliness? No – No indication of the currency of the data

5. Consistency? No – No indication

2,500,000

Data field name: Date of Birth

Age Range Count Percentage0-19 200,000 8.00%20-59 1,800,000 72.00%60-99 310,000 12.40%100-119 44,000 1.76%120-169 20,500 0.82%170+ 500 0.02%No Date of birth 125,000 5.00%Total 2,500,000 100.00%

Data Content FactsTotal number of customer records:

Age ranges - based on date of birth

Page 10: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

Data content facts add a “smell” to data defects

One thing worse than a square peg not fitting in a round hole… a square peg that does fit in a round hole…

It’s not “fit for purpose” –

Data defects are more like natural gas:• Colourless• Odourless• Potentially deadly

DataDefect

Data defects are not like s/w bugs – they seldom cause a system to fail.

Page 11: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

Where does your data come from?

Nicola Askham wrote an excellent blog post recently about “The data faeries” – does this sound familiar to you?

• Team A: Our data is loaded up by IT• IT: No we don't touch that data, it's a manual data load

by Team B• Team B: We just send the spreadsheet to Team A -

we're sure that they load the data• Team A: No we really don't load up that data…

• Most people don’t know where their data comes from• They assume it is always there, and is “OK” • Too few are aware of their role in the data supply chain

Page 12: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

The FSA expects you to know where your data comes from… “data provenance”

http://www.dmsg.bcs.org/web/images/stories/2012-03-29-dean-buckner.pdf

Page 13: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

The FSA expects you to understand how your data is transformed…

http://www.dmsg.bcs.org/web/images/stories/2012-03-29-dean-buckner.pdf

But don’t sweat the small stuff – the FSA advice is to focus on your most critical data

Page 14: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

Where does your data come from? Data Provenance / Traceability / Lineage – The “bucket brigade” model

Imagine if someone in the “bucket brigade” chain• Thought the water was for him and drank it• Used the water on his garden • Turned off the tap• Started the fire deliberately…• Useless if bucket is empty when it reaches the fire

Page 15: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

Turn your data supply chain into a “bucket brigade”

Everyone must understand:• Why the data is ultimately required• The importance of their role & their dependence on others • Where they get their data from and who they provide it to• What the data should contain and what it does contain• If the data is not right – they should raise a data defect !

Page 16: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

Where to start…Learn from Chilean mine rescue…Trace a single critical data element end to end through your data supply chain – this will highlight challenges to overcome

• How do we assign data ownership?• How do we agree data definitions?• How do we specify business rules?• How do we measure data quality?• How do we govern the above?

Page 17: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

You know what’s in the data and where it comes from… now what do you do with it?

http://www.clusterseven.com/external-research/2010/7/20/spreadsheets-and-solvency-ii-financial-services-authority-uk.html

Apply appropriate controls to your spreadsheets

Page 18: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

All industries have critical data…

• Health• Pharmaceutical• Banking• Insurance• Aviation• …

Data consumers in all industries need to know:• What’s in the data they’re consuming• Where it comes from • What quality controls have been applied to it

Page 19: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

The food industry reacted to crises…

• Tylenol poisonings• Mad Cow disease (BSE) crisis

Regulators are reacting to the 2008 financial crisis…They increasingly expect evidence that:

- You can trust your data- They can trust your data

Solvency II, Basel III, Dodd Frank, UCITs, MiFID II, CRD IV...A perfect storm - a Frankenstorm of regulation… - all expecting evidence of data provenance- all expecting evidence of DQ management process

Page 20: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

JFK quoted George Bernard Shaw…“Other people, he said "see things and . . . say 'Why?' . . . But I dream things that never were-- and I say: 'Why not?'"

John F Kennedy – Address before Irish parliament June 28 th 1963

http://www.jfklibrary.org/Research/Ready-Reference/JFK-Speeches/Address-Before-the-Irish-Parliament-June-28-1963.aspx

I dream of a time…- When all critical data is accompanied by facts about that data (Data Quality Information / provenance). - When we will look back on the days when data consumers had few facts about the data they were consuming – and regulators tolerated it.

and I say: 'Why not now?' Visit www.clearinformation.org - a good place to start

Page 21: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

Your new approach to data…

When you return to your office, I would like you to start asserting your rights

• Ask for facts about the data provided to you• Provide facts about the data you provide to others

The norm must become “Here is the data, and here are the facts (Data Quality Information / provenance) about the data”

Page 22: Do you know what's in the data you're consuming

Copyright Kenoconnordata.com 2012

Ken O’Connor

Email: [email protected]: KenoconnordataLinkedin: ie.linkedin.com/in/kenoconnor00

Ken O'Connor is an independent data consultant with over 30 years of hands on experience in the field. Ken specialises in helping organisations meet the data quality management challenges presented by data intensive programmes such as data conversions, data migrations, data population and regulatory programmes such as Solvency II, Basel II / III, Single Customer View and Anti Money Laundering. Ken provides practical data quality and data governance advice at his popular blog: http://kenoconnordata.com