61
Data in databases “It’s not what you think” Clare Somerville Trish O’Kane

Clare Somerville Trish O’Kane Data in Databases

Embed Size (px)

DESCRIPTION

Data in Databases It's not what you think Clare Somerville and Trish O’Kane

Citation preview

Page 1: Clare Somerville Trish O’Kane Data in Databases

Data in databases

“It’s not what you think”

Clare SomervilleTrish O’Kane

Page 2: Clare Somerville Trish O’Kane Data in Databases

Long term preservation of data requires understanding how data is created and managed

We have to work out: ◦What data the business needs to keep◦What records the business needs

to create and keepAnd….. how

◦What data must be unchanged◦What we mean by usable and retrievable

Our point

Page 3: Clare Somerville Trish O’Kane Data in Databases

We will

cover

The problem, as we see it

What is a record and its attributes

What is a database and how they are built and maintained

How can we use data sets to create records?

What is a data warehouse and how they are built and maintained

How can we ensure that useful data sets are available over time

Page 4: Clare Somerville Trish O’Kane Data in Databases

Agenda

The problem

Definitions

Delivering data &

records from data

◦Data warehousing

◦Data “lifecycle”

management

Conclusion

Page 5: Clare Somerville Trish O’Kane Data in Databases

The problemDatabases have replaced many semi-structured

records ◦ Register of Births, Deaths and Marriages (and Divorces!)◦ EQC claims data

But - we want some of that information available long term in a usable format

Records managers are unfamiliar with the world of structured data◦ Disposal outcome in a draft disposal authority:

“When database decommissioned, transfer to Archives NZ”

◦ Transfer what?

Source

solutio

n

Page 6: Clare Somerville Trish O’Kane Data in Databases

Who wants what?What have we got?

◦Data in databasesWhat do we want?

◦RecordsWhen do we want them?

◦Now, and for the long termBut….what is a record

in the context of data?◦The individual data item? ◦A whole dataset?

Slice and

dice here

Maintain metadata

here

Business users?

Broader audience

Page 7: Clare Somerville Trish O’Kane Data in Databases

What have we got

1. Customers◦ Customers for data◦ Customers for records

2. Information assets◦ Records◦ Transactional data in databases◦ Datasets◦ Data marts and data warehouses

3. What do we have to do to?◦ Principles from data warehousing◦ Data life cycle management

Page 8: Clare Somerville Trish O’Kane Data in Databases

DefinitionsRecords, metadata, data, source systems, database, data warehouse

Page 9: Clare Somerville Trish O’Kane Data in Databases

Records

Recordkeeping definition

In structured world

Public Records Act 2005A record or class of

records in any form in whole or part, created or received by a public office in the conduct of its affairs

A record is a line of data in a table in a database

Page 10: Clare Somerville Trish O’Kane Data in Databases

Attributes of a recordRecordkeeping

perspectiveData management

perspective

Field types◦ Numeric◦ Character◦ Date/time

Composite, derivedValues

Documents the carrying out of the organisation’s business objectives, core business functions, services and deliverables, and/or

Provides evidence of compliance with any current jurisdictional standards, and/or

Documents the value of the resources of the organisation and how risks to the business are managed, and/or

Supports the long-term viability of the organisation

Page 11: Clare Somerville Trish O’Kane Data in Databases

Data and metadataDocuments and metadata

“Essentially there is a different relationship between

data and its metadata

than

documents and their metadata”

Page 12: Clare Somerville Trish O’Kane Data in Databases

Is it data or is it metadata?It depends, doesn’t it?

It’s about the level at which it is used/appliedE.g. Date created

Customer ID Date created Customer name

Customer Type

123 2008-10-20 Bloggs, Joe Retailer

124 2008-10-23 Mouse, Minnie Distributor

125 2008-10-26 Max, Metadata

Direct

date created

Page 13: Clare Somerville Trish O’Kane Data in Databases

Metadata in the data warehouse

Business metadata Technical metadata

Link between database and users – road map for access

Business usersAnalystsLess technical

What data, from where, how, when etc

DevelopersTechnical usersMaintenance and growthOn-going development

Page 14: Clare Somerville Trish O’Kane Data in Databases

Metadata in the data warehouse

Business metadata Technical metadata

Structure of dataTable namesAttribute namesLocationAccessReliabilitySummarisationsBusiness rules

Table namesKeysIndexesProgram namesJob dependenciesTransformationExecution timeAudit, security controls

Page 15: Clare Somerville Trish O’Kane Data in Databases

Metadata

Data Metadata

10 bytes 1 byte

Page 16: Clare Somerville Trish O’Kane Data in Databases

Metadata

Data Metadata

Heaps!

Page 17: Clare Somerville Trish O’Kane Data in Databases

Data – comma delimited

0349,000,A," ","CHANGE ADD ON MED CERT "," "," "," ","","S","GASUP","",00000,71909,00000,0,71909,10393470,00000.00,00000.00,00000.00,00000.00,00000.00,000000,71937,72266,0,139,600,4,72266,471,360480713,000000000,1,00090.00,00037.00,000031543560",00000.00,00000.00,+000000.00,0000000,0000,000,00,000000000,00000,00000,000000.00,000000.00,000000000,009,72266,00000,72268,16414213,000000001,000000000,244,0114340511,04,01,+000000.00,+000000.00,00000,000000,+000000.00,610,0,00146.13,000000.00,000,000,610,0,290763901,290763901,000000000,000000000,0000699873780174,000,D,"C","N","N","Y","Y","Y","N","N","3349533755","Y","T REED","Y","DSWSINVE106 ","BELOQ","Y","NAWEK","TANIA","REED","C","N",02651,009,0000,72273,72268,16405202,0114340511,03,72245,0000,003,011434,0000002288550174,000,A,"C","N","N","Y","Y","Y","N","N","3349533755","Y","T REED","Y","DSWSINVE106 ","BELOQ","Y","NAWEK","TANIA","REED","C","N",02651,009,0000,72273,72268,16405202,0114340511,04,72245,0000,003,011434,0000002288550161,000,D,"A",126,72263,00000.00,600,5,360480713,0000072827280161,000,A,"A",126,72263,00000.00,600,5,360480713,0000072827280057,000,D," "," "," "," "," ","A","","","AHMEV","VOKOG",000000003,0814409,2500,001,25,00,00000.00,000000,00,132,00000,0,+00063.00,72266,14133031,00000,00000.00,2,+00063.00,01,00000.00,000000,2,0,0,00000.00,607,1,471,362400470,000000000,0004094132990057,000,A," ","MANOP "," "," "," ","A","","","AHMEV","VOKOG",000000003,0814409,2500,001,25,00,00000.00,000000,00,132,72269,0,+00063.00,72266,14133031,00000,00000.00,2,+00063.00,01,00000.00,000000,2,0,0,00000.00,607,1,471,362400470,000000000,0004094132990270,000,A," "," ","","N","N","G",128,72266,72268,16414261,01,00000.000,00000.000,0,139,00000.00,00000.00,00000.00,00000.00,00000.00,00000.00,00000.00,00000.00,00000.00,600,5,471,000,000,000,000,000,000,000,000,0001,360480713,00000.00,0006025374450062,000,A,"YYYYYYYYYY ","AUTH01532600063000000000131197N014101 0000000","VA","SATRA","DSWSAUCK119 "," ","MANOP","",003,132,72268,16414266,0000000000,0,607,362400470,000000000,000084800530

Page 18: Clare Somerville Trish O’Kane Data in Databases

Data – in a table

Page 19: Clare Somerville Trish O’Kane Data in Databases

Database

Page 20: Clare Somerville Trish O’Kane Data in Databases

3 layers

Database

•User interface•Rules and algorithms•Data

Page 21: Clare Somerville Trish O’Kane Data in Databases

Application layer

Data layer

Adds, overwrites, deletes dataRuns rules and processes

Provides views, creates reportsTurns data into information

Data in tablesActed on by application layer

Source solution database

Page 22: Clare Somerville Trish O’Kane Data in Databases

Can data fit the PRA definition?• We are “format neutral” in the

management of records, so….• Data can be records!

– Births Deaths and Marriages Register– EQC claims data

• Test questions– If we exclude data what have we lost?– What is the impact of losing data?

• On the business• For the future

Page 23: Clare Somerville Trish O’Kane Data in Databases

Application layer

Data layer

The Solution System is not a recordkeeping system because it…

• Holds transactional data, not evidence of transactions in context (records)

• Isn’t tamper proof – Difficult to know exactly what the

application layer is doing– Different tables and rows may be

managed differently– Hard to roll back to a point in time

• Must overwrite ‘redundant’ data to run efficiently– Compromise of history vs speed– Business use is the priority

• The data layer is not usable without the application layer

Source solution is not a recordkeeping system

Page 24: Clare Somerville Trish O’Kane Data in Databases

Inside a databaseHere today - gone tomorrowTransaction metadata

◦Example: An activity about a customer is a recordIs there a Unique ID

For the transaction? For the customer?

Where and when are/were components located? Multiple data tables in one database Multiple data tables across multiple database

Table names and column names Standard names for elements across tables

Page 25: Clare Somerville Trish O’Kane Data in Databases

Source / business databasesData stored in tablesNormalised structureLots of dataLarge number of usersLots of very quick transactionsVarying history retainedMostly data is overwritten

Page 26: Clare Somerville Trish O’Kane Data in Databases

Data warehouse

Page 27: Clare Somerville Trish O’Kane Data in Databases

Data warehouse

Storing and accessing large amounts of data

Central repository for all or significant parts of the data that an enterprise’s various business systems collect

Page 28: Clare Somerville Trish O’Kane Data in Databases

Data warehouse

Corporate needs

Centrally owned

Corporate effort

Transaction level data

Historical data

Lots of data!

Multiple source

systems

Designed for reporting and

analysis

Large queries

Multiple table joins

Unpredictable use

Pressure on resources

Page 29: Clare Somerville Trish O’Kane Data in Databases

What is the simplest/most robust approach to deliver data and records from databases?

Page 30: Clare Somerville Trish O’Kane Data in Databases

Elegant solutions needed

Page 31: Clare Somerville Trish O’Kane Data in Databases

1 Create policy to document:What authoritative records must be retained

and what metadata must be retainedWhat formats are acceptableWhich (if any) records and metadata are

considered transient artefacts, and why (e.g. format shifting duplicates, quality checking etc),

Get approval for destruction of transient artefacts as part of the normal functioning of the systems that dispose of them

Page 32: Clare Somerville Trish O’Kane Data in Databases

Approach: create and export records from solution system

1. Identify what data tables/records are needed and that can be produced

2. Map identified records to disposal authorities◦ Which records must be kept beyond system

decommission◦ Identify the business need for retention

3. Use the application layer to create and export those records in a suitable format

4. Store in recordkeeping system e.g. data warehouse or EDRMS

5. Retain records needed for the business post-decommission

Page 33: Clare Somerville Trish O’Kane Data in Databases

2 Persistently associate metadataAppropriate metadata associated and

retained with authoritative records◦Identify data linkages between systems◦Retain those linkages or◦Consolidate metadata and associated record

objects into one system, and ensure they are persistently associated

Ensure migrated data/metadata/objects retain their context (e.g. date created, author etc)

Page 34: Clare Somerville Trish O’Kane Data in Databases

Case mgmtsystem

EDRMSCustomer

mgmtsystem

Future state BAU transfers to recordkeeping systems

Create key records and send to EDRMS

Structured data to data warehouse

Page 35: Clare Somerville Trish O’Kane Data in Databases

Data warehouses as an example of good practice

Page 36: Clare Somerville Trish O’Kane Data in Databases

Managing data

Page 37: Clare Somerville Trish O’Kane Data in Databases

Data feeds - principlesDirect data feeds from source systemsNot changed in any wayNo intervening processesAll changes to the dataFully auditableReconcile to source system

Page 38: Clare Somerville Trish O’Kane Data in Databases

For Example: one table…Before:29 months data162 tapes400 million records88 GB

After:29 months data4 physical files27 million records6 GB

Month1

Compare

Month2 Month3Monthn

Compare

. . . . .

. . . .

Differences2Differences1

. . . . .

. . . .

. . . . .

. . . .

Consolidated file

Page 39: Clare Somerville Trish O’Kane Data in Databases

SubsetsFrequently used dataAt a point in timeSmaller, quickerEasier to useDaily, weekly,

monthly

Page 40: Clare Somerville Trish O’Kane Data in Databases

Summary layerAnalysts access the summary layerSmaller, easierData Marts

Summary data

Page 41: Clare Somerville Trish O’Kane Data in Databases

Benefits of data warehouse

One version of the truth

Tuned environment

Can do more – freedom to explore

Full history – track everything

Updated daily

Multiple sources of data

Quick and easy to access

Stored online

Accessible

Page 42: Clare Somerville Trish O’Kane Data in Databases

Data managementData does not manage itself!Difficult, unrulyStandards, processesRoles and responsibilitiesData warehouse teamSkills

◦ Data warehousing, Data management, Software, Hardware, Metadata, Architecture, Analysis, Performance, tuning

Coordination, communication, marketing

Page 43: Clare Somerville Trish O’Kane Data in Databases

Best practiceData warehousing around for yearsProven architectures, technologies,

methodologiesGood infrastructure

… but will it last?

Page 44: Clare Somerville Trish O’Kane Data in Databases
Page 45: Clare Somerville Trish O’Kane Data in Databases

Challenges – big data

33% - data growth contributes to performance issues “most of the time”

Managing storage may cost 3-10 times cost of procurement

Average company keeps 20-40 duplicates of its data

Page 46: Clare Somerville Trish O’Kane Data in Databases

Helping IT and the business to collaborate in managing dataIt’s not just about BI

Business and IT must work together

Helping IT and the business to collaborate in managing data

Page 47: Clare Somerville Trish O’Kane Data in Databases

Data “lifecycle” management

Page 48: Clare Somerville Trish O’Kane Data in Databases

Old EDRMS

New EDRMS

Old case mgmt system

New case

mgmt system

Data warehouse

Decommission = risk

Partial exports

Page 49: Clare Somerville Trish O’Kane Data in Databases

Data lifecycle managementData lifecycle management (DLM) Managing the flow of data, information

and associated metadata through information systems and repositories, from creation and storage through to when it can be discarded.

Recognises that the importance and business value of data does not rely on its age, or how often it is used.

Page 50: Clare Somerville Trish O’Kane Data in Databases

Why DLMData and information has value for

◦strategic and operational business needs ◦managing risk ◦meeting legislative obligations

Value of information decays over time Some information can be archived, some

discardedOccasionally, sometimes unexpectedly,

older data may need to be accessed again, quickly, completely and accurately

Page 51: Clare Somerville Trish O’Kane Data in Databases

DLM Components

MaintainOrganiseDescribeManage

Retain or DisposeArchiveTransferDestroy

UseAccessShareFind

Create or ModifyStandardsFormatsRetrieval

Property

Customer

Tenancy

Requires:Core process artefactsConnected systemsAutomated capture

Requires:Risk identificationLifecycle policiesMetadata schemaBusiness classification linked to business process

Requires:Single source of truthDisposal AuthoritiesDisposal PlanningTiered Storage

Requires: Disposal Authorities Business requirements Disposal planningTiered Storage

Includes data validation

Page 52: Clare Somerville Trish O’Kane Data in Databases

Conclusion

Page 53: Clare Somerville Trish O’Kane Data in Databases

Create and maintain

Principle 1: Recordkeeping Must be Planned and Implemented 1. Responsibility assigned CEO down2. Policy3. Procedures 4. Responsibilities defined, resourced5. Recordkeeping programme & monitoring

Page 54: Clare Somerville Trish O’Kane Data in Databases

Principle 2: Full & accurate records of business activity must be made

Requirement Database

Data Warehouse

1. Functions and business activities identified and documented

2. Records of business decisions and transactions must be created

3. All records of business activity captured routinely into an organisation-wide recordkeeping framework

4. Training provided

Page 55: Clare Somerville Trish O’Kane Data in Databases

Principle 3: records must provide authoritative and reliable evidence of business activity

Requirement Database

Data Warehouse

10. Authentic: accurately documented creation, receipt, & transmission 11. Reliability & integrity, maintained unaltered 12. Useable, retrievable, accessible 13. Complete, with content & contextual information 14. Comprehensive, provide authoritative evidence of all business activities

Page 56: Clare Somerville Trish O’Kane Data in Databases

Principle 4: records must be managed systematically

Requirement Database

Data Warehouse

15. Identified & captured in recordkeeping framework 16. Organised according to a business classification scheme 17. Reliably maintained over time in recordkeeping framework 18. Useable, accessible & retrievable for the entire period of their retention 19. Contextual and structural integrity maintained over time 20. Retention & disposal actions systematic

Page 57: Clare Somerville Trish O’Kane Data in Databases

RK capability of system(s)A system that holds authoritative records

◦Must be capable of recordkeeping, or◦Made capable, or◦Must transfer records to a recordkeeping

systemWho makes that decision?

◦Should be business owner ◦(with advice from IT)

Data warehouses show us ◦what can be done◦how to do it

Page 58: Clare Somerville Trish O’Kane Data in Databases

Developing an Enterprise Information Management Framework

STRUCTURED AND UNSTRUCTURED INFORMATION

GOVERNANCE

INFORMATION ASSET ARCHITECTURE

METADATA MANAGEMENT

SECURITY AND CONTROL

INFO

RM

AT

ION

CU

LT

UR

E

INFO

RM

AT

ION

ST

EW

AR

DS

HIP

BUSINESS INTELLIGENCE AND DATA WAREHOUSING

REFERENCE AND MASTER DATA MANAGEMENT

Authority, management, monitoring and performance of information management functions

A blueprint for the semantic and physical integration of enterprise information assets, technology and the business

The connecting foundation for EIM, used to describe, organise, integrate, share, and govern enterprise information assets

Develop: - Metadata Schema - Controlled Vocabulary - Thesauri - Business Function Classification Utilise system generated metadata

Map across metadata schemas Establish monitoring and maintenance processes Implement metadata management tools

Establish principles Define: - Policies - Standards - Business Rules

Develop a strategy and roadmap Establish structures and arrangements Define roles and processes arrangements

Assess current and desired maturity Determine metrics and measuring Establish monitoring processes

Document legislative framework Understand compliance Determine and optimise business benefits Manage information risk

Organise information for: - Navigation and retrieval - Discovery - Content types and

categorisation

Model key information flows Establish IS design principles and standards Develop an inventory of information, systems and processes

Develop a recordkeeping strategy and roadmap Enable compliant retention and disposal in systems Support access to legacy information Plan for any content migration

Develop an information lifecycle strategy and roadmap Enable integration and interoperability Plan and manage: - Repositories - Storage - Format

Policies, rules and tools that ensure the proper control, protection and privacy of information

Manage access control Manage classified information Ensure regulatory compliance Establish monitoring and metrics

Identify: - Authoritative information - High-value information - Critical information Plan for disaster recovery

Establish security policies and rules Model information security and scenarios Build security into system metadata

Store and transform Integrate and deliver Perform analytics and reporting Support decision making

Capture, store and re-use core business entities Consolidate and match data Manage and control data quality Distribute core data appropriately

The behaviours, values and norms of the enterprise within the context of information use

Oversight of the content, description, quality, and accuracy of enterprise information throughout its lifecycle

Manage and sustain change Provide information leadership Embed EIM in performance management Deliver training and ongoing support Develop toolkits and reference material

Define responsibility, roles and accountability Establish stewardship processes Establish monitoring and maintenance

Social

Documents

IT/OT Transactional Data

Search

Emails

Images

Audio

Text

Mobile

Movies

Page 59: Clare Somerville Trish O’Kane Data in Databases

Future state of dataAccurate, relevant, timely delivery of data and information

◦ Trustworthy information◦ Where it is needed◦ Formats most appropriate to business need and future

Information found quickly, whether it’s old or newClear guidelines for systems and processes

◦ Keep what’s needed for only as long as it’s needed◦ In the right format

Data has recognisable value and appropriate levels of management◦ Business need: we know what’s important, and when it’s

important◦ Risk: we’re clear about what to manage, and how◦ Regulatory framework: we meet legislative obligations

Page 60: Clare Somerville Trish O’Kane Data in Databases

Long term preservation of data requires understanding how data is created and managed

We have to work out: ◦What data the business needs to keep◦What records the business needs

to create and keepAnd….. how

◦What data must be unchanged◦What we mean by usable and retrievable

Our point

Page 61: Clare Somerville Trish O’Kane Data in Databases

Data in databases

“It’s not what you think”Clare SomervilleTrish O’Kane