Upload
trillium-software
View
1.270
Download
0
Tags:
Embed Size (px)
Citation preview
Be Certain, Be Trillium Certain
The Changing Data Quality & Data Governance Landscape a survival guide for data governance & data quality professionals
Trillium Software webinar – Wednesday 12 DecemberNigel Turner, VP Information Management Strategy
The traditional DQ & Data Governance Landscape?
2 © Copyright 2012, Trillium Software, Inc. All rights reserved.
The future DQ & Data Governance Landscape?
© Copyright 2012, Trillium Software, Inc. All rights reserved.3
The changing landscape:
potential disruptive eruptions
BIG DATA
CLOUDCOMPUTING
DATAVIRTUALIZATION
4 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Disruptive eruption 1 –
Big Data
5 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Big Data – what is it?
� Set of new concepts, practices & technologies to manage & exploit digital data
� Can be defined as:
� “Data that exceeds the processing capability of conventional
database systems. The data is too big, moves too fast, or
doesn’t fit the strictures of your database architecture”
(Source: Ed Dumbill – O’Reilly Community)
� Its key premise is that all data has potential value if it can be collected, analysed and used to generate actionable insight
6 © Copyright 2012, Trillium Software, Inc. All rights reserved.
The characteristics of Big Data - the 3Vs
• Reflects exponential growth of data – predicted 40-60% per
annum
• Today 2.5 quintillion bytes of data are created every day
• 90% of all digital data was created in the last two years
• Data generated more varied and complex than before:
– Text, Audio, Images, Machine Generated etc.
• Much of this data is semi-structured or unstructured• Traditional IT techniques ill equipped to process & analyse it
• Data often generated in real time
• Analysis and response needs to be rapid, often also real time
• Traditional BI / DW environments becoming obsolescent –
new approaches are needed
7 © Copyright 2012, Trillium Software, Inc. All rights reserved.
What’s different about Big Data?
� New technologies which enable distributed & highly scalable MPP (Massively Parallel Processing), e.g.� Apache Hadoop
� MapReduce
� NoSQL databases
� Strong emphasis on analytical approaches� Emergence of “data science”
� Predictive Analytics
� Data Mining
� The “democratisation” of data � Data made available to all (cf Cloud Computing)
� Business and not IT led BI
8 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Where does Big Data come from?
SOCIAL MEDIA &SOCIAL
NETWORKS
MACHINE GENERATED
WIDELY KNOWN SOURCES
9 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Big Data – Foundations of Success
� Identifying the right data to solve the business problem or opportunity
� The ability to integrate & match varied data from multiple data sources
� structured, semi-structured, unstructured
� Building the right IT infrastructure to support Big Data applications
� Having the right capabilities & skills to exploit the data
10 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Big Data – Barriers & Pitfalls
� The sheer volume of data – what’s worth using?
� Data extraction challenges
� The ability to match data from disparate sources / formats / media
� The time taken to integrate new data sources
� The risks of mismatching and incorrect identification of individuals � Legal & regulatory pitfalls
� Security concerns – corporate & individual
� Lack of skills & expertise
11 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Big Data – the data integration challenge
SOCIAL
MEDIA
SENSORS
CS
DATA
MOBILES
EX
TE
RN
AL
DA
TA
SO
UR
CE
S
INT
ER
NA
L D
AT
A S
OU
RC
ESCRM
BILLING
OPS
SALES
PRODS
ANALYTICS PLATFORM 1
ANALYTICS PLATFORM 2
ANALYTICS PLATFORM 3
ANALYTICS PLATFORM n
ACTIONABLE INSIGHT
& KNOWLEDGE 12 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Big Data – DQ as the key enabler
SOCIAL
MEDIA
SENSOR
S
CS
DATA
EX
TE
RN
AL
DA
TA
SO
UR
CE
S
INT
ER
NA
L D
AT
A S
OU
RC
ES
CRM
BILLING
OPS
SALES
PRODS
ANALYTICS PLATFORM 1
ANALYTICS PLATFORM 2
ANALYTICS PLATFORM 3
ANALYTICS PLATFORM n
ACTIONABLE INSIGHT
& KNOWLEDGE
PROFILE
PARSE
STANDARDISE
MATCH
ENRICH
DATA QUALITY PLATFORM
PROFILE
PARSE
STANDARDISE
MATCH
ENRICH
MOBILES
13 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Big Data – the DG & DQ impact
• Big Data will depend on data quality to reap its claimed benefits – the GIGO truism
• The democratization of data will expose poor DQ
• The need for Data Governance increases as data becomes more accessible
• Data skills will become more
valued for ‘data science’
• Big Data will increase the 3Vs of data
• Control of data becomes more difficult – scope and
variety of use increases • Data standards & business
rules become more complex• Potential legal & regulatory
minefield
14 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Disruptive eruption 2 –
Cloud Computing
15 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Cloud Computing – Alternative Definitions
� “Cloud computing is the delivery of computing as a service rather than a product, whereby shared resources, software, and information are provided to computers and other devices as a metered service over a network (typically the Internet).” (Wikipedia)
� “Marketing term for the technologies that provide computation, software, data access, and storage services that do not require end-user knowledge of the physical location or configuration of the system that delivers the services.” (Trillium Software)
16 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Cloud Computing – the Wikipedia view
17 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Cloud Computing – Key Elements
� Provision of services via the Internet / network
� Virtual not physical allocation of resources
� Multi-tenanted hosting
� Pay as you use - not outright purchase (cf utilities)
� Cloud is a disruptive technology as it provides a clear
alternative model to outright purchase of hardware,
platforms & applications
18
18 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Types of clouds & services
� Public/private/hybrid options� Public – via the internet
� Private – via an intranet
� Hybrid – combination
� Cloud services � Infrastructure as a service (IaaS)
� Platform as a service (PaaS)
� Software as a service (SaaS)
� et al (XaaS)
19 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Cloud Computing: potential benefits (1)
� Speed to deploy new applications & services
� Greater standardisation
� Scalability & elasticity
� Lower initial implementation costs – CAPEX to OPEX
� Better cost control and lower internal IT costs (e.g.
help desks)
20 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Cloud Computing: potential benefits (2)
� Benefits to SMEs who cannot afford to purchase
� Try before you buy options – benefits both
customers & suppliers
� Self-service and self-configuration of services
� Better and faster user adoption
� Potentially improved performance
� Automatic data back ups
21 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Cloud Computing –barriers & risks
DATA DATA
SECURITYSECURITY
& PRIVACY& PRIVACY
CONCERNS CONCERNS
COMMERCIAL COMMERCIAL
& OPERATIONAL& OPERATIONAL
FACTORSFACTORS
APPLICATIONAPPLICATION
& DATA& DATA
INTEGRATIONINTEGRATION
CHALLENGESCHALLENGES
LEGAL &LEGAL &
REGULATORYREGULATORY
RESTRICTIONSRESTRICTIONS
22 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Preparing data for migration
• Scoping and scaling data to be migrated
• Evaluating its suitability for integration with other data sources
• Undertaking source data rationalization & cleanse
Migrating to the cloud environment
• Profiling data in advance of data migration
• Enhancing data in preparation for migration
• Maintaining DQ during ETL processes
Managing data in the cloud
• Enforcing business rules to be applied in the Cloud environment
• Auditing data to ensure security, adherence and quality
• Supporting data governance activities
Cloud – the role of DQ & DG
23 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Cloud Computing – the DG / DQ impact
• DQ / DG will be key to Cloud migration success –before, during and after migration
• Internal and external data integration will become key
• Could improve DQ as fewer devices will hold data
• DQ host and application companies may offer DQaaS
• Cloud will require an enhanced focus on data governance – within and outside the enterprise
• Organisations may lose physical control of data
• DQ SLAs will be needed with data hosts / suppliers
• Legal & regulatory compliance becomes a major challenge
24 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Disruptive eruption 3 –
Data Virtualization
25 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Data virtualization – a simple view
26 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Data Virtualization – a less simple view
27 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Data virtualization – the essentials
� Data is held in a variety of internal and external sources (e.g.
DBMS, DW, Excel etc.)
� A middleware layer sits above the data sources
� Creates a virtual view at run time and creates temporary
tables in a dedicated server
� Processes, assembles and presents the data to the application
layer / device
� Benefits claimed:
� Hides complexity from users
� Flexibility
� Speed - as data can be cached in memory
28 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Data virtualization – the DG / DQ impact
• Will put the focus on DQ & data
standardisation as a key
enabler to DV interoperability
• To work will require the deployment of both real time
and batch DQ capability
• Will require a Shared Business
Vocabulary (SBV) for shared data model and data standards
across an organisation
• Need for better DQ in source
systems to enable run time
integration
• Data is physically held in a
wide variety of sources so
makes coherent Data
Governance more difficult• Data at source will be used for
multiple applications so
common business rules harder
to agree• Run time integration requires
real time DQ – many
organisations do not have this
capability
29 © Copyright 2012, Trillium Software, Inc. All rights reserved.
The potential eruptions…
DATAVIRTUALIZATION
BIG DATA
CLOUDCOMPUTING
30 © Copyright 2012, Trillium Software, Inc. All rights reserved.
So what’s the impact of all this on DQ / DG practitioners?
New Data Quality & Data Governance challenges
What do we need to do?
Changing DQ and DG roles
& skills
31 © Copyright 2012, Trillium Software, Inc. All rights reserved.
New DQ & Data Governance challenges
PREDOMINANTLY BATCH DQ
CUSTOMERORGANISATION
FOCUS
PROCEDURAL
FOCUS MAINLY WITHIN
THE ENTERPRISE
THE TRADITIONAL LANDSCAPE
SUPPLIER ORGANISATION
FOCUS
PREDOMINANTLYREAL TIME DQ
GROWING FOCUS OUTSIDE
THE ENTERPRISE
COMMERCIAL
THE CHANGING LANDSCAPE
32 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Changing DQ and DG roles
� DQ and Data Governance roles will become more ‘beyond
organisation’ facing – into hosting companies, data &
application suppliers etc.
� Many data management and DQ specialists will work with or
evolve into data scientists
� DQ and DG people will need to enhance their understanding
of global legal and regulatory environments
� Commercial and negotiation skills will become more
important
33 © Copyright 2012, Trillium Software, Inc. All rights reserved.
What action should we take?
� Identify and get involved in any current or planned Big Data,
Cloud or Data Virtualization initiatives within our
organisations
� Ensure that the DQ and DG implications & imperatives of
these initiatives are understood
� Participate in any due diligence of potential third party
vendors & providers
� Plan for the new DQ and DG challenges that these trends will
pose
34 © Copyright 2012, Trillium Software, Inc. All rights reserved.
The changing landscape
� Better DQ needs to be achieved in an environment where data will
continue to increase by 50% per annum
� The claimed benefits of Big Data, Cloud & Data Virtualisation cannot be
achieved without renewed emphasis on data quality management & data
governance
� Data governance becomes increasingly challenging & extends within and
outside the enterprise
� DQ services will increasingly be offered as DQaaS by vendors and data
hosts, and more DQ / DG roles may be outsourced
� As DQ practitioners we need to understand, educate and get involved
with those in our organisations who are creating the new landscape
35 © Copyright 2012, Trillium Software, Inc. All rights reserved.
A final thought…
“It’s not the will to win
but the will to prepare to
win that makes the
difference”
Bear Bryant –US Football Coach
1913 – 1983
36 © Copyright 2012, Trillium Software, Inc. All rights reserved.
Questions
Contact: [email protected]
www.trilliumsoftware.com
37 © Copyright 2012, Trillium Software, Inc. All rights reserved.