Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Presented by: Piyush Malik
IBM
October 07, 2014
Governing the 4 V’s of Data
Principles and Best Practices in the era of Big Data
1
Overview
As data-intensive decision making is being increasingly adopted by businesses, governments,
and other agencies around the world, most organizations encountering very large amounts and
variety of data are still contemplating and assessing their readiness to embrace Big Data.
While these organizations devise various ways to deal with the challenges it brings, the
impact and importance of Big Data to information quality and governance programs should
not be underestimated. Data in doubt represents the uncertainty or veracity as a characteristic
to describe Big Data.
Through real life case studies of implementations across retail, finance and other industries,
this session explores the issues and challenges involved in the management of Big Data as it
is combined with traditional enterprise data, highlighting the principles and best practices for
effective Big Data governance. This session will:
Prepare the audience to deal with increasingly uncertain data and still make good decisions
Illustrate how Data Science and Data management professionals complement each other in
organizations
Draw upon implementation experiences of early adopters of Big Data technologies across multiple
industries
Showcase tips and best practices
2
1. Data Data Everywhere..
2. Classifying Big Data with 4 Vs
3. Veracity….the trustworthiness dimension
4. Big Data opportunities and Governance Challenges
5. A framework and approach for Big Data Govenance
6. Call to Action
Overview
1 in 2 business leaders do not have access to
data they need
83% of CIO’s cited Business
Intelligence (BI) and analytics as part of their
visionary plan
5.4X more likely that top
performers use business analytics
80% of the world’s data today is unstructured
90% of the world’s data was created in the
last two years
20% of available data can
be processed by traditional systems
Source: GigaOM, Software Group, IBM Institute for Business Value"
Data Data Everywhere…
5
As dataset size increases, so do
anomalies
Make decisions on untrusted information 1 in 3
60%
Don’t have necessary information 1 in 2
Time spent per big data project to find, prepare, understand & defend information due to lack of context
80%
Have more data than they can use 60%
So, How Are We Doing?
7 7
Glo
bal
Da
ta V
olu
me i
n E
xa
byte
s
Multiple sources: IDC,Cisco
100
90
80
70
60
50
40
30
20
10
Aggre
gate
Uncert
ain
ty %
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
2005 2010 2015
Veracity of Data is key By 2015, 80% of all available data will be uncertain
Data quality solutions exist for
enterprise data like customer,
product, and address data, but
this is only a fraction of the
total enterprise data.
By 2015 the number of networked devices will
be double the entire global population. All
sensor data has uncertainty.
The total number of social media
accounts exceeds the entire global
population. This data is highly uncertain
in both its expression and content.
8
Big Data Enriches the
Information Management Ecosystem
Who Ran What,
Where, and When?
Audit
MapReduce
Jobs and tasks
Managing a
Governance Initiative
OLTP
Optimization
(SAP, checkout,
+++)
Master Data Enrichment via Life
Events, Hobbies, Roles, +++
Establishing
Information
as a Service
Active Archive
Cost Optimization
Emerging Big Data Analytics related technologies are converging to create opportunities
10
Big Data Opportunities create Governance Challenges
Consumer behaviors rapidly evolving
• Social shopping becoming the norm
• Advent of social- enabled commerce
• Rise of the Public datasets : Data becoming
publicly available (Weather, Satellite, Maps,
Parking, Crime..) Apps, & Data democratization
• Privacy Concerns
Social Data needs to marry corporate data
• Personalization of marketing
• Product centricity Customer centricity
Differentiated service based on 360*++
• “Siloes” of data will not work
• Compliance mandates and Security of data at
scale and speed
Technical Challenges in Governing Big Data
Extremely High Data Volumes (e.g. machine generated data) – Size and Frequency
No defined data formats
Unstructured data including free form text, images and log data
Unknown data patterns or data relationships
Multiple data types / formats
Loading into SQL / relational data stores too time consuming
Historical value is limited until pattern is discovered
Frequency can be up to real time with high potential for data “spikes”
Big Data Ecosystem, vendors and products are evolving rapidly
Reaping the full benefits of Big Data
needs investments beyond technology
Data Scientists
Chief Data Officers
Streaming Analytics
BI Tools
Organizations need to Invest in People, Processes and Technology for optimum results
Meeting Business Objectives in
a Big Data Environment
Context
Agility
A business framework (policies) for determining how and where to use big data.
Flexibility to establish and maintain context independent of the volume, variety and velocity of data.
Security
Protection of data privacy and
access; compliance with data
security and other regulatory
requirements
Context Requires Governance; Agility Requires a
Unique Big Data Approach to Governance
TRADITIONAL APPROACH BIG DATA APPROACH
Govern data to the highest standard. Store it, then use it for multiple purposes
Understand data and usage. Govern to the appropriate level. Use it, and iterate
Repository Govern to Perfection
Use Data
Data Explore / Understand
Govern Appropriately
Use
How does an organization achieve agility in creating and
continually evolving a safe and secure context in big data
environments?
Obtain Executive
Sponsorship
2
Align Teams
3
Understand Data Risk and Value
4
Define Business Problem
1
Measure Results
6
Implement Analytical / Operational Project(s)
5
ACT ASSESS PLAN
Defend Secure and Comply
Prepare Find
Our holistic approach to Big
Data Governance
Agile Coordination and Alignment of Business Objectives with Information Requirements
Obtain Executive
Sponsorship
Align Teams
3
Understand Data Risk and Value
4
Define Business Problem
1
Measure Results
Implement Analytical / Operational Project(s)
5
Big Data Governance
– How It Works
6
2
Begin by defining the business problem to solve with big data
Obtain executive sponsor to finalize priorities and goals
Update governance roles to account for big data
Categorize data to understand risk exposure
Implement planned projects with governed data search,
preparation, defense and security
Assess governance results and adjust
Find Establish context to find, visualize, and understand data for improved decision making
Defend Build confidence in information by making it defensible against challenges
Prepare Understand context to extract, cleanse, integrate and monitor data properly, to increase integrity and trustworthiness for subsequent usage
Secure and Comply Protection of data privacy and access; compliance with data security and other regulatory requirements
Key Data Scenarios for
Big Data Governance
Find
Establish context to find, visualize, and understand data for improved decision making
Capabilities to Consider
The Cost is High
Connectivity to sources
Of data scientists’ time on big data projects is spent finding and preparing data
80%
Real-time
queries (SQL,
etc)
Enterprise search
Automated data discovery
Data profiling
Prepare
Understand context to extract, cleanse, integrate and monitor data properly to increase integrity and trustworthiness for subsequent usage
The Risk is Real
Capabilities to Consider
Highly scalable data integration
Define terms and policies
Data cleansing
Quality dashboarding
Rich annotation
Defend
Build confidence in information by making it defensible against challenges
The Need is Present Make decisions on untrusted information 1 in 3
Capabilities to Consider
Maintain data lineage
Data quality dashboarding
Master data management
Secure and Comply
Protection of data privacy and access; compliance with data security and other regulatory requirements
The Threat is Severe
$200 million just to replace
cards!
Capabilities to Consider
Secure data at rest and in
motion
Data masking
Governed data retention
Test data management
Governance reporting
© 2014 IBM Corporation 22
Four typical organizational
models for Governing Big Data
Business unit led Business unit led with central
support Center of Excellence Fully centralized
• BUs make Big Data decisions with limited coordination
• BUs make their own decisions
• Collaboration on selected initiatives
• Corporate centre takes direct responsibility for identifying and prioritizing initiatives
• Independent centre • Units pursue initiatives
under CoEs guidance
Source: http://www.bain.com/publications/articles/big_data_the_organizational_challenge.aspx
Organizations rated their
decision making as 7 or
higher on a scale of 1 to 10.
4 out of 5 Organizations are improving
at 3 times the rate of
competitors.
3X
Of organizations show high
or very high levels of trust
77%
Source: The Big Data Imperative: Why Information Governance Must Be Addressed Now, Aberdeen Group, Dec 2012
IBM Big Data Governance
Offers a Proven approach
All Hadoop Vendors Talk About
Their Big “Data Lake”.
Clean Hadoop Lake
Hadoop Data Swamp
IBM Big Data Governance- including quality, security, and data lineage -
transforms your Hadoop Data Swamp to a consumable Big Data Lake.
26
Big Data Governance Strategy
BI /
Reporting
BI /
Reporting
Exploration /
Visualization
Functional
App
Industry
App
Predictive
Analytics
Content
Analytics
Analytic Applications
Big Data Platform
Systems
Management
Application
Development
Visualization
& Discovery
Accelerators
Information Integration & Governance
Hadoop
System
Stream
Computing
Data
Warehouse
1. Move the Analytics Closer to the Data
2. Have a platform approach to
integrating Big Data in the IT
environment
3. Not all Big Data workloads suitable for
production SLAs.
4. Keep exploratory environments on
commodity Infrastructure on
Hadoop/NoSQL stores for
complementary analytics
5. Define and extend a Data
Governance program to handle
structured as well as multi-structured
data
6. Leverage IBM IGC Maturity Model to
Conduct Big Data Governance
Maturity Assessment
27
Big Data Governance Framework
Start with the Goals and Business
outcomes in mind, seek active
stakeholder engagement and ask
questions along all dimensions of the
Information Governance Maturity Model:
Do we fully recognize the responsibilities
associated with handling big data?
How does big data change the traditional
concept of information as a corporate asset?
What are the emerging requirements around
privacy?
Are the data stewards savvy or trained to
handle profiling and anomalous pattern
detection with Big Data ?
How do all these big data technologies relate
to our Architecture and current IT
infrastructure?
http://ibmdatamag.com/2012/04/big-data-governance-a-framework-to-assess-maturity/
28 28
AIRBUS Reduced Call Resolution
Time from 50 min. to 15 min =
$36M in Savings
Need
• Streamline silos of locked data into a common
integrated knowledge domain
• Triple digit data sources, over 100,000 users
(internal and external), spans divisions,
countries, +++
• ARIBUS World (portal), Rise (best practices),
People (HR, tribal knowledge), Supply
(supplier portal)
Benefits
• Slashing ‘wait on gate’ times for problem resolution
• Single point of access to ALL the data with
seamless security #2 hit app in all of Airbus
• Greater visibility into the supply chain for partners,
suppliers, and employees
• Airbus support teams can ‘parent’ more planes with
the same amount of resources
29
Leverage IBM’s vast IP& thought leadership assets Understanding how to create value from data has been the focus of IBM’s analytics and governance studies for many years
Analytics:
The new path to
value
Operationalizing analytics in sophisticated organizations
Analytics:
The widening
divide
Mastering analytic competencies
Analytics:
The real world use
of big data
Fundamentals of big data
Analytics:
A blueprint for
value
Extracting value from data and analytics
2010 2011 2012 2013
The intelligent enterprise and
Breaking away with BAO
2009
Defining analytics as a strategic asset
http://www-935.ibm.com/services/us/gbs/thoughtleadership/bao.html
30
Summary
Big Data brings new opportunities as well as challenges Successfully integrating and governing 4 v’s of big data in the organization needs proven method and framework expertise You are not alone but Agility is important Help is available
THINK
Piyush Malik
Twitter @pmalik1
Please complete the evaluation form
.
© 2014 IDQS. All rights reserved.
32