32
Presented by: Piyush Malik IBM October 07, 2014 Governing the 4 V’s of Data Principles and Best Practices in the era of Big Data 1

Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

Presented by: Piyush Malik

IBM

October 07, 2014

Governing the 4 V’s of Data

Principles and Best Practices in the era of Big Data

1

Page 2: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

Overview

As data-intensive decision making is being increasingly adopted by businesses, governments,

and other agencies around the world, most organizations encountering very large amounts and

variety of data are still contemplating and assessing their readiness to embrace Big Data.

While these organizations devise various ways to deal with the challenges it brings, the

impact and importance of Big Data to information quality and governance programs should

not be underestimated. Data in doubt represents the uncertainty or veracity as a characteristic

to describe Big Data.

Through real life case studies of implementations across retail, finance and other industries,

this session explores the issues and challenges involved in the management of Big Data as it

is combined with traditional enterprise data, highlighting the principles and best practices for

effective Big Data governance. This session will:

Prepare the audience to deal with increasingly uncertain data and still make good decisions

Illustrate how Data Science and Data management professionals complement each other in

organizations

Draw upon implementation experiences of early adopters of Big Data technologies across multiple

industries

Showcase tips and best practices

2

Page 3: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

1. Data Data Everywhere..

2. Classifying Big Data with 4 Vs

3. Veracity….the trustworthiness dimension

4. Big Data opportunities and Governance Challenges

5. A framework and approach for Big Data Govenance

6. Call to Action

Overview

Page 4: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

1 in 2 business leaders do not have access to

data they need

83% of CIO’s cited Business

Intelligence (BI) and analytics as part of their

visionary plan

5.4X more likely that top

performers use business analytics

80% of the world’s data today is unstructured

90% of the world’s data was created in the

last two years

20% of available data can

be processed by traditional systems

Source: GigaOM, Software Group, IBM Institute for Business Value"

Data Data Everywhere…

Page 5: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

5

As dataset size increases, so do

anomalies

Page 6: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

Make decisions on untrusted information 1 in 3

60%

Don’t have necessary information 1 in 2

Time spent per big data project to find, prepare, understand & defend information due to lack of context

80%

Have more data than they can use 60%

So, How Are We Doing?

Page 7: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

7 7

Glo

bal

Da

ta V

olu

me i

n E

xa

byte

s

Multiple sources: IDC,Cisco

100

90

80

70

60

50

40

30

20

10

Aggre

gate

Uncert

ain

ty %

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

2005 2010 2015

Veracity of Data is key By 2015, 80% of all available data will be uncertain

Data quality solutions exist for

enterprise data like customer,

product, and address data, but

this is only a fraction of the

total enterprise data.

By 2015 the number of networked devices will

be double the entire global population. All

sensor data has uncertainty.

The total number of social media

accounts exceeds the entire global

population. This data is highly uncertain

in both its expression and content.

Page 8: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

8

Big Data Enriches the

Information Management Ecosystem

Who Ran What,

Where, and When?

Audit

MapReduce

Jobs and tasks

Managing a

Governance Initiative

OLTP

Optimization

(SAP, checkout,

+++)

Master Data Enrichment via Life

Events, Hobbies, Roles, +++

Establishing

Information

as a Service

Active Archive

Cost Optimization

Page 9: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

Emerging Big Data Analytics related technologies are converging to create opportunities

Page 10: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

10

Big Data Opportunities create Governance Challenges

Consumer behaviors rapidly evolving

• Social shopping becoming the norm

• Advent of social- enabled commerce

• Rise of the Public datasets : Data becoming

publicly available (Weather, Satellite, Maps,

Parking, Crime..) Apps, & Data democratization

• Privacy Concerns

Social Data needs to marry corporate data

• Personalization of marketing

• Product centricity Customer centricity

Differentiated service based on 360*++

• “Siloes” of data will not work

• Compliance mandates and Security of data at

scale and speed

Page 11: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

Technical Challenges in Governing Big Data

Extremely High Data Volumes (e.g. machine generated data) – Size and Frequency

No defined data formats

Unstructured data including free form text, images and log data

Unknown data patterns or data relationships

Multiple data types / formats

Loading into SQL / relational data stores too time consuming

Historical value is limited until pattern is discovered

Frequency can be up to real time with high potential for data “spikes”

Big Data Ecosystem, vendors and products are evolving rapidly

Page 12: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

Reaping the full benefits of Big Data

needs investments beyond technology

Data Scientists

Chief Data Officers

Streaming Analytics

BI Tools

Organizations need to Invest in People, Processes and Technology for optimum results

Page 13: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

Meeting Business Objectives in

a Big Data Environment

Context

Agility

A business framework (policies) for determining how and where to use big data.

Flexibility to establish and maintain context independent of the volume, variety and velocity of data.

Security

Protection of data privacy and

access; compliance with data

security and other regulatory

requirements

Page 14: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

Context Requires Governance; Agility Requires a

Unique Big Data Approach to Governance

TRADITIONAL APPROACH BIG DATA APPROACH

Govern data to the highest standard. Store it, then use it for multiple purposes

Understand data and usage. Govern to the appropriate level. Use it, and iterate

Repository Govern to Perfection

Use Data

Data Explore / Understand

Govern Appropriately

Use

How does an organization achieve agility in creating and

continually evolving a safe and secure context in big data

environments?

Page 15: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

Obtain Executive

Sponsorship

2

Align Teams

3

Understand Data Risk and Value

4

Define Business Problem

1

Measure Results

6

Implement Analytical / Operational Project(s)

5

ACT ASSESS PLAN

Defend Secure and Comply

Prepare Find

Our holistic approach to Big

Data Governance

Agile Coordination and Alignment of Business Objectives with Information Requirements

Page 16: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

Obtain Executive

Sponsorship

Align Teams

3

Understand Data Risk and Value

4

Define Business Problem

1

Measure Results

Implement Analytical / Operational Project(s)

5

Big Data Governance

– How It Works

6

2

Begin by defining the business problem to solve with big data

Obtain executive sponsor to finalize priorities and goals

Update governance roles to account for big data

Categorize data to understand risk exposure

Implement planned projects with governed data search,

preparation, defense and security

Assess governance results and adjust

Page 17: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

Find Establish context to find, visualize, and understand data for improved decision making

Defend Build confidence in information by making it defensible against challenges

Prepare Understand context to extract, cleanse, integrate and monitor data properly, to increase integrity and trustworthiness for subsequent usage

Secure and Comply Protection of data privacy and access; compliance with data security and other regulatory requirements

Key Data Scenarios for

Big Data Governance

Page 18: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

Find

Establish context to find, visualize, and understand data for improved decision making

Capabilities to Consider

The Cost is High

Connectivity to sources

Of data scientists’ time on big data projects is spent finding and preparing data

80%

Real-time

queries (SQL,

etc)

Enterprise search

Automated data discovery

Data profiling

Page 19: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

Prepare

Understand context to extract, cleanse, integrate and monitor data properly to increase integrity and trustworthiness for subsequent usage

The Risk is Real

Capabilities to Consider

Highly scalable data integration

Define terms and policies

Data cleansing

Quality dashboarding

Rich annotation

Page 20: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

Defend

Build confidence in information by making it defensible against challenges

The Need is Present Make decisions on untrusted information 1 in 3

Capabilities to Consider

Maintain data lineage

Data quality dashboarding

Master data management

Page 21: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

Secure and Comply

Protection of data privacy and access; compliance with data security and other regulatory requirements

The Threat is Severe

$200 million just to replace

cards!

Capabilities to Consider

Secure data at rest and in

motion

Data masking

Governed data retention

Test data management

Governance reporting

Page 22: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

© 2014 IBM Corporation 22

Page 23: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

Four typical organizational

models for Governing Big Data

Business unit led Business unit led with central

support Center of Excellence Fully centralized

• BUs make Big Data decisions with limited coordination

• BUs make their own decisions

• Collaboration on selected initiatives

• Corporate centre takes direct responsibility for identifying and prioritizing initiatives

• Independent centre • Units pursue initiatives

under CoEs guidance

Source: http://www.bain.com/publications/articles/big_data_the_organizational_challenge.aspx

Page 24: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

Organizations rated their

decision making as 7 or

higher on a scale of 1 to 10.

4 out of 5 Organizations are improving

at 3 times the rate of

competitors.

3X

Of organizations show high

or very high levels of trust

77%

Source: The Big Data Imperative: Why Information Governance Must Be Addressed Now, Aberdeen Group, Dec 2012

IBM Big Data Governance

Offers a Proven approach

Page 25: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

All Hadoop Vendors Talk About

Their Big “Data Lake”.

Clean Hadoop Lake

Hadoop Data Swamp

IBM Big Data Governance- including quality, security, and data lineage -

transforms your Hadoop Data Swamp to a consumable Big Data Lake.

Page 26: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

26

Big Data Governance Strategy

BI /

Reporting

BI /

Reporting

Exploration /

Visualization

Functional

App

Industry

App

Predictive

Analytics

Content

Analytics

Analytic Applications

Big Data Platform

Systems

Management

Application

Development

Visualization

& Discovery

Accelerators

Information Integration & Governance

Hadoop

System

Stream

Computing

Data

Warehouse

1. Move the Analytics Closer to the Data

2. Have a platform approach to

integrating Big Data in the IT

environment

3. Not all Big Data workloads suitable for

production SLAs.

4. Keep exploratory environments on

commodity Infrastructure on

Hadoop/NoSQL stores for

complementary analytics

5. Define and extend a Data

Governance program to handle

structured as well as multi-structured

data

6. Leverage IBM IGC Maturity Model to

Conduct Big Data Governance

Maturity Assessment

Page 27: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

27

Big Data Governance Framework

Start with the Goals and Business

outcomes in mind, seek active

stakeholder engagement and ask

questions along all dimensions of the

Information Governance Maturity Model:

Do we fully recognize the responsibilities

associated with handling big data?

How does big data change the traditional

concept of information as a corporate asset?

What are the emerging requirements around

privacy?

Are the data stewards savvy or trained to

handle profiling and anomalous pattern

detection with Big Data ?

How do all these big data technologies relate

to our Architecture and current IT

infrastructure?

http://ibmdatamag.com/2012/04/big-data-governance-a-framework-to-assess-maturity/

Page 28: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

28 28

AIRBUS Reduced Call Resolution

Time from 50 min. to 15 min =

$36M in Savings

Need

• Streamline silos of locked data into a common

integrated knowledge domain

• Triple digit data sources, over 100,000 users

(internal and external), spans divisions,

countries, +++

• ARIBUS World (portal), Rise (best practices),

People (HR, tribal knowledge), Supply

(supplier portal)

Benefits

• Slashing ‘wait on gate’ times for problem resolution

• Single point of access to ALL the data with

seamless security #2 hit app in all of Airbus

• Greater visibility into the supply chain for partners,

suppliers, and employees

• Airbus support teams can ‘parent’ more planes with

the same amount of resources

Page 29: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

29

Leverage IBM’s vast IP& thought leadership assets Understanding how to create value from data has been the focus of IBM’s analytics and governance studies for many years

Analytics:

The new path to

value

Operationalizing analytics in sophisticated organizations

Analytics:

The widening

divide

Mastering analytic competencies

Analytics:

The real world use

of big data

Fundamentals of big data

Analytics:

A blueprint for

value

Extracting value from data and analytics

2010 2011 2012 2013

The intelligent enterprise and

Breaking away with BAO

2009

Defining analytics as a strategic asset

http://www-935.ibm.com/services/us/gbs/thoughtleadership/bao.html

Page 30: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

30

Summary

Big Data brings new opportunities as well as challenges Successfully integrating and governing 4 v’s of big data in the organization needs proven method and framework expertise You are not alone but Agility is important Help is available

Page 31: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

THINK

Page 32: Governing the 4 V’s of Data...Differentiated service based on 360*++ • “Siloes” of data will not work • Compliance mandates and Security of data at scale and speed Technical

Piyush Malik

[email protected]

Twitter @pmalik1

Please complete the evaluation form

.

© 2014 IDQS. All rights reserved.

32