20
© 2014 IBM Corporation IBM Big Data Governance

Why do you need to you govern big data?

Embed Size (px)

Citation preview

© 2014 IBM Corporation

IBM Big Data Governance

© 2014 IBM Corporation2

What you’ll learn…

The opportunity Big data governance:

− Requirements

− How it works

− Capabilities A holistic approach Next steps

© 2014 IBM Corporation3

Veracity: Can I trust what I am seeing?

What Is Big Data?

Immense volume, variety and velocity of data, in context, beyond what was previously possible

Opportunity to derive new insights – challenged by questionable veracity

Volume

Prevent customer churn

call detail records per day500million

Velocity

trade events per second

Identify potential fraud

5 million

is images, video, documents

Improve customer satisfaction

80%

Variety

from surveillance cameras

Monitor events of interest

100’s of video feeds

of data growth

meter readings per annum350 billion

Analyze product sentiment

of Tweets create daily

12 terabytes

Predict power consumption

© 2014 IBM Corporation4

Utilities• Weather analysis• Smart grid management

Retail• 360° View of the customer• Real-time promotions

Law Enforcement• Multimodal surveillance• Cyber security detection

Transportation•Logistics optimization •Traffic congestion

Financial Services•Fraud detection•360° View of the customer

Information Technology• System Log Analysis• Cybersecurity

Health & Life Sciences•Epidemic early warning •ICU monitoring

Telecommunications•Geomapping/marketing•Network monitoring

What Can You Do With Big Data?

© 2014 IBM Corporation5

c

cc

c

cMake decisions on untrusted information1 in 3

60%

Don’t have necessary information1 in 2

Time spent per big data project to find, prepare, understand & defend information due to lack of context

80%

Have more data than they can use60%

So, How Are We Doing?

© 2014 IBM Corporation6

American’s in a recent survey don’t want personalized on-line advertising

When you tell them the information you collect and store in order to do it

66%

Increasing to

86%

© 2014 IBM Corporation7

Context, Agility and Security are Essential Requirements to Meet Business Objectives in a Big Data Environment

AgilityA business framework (policies) for determining how and where to use big data.

Context Flexibility to establish and maintain context

independent of the volume, variety and

velocity of data.

SecurityProtection of data privacy and access; compliance with data

security and other regulatory requirements

Essential Requirements

© 2014 IBM Corporation8

Context Requires Governance; Agility Requires a Unique Big Data Approach to Governance

Traditional approach Big data approach

Govern data to the highest standard. Store it, then use it for multiple purposes

Understand data and usage. Govern to the appropriate level. Use it, and iterate

RepositoryGovern to

Perfection

UseData

Data

Explore/ Understand

Govern Appropriately

Use

How does an organization achieve agility in creating and continually evolving a safe and secure context in big data environments?

© 2014 IBM Corporation9

ACTACT

Implement planned projects with governed data search, preparation, defense and security

Implement planned projects with governed data search, preparation, defense and security

Begin by defining the business problem to solve with big data

Begin by defining the business problem to solve with big data

Obtain Executive

Sponsorship

2

AlignTeams

3

Understand Data Risk and

Value

4

Define Business Problem

1

MeasureResults

6ImplementAnalytical / Operational Project(s)

5

ACT

ASSESSPLAN

Defend Secure and Comply

PrepareFind

Big Data Governance is a Holistic Approach

Obtain executive sponsor to finalize priorities and goals

Obtain executive sponsor to finalize priorities and goals

Update governance roles to account for big data

Update governance roles to account for big data

Categorize data to understand risk exposure

Categorize data to understand risk exposure

Assess governance results and adjust

Assess governance results and adjust

© 2014 IBM Corporation10

Key Data Scenarios for Big Data Governance

Find Prepare Defend Secure and Comply

Establish context to find, visualize, and understand data for improved decision making

Understand context to extract, cleanse, integrate and monitor data properly, to increase integrity and trustworthiness for subsequent usage

Build confidence in information by making it defensible against challenges

Protection of data privacy and access; compliance with data security and other regulatory requirements

Analytical use Operational use

© 2014 IBM Corporation11

FindEstablish context to find, visualize, and understand data for improved decision making

Capabilities to Consider

The Cost is High

of data scientists’ time on big data projects is spent finding and preparing data

80%

Connectivity to sources

Real-time queries

(SQL, etc)

Enterprise search

Automated data

discoveryData profiling

Key Data Scenarios for Big Data Governance

© 2014 IBM Corporation12

Key Data Scenarios for Big Data Governance

PrepareUnderstand context to extract, cleanse, integrate and monitor data properly to increase integrity and trustworthiness for subsequent usage

Capabilities to Consider

The Risk is Real

Highly scalable data

integration

Define terms and policies

Data cleansing

Quality dashboardin

g

Rich annotation

© 2014 IBM Corporation13

Capabilities to Consider

Maintain data lineage

Data quality dashboardin

g

Master data management

Make decisions on untrusted information

DefendBuild confidence in information by making it defensible against challenges

The Risk is Real

1 in 3

Key Data Scenarios for Big Data Governance

© 2014 IBM Corporation14

Capabilities to Consider

Secure data at rest and in

motion

Data masking

Governed data

retention

Test data management

Governance reporting

$200 million just to replace cards!Secure and

ComplyProtection of data privacy and access; compliance with data security and other regulatory requirements

The Risk is Severe

Key Data Scenarios for Big Data Governance

© 2014 IBM Corporation15

Organizations rated their decision making as

7 or higher on a scale of 1 to 10

4 out of 5 Organizations are

improving at 3 times the rate of competitors

3XOf organizations show

high or very high levels of trust

77%

Source: The Big Data Imperative: Why Information Governance Must Be Addressed Now, Aberdeen Group, Dec 2012

IBM Big Data Governance Offers a Golden Opportunity

© 2014 IBM Corporation16

All Hadoop Vendors Talk About Their Big “Data Lake”.ONLY IBM Delivers Consumable Big Data From The Swamp.

Clean Hadoop LakeHadoop Data Swamp

IBM Big Data Governance–including quality, security, and data lineage– transforms your Hadoop Data Swamp to a consumable Big Data Lake.

© 2014 IBM Corporation17

A Complete Big Data Solution Is More Than Just An Engine

© 2013 IBM Corporation

IBM Teradata Pivotal INFA Cloudera Horton

Hadoop Distribution Horton

Hadoop Available via Appliance ORCL & HP Teradata

Hadoop SQL Engine Postgre

Streaming Data Flume/ Storm

Flume/ Storm

Data Exploration Tools

Enterprise Reporting

Data Provisioning Tools IBM, INFA Scripting Talend

Security Monitoring Protegrity

ELT, ETL & Replication IBM, INFA Talend

Metadata & Lineage Revelytix

Profile & Cleanse (native) IBM, INFA Talend

Hadoop Matching (native) IBM, INFA

Reference Data Mgmt.

Data Masking on Hadoop IBM, INFA

Archiving on Hadoop

© 2014 IBM Corporation18

Reduces reporting timefrom 2 to 3 days to minutes

“The IBM analytics solution greatly improves our ability to define and monitor business KPIs, and it brings much greater transparency to reporting. We now have a single version of the truth and a single comprehensive report for each topic.”

— Irfan Zafar, Chief Technology Innovation Officer and Senior General Manager of Customer Services, Sui Southern Gas Company Limited

Enables timely analyticscombining real-time operational and geographic data from over 5000 sources

Single source to informationthat is reliable and provides better clarity into the supply chain

Chemicals & Petroleum, Energy & Utilities

The transformation: Deployed an analytics solution that overlays digital maps with real-time operational and financial data, enabling SSGC to analyze data in a real-world context.

IBM Software–Information ManagementSui Southern Gas CompanyMitigates Business Risk Through Insights Into Supply and Demand

Learn more: https://ibm.biz/bigdatagovernance

© 2014 IBM Corporation20

Legal Disclaimer

• © IBM Corporation 2014. All Rights Reserved.• The information contained in this publication is provided for informational purposes only. While efforts were made to verify the completeness and accuracy of the information contained

in this publication, it is provided AS IS without warranty of any kind, express or implied. In addition, this information is based on IBM’s current product plans and strategy, which are subject to change by IBM without notice. IBM shall not be responsible for any damages arising out of the use of, or otherwise related to, this publication or any other materials. Nothing contained in this publication is intended to, nor shall have the effect of, creating any warranties or representations from IBM or its suppliers or licensors, or altering the terms and conditions of the applicable license agreement governing the use of IBM software.

• References in this presentation to IBM products, programs, or services do not imply that they will be available in all countries in which IBM operates. Product release dates and/or capabilities referenced in this presentation may change at any time at IBM’s sole discretion based on market opportunities or other factors, and are not intended to be a commitment to future product or feature availability in any way. Nothing contained in these materials is intended to, nor shall have the effect of, stating or implying that any activities undertaken by you will result in any specific sales, revenue growth or other results.

• If the text contains performance statistics or references to benchmarks, insert the following language; otherwise delete:Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon many factors, including considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve results similar to those stated here.

• If the text includes any customer examples, please confirm we have prior written approval from such customer and insert the following language; otherwise delete:All customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.

• Please review text for proper trademark attribution of IBM products. At first use, each product name must be the full name and include appropriate trademark symbols (e.g., IBM Lotus® Sametime® Unyte™). Subsequent references can drop “IBM” but should include the proper branding (e.g., Lotus Sametime Gateway, or WebSphere Application Server). Please refer to http://www.ibm.com/legal/copytrade.shtml for guidance on which trademarks require the ® or ™ symbol. Do not use abbreviations for IBM product names in your presentation. All product names must be used as adjectives rather than nouns. Please list all of the trademarks that you use in your presentation as follows; delete any not included in your presentation. IBM, the IBM logo, Lotus, Lotus Notes, Notes, Domino, Quickr, Sametime, WebSphere, UC2, PartnerWorld and Lotusphere are trademarks of International Business Machines Corporation in the United States, other countries, or both. Unyte is a trademark of WebDialogs, Inc., in the United States, other countries, or both.

• If you reference Adobe® in the text, please mark the first use and include the following; otherwise delete:Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries.

• If you reference Java™ in the text, please mark the first use and include the following; otherwise delete:Java and all Java-based trademarks are trademarks of Sun Microsystems, Inc. in the United States, other countries, or both.

• If you reference Microsoft® and/or Windows® in the text, please mark the first use and include the following, as applicable; otherwise delete:Microsoft and Windows are trademarks of Microsoft Corporation in the United States, other countries, or both.

• If you reference Intel® and/or any of the following Intel products in the text, please mark the first use and include those that you use as follows; otherwise delete:Intel, Intel Centrino, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries.

• If you reference UNIX® in the text, please mark the first use and include the following; otherwise delete:UNIX is a registered trademark of The Open Group in the United States and other countries.

• If you reference Linux® in your presentation, please mark the first use and include the following; otherwise delete:Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Other company, product, or service names may be trademarks or service marks of others.

• If the text/graphics include screenshots, no actual IBM employee names may be used (even your own), if your screenshots include fictitious company names (e.g., Renovations, Zeta Bank, Acme) please update and insert the following; otherwise delete: All references to [insert fictitious company name] refer to a fictitious company and are used for illustration purposes only.