39
BIG DATA SHINING THE LIGHT ON ENTERPRISE DARK DATA (EDD) APRIL 17, 2013

Big Data – Shining the Light on Enterprise Dark Data

Embed Size (px)

DESCRIPTION

Content stored for a business purpose is often without structure or metadata required to determine its original purpose. With Hitachi Data Discovery Suite and Hitachi Content Platform, businesses can uncover dark data that could be leveraged for better business insight and uncover compliance issues that could prevent business risks. View this session and learn: What is enterprise dark data? How can enterprise dark data impact business decisions? How can you augment your underutilized data and deliver more value? How can you decrease the headache and challenges created by dark data? For more information please visit: http://www.hds.com/products/file-and-content/

Citation preview

Page 1: Big Data – Shining the Light on Enterprise Dark Data

BIG DATA – SHINING THE LIGHT ON ENTERPRISE DARK DATA (EDD)

APRIL 17, 2013

Page 2: Big Data – Shining the Light on Enterprise Dark Data

Content stored for a business purpose often lacks structure or metadata required to

determine its original purpose. With Hitachi Data Discovery Suite and Hitachi Content

Platform, businesses can uncover dark data that could be leveraged for better business

insight and uncover compliance issues that could prevent business risks.

Attend this session and learn:

• What is enterprise dark data?

• How can enterprise dark data impact business decisions?

• How can you augment your underutilized data and deliver more value?

• How can you decrease the headache and challenges created by dark data?

BIG DATA – SHINING THE LIGHT ON ENTERPRISE DARK DATA

WEBTECH EDUCATIONAL SERIES

Page 3: Big Data – Shining the Light on Enterprise Dark Data

SPEAKERS

Jeff Lundberg, senior product marketing manager, Hitachi

Content Platform

Marcelline Sanders, senior product manager, Hitachi Data

Discovery Suite

Eamon O’Neill, senior product manager, Hitachi Content

Platform

Photo?

Page 4: Big Data – Shining the Light on Enterprise Dark Data

WHAT IS ENTERPRISE DARK DATA?

Dark data is

‒ Old files

‒ Data that you kept just in case

‒ Content on devices and clouds outside of IT control

It's created almost everywhere and stored anywhere

Organizations hoard this unanalyzed information because

it’s value is unknown and storage is “cheap”

It may be worthless, invaluable or somewhere in between

‒ It’s clogging up production systems

‒ It’s all being treated the same despite widely varying value to the organization

Page 5: Big Data – Shining the Light on Enterprise Dark Data

INFORMATION IS CREATED IN SILOS

OPERATIONS DISTRIBUTION

MARKETING

CALL CENTER

MANU- FACTURING

R&D

IT STORES

AND SALES

EMAIL

EMAIL

EMAIL

EMAIL

EMAIL

PDF

PDF

PDF

PDF

Page 6: Big Data – Shining the Light on Enterprise Dark Data

UNSTRUCTURED DATA IS A MESS

Page 7: Big Data – Shining the Light on Enterprise Dark Data

OLD WAYS OF INFORMATION GATHERING

Page 8: Big Data – Shining the Light on Enterprise Dark Data

HOW TO GAIN INSIGHT ACROSS THE BUSINESS?

Legal Counsel CEO CIO

What’s the next big

opportunity for the

company?

Is the

business at

risk due to

dark data?

How do I

understand my

enterprise dark

data?

CMO

How can we

influence market

sentiment for our

brand?

Page 9: Big Data – Shining the Light on Enterprise Dark Data

COLLECT AND ORGANIZE YOUR DATA

Corporate Compliance

Operational Intelligence

New Insight

Page 10: Big Data – Shining the Light on Enterprise Dark Data

10

HOW IT WORKS IN THE REAL WORLD

Page 11: Big Data – Shining the Light on Enterprise Dark Data

HEALTHCARE, LIFE SCIENCES

THE KNOWLEDGE OF ALL FOR THE TREATMENT OF ONE

RESEARCH EVALUATION TREATMENT CLINICAL TRIALS

= The next cure

= Better patient care

Page 12: Big Data – Shining the Light on Enterprise Dark Data

HEALTHCARE EXAMPLE

KLINIKUM WELS

Primary Site

8 HCP nodes

2 HDDS Nodes

(Full content and

metadata search)

USP-V

Secondary Site

4 HCP nodes

1 HDDS node

USP-V

Replication

Health Portal

Ingest and consolidate data from 37 departments, 26 specialties

Metadata-based repository

Metadata Robot

(CDA, PDF and XML)

Adds metadata and custom metadata to create context

(information and intelligence)

The environment

‒ Consolidate content from 37 departments

‒ 30-year compliant preservation

‒ Aggregation, search and metadata mining

How they use big data

‒ Intelligent data management

‒ Improve patient care, research and education capabilities

‒ Trend analysis

‒ Reduce cost and complexity of backups

‒ Make data independent of applications

Page 13: Big Data – Shining the Light on Enterprise Dark Data

FINANCIAL SERVICES

PROACTIVELY SEARCH FOR REGULATORY ISSUES

BLOOMBERG

MESSAGES

EMAIL CALL

RECORDINGS

DATABASE

RECORDS

= Smart Intelligence from

enterprise dark data

= Protect business from risk

XML

Page 14: Big Data – Shining the Light on Enterprise Dark Data

FINANCIAL SERVICES − REGULATORY

XML

AUDIO

RECORDS

BLOOM-

BERG

MESAGES

Add

Custom

Metadata

Google $600.00

11A

M P

ST

Ap

ple

52

3.0

0

Ap

ple

52

3.0

0

Tra

der

– S

am

Malo

ne

Bloomberg 11AM

Tra

de

r – S

am

Ma

lon

e

JP

Mo

rga

n 3

rd P

art

y

11:20 AM PST

Equity E

NP

V 1

1 B

illio

n

Nov 15, 2012

Nov 15, 2012

Nov 15, 2012

Nov 15, 2012

HDDS

Search “Nov 15,

2012” and “Sam

Malone” and “I

have a deal for

you”

Legal Hold Legal Hold Legal Hold Legal Hold

Index and

Search

Page 15: Big Data – Shining the Light on Enterprise Dark Data

INSURANCE

MOVING BEYOND I.T.-CENTRIC VALUE TO BUSINESS VALUE

ACCIDENT CLAIM INVESTIGATION PAYOUT

= Competitive differentiation

= Increased customer loyalty

Page 16: Big Data – Shining the Light on Enterprise Dark Data

INSURANCE EXAMPLE

ENTERPRISE CONTENT LIFECYCLE MANAGEMENT AND DISCOVERY

Unified Search (HDDS)

Virtualized Content Content Creation

Unified Management

Mobile

Remote/Branch Office

On-Site

<claim id=1203

date=20110925>

<policy id=101>

<party id=1 type=car

plate=509445>

<claim id=1203

date=20110925>

<policy id=101>

<estimate id=2344

estimator=124

date=20110930>

<claim id=1203

date=20110930>

<policy id=101>

<invoice id=72273881

vendor=2833>

Search across all content

independent of applications,

physical location of data

Cloud

Storage

Page 17: Big Data – Shining the Light on Enterprise Dark Data

17

INDEX AND SEARCH DISCOVER, CONNECT, FILTER, ASSESS, ACT

Page 18: Big Data – Shining the Light on Enterprise Dark Data

DISCOVER

GAIN INSIGHT BY CONNECTING TO YOUR DATA

SEARCH

ANALYZE INSIGHT

Page 19: Big Data – Shining the Light on Enterprise Dark Data

MANY DATA SOURCES

5/ 25/ 12 Retreive Well Product ion Data

1/ 2https:/ / www.dmr.nd.gov/ oilgas/ basic/ getwellprod.asp?filenumber= 19119

Related Links

Get Well Production History Data

Enter File Number: 20178

Get Monthly Production Data

NDIC File No: 19119 API No: 33-105-01865-00-00 CTB No: 119119

Well Type: OG Well Status: A Status Date: 11/5/2010 Wellbore type: Horizontal

Location: NENW 26-155-101 Footages: 320 FNL 2529 FWL Latitude: 48.225686 Longitude:-103.636598

Current Operator: BRIGHAM OIL & GAS, L.P.

Current Well Name: HEEN 26-35 1-H

Elevation(s): 2073 KB 2053 GR 2053 GL Total Depth: 20400 Field: TODD

Spud Date(s): 7/27/2010

Casing String(s): 9.625" 2160' 7" 10896'

Completion Data Pool: BAKKEN Perfs: 10896-20400 Comp: 11/5/2010 Status: AL Date: 2/10/2011 Spacing:2SEC

Cumulative Production Data Pool: BAKKEN Cum Oil: 162510 Cum MCF Gas: 141410 Cum Water: 150629

Production Test Data IP Test Date: 11/8/2010 Pool: BAKKEN IP Oil: 3425 IP MCF: 2194 IP Water: 6265

Monthly Production Data

Pool Date Days BBLS Oil Runs BBLS Water MCF Prod MCF Sold Vent/Flare

BAKKEN 3-2012 31 5301 5217 4079 4301 3667 634

BAKKEN 2-2012 29 5050 4971 3756 2723 1185 1538

BAKKEN 1-2012 31 5624 5786 4239 2846 1705 1141

BAKKEN 12-2011 31 5708 5407 4272 4033 3134 899

BAKKEN 11-2011 30 6112 6228 4536 4647 4368 279

BAKKEN 10-2011 31 6227 7526 4857 4903 4303 600

BAKKEN 9-2011 30 6516 5544 4866 5418 5113 305

BAKKEN 8-2011 31 7430 7276 7724 5996 2532 3464

BAKKEN 7-2011 31 8085 7866 5699 7500 7499 1

BAKKEN 6-2011 30 8438 8682 5501 6481 1816 4665

BAKKEN 5-2011 28 6221 6526 6709 4456 0 4456

BAKKEN 4-2011 30 8201 7379 8189 5943 0 5943

BAKKEN 3-2011 31 11263 11928 9963 8345 0 8345

BAKKEN 2-2011 23 10035 10365 7819 9841 0 9841

Structured: Presentation of RDBMS Data Unstructured: Well File, PDF

of Scanned Documents, Seismic, etc.

Page 20: Big Data – Shining the Light on Enterprise Dark Data

SCALE-OUT INDEXING OF INFORMATION

Index Metadata and Full Content in

Complex Formats and Multiple

Languages

Process Petabytes of Data

Security Protection!

Page 21: Big Data – Shining the Light on Enterprise Dark Data

DISCOVER, CONNECT, AND ASSESS INFORMATION

Hitachi Data Discovery Suite (HDDS)

‒ Scales using latest open source technologies

‒ Hadoop

‒ HDFS

‒ Zookeeper

‒ 1,000 objects per second per server/node (NFS metadata indexing)

‒ Parallel processing

Structured queries against

unstructured information

Rich API

Results for further analysis

Page 22: Big Data – Shining the Light on Enterprise Dark Data

BREAK DOWN SILOS

SOPHISTICATED INSIGHT ACROSS DISPARATE INFORMATION TYPES

Identify Trends and Insights With a

Single View Across Previously Siloed

Data

3

4

4

1

Net-New Revenue

Opportunity, Innovation or

Competitive Differentiation SINGLE VIRTUALIZATION PLATFORM

Block Object File

Structured/Unstructured

Healthcare Insurance Manufacturing

ANALYTICS

Page 23: Big Data – Shining the Light on Enterprise Dark Data

BRING STRUCTURE TO UNSTRUCTURED DATA

Page 24: Big Data – Shining the Light on Enterprise Dark Data

USE METADATA TO ORGANIZE AND QUERY

Block File

M

E

T

A

D

A

T

A

Object

QUERIES

Page 25: Big Data – Shining the Light on Enterprise Dark Data

BIG METADATA

PREPARE DATA FOR ANALYTICS

Block File Object

ANALYTICS

Page 26: Big Data – Shining the Light on Enterprise Dark Data

OBJECT STORAGE

FOR STORING, CONTROLLING, TAGGING, ANALYZING, ENRICHING, AND SHARING ENTERPRISE DARK DATA

Page 27: Big Data – Shining the Light on Enterprise Dark Data

STORE EDD − VOLUME AND VELOCITY

80 Nodes

40 Petabytes of Storage

64 Billion User Objects

Volume: Grow from 4TB to 40PB, by adding storage

Velocity: Rapid read-write of data. Increase bandwidth by adding nodes

Scale-Out Architecture

With compression and deduplication, store big data efficiently in Hitachi

Content Platform (HCP), inside the enterprise or in cloud-hosted HCP

Page 28: Big Data – Shining the Light on Enterprise Dark Data

STORE EDD − VARIETY

10,000 namespace divisions within the reservoir Different data management policies for each kind of

data – retention, compliance, etc.

HCP DESIGNED TO STORE A WIDE VARIETY OF UNSTRUCTURED DATA

Office SharePoint Server2007

Office SharePoint Server2007

Office SharePoint Server2007

Office SharePoint Server2007

Microsoft® SharePoint®

Microsoft Exchange

X-rays

Metadata Schema

Adapted for Various

Content Types

Legal contracts

Instant messages

Surveillance

Call Recordings

Page 29: Big Data – Shining the Light on Enterprise Dark Data

CONTROL EDD – BACKUP-FREE

Use of proven RAID-6 protection

2 copies of all metadata

Customer configurable redundant local object copies (2, 3, or 4)

Content validation via hashes and automatic object repair

Replication – offsite copies with automated repair from replica

Object versioning – protection from accidental deletes and changes

Active data protection built into the object store

Equals unparalleled data protection and reduced backup burden

Page 30: Big Data – Shining the Light on Enterprise Dark Data

P

21 May 21 2036

May

Authentication Policy-based object management guarantees archived data is authentic, available and secure

Guards against corruption or tampering

Selectable hash algorithms include SHA-1, 256/384/512; MD5, and RIPEMD-160

0 1 1 0 0 1 1 0 0 1 0 1

1 1 1 0 1 1 0 1 1 1 0 0

0 0 1 1 0 0 0 1 0 0 0 1

A

Retention Prevents deletion before retention period expires

Strict “compliance” or more liberal “enterprise” mode

Retention classes, date in object, or deferred options. Privileged delete, retention hold

Protection Self-configuring and self-healing with automated policy enforcement, failover and ongoing integrity checks

Ensures specified number of replica copies are maintained to tolerate simultaneous points of failure,

depending on value of data

CONTROL EDD – PRESERVE AND SECURE

Encryption of data at rest Protects content if media is stolen, using patented Secret Sharing technology

Transparently encrypts all content, metadata, and search indexes

Implements a distributed key management solution

Replication Bidirectional, inbound star, chain topologies

Transparent object-level restore, repair, and read recovery from replica

Shredding Ensures no trace of file is recoverable from disk after deletion; U.S. DoD 5520-M spec.

X X X X X X

X X X X X X

X X X X X X

Page 31: Big Data – Shining the Light on Enterprise Dark Data

TAG EDD – CUSTOM METADATA

<claim id=1203 date=20110925>

<policy id=101>

<party id=1 type=car plate=509445>

<claim id=1203 date=20110925>

<policy id=101>

<estimate id=2344 estimator=124 date=20110930>

<policy id=101>

<object type=car

plate=454756>

<customer id=2355>

<tow plate=454756>

Object Consists of Files (JPG,

PDF, etc.) Plus Appended

Tags

Page 32: Big Data – Shining the Light on Enterprise Dark Data

ANALYZE EDD

Built-in metadata search index

Object query API enables web

dashboards

Relational queries link together

many kinds of unstructured objects

and connect those to structured data

Metadata policy engine – automated

management actions on search

results

Put HOLD on all files related to lawsuit

Retrieve all scanned-doctor-notes related to

tibia-fracture-xray-images and related

insurance-claim-records in SQL DBs

Page 33: Big Data – Shining the Light on Enterprise Dark Data

EDD LIFECYCLE

ENRICH EDD

Analyze

Enrich Store and

Control

Capture

+

HCP makes existing data more

useful. Outcome of analysis leads

to more tags for the content.

Continuously append custom

metadata

Over time, what you learn

about EDD becomes

more important than the

data itself

Page 34: Big Data – Shining the Light on Enterprise Dark Data

Linux/Unix

Filers

(NFS)

Document

Management

(WebDAV)

Microsoft®

Windows®

(CIFS)

Amazon S3

(Compatible

RESTful

HTTP(S))

SHARE EDD – MANY ACCESS METHODS

Email

Journaling

(SMTP) https://marketing.xenos.

/browser/contract.pdf

Page 35: Big Data – Shining the Light on Enterprise Dark Data

ADDITIONAL RESOURCES

For more information about the technologies behind

enterprise dark data, please refer to the following

links for more information

Hitachi Data Discovery Suite

http://www.hds.com/products/file-and-content/data-

discovery-suite.html?WT.ac=us_mg_pro_dds

Hitachi Content Platform

http://www.hds.com/products/file-and-content/content-

platform/?WT.ac=us_mg_pro_hcp

General EDD questions − Laura Chu-Vial,

[email protected]

Page 36: Big Data – Shining the Light on Enterprise Dark Data

SUMMARY

Currently, Dark Data is a burden:

‒ It's created almost everywhere and stored anywhere

‒ Organizations hoard this data because it’s value is unknown

and storage is ‘cheap’

‒ It’s all being treated the same despite widely varying value to the organization

‒ Provides low value outside of legal and compliance

Put your data to work for you:

‒ Identify dark data and assess its value with index and search

‒ Collect, store and organize data in an object store

‒ Analyze your dark data’s content and metadata

‒ Enrich and share insight to drive new innovation

Page 37: Big Data – Shining the Light on Enterprise Dark Data

QUESTIONS AND DISCUSSION

Page 38: Big Data – Shining the Light on Enterprise Dark Data

UPCOMING WEBTECHS

HDS Big Data Roadmap, May 1, 9 a.m. PT, noon ET

Hitachi’s Cloud Strategy, Enabling Technologies, and Solutions, May

21, 9 a.m. PT, noon ET

Environmental Pressures Driving an Evolution in File Storage, May 23,

9 a.m. PT, noon ET

HDS Hadoop Reference Architecture, June 5, 9 a.m. PT, noon ET

Check www.hds.com/webtech for:

Links to the recording, the presentation and Q&A (available next week)

Schedule and registration for upcoming WebTech sessions

Page 39: Big Data – Shining the Light on Enterprise Dark Data

THANK YOU