Upload
hitachi-data-systems
View
113
Download
3
Tags:
Embed Size (px)
DESCRIPTION
Content stored for a business purpose is often without structure or metadata required to determine its original purpose. With Hitachi Data Discovery Suite and Hitachi Content Platform, businesses can uncover dark data that could be leveraged for better business insight and uncover compliance issues that could prevent business risks. View this session and learn: What is enterprise dark data? How can enterprise dark data impact business decisions? How can you augment your underutilized data and deliver more value? How can you decrease the headache and challenges created by dark data? For more information please visit: http://www.hds.com/products/file-and-content/
Citation preview
BIG DATA – SHINING THE LIGHT ON ENTERPRISE DARK DATA (EDD)
APRIL 17, 2013
Content stored for a business purpose often lacks structure or metadata required to
determine its original purpose. With Hitachi Data Discovery Suite and Hitachi Content
Platform, businesses can uncover dark data that could be leveraged for better business
insight and uncover compliance issues that could prevent business risks.
Attend this session and learn:
• What is enterprise dark data?
• How can enterprise dark data impact business decisions?
• How can you augment your underutilized data and deliver more value?
• How can you decrease the headache and challenges created by dark data?
BIG DATA – SHINING THE LIGHT ON ENTERPRISE DARK DATA
WEBTECH EDUCATIONAL SERIES
SPEAKERS
Jeff Lundberg, senior product marketing manager, Hitachi
Content Platform
Marcelline Sanders, senior product manager, Hitachi Data
Discovery Suite
Eamon O’Neill, senior product manager, Hitachi Content
Platform
Photo?
WHAT IS ENTERPRISE DARK DATA?
Dark data is
‒ Old files
‒ Data that you kept just in case
‒ Content on devices and clouds outside of IT control
It's created almost everywhere and stored anywhere
Organizations hoard this unanalyzed information because
it’s value is unknown and storage is “cheap”
It may be worthless, invaluable or somewhere in between
‒ It’s clogging up production systems
‒ It’s all being treated the same despite widely varying value to the organization
INFORMATION IS CREATED IN SILOS
OPERATIONS DISTRIBUTION
MARKETING
CALL CENTER
MANU- FACTURING
R&D
IT STORES
AND SALES
UNSTRUCTURED DATA IS A MESS
OLD WAYS OF INFORMATION GATHERING
HOW TO GAIN INSIGHT ACROSS THE BUSINESS?
Legal Counsel CEO CIO
What’s the next big
opportunity for the
company?
Is the
business at
risk due to
dark data?
How do I
understand my
enterprise dark
data?
CMO
How can we
influence market
sentiment for our
brand?
COLLECT AND ORGANIZE YOUR DATA
Corporate Compliance
Operational Intelligence
New Insight
10
HOW IT WORKS IN THE REAL WORLD
HEALTHCARE, LIFE SCIENCES
THE KNOWLEDGE OF ALL FOR THE TREATMENT OF ONE
RESEARCH EVALUATION TREATMENT CLINICAL TRIALS
= The next cure
= Better patient care
HEALTHCARE EXAMPLE
KLINIKUM WELS
Primary Site
8 HCP nodes
2 HDDS Nodes
(Full content and
metadata search)
USP-V
Secondary Site
4 HCP nodes
1 HDDS node
USP-V
Replication
Health Portal
Ingest and consolidate data from 37 departments, 26 specialties
Metadata-based repository
Metadata Robot
(CDA, PDF and XML)
Adds metadata and custom metadata to create context
(information and intelligence)
The environment
‒ Consolidate content from 37 departments
‒ 30-year compliant preservation
‒ Aggregation, search and metadata mining
How they use big data
‒ Intelligent data management
‒ Improve patient care, research and education capabilities
‒ Trend analysis
‒ Reduce cost and complexity of backups
‒ Make data independent of applications
FINANCIAL SERVICES
PROACTIVELY SEARCH FOR REGULATORY ISSUES
BLOOMBERG
MESSAGES
EMAIL CALL
RECORDINGS
DATABASE
RECORDS
= Smart Intelligence from
enterprise dark data
= Protect business from risk
XML
FINANCIAL SERVICES − REGULATORY
XML
AUDIO
RECORDS
BLOOM-
BERG
MESAGES
Add
Custom
Metadata
Google $600.00
11A
M P
ST
Ap
ple
52
3.0
0
Ap
ple
52
3.0
0
Tra
der
– S
am
Malo
ne
Bloomberg 11AM
Tra
de
r – S
am
Ma
lon
e
JP
Mo
rga
n 3
rd P
art
y
11:20 AM PST
Equity E
NP
V 1
1 B
illio
n
Nov 15, 2012
Nov 15, 2012
Nov 15, 2012
Nov 15, 2012
HDDS
Search “Nov 15,
2012” and “Sam
Malone” and “I
have a deal for
you”
Legal Hold Legal Hold Legal Hold Legal Hold
Index and
Search
INSURANCE
MOVING BEYOND I.T.-CENTRIC VALUE TO BUSINESS VALUE
ACCIDENT CLAIM INVESTIGATION PAYOUT
= Competitive differentiation
= Increased customer loyalty
INSURANCE EXAMPLE
ENTERPRISE CONTENT LIFECYCLE MANAGEMENT AND DISCOVERY
Unified Search (HDDS)
Virtualized Content Content Creation
Unified Management
Mobile
Remote/Branch Office
On-Site
<claim id=1203
date=20110925>
<policy id=101>
<party id=1 type=car
plate=509445>
<claim id=1203
date=20110925>
<policy id=101>
<estimate id=2344
estimator=124
date=20110930>
<claim id=1203
date=20110930>
<policy id=101>
<invoice id=72273881
vendor=2833>
Search across all content
independent of applications,
physical location of data
Cloud
Storage
17
INDEX AND SEARCH DISCOVER, CONNECT, FILTER, ASSESS, ACT
DISCOVER
GAIN INSIGHT BY CONNECTING TO YOUR DATA
SEARCH
ANALYZE INSIGHT
MANY DATA SOURCES
5/ 25/ 12 Retreive Well Product ion Data
1/ 2https:/ / www.dmr.nd.gov/ oilgas/ basic/ getwellprod.asp?filenumber= 19119
Related Links
Get Well Production History Data
Enter File Number: 20178
Get Monthly Production Data
NDIC File No: 19119 API No: 33-105-01865-00-00 CTB No: 119119
Well Type: OG Well Status: A Status Date: 11/5/2010 Wellbore type: Horizontal
Location: NENW 26-155-101 Footages: 320 FNL 2529 FWL Latitude: 48.225686 Longitude:-103.636598
Current Operator: BRIGHAM OIL & GAS, L.P.
Current Well Name: HEEN 26-35 1-H
Elevation(s): 2073 KB 2053 GR 2053 GL Total Depth: 20400 Field: TODD
Spud Date(s): 7/27/2010
Casing String(s): 9.625" 2160' 7" 10896'
Completion Data Pool: BAKKEN Perfs: 10896-20400 Comp: 11/5/2010 Status: AL Date: 2/10/2011 Spacing:2SEC
Cumulative Production Data Pool: BAKKEN Cum Oil: 162510 Cum MCF Gas: 141410 Cum Water: 150629
Production Test Data IP Test Date: 11/8/2010 Pool: BAKKEN IP Oil: 3425 IP MCF: 2194 IP Water: 6265
Monthly Production Data
Pool Date Days BBLS Oil Runs BBLS Water MCF Prod MCF Sold Vent/Flare
BAKKEN 3-2012 31 5301 5217 4079 4301 3667 634
BAKKEN 2-2012 29 5050 4971 3756 2723 1185 1538
BAKKEN 1-2012 31 5624 5786 4239 2846 1705 1141
BAKKEN 12-2011 31 5708 5407 4272 4033 3134 899
BAKKEN 11-2011 30 6112 6228 4536 4647 4368 279
BAKKEN 10-2011 31 6227 7526 4857 4903 4303 600
BAKKEN 9-2011 30 6516 5544 4866 5418 5113 305
BAKKEN 8-2011 31 7430 7276 7724 5996 2532 3464
BAKKEN 7-2011 31 8085 7866 5699 7500 7499 1
BAKKEN 6-2011 30 8438 8682 5501 6481 1816 4665
BAKKEN 5-2011 28 6221 6526 6709 4456 0 4456
BAKKEN 4-2011 30 8201 7379 8189 5943 0 5943
BAKKEN 3-2011 31 11263 11928 9963 8345 0 8345
BAKKEN 2-2011 23 10035 10365 7819 9841 0 9841
Structured: Presentation of RDBMS Data Unstructured: Well File, PDF
of Scanned Documents, Seismic, etc.
SCALE-OUT INDEXING OF INFORMATION
Index Metadata and Full Content in
Complex Formats and Multiple
Languages
Process Petabytes of Data
Security Protection!
DISCOVER, CONNECT, AND ASSESS INFORMATION
Hitachi Data Discovery Suite (HDDS)
‒ Scales using latest open source technologies
‒ Hadoop
‒ HDFS
‒ Zookeeper
‒ 1,000 objects per second per server/node (NFS metadata indexing)
‒ Parallel processing
Structured queries against
unstructured information
Rich API
Results for further analysis
BREAK DOWN SILOS
SOPHISTICATED INSIGHT ACROSS DISPARATE INFORMATION TYPES
Identify Trends and Insights With a
Single View Across Previously Siloed
Data
3
4
4
1
Net-New Revenue
Opportunity, Innovation or
Competitive Differentiation SINGLE VIRTUALIZATION PLATFORM
Block Object File
Structured/Unstructured
Healthcare Insurance Manufacturing
ANALYTICS
BRING STRUCTURE TO UNSTRUCTURED DATA
USE METADATA TO ORGANIZE AND QUERY
Block File
M
E
T
A
D
A
T
A
Object
QUERIES
BIG METADATA
PREPARE DATA FOR ANALYTICS
Block File Object
ANALYTICS
OBJECT STORAGE
FOR STORING, CONTROLLING, TAGGING, ANALYZING, ENRICHING, AND SHARING ENTERPRISE DARK DATA
STORE EDD − VOLUME AND VELOCITY
80 Nodes
40 Petabytes of Storage
64 Billion User Objects
Volume: Grow from 4TB to 40PB, by adding storage
Velocity: Rapid read-write of data. Increase bandwidth by adding nodes
Scale-Out Architecture
With compression and deduplication, store big data efficiently in Hitachi
Content Platform (HCP), inside the enterprise or in cloud-hosted HCP
STORE EDD − VARIETY
10,000 namespace divisions within the reservoir Different data management policies for each kind of
data – retention, compliance, etc.
HCP DESIGNED TO STORE A WIDE VARIETY OF UNSTRUCTURED DATA
Office SharePoint Server2007
Office SharePoint Server2007
Office SharePoint Server2007
Office SharePoint Server2007
Microsoft® SharePoint®
Microsoft Exchange
X-rays
Metadata Schema
Adapted for Various
Content Types
Legal contracts
Instant messages
Surveillance
Call Recordings
CONTROL EDD – BACKUP-FREE
Use of proven RAID-6 protection
2 copies of all metadata
Customer configurable redundant local object copies (2, 3, or 4)
Content validation via hashes and automatic object repair
Replication – offsite copies with automated repair from replica
Object versioning – protection from accidental deletes and changes
Active data protection built into the object store
Equals unparalleled data protection and reduced backup burden
P
21 May 21 2036
May
Authentication Policy-based object management guarantees archived data is authentic, available and secure
Guards against corruption or tampering
Selectable hash algorithms include SHA-1, 256/384/512; MD5, and RIPEMD-160
0 1 1 0 0 1 1 0 0 1 0 1
1 1 1 0 1 1 0 1 1 1 0 0
0 0 1 1 0 0 0 1 0 0 0 1
A
Retention Prevents deletion before retention period expires
Strict “compliance” or more liberal “enterprise” mode
Retention classes, date in object, or deferred options. Privileged delete, retention hold
Protection Self-configuring and self-healing with automated policy enforcement, failover and ongoing integrity checks
Ensures specified number of replica copies are maintained to tolerate simultaneous points of failure,
depending on value of data
CONTROL EDD – PRESERVE AND SECURE
Encryption of data at rest Protects content if media is stolen, using patented Secret Sharing technology
Transparently encrypts all content, metadata, and search indexes
Implements a distributed key management solution
Replication Bidirectional, inbound star, chain topologies
Transparent object-level restore, repair, and read recovery from replica
Shredding Ensures no trace of file is recoverable from disk after deletion; U.S. DoD 5520-M spec.
X X X X X X
X X X X X X
X X X X X X
TAG EDD – CUSTOM METADATA
<claim id=1203 date=20110925>
<policy id=101>
<party id=1 type=car plate=509445>
<claim id=1203 date=20110925>
<policy id=101>
<estimate id=2344 estimator=124 date=20110930>
<policy id=101>
<object type=car
plate=454756>
<customer id=2355>
<tow plate=454756>
Object Consists of Files (JPG,
PDF, etc.) Plus Appended
Tags
ANALYZE EDD
Built-in metadata search index
Object query API enables web
dashboards
Relational queries link together
many kinds of unstructured objects
and connect those to structured data
Metadata policy engine – automated
management actions on search
results
Put HOLD on all files related to lawsuit
Retrieve all scanned-doctor-notes related to
tibia-fracture-xray-images and related
insurance-claim-records in SQL DBs
EDD LIFECYCLE
ENRICH EDD
Analyze
Enrich Store and
Control
Capture
+
HCP makes existing data more
useful. Outcome of analysis leads
to more tags for the content.
Continuously append custom
metadata
Over time, what you learn
about EDD becomes
more important than the
data itself
Linux/Unix
Filers
(NFS)
Document
Management
(WebDAV)
Microsoft®
Windows®
(CIFS)
Amazon S3
(Compatible
RESTful
HTTP(S))
SHARE EDD – MANY ACCESS METHODS
Journaling
(SMTP) https://marketing.xenos.
/browser/contract.pdf
ADDITIONAL RESOURCES
For more information about the technologies behind
enterprise dark data, please refer to the following
links for more information
Hitachi Data Discovery Suite
http://www.hds.com/products/file-and-content/data-
discovery-suite.html?WT.ac=us_mg_pro_dds
Hitachi Content Platform
http://www.hds.com/products/file-and-content/content-
platform/?WT.ac=us_mg_pro_hcp
General EDD questions − Laura Chu-Vial,
SUMMARY
Currently, Dark Data is a burden:
‒ It's created almost everywhere and stored anywhere
‒ Organizations hoard this data because it’s value is unknown
and storage is ‘cheap’
‒ It’s all being treated the same despite widely varying value to the organization
‒ Provides low value outside of legal and compliance
Put your data to work for you:
‒ Identify dark data and assess its value with index and search
‒ Collect, store and organize data in an object store
‒ Analyze your dark data’s content and metadata
‒ Enrich and share insight to drive new innovation
QUESTIONS AND DISCUSSION
UPCOMING WEBTECHS
HDS Big Data Roadmap, May 1, 9 a.m. PT, noon ET
Hitachi’s Cloud Strategy, Enabling Technologies, and Solutions, May
21, 9 a.m. PT, noon ET
Environmental Pressures Driving an Evolution in File Storage, May 23,
9 a.m. PT, noon ET
HDS Hadoop Reference Architecture, June 5, 9 a.m. PT, noon ET
Check www.hds.com/webtech for:
Links to the recording, the presentation and Q&A (available next week)
Schedule and registration for upcoming WebTech sessions
THANK YOU