Upload
hortonworks
View
1.712
Download
4
Embed Size (px)
DESCRIPTION
Shaun Connolly's presentation at SAS Global Conference
Citation preview
Page 1 Hortonworks © 2014
Distilling Hadoop Patterns of Use Shaun Connolly, Hortonworks @shaunconnolly
March 25, 2014
Page 2 Hortonworks © 2014
Our Mission:
Our Commitment
Open Leadership Drive innovation in the open exclusively via the Apache community-driven open source process
Enterprise Rigor Engineer, test and certify Apache Hadoop with the enterprise in mind
Ecosystem Endorsement Focus on deep integration with existing data center technologies and skills
Headquarters: Palo Alto, CA Employees: 300+ and growing
Reseller Partners
Enable your Modern Data Architecture by Delivering Enterprise Apache Hadoop
Page 3 Hortonworks © 2014
Data Continues to Grow Sharply
2020: Digital universe = 40 Ze'abytes
2012: Digital universe = 20 Ze'abytes 1 Ze2abyte (ZB) = 1 billion Terabytes (TB)
2014: 31% of enterprises managing more than 1 Petabyte
Social Networks
Machine Generated
Documents, Emails
OLTP, ERP, CRM Systems
Geoloca@on Data
Sensor Data
Web Logs, Click Streams
85% of growth from new types of data with machine-‐generated data increasing 15x
Sources: IDC and IDG Enterprise
Page 4 Hortonworks © 2014
Cameras and microphones widely
deployed
New routes to market via intelligent objects
Content and services via connected
products
Everything has a URL
Remote sensing of objects and environment
Augmented reality
Situational decision support
Building and infrastructure management
Over 50% of Internet connections are things: 2011: 15+ billion permanent, 50+ billion intermittent 2020: 30+ billion permanent, >200 billion intermittent
Source: Gartner Keynote at Hadoop Summit 2013
Page 5 Hortonworks © 2014
Harnessing Big Data is transformational to business models Enables the move from post-transaction, reactive analysis of subsets of data stored in silos to a world of pre-transaction, interactive insights across all data that impacts both the top and bottom lines
Page 6 Hortonworks © 2014
DATA
SYSTEMS
APPLICAT
IONS
Repositories
ROOMS
Sta@s@cal Analysis
BI / Repor@ng, Ad Hoc Analysis
Interac@ve Web & Mobile Applica@ons
Enterprise Applica@ons
EDW MPP RDBMS EDW MPP
Governa
nce
& In
tegra=
on
Security
Ope
ra=o
ns
Data Access
Data Management
SOURC
ES
OLTP, ERP, CRM Systems
Documents, Emails
Web Logs, Click Streams
Social Networks
Machine Generated
Sensor Data
Geoloca@on Data
Modern Data Architecture with Hadoop
OPERATIONS TOOLS
Provision, Manage & Monitor
DEV & DATA TOOLS
Build & Test
ENTERPRISE HADOOP
Page 7 Hortonworks © 2014
MDA Unlocks New Approach to Insight
Enterprise Hadoop Mul@ple Query Engines Itera@ve Process: Explore, Transform, Analyze
SQL Single Query Engine Repeatable Linear Process
Determine list of ques@ons
Current Approach Apply schema on write Dependent on IT
Augment with Hadoop Apply schema on read Support range of access paRerns to data stored in HDFS
Design solu@ons
Collect structured data
Ask ques@ons from list
Detect addi@onal ques@ons
Batch Interac@ve Real-‐@me Streaming
Page 8 Hortonworks © 2014
Schema-on-Write vs. Schema-on-Read
Standard Digital Camera § Zoom & focus first § Capture limited set of pixels § Crop around the focused area
Lytro Lightfield Camera § Capture entire lightfield § Infinite zoom & focus § Crop any captured areas
Page 9 Hortonworks © 2014
MDA Uses Commodity Compute + Storage
$0 $20,000 $40,000 $60,000 $80,000 $180,000
Cloud Storage
HADOOP
NAS
Engineered System
Hadoop Enables Scalable Compute & Storage at a
Compelling Cost Structure
Fully Loaded Cost per Raw TB of Data (min – max cost)
EDW/MPP
SAN
Page 10 Hortonworks © 2014
MDA Optimizes Data Warehouse
Analytics 20%
ETL Process 30%
Operations 50%
Current Reality § EDW at capacity; some usage
from low value workloads § Older transformed data
archived, unavailable for ongoing exploration
§ Source data often discarded
Operations 50%
Analytics 50%
HADOOP Parse, cleanse,
apply structure, transform
Augment with Hadoop § Free up EDW resources from low
value tasks § Keep 100% of source data and
historical data for ongoing exploration § Mine data for value after loading it
because of schema-on-read
Page 11 Hortonworks © 2014
Integrating with Existing Investments AP
PLICAT
IONS
DATA
SYSTEM
SOURC
ES
RDBMS EDW MPP
Emerging Sources (Sensor, Sen=ment, Geo, Unstructured)
HANA
BusinessObjects BI
OPERATIONAL TOOLS
DEV & DATA TOOLS
Exis=ng Sources (CRM, ERP, Clickstream, Logs)
INFRASTRUCTURE
Page 12 Hortonworks © 2014
Powering the Modern Data Architecture
Enables deep insight across a large, broad,
diverse set of data at efficient scale
Mul=-‐Use Data PlaSorm Store all data in one place, process in many ways
1 ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
°
°
°
°
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
° ° ° ° ° ° ° ° ° °
°
°
°
°
°
n
Batch Interac=ve Real-‐=me Streaming
Data Lake that contains ALL data; raw sources and any processed data
over extended periods of time.
YARN : Data Opera=ng System
Page 13 Hortonworks © 2014
How Hadoop? “Hadoop can be used to create a ‘data lake’ – an integrated repository of data from internal and external data sources... Data combined from mulVple silos can help your organizaVon find answers to complex quesVons that no one has previously dared ask or known how to ask.”
-‐-‐ Forrester
Page 14 Hortonworks © 2014
The Common Journey with Hadoop SC
ALE
SCOPE
More data and analytic apps
New Analytic Apps New types of data LOB-driven
A Modern Data Architecture
RDBMS
MPP
EDW
Gov
erna
nce
&
Inte
grat
ion
Secu
rity
Ope
ratio
ns
Data Access
Data Management
Page 15 Hortonworks © 2014
Unlock Value in New Types of Data 1. Social
Understand how people are feeling and interacting – right now
2. Clickstream Capture and analyze website visitors’ data trails and optimize your website
3. Sensor/Machine Discover patterns in data streaming from remote sensors and machines
4. Geographic Analyze location-based data to manage operations where they occur
5. Server Logs Diagnose process failures and prevent security breaches
6. Unstructured (txt, video, pictures, etc..) Understand patterns in files across millions of web pages, emails, and documents
Value
+ Online archive Data that was once purged or moved to tape can be stored in Hadoop to discover long term trends and previously hidden value
Page 16 Hortonworks © 2014
20 Business Applications of Hadoop Industry Use Case Type of Data
Financial Services New Account Risk Screens Text, Server Logs
Trading Risk Server Logs
Insurance Underwriting Geographic, Sensor, Text
Telecom Call Detail Records (CDRs) Machine, Geographic
Infrastructure Investment Machine, Server Logs
Real-time Bandwidth Allocation Server Logs, Text, Social
Retail 360° View of the Customer Clickstream, Text
Localized, Personalized Promotions Geographic
Website Optimization Clickstream
Manufacturing Supply Chain and Logistics Sensor
Assembly Line Quality Assurance Sensor
Crowdsourced Quality Assurance Social
Healthcare Use Genomic Data in Medical Trials Structured
Monitor Patient Vitals in Real-Time Sensor
Pharmaceuticals Recruit and Retain Patients for Drug Trials Social, Clickstream
Improve Prescription Adherence Social, Unstructured, Geographic
Oil & Gas Unify Exploration & Production Data Sensor, Geographic & Unstructured
Monitor Rig Safety in Real-Time Sensor, Unstructured
Government ETL Offload in Response to Federal Budgetary Pressures Structured
Sentiment Analysis for Government Programs Social
Page 17 Hortonworks © 2014
360° Customer View for Home Supply Retailer
Problem Disjoint customer engagement across all channels Data repositories on website traffic, POS transactions and in-home services exist in separate silos Unable to perform analytics on customer buying behavior across all channels Limited ability for targeted marketing to specific segments
Solution Unified system of engagement via “golden record” Golden record enables targeted marketing capabilities: customized coupons, promotions and emails Deep visibility into all customers and all market segments Unlocks rich, informed cross-sell & up-sell opportunities
Creating Opportunity Data: Clickstream,
Unstructured, Structured
Retail
Major home improvement retailer
>$74B in revenue
>300K employees
>2,200 stores
Page 18 Hortonworks © 2014
Monetize Anonymous & Aggregate Banking Data
Problem Unable to unlock valuable cross-sell banking data Bank possesses data that indicates larger macro-economic trends, which can be monetized in secondary markets Data sets are isolated in legacy silos controlled by LOBs Regulations and company policies protect customer privacy IT challenged by joining data while guaranteeing anonymity
Solution Create cross-LOB data lake of de-identified data Mortgage bankers, consumer bankers, credit card group and treasury bankers have access to the same cross-sell data Single point of security & privacy for de-identification, masking, encryption, authentication and access control Interoperability with SAS, Red Hat & Splunk
Creating Opportunity Data: Structured,
Clickstream, Social & Unstructured
Banking
One of the largest US banks
Page 19 Hortonworks © 2014
Improving Efficiency Data: Sensor Optimize High-Tech Manufacturing
Problem Ineffective root cause analysis on product defects 200 million digital storage devices manufactured yearly >10K faulty devices returned by customers every month Limited data available for root cause analysis means that diagnosing problems is highly manual (physical inspections) Subset of sensor data from QA testing retained 3-12 months
Solution Created sensor data lake for 10x quality improvement Repository holds 24 months of data for each device Manufacturing dashboard allows >1,000 employees to search data, with results returned in less than 1 second Quality improved 10x: rate down to ~1K faulty devices / month
Manufacturing
Digital Storage Devices
>$15B in revenue
>85K employees
Page 20 Hortonworks © 2014
Think Pigabyte, Not Petabyte
Page 21 Hortonworks © 2014
Enabling Hadoop for the Enterprise Journey
Capabili=es Ensure enterprise capabili@es are delivered in 100% open source to benefit all
1 2 Integra=on Interoperable with exis@ng
data center investments
Skills Leverage your exis@ng skills: development, analy@cs, opera@ons 3
Scale
Scope
More data and analytic apps
New Analytic Apps New types of data LOB-driven
A Modern Data Architecture
RDBMS
MPP
EDW
Gov
erna
nce
&
Inte
grat
ion
Secu
rity
Ope
ratio
ns
Data Access
Data Management
Page 22 Hortonworks © 2014
Try Hadoop Today… Get Involved
Download the Hortonworks Sandbox Learn Hadoop
Build Your Analytic App
Try Hadoop 2
San Jose, CA June 3 - 5, 2014
REGISTER NOW
Amsterdam April 2 - 3, 2014
REGISTER NOW
Page 23 Hortonworks © 2014
Questions? @shaunconnolly