25
© Hortonworks Inc. 2011 2014. All Rights Reserved Q&A box is available for your questions Webinar will be recorded for future viewing Thank you for joining! We’ll get started soon…

Create a Smarter Data Lake with HP Haven and Apache Hadoop

Embed Size (px)

Citation preview

Page 1: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Q&A box is available for your questions

Webinar will be recorded for future viewing

Thank you for joining!

We’ll get started soon…

Page 2: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Create a Smarter Data Lake with HP Haven

and Apache Hadoop

We do Hadoop.

Page 3: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Your speakers…

Ajay Singh, Director of Technical Channels

Hortonworks

Will Gardella, Director of Product

Management, Big Data

HP

Page 4: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Traditional systems under pressure

Challenges

• Constrains data to app

• Can’t manage new data

• Costly to Scale

Business Value

Clickstream

Geolocation

Web Data

Internet of Things

Docs, emails

Server logs

2012

2.8 Zettabytes

2020

40 Zettabytes

LAGGARDS

INDUSTRY

LEADERS

1

2 New Data

ERP CRM SCM

New

Traditional

Page 5: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop emerged as foundation of new data architecture

Apache Hadoop is an open source data platform for

managing large volumes of high velocity and variety of data

• Built by Yahoo! to be the heartbeat of its ad & search business

• Donated to Apache Software Foundation in 2005 with rapid adoption by

large web properties & early adopter enterprises

Hadoop Advantages

Manages new data paradigm

Handles data at scale

Cost effective

Open source

Application

Storage

HDFS

Batch Processing

MapReduce

Page 6: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Hadoop for the Enterprise:

Implement a Modern Data Architecture with HDP

Customer Momentum

• 230+ customers (as of Q3 2014)

Hortonworks Data Platform

• Completely open multi-tenant platform for any app & any

data.

• A centralized architecture of consistent enterprise services

for resource management, security, operations, and

governance.

Partner for Customer Success

• Open source community leadership focus on enterprise

needs

• Unrivaled world class support

• Founded in 2011

• Original 24 architects, developers,

operators of Hadoop from Yahoo!

• 600+ Employees

• 800+ Ecosystem Partners

Page 7: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

HDP delivers a completely open data platform

Hortonworks Data Platform 2.2

Hortonworks Data Platform provides Hadoop for the Enterprise: a centralized architecture

of core enterprise services, for any application and any data.

Completely Open

• HDP incorporates every element

required of an enterprise data

platform: data storage, data

access, governance, security,

operations

• All components are developed in

open source and then rigorously

tested, certified, and delivered as

an integrated open source platform

that’s easy to consume and use by

the enterprise and ecosystem.

YARN: Data Operating System

(Cluster Resource Management)

1 ° ° ° ° ° ° °

° ° ° ° ° ° ° °

Ap

ach

e P

ig

° °

° °

° ° °

° ° °

HDFS (Hadoop Distributed File System)

GOVERNANCE BATCH, INTERACTIVE & REAL-TIME DATA ACCESS

Apache Falcon

Ap

ach

e H

ive

Casca

din

g

Ap

ach

e H

Base

Apache A

ccum

ulo

Ap

ach

e S

olr

Ap

ach

e S

pa

rk

Ap

ach

e S

torm

Apache Sqoop

Apache Flume

Apache Kafka

SECURITY

Apache Ranger

Apache Knox

Apache Falcon

OPERATIONS

Apache Ambari

Apache

Zookeeper

Apache Oozie

Page 8: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

HP & Hortonworks

Page 9: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

HP & Hortonworks: An Integrated Part of a Modern Data Architecture

Smart Content Hub

Solution Architecture

Page 10: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

The Opportunity

Page 11: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11

Accelerating business outcomes

Data lakes – the new enterprise data hub

Social media IT/OT ImagesAudioVideoTransactional

dataMobile Search engineEmail Texts Documents

Hadoop Distributed File System (HDFS)

Self-healing, high bandwidth, cheap clustered storage

Map/Reduce

Distributed Computing Framework

Business

outcomes

Page 12: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12

Analyzing data is a multi step process

Data type• Structured tables• Semi-structured• Unstructured• Documents• Images• Audio• Video

Speed• Batch• Interactive• Real-time

Process• Acquisition• Preparation• Visualization• Analysis• Presentation• Collaboration

Skill set• Business users• Programmer• Database

expert• Statistician• Mathematician• Subject Matter

Expert

Types• Descriptive• Diagnostic• Predictive• Prescriptive

Requires “Easy To Use” tools to meet wide range of skills

Page 13: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13

Challenge: Barriers between business users and

actionable information

Business users Data Scientists

Programmers Batch

Data Cleansing

Programming

Statistics

Reports

Information Requests

Hadoop

Page 14: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14

Contextualsearch

Dataexploration

Image/videoanalytics Geospatial

analytics

SQL onHadoopAccelerated

analyticsSentimentanalysis

Predictiveanalytics

HP Haven Big Data platform

Access Explore Enrich Analyze Predict Serve ActAndmore...

Core big data business capabilities

On-premise In the Cloud

Industry-leading breath & depth of capabilities

Page 15: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15

A Smarter Data Lake• Any Source − Build, enrich, and clean up your data lake

• Data Clarity & Mapped Security – Data dictionary and information security within your data

lake

• Advanced Analytics - Provide contextual search and text, image, video, speech machine

learning

Fast Analytics with HP Vertica on Hadoop• The fastest and most advanced SQL analytics on Hadoop

• Operationalize, democratize and monetize all your data

• Data tiering – pick the best location and format for your data

HP Haven for Hadoop

Page 16: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.16

2D/3D clustering, Acoustic signature, Active matching, Agents, Alerting, Auto language detection, Auto query guidance, Boolean & legacy, Operations, Breaking news clustering, Categorization, Collaboration, Community, Concept highlighting,Concept-query, Summarization, Conceptual retrieval, Context summarization, Cross-modal suggest, Dynamic n-dimensional, Taxonomy generation, Dynamic XML, Consumption, Exact phrase matching, Expertise location, Explicit profiling,Face recognition, Field modulation, Frame analysis, Fuzzy matching, Hot clustering, Hyperlinking, Image analysis, Image association, Implicit profiling, Keyword search, Mail object identification, Melody classification, Melody identification,Metadata recognition, Natural language retrieval, Object identification, Object recognition, Ontology generation, Parametric refinement, Phrase spotting, Proper name identification, Query by example, Real-time aggregation, Routing,Scene detection, Script alignment, Sentiment analysis, Sound matching, Speaker identification, Speaker recognition, Spectographic analysis, Spell checking, Tag reconciliation, Transcription, Video analysis, Voice printing, Word spotting,Work groups, XML tagging….

AnalyzeEnrichFind Act

HP IDOL: Act on 100% of your information

Transactional

dataIOT Search

engine

ImagesSocial

mediaVideo Audio Mobile Documents Texts EmailCustomer

communications

Language independent Language independent

News

Forums

Blogs

…and more

Enterprise External and Cloud

HP Archiving

HP Data Protection

HP Marketing

Optimization

…and more

Act on 100% of your information

HP IDOL

+500 powerful HP IDOL functions

Page 17: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Using HP Haven with Hadoop

Page 18: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.18

A Smarter Data Lake Needs…

Automatically analyse rich media

Connectors & Policies

HP IDOL FeaturesIntegration points with Hadoop

Understand myriad file formats and types

Breakdown information silos across enterprise

Improved, intuitive visibility to contents

KeyView

IDOL Server (incl HDFS Sync)

Image Server & Video Server

Page 19: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.19

Hadoop HDFS Synchronizer

Deep Hadoop integration with MPP M/R architecture and enterprise-class security

• Automate the complete picture

- Extracts the entire content of a given file

residing on HDFS

- Processing on HDFS

• Configuration -> Map Reduce

- Synchronized crawlers that translate

configurations into Map/Reduce processes

- No advanced programming necessary

• Leverage M/R

- Distributed MPP processing, data locality,

minimized network traffic

• Advanced analytics built in

- OCR, entity extraction, logo detection,

IDOL HDFS Sync: prepare data for analysis

Page 20: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.20

Demo: Smart Content Hub

Hadoop Cluster

HD

FS

HDFS

Connector

IDOL

EnterpriseConnectors

IDOL

Apps

Enterprise

Repositorie

s

Cloud &

Web

Business

Users

HDFS

Sync

Hadoop Services

Edge

Node

Resource Slots

Compute Nodes

Page 21: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Hadoop and IDOL in practice

Page 22: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.22

Case study: Hadoop and Big Data in healthcare

2

2

New use-cases enabled• Population and Community Health

• Prediction capabilities (symptoms, ailments, outbreaks,

etc.)

• Clear picture of Community Health (attitudinal trends,

demographics, geospatial)

• International impact

• Benefit/Reference-based plan design

• Care Management/Care Coordination• Combine with Claims to fill in gaps (symptomatic,

attitudes, education)

• Outcome Success

• Surveillance, Analysis, Product Development

Innovation• Competitive intelligence

• Trends (attitudinal/behavioral, caregiving, device

usage, etc.)

• Monetized data insight opportunities

• Consumer Activation/Engagement/Education• Consumer conversations, trends, blogs

• Interactive/participative approach

• Expand “Circles of Influence”

• Sets Quality Standards for Care/Providers

• Reputation Management/Outreach• Sentiment management (competitor & brand)

• Outreach to support members, clients, providers

• Voice of the Customer

Claims Data

Treatment/

Service Data

Call Center Data

InnovationPayment

integrityProduct

dev.

Care

deliveryBrandConsumer

activationProviders

FWA

Recovery

Data

Provider

Information

Lines of business

Social Media

Challenges:

• Started with Payment Integrity Use-Case

• Dealing with evolving patterns of FWA

• Multiple payment systems , no single view

• No-Self Service

• Long turn around time for BI analysis

reports

HP solution:

• IDOL based solution

• Self-service analysis for business analyst

• Single point of access - Multiple systems

• Dynamic rule-engine tests against new

and historical claims to identify potential

recoveries

• Scale out on Hadoop Architecture

• New data, use-cases being added

continually

ROI:

• 24 x improvement in analysis turnaround

• Millions $$ saved in first few weeks

Page 23: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.23

Using HP IDOL with Hadoop

• Reduce cost, time, and expertise

required to gain actionable insight

• Empower business users to

interact with Hadoop data

• Real-time and interactive

• IDOL’s advanced analysis of data

• Connects all data-types

• Standardized data model

RETURN ON

INFORMATION

Securely perform enterprise-class analysis of Hadoop data

Page 24: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.24

www.hp.com/go/haven

hortonworks.com/partner/hp/

Solution brochure

Technical white paper

HP Vertica SQL on Hadoop

FAQ

Customer analytics use case

Learn more about HP Haven

Page 25: Create a Smarter Data Lake with HP Haven and Apache Hadoop

© Hortonworks Inc. 2011 – 2014. All Rights Reserved

Next Steps…

Download the Hortonworks Sandbox

Learn Hadoop

Build Your Analytic App

Try Hadoop 2

More about HP & Hortonworkshttp://hortonworks.com/partner/HP/

Contact us: [email protected]