Pivotal HAWQ and Hortonworks Data Platform: Modern Data Architecture for IT Transformation

Preview:

Citation preview

1© 2015 Pivotal Software, Inc. All rights reserved.

2© 2015 Pivotal Software, Inc. All rights reserved.

Agenda

• Hortonworks Data Platform Overview

• Pivotal Big Data Suite Overview

• Pivotal HAWQ

• Demo

• Pivotal HAWQ & HDP Business Value and Use Cases

• Q&A

HAWQ

3© 2015 Pivotal Software, Inc. All rights reserved.

Your Hosts

Parham ParviziPRODUCT MANAGER, PIVOTAL HAWQ, PIVOTAL

Parham Parvizi is a Product Manager at Pivotal , where he is responsible for driving the technical product roadmap of the company's flagship SQL on Hadoop product – Pivotal HAWQ.

Shivaji DuttaDEVELOPER EVANGELIST / SR PARTNER SOLUTIONSENGINEERING, HORTONWORKS

Shivaji is Sr. Partner Engineer with Hortonworks. He has over 18 years of Software Development and Consulting Experience.

Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Hadoop for the Enterprise: Implement a Modern Data Architecture with HDP

Customer Momentum• 437 customers (as of March 31, 2015)

• 105 customers added in Q1 2015

Hortonworks Data Platform• Completely open multi-tenant platform for any app and any data.

• A centralized architecture of consistent enterprise services for resource management, security, operations, and governance.

Partner for Customer Success• Open source community leadership focus on enterprise needs

• Unrivaled world class support

• Founded in 2011

• Original 24 architects, developers, operators of Hadoop from Yahoo!

• 600+ Employees

• 1100+ Ecosystem Partners

Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

HDP Makes Hadoop Enterprise-Ready

Hortonworks Data Platform

Multi-tenant data platform built on a centralized architecture of shared enterprise services

YARN: data operating system

Governance Security

Operations

Resource management

Existing applications

Newanalytics

Partner applications

Data access: batch, interactive, real-time

Storage

Key Benefits

• Consolidates all data sets

• Delivers real-time insights

• Integrates with data center

• Scalable and affordable

Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Hortonworks Data Platform

Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

SQL Engines on HDP

• Apache Hive + Tez + ORC

• Apache Phoenix

• Spark SQL (Tech Preview)

• HAWQ

Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Pivotal and Hortonworks

• Joint engineering

Pivotal HD and HDP based on a common core

Pivotal HAWQ certified on HDP

• Co-founders of Open Data Platform

PIVOTAL AND HORTONWORKS

ARE STRONG DRIVERS OF

OPEN SOURCE SOFTWARE

PIVOTAL AND HORTONWORKS ARE STRONG DRIVERS OF OPEN SOURCE SOFTWARE

ODP#

(Enterprise Hardening)

Hortonworks Data Platform

(HDP)

Pivotal Hadoop Distribution

Other apps/tools*

*ex. Analytics apps and visualization tools

#OpenDataPlatform.org

Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved

Pivotal HAWQ on HDP Certified

10© 2015 Pivotal Software, Inc. All rights reserved.

BUSINESSVALUE FROM DATA

Transforming companies into data-driven enterprises with open, agile, cloud-ready end-to-end solutions

PLATFORMAT YOUR SERVICE

Pioneering an open vision for cloud-based, agile application development

A BETTER WAY TO BUILD PRODUCTS

World-class applicationdevelopment services,‘Pivots’, & transformativemethodologies

Cloud Foundry Big Data Suite Pivotal Labs

Pivotal – Business Groups

11© 2015 Pivotal Software, Inc. All rights reserved.

12© 2015 Pivotal Software, Inc. All rights reserved.

HAWQ + HDP

HDPOpenEnterpriseHadoop

Pivotal HAWQ 100%

ANSI SQL

PerformanceComplex Query

Accessible

Pivotal HAWQ

13© 2015 Pivotal Software, Inc. All rights reserved.

HAWQ + HDP Pivotal HAWQ

• Discover New Relationships• Enable Data Science • Analyze External Sources• Query All Data Types!

Multi-level Fault Tolerance

Granular Authorization

Resource Pools

High multi-tenancy

100% ANSI SQL Standard

OLAP Extensions

JDBC ODBCConnectivity

MPP Architecture

Online Expansion

HDFS

Petabyte Scale

Cost Based Optimizer

Dynamic Pipelining

ACID + Transaction

al

Multi-LanguageUDF Support

Built-in Data Science Library

Extensible (PXF)

Query External Sources

Hardened, 10+ Years Tested, Production Proven

Accessibility + Usability

HDFS Native File Formats

• Manage Multiple Workloads• Petabyte Scale Analytics• Sub-second Performance

• Leverage Existing Skills & Tools

• Easily Integrate with Other Tools

Compression + Partitioning

core

com

pli

ance

• Well Integrated with Hortonworks Data Platform

14© 2015 Pivotal Software, Inc. All rights reserved.

Reasons WhyCustomersWill Prefer

HDP + HAWQ

5Pivotal HAWQ

15© 2015 Pivotal Software, Inc. All rights reserved.

Reasons WhyCustomersWill Prefer

HDP + HAWQ

• Up to 30x SQL on Hadoop performance advantage

• Faster time to insight• Massive MPP scalability to petabytes

Benefits: Near real-time latency, complex queries and advanced analytics at scale

1. Advanced Analytics PerformancePivotal HAWQ

5

16© 2015 Pivotal Software, Inc. All rights reserved.

Reasons WhyCustomersWill Prefer

HDP + HAWQ

• ANSI SQL-92, -99, -2003• All 99 TPC-DS queries tested, no

modifications• Plus, OLAP extensions• Complete ACID integrity and reliability

Benefits: 100% SQL compliant No risk to SQL applications All native on HDP via HAWQ

2. 100% ANSI SQL CompliantPivotal HAWQ

5

17© 2015 Pivotal Software, Inc. All rights reserved.

Reasons WhyCustomersWill Prefer

HDP + HAWQ

• Advanced machine learning for big data• Local, in database operation• Exceptional MPP/parallel performance• Open source, Postgres-based

Benefits: Advanced, highly scalable, machine learning, directly on HDP data

3. Integrated Machine LearningPivotal HAWQ

5

18© 2015 Pivotal Software, Inc. All rights reserved.

Reasons WhyCustomersWill Prefer

HDP + HAWQ

• HDP and Pivotal HD, easily managed via Ambari

• On premises, in cloud, or PaaS• Hbase, Avro, Parquet, ORC and more• Plus, connectors to make HAWQ data

available to other SQL query tools

Benefits: Flexibility Accessibility

Portability

4. Flexible DeploymentPivotal HAWQ

5

19© 2015 Pivotal Software, Inc. All rights reserved.

Reasons WhyCustomersWill Prefer

HDP + HAWQ

• Cost-based query optimization

• Robust query plan optimization

• Complex big data management

Benefits: Optimize performance and costs Maximize HDP cluster resources Offload EDW without compromise

5. Query Optimization OptionsPivotal HAWQ

5

20© 2015 Pivotal Software, Inc. All rights reserved.

HAWQ over Competition - Impala• 100% TPC-DS Compatible• HAWQ completed 58/99 TPC-DS queries 12 hours faster!• Multi-dimensional queries with subqueries, dynamic

partition elimination, large table joins, and roll-ups• Higher concurrency

• Only partial ANSI-SQL compatibility58/99 TPC-DS queries

• Exposure to application errors dueto SQL incompatibilities

• Single-dimension queries. • No nesting, small table joins, no roll-ups• Limited performance range• No machine learning!

100%ANSISQL

Que

ry c

ompl

exity

& s

peed

req

uire

men

ts

+

-0%

Pivotal HAWQ

21© 2015 Pivotal Software, Inc. All rights reserved.

TPC-DS Results vs. Impala Pivotal HAWQ

HAWQ Faster 88% of queries 12 hours

Impala Faster12% of queries

Subset of TPC-DS Queries Comparison of HAWQ vs. Impala

TPC-DS Queries

Qu

ery

Ru

nti

me

Dif

fere

nce

(s)

(+H

AW

Q F

aste

r/-I

mp

ala

Fas

ter)

22© 2015 Pivotal Software, Inc. All rights reserved.

HAWQ Integration with Hive & Value-Add

Query all Hive tables via PXF

Easily move between HAWQ and Hive

Value-Add:– Application sub-second performance and faster time to insight– Integration with traditional BI Reporting tools and complex machine

generated SQL– Data Science driven application requiring built-in machine learning– Large queries across multiple dataset to find new relationship and patterns– Silo Analytical application with large ad-hoc users and high multi-tenancy– Complex SQL statements with multi-level selects, partitions and rollups

+

Pivotal HAWQ

23© 2015 Pivotal Software, Inc. All rights reserved.

Collaboration on support

Full SQL on Hadoop Performance Leadership

Common ODP core

BDS + HDP

Focus on solution life-cycle

Exceptional performance, applications run without SQL errors, leverage

existing SQL skills

No vendor lock-in protects investment and grows ecosystem

Apache open source availability of software

Benefits Summary

24© 2015 Pivotal Software, Inc. All rights reserved.

+Open Enterprise Hadoop Powering Digital Transformation

Working together to digitally transform companies into innovative enterprises

25© 2015 Pivotal Software, Inc. All rights reserved.

Recommended