24
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice. HP Vertica-Solving Facebooks Big Data Challenges Moustafa Soliman/ April 1, 2015

Moustafa Soliman "HP Vertica- Solving Facebook Big Data challenges"

Embed Size (px)

Citation preview

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

HP Vertica-Solving Facebooks Big Data ChallengesMoustafa Soliman/ April 1, 2015

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Hello!

Infrastructure and Outsourcing

Applications and Business Services

Analytics & Data Management

HP FinancialServices

HP Enterprise Group

HP Enterprise Services

HP SoftwareHP Printing &

Personal Systems

Hewlett-Packard Enterprise HP Inc.

HP Big Data Platform- HAVEn

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4

HP Big Data platformTransforming Big Data into business solutions

AudioSocial Media ImagesEmailVideo Search engineDocumentsTexts TransactionalData

Mobile IT/OT

Catalog massive volumes of distributed data

Hadoop/HDFSProcess and index all information

Autonomy IDOLAnalyze at extreme scale in real-time

VerticaCollect & unify machine data

Enterprise Security

Powering HP Software + your apps

nApps

HAVEn

Standard platform with connectors, applications, and engines

The Vertica Technology

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8

HP Vertica: An integrated Big Data ecosystem

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9

HP Vertica – Core Features

Columnar Storage

Compression MPP Scale-Out Distributed Query

Projections

Speeds Query Time by Reading Only Necessary Data

Lowers costly I/O to boost overall performance

Provides high scalability on

clusters with no name node or other

single point of failure

Any node can initiate the queries

and use other nodes for work.

No single point of failure

Combine high availability with

special optimizations for

query performance

CPU

Memory

Disk

CPU

Memory

Disk

CPU

Memory

Disk

A B D C E A

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Vertica co-existing to convergence

Vertica - adooH P

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11

1.Vertica and Hadoop Integration

1. MapReduce & Pig Connector

2. HDFS Flat files

3. HCatalogHadoop

HDFS

ANSI SQL

HCatalog Pig

MapReduce HB

ase

Billing

Clickstream

Telemetry

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12

Vertica Map Reduce Connector

Allows flexibility & interoperability

Integrate with Hadoop / MapReduce and Pig• Vertica-aware extension to Hadoop

• Specialized adapter for distributed streaming between Hadoop and Vertica

Developers need access to fast DBMS thatco-exists with Hadoop rather than beingembedded• Operate on different clusters, generally by different

groups of people

• Allows customers to scale computation independent of DBMS

Hadoop

HDFS

ANSI SQL

HCatalog Pig

MapReduce HB

ase

Billing

Clickstream

Telemetry

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13

Vertica Connector for HDFS

• External Tables: Query/Access HDFS data directly off HDFS

• Connector for HDFS - Load files from HDFS into Vertica via Copy command

• HDFS Storage Location - Store ROS containers on HDFS

Hadoop

HDFS

ANSI SQL

HCatalog Pig

MapReduce HB

ase

Billing

Clickstream

Telemetry

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14

Vertica HCatalog Connector

The Vertica HCatalog Connector lets you access data stored in Apache's Hive data warehouse software the same way you access it within a native Vertica table.

• Always reflects the current state of data stored in Hive.

• The HCatalog Connector uses the parallel nature of both HP Vertica and Hadoop to process Hive data.

• Since Vertica performs the extraction and parsing of data, the HCatalog Connector does not significantly increase the load on your Hadoop cluster.

Hadoop

HDFS

ANSI SQL

HCatalog Pig

MapReduce HB

ase

Billing

Clickstream

Telemetry

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15

2.Vertica SQL-on-Hadoop

• Vertica and Hadoop co-exist on shared hardware

• Vertica uses HDFS for its data storage

Hadoop

HDFS

HCatalog Pig

MapReduce HB

ase

Billing

Clickstream

Telemetry

HP Vertica- Solving Facebooks Big Data Challenges

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17

Use-case: Facebook

What do they do Challenge Business Benefit / Outcomes

Technical Pain

Use-Case

Business Challenge

• Lack of query performance

• Restricted data set due to lack of scalability

• Targeted marketing

• Generating new revenue streams from mobile

Leading social media website focused on connecting the world.

largest Database in the world (> 400PB), driving revenue through targeted online marketing/revenue from data via web and mobile

• Jobs take 1 day in Hadoop• X could not scale-up• Y too expensive

To increase revenue from information, through massive volume and variety of queries and profiling people with the right advertising campaigns

The queries take < 1 minute using HP Vertica.

The company said growth had been fuelled by advertising income, which leapt 66 per cent year-on-year.

Facebook were struggling to generate mobile advertising revenue prior to implementing HP Vertica.

© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Facebook Production Data Flow

Mobile

PC/Laptop

Web Servers

LogsHadoop/ HDFS

2 huge Hadoop Clusters

• 1.7 ExaBytes

• 15000 nodes

• 40000 nodes

Job Scheduler

Vertica

Logs

15 mins

Hourly

Daily

Legacy• Schedulers: Data Wormhole and Data Bee (Developed at FB)

• 1500B rows/day, 20TB/hour…..500TB/Day

• Keep 30-90 days. Currently 30 days

• 2 x 270-node Vertica cluster

Distance apart 60km

ETL Dual Load

One (Primary), One (Secondary)

Grow to 500-1000 nodes per cluster in one year

• 600K MR Jobs/day• 50K Informatica Jobs/day

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Thank you!

Contact us:

[email protected]

[email protected]

[email protected]

[email protected]

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

Back up

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.21

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.22

“…for Facebook’s CIO, Tim Campos, to get on stage in Europe and declare that, "A partner like HP Vertica thinks like we do” and is a “key part” of Facebook’s big data capabilities, is one the best endorsements, err … “likes,” that any modern IT infrastructure vendor could hope for.”

- Dana Gardner, Briefings Direct

Watch the video to see how Vertica empowers Facebook

© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.

The Vertica TechnologyThe 4 Cs

© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.24

HP Vertica Flex Zone

Challenge: Analyzing semi-structured data is difficult and time consuming

SOLUTION: HP Vertica Flex Zone

Benefits:

• Store and explore semi-structured data cost effectively

• Avoid creating and maintaining time-consuming schemas

• Gain 10X+ performance with one simple step

• http://www.vertica.com/hp-vertica-products/flexzone/

Extensible Analytics

SQL

Flex Zone

Explore

Enterprise Edition

Optimize

HP Vertica Analytics Platform