Upload
dataconomy
View
534
Download
0
Tags:
Embed Size (px)
Citation preview
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
HP Vertica-Solving Facebooks Big Data ChallengesMoustafa Soliman/ April 1, 2015
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Hello!
Infrastructure and Outsourcing
Applications and Business Services
Analytics & Data Management
HP FinancialServices
HP Enterprise Group
HP Enterprise Services
HP SoftwareHP Printing &
Personal Systems
Hewlett-Packard Enterprise HP Inc.
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.4
HP Big Data platformTransforming Big Data into business solutions
AudioSocial Media ImagesEmailVideo Search engineDocumentsTexts TransactionalData
Mobile IT/OT
Catalog massive volumes of distributed data
Hadoop/HDFSProcess and index all information
Autonomy IDOLAnalyze at extreme scale in real-time
VerticaCollect & unify machine data
Enterprise Security
Powering HP Software + your apps
nApps
HAVEn
Standard platform with connectors, applications, and engines
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.6
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.7
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.8
HP Vertica: An integrated Big Data ecosystem
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.9
HP Vertica – Core Features
Columnar Storage
Compression MPP Scale-Out Distributed Query
Projections
Speeds Query Time by Reading Only Necessary Data
Lowers costly I/O to boost overall performance
Provides high scalability on
clusters with no name node or other
single point of failure
Any node can initiate the queries
and use other nodes for work.
No single point of failure
Combine high availability with
special optimizations for
query performance
CPU
Memory
Disk
CPU
Memory
Disk
CPU
Memory
Disk
A B D C E A
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Vertica co-existing to convergence
Vertica - adooH P
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.11
1.Vertica and Hadoop Integration
1. MapReduce & Pig Connector
2. HDFS Flat files
3. HCatalogHadoop
HDFS
ANSI SQL
HCatalog Pig
MapReduce HB
ase
Billing
Clickstream
Telemetry
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.12
Vertica Map Reduce Connector
Allows flexibility & interoperability
Integrate with Hadoop / MapReduce and Pig• Vertica-aware extension to Hadoop
• Specialized adapter for distributed streaming between Hadoop and Vertica
Developers need access to fast DBMS thatco-exists with Hadoop rather than beingembedded• Operate on different clusters, generally by different
groups of people
• Allows customers to scale computation independent of DBMS
Hadoop
HDFS
ANSI SQL
HCatalog Pig
MapReduce HB
ase
Billing
Clickstream
Telemetry
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.13
Vertica Connector for HDFS
• External Tables: Query/Access HDFS data directly off HDFS
• Connector for HDFS - Load files from HDFS into Vertica via Copy command
• HDFS Storage Location - Store ROS containers on HDFS
Hadoop
HDFS
ANSI SQL
HCatalog Pig
MapReduce HB
ase
Billing
Clickstream
Telemetry
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.14
Vertica HCatalog Connector
The Vertica HCatalog Connector lets you access data stored in Apache's Hive data warehouse software the same way you access it within a native Vertica table.
• Always reflects the current state of data stored in Hive.
• The HCatalog Connector uses the parallel nature of both HP Vertica and Hadoop to process Hive data.
• Since Vertica performs the extraction and parsing of data, the HCatalog Connector does not significantly increase the load on your Hadoop cluster.
Hadoop
HDFS
ANSI SQL
HCatalog Pig
MapReduce HB
ase
Billing
Clickstream
Telemetry
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.15
2.Vertica SQL-on-Hadoop
• Vertica and Hadoop co-exist on shared hardware
• Vertica uses HDFS for its data storage
Hadoop
HDFS
HCatalog Pig
MapReduce HB
ase
Billing
Clickstream
Telemetry
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.17
Use-case: Facebook
What do they do Challenge Business Benefit / Outcomes
Technical Pain
Use-Case
Business Challenge
• Lack of query performance
• Restricted data set due to lack of scalability
• Targeted marketing
• Generating new revenue streams from mobile
Leading social media website focused on connecting the world.
largest Database in the world (> 400PB), driving revenue through targeted online marketing/revenue from data via web and mobile
• Jobs take 1 day in Hadoop• X could not scale-up• Y too expensive
To increase revenue from information, through massive volume and variety of queries and profiling people with the right advertising campaigns
The queries take < 1 minute using HP Vertica.
The company said growth had been fuelled by advertising income, which leapt 66 per cent year-on-year.
Facebook were struggling to generate mobile advertising revenue prior to implementing HP Vertica.
© Copyright 2013 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Facebook Production Data Flow
Mobile
PC/Laptop
Web Servers
LogsHadoop/ HDFS
2 huge Hadoop Clusters
• 1.7 ExaBytes
• 15000 nodes
• 40000 nodes
Job Scheduler
Vertica
Logs
15 mins
Hourly
Daily
Legacy• Schedulers: Data Wormhole and Data Bee (Developed at FB)
• 1500B rows/day, 20TB/hour…..500TB/Day
• Keep 30-90 days. Currently 30 days
• 2 x 270-node Vertica cluster
Distance apart 60km
ETL Dual Load
One (Primary), One (Secondary)
Grow to 500-1000 nodes per cluster in one year
• 600K MR Jobs/day• 50K Informatica Jobs/day
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Thank you!
Contact us:
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
Back up
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.21
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.22
“…for Facebook’s CIO, Tim Campos, to get on stage in Europe and declare that, "A partner like HP Vertica thinks like we do” and is a “key part” of Facebook’s big data capabilities, is one the best endorsements, err … “likes,” that any modern IT infrastructure vendor could hope for.”
- Dana Gardner, Briefings Direct
Watch the video to see how Vertica empowers Facebook
© Copyright 2014 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.
The Vertica TechnologyThe 4 Cs
© Copyright 2012 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice.24
HP Vertica Flex Zone
Challenge: Analyzing semi-structured data is difficult and time consuming
SOLUTION: HP Vertica Flex Zone
Benefits:
• Store and explore semi-structured data cost effectively
• Avoid creating and maintaining time-consuming schemas
• Gain 10X+ performance with one simple step
• http://www.vertica.com/hp-vertica-products/flexzone/
Extensible Analytics
SQL
Flex Zone
Explore
Enterprise Edition
Optimize
HP Vertica Analytics Platform