Upload
chiou-nan-chen
View
107
Download
2
Embed Size (px)
DESCRIPTION
VMWare Big Data Forum
Citation preview
1© Copyright 2013 Pivotal. All rights reserved. 1© Copyright 2013 Pivotal. All rights reserved.
A NEW PLATFORM FOR A NEW ERA
2Pivotal Confidential–Internal Use Only 2© Copyright 2013 Pivotal. All rights reserved.
Pivotal HD
3Pivotal Confidential–Internal Use Only
HDFS
HBase
Pig, Hive, Mahout
Map Reduce
Sqoop Flume
Resource Management & Workflow
Yarn
Zookeeper
Apache Pivotal HD Added Value
Configure,
Deploy, Monitor,
Manage
Command
Center
Hadoop Virtualization (HVE)
Data Loader
Pivotal HDEnterprise
XtensionFramework
CatalogServices
QueryOptimizer
Dynamic Pipelining
ANSI SQL + Analytics
HAWQ– Advanced Database Services
Pivotal HD Architecture
4Pivotal Confidential–Internal Use Only
• HDFS – The Hadoop Distributed File System acts as the storage layer for Hadoop
• MapReduce – Parallel processing framework used for data computation in Hadoop
• Hive – Structured, data warehouse implementation for data in HDFS that provides a SQL-like interface to Hadoop
• Pig – High-level procedural language for data pipeline/data flow processing in Hadoop
• HBase – NoSQL, key-value data store on top of HDFS
• Mahout – Library of scalable machine-learning Algorithms
• Spring Hadoop – Integrates the Spring framework into Hadoop
Pivotal HD Components
5Pivotal Confidential–Internal Use Only
• Installation and Configuration Manager (ICM) – cluster installation, upgrade, and expansion tools.
• GP Command Center – visual interface for cluster health, system metrics, and job monitoring.
• Hadoop Virtualization Extension (HVE) – enhances Hadoop to support virtual node awareness and enables greater cluster elasticity.
• GP Data Loader – parallel loading infrastructure that supports “line speed” data loading into HDFS.
• Isilon Integration – extensively tested at scale with guidelines for compute-heavy, storage-heavy, and balanced configurations.
• Advanced Database Services (HAWQ)– high-performance, “True SQL” query interface running within the Hadoop cluster.
• Extensions Framework (GPXF) – support for HAWQ interfaces on external data providers (HBase, Avro, etc.).
• Advanced Analytics Functions (MADLib) – ability to access parallelized machine-learning and data-mining functions at scale.
GPHD Includes… Pivotal HD Adds the Following to GPHD…
Pivotal HD Value-Added Components
6Pivotal Confidential–Internal Use Only
Component Version
Hadoop 1.0.3
HBase 0.92.1
Hive 0.8.1
Mahout 0.6
Pig 0.9.2
Zookeeper 3.3.5
Flume 1.2.0
Sqoop 1.4.1
Spring Hadoop
GPHD 1.2 Core Distribution Pivotal HD Enterprise
Pivotal Core Components & Versions
Component Version
Hadoop 2.0.2
HBase 0.94.2
Hive 0.9.1
Mahout 0.8.0
Pig 0.10.0
Zookeeper 3.4.5
Flume 1.3.1
Sqoop 1.4.2
Spring Hadoop 1.0.0
7Pivotal Confidential–Internal Use Only
DataLoader
.
.
.
Streams
Push
Pull
Connectors
Flume
HDFS
DataLoader
Data Source Registration
Copy Strategy
Optimization
Web GUI and CLI
Data Destination Registration
Data Copy
Job Management
Data Processing
REST APIs
Files
HDFS
NFS
HTTP
FTP
Local
8Pivotal Confidential–Internal Use Only
Command CenterSimple and complete cluster management
Install and configure Hadoop components and services
Centralized interface for Pivotal HD cluster monitoring, diagnostics, and management
Live and historical Hadoop system metrics analysis
Configure
Monitor
Manage
Analyze
Deploy
9Pivotal Confidential–Internal Use Only
Command Center – Monitor, Manage, and Analyze Host, application, and job level
monitoring across the entire Pivotal HD cluster performance
Visualize and analyze live and historical Hadoop cluster information through Command Center Dashboard
Quick diagnostics of functional or performance issue
10Pivotal Confidential–Internal Use Only
Hadoop Virtualization Extensions (HVE)
• HVE enables Hadoop to support more effective virtual deployments
• This creates the opportunity to provision and scale the compute and storage processes
independently resulting in:
• Much better resource utilization
• Improved resource allocation and consumption
• Support Multi-Tenancy
11Pivotal Confidential–Internal Use Only
HAWQ Delivers
SQL compliant
World-class query optimizer
Interactive query
Horizontal scalability
Robust data management
Common Hadoop formats
Deep analytics
12Pivotal Confidential–Internal Use Only
Xtension Framework
An advanced version of GPDB external tables
Enables combining HAWQ data and Hadoop data in single query
Supports connectors for HDFS, Hbase and Hive
Provides extensible framework API to enable custom connector development for other data sources
HDFS HBase Hive
Xtension Framework