Upload
others
View
11
Download
0
Embed Size (px)
Citation preview
Panasas and Hadoop
High Performance NAS for Hadoop
HPC ADVISORY COUNCIL, STANFORD
FEB 8, 2013
DR. BRENT WELCH, CTO, PANASAS
Panasas and Hadoop 2
PANASAS TECHNICAL DIFFERENTIATION
Scalable Performance
• Balanced object-storage building block [8TB SATA, 120GB SSD, 8GB RAM, 1 core, dual GE]
• 40 TB to 8 PB single system supporting 100’s to 1000’s of active clients
Novel Data Integrity Protection
• File system and RAID are integrated
• Highly reliable data w/ novel data protection systems
Maximum Availability
• Built-in distributed system platform manages 100’s of blades
Simple to Deploy and Maintain
• Integrated storage system with appliance model
Application Acceleration
• Customer proven results
Standards Based
• pNFS, OSD ActiveStor 14
Panasas and Hadoop 3
ACTIVESTOR BLADE HARDWARE
Dual Power Supplies + Battery
Dual 10GE
uplinks
Enterprise SATA + SSD => OSD
Scalable Metadata
4u
Panasas and Hadoop 4
PANASAS SYSTEM VIEW
Complete “appliance” solution (HW + SW), blade form factor
• DirectorBlade = metadata server
• StorageBlade = OSD
Clustered, fault tolerant
metadata services
Linux kernel module for
parallel I/O
DirectFlow, or pNFS
Object Storage
Snapshots, Quota
Global namespace
NFS & CIFS re-export
4
iSCSI/OSD
OSDFS
Storage
Blade
1000+
SysMgr
PanFS
NFS/CIFS
Client
DirectorBlade
100+
Client
Compute Nodes
RPC
10,000+
Panasas and Hadoop 5
PANASAS PARALLEL DATA PATH
Data path by-passes RAID controllers and metadata servers
• Application writes data
• DirectFlow/pNFS client layer generates redundant data for each stripe
• Everything is written directly to storage
• All blades work together on RAID rebuild
Client Client Client Client Client Client
Ethernet Network
Panasas and Hadoop 7
PANASAS PARALLEL ADVANTAGE
Scale-out storage system with true parallel architecture
• Scale performance and capacity at the same time
• Rapid recovery from failure – shared RAID responsibility
0
20
40
60
80
100
120
140
0 2 4 6 8 10 12 14
# Shelves
One Volume, 1G Files
One Volume, 100MB Files
N Volumes, 1GB Files
N Volumes, 100MB Files
MB/sec Rebuild
4 Shelves are 4
times faster than 1 12 Shelves rebuild 12
times faster than 1
0
500
1000
1500
2000
2500
0 16 32 48 64 80 96 112 128 144
MB
/se
c
IOR processes
Shelf Scaling
Write 4 shelves 16 clients
Write 2 shelves 8 clients
Write 1 shelf 8 clients
3.4 testing December 2008, PAS 8 10GE
Panasas and Hadoop 8
SCALABLE BANDWIDTH
0
2000
4000
6000
8000
10000
12000
14000
0 1 2 3 4 5 6 7 8 9
MB
/se
c
# Shelves, 80-procs per shelf
Shelf Scaling Nov 2012, 5.0.0
Write Aggregate
Read Aggregate
Write Per Shelf
Read Per Shelf
Testing Nov, 2012, AS-12 & AS-14, Rel 5.0.0
8 Shelves are 8
times faster than 1
Panasas and Hadoop 9
HIGH PERFORMANCE NAS FOR HADOOP
Panasas and Hadoop 10
HADOOP HW ENVIRONMENT
Compute
Data
Compute
Data
Compute
Data
Compute
Data
Compute
Data
Compute
Data
Compute
Data
Compute
Data
Low cost hardware, run until failure, offline service
Network infrastructure often oversubscribed
Panasas and Hadoop 11
HADOOP SW ENVIRONMENT
Hadoop environment is open Java implementation of a family of
data and compute facilities
• Hadoop job scheduler for Map/Reduce applications
• HDFS file system
• Zookeeper configuration management
• NoSQL key-value stores layered over HDFS
• Query languages
• Many more
Panasas and Hadoop 12
LIMITATIONS OF THE ENVIRONMENT
Classic HW config mixes compute and data, with weak network
• Motivates function shipping instead of data shipping
• Even so, local access to data is not always possible
• Triplication is an expensive way to do data protection
• Not easy to share HDFS data with “normal” applications
• Classic model grew up in an environment skewed by Google requirements
• Very different than classic HPC environment
Panasas and Hadoop 13
DEDICATED COMPUTE AND STORAGE
Compute
OSD Data
Separating compute and storage demands a high quality network
Data is shared among different compute clusters
Hardware replacement cycles for compute and storage differ
Compute Compute
Compute Compute
Compute Compute
Compute Compute
Compute
OSD Data OSD Data OSD Data OSD Data OSD Data OSD Data OSD Data OSD Data OSD Data Network
Compute
Data
Compute
Data
NFS4.1
Metadata service
Panasas and Hadoop 14
HIGH PERFORMANCE NAS FOR HADOOP
A fast network and a good, scalable parallel file system
• Keep compute and data management separate
• Mixed workflows with different kinds of application sharing data
Performance intuition
• A local disk goes at 50 to 100 MB/sec (large sequential workloads)
• A good network file system can deliver 500-1000+ MB/sec to one client
• A local SSD can deliver 250 to 2500 MB/sec
• Tuning Map/Reduce is more about partitioning a problem so it fits into
main memory of the nodes
Management intuition
• Data scattered among compute nodes makes them “heavy”
• Hard to upgrade compute w/out affecting storage
• Serviceability model of many hard drives or expensive PCIe card in every
compute node is not very good
Panasas and Hadoop 15
COMPARING PANFS AND HDFS
Hadoop Panasas Comment
Data Availability Triple
Replication
Object RAID Panasas at 15%
overhead vs. 200%
File system
support
Proprietary POSIX Panasas files can be
shared with other big
data workloads
Hardware Compute and
Storage scale
together
Compute and
Storage
independent
Panasas allows
independent scaling of
compute and storage
Applications Single task -
Hadoop
analytics
Multi-purpose
workloads
Panasas designed for
many big data
workloads
Multi-client
write to file
Not allowed -
WORM
Supported –
Write many
Panasas big data
workloads require
concurrent file access by
multiple clients
Small File No Yes Panasas well suited to
mixed big data workloads
Panasas and Hadoop 16
ENTERPRISE HADOOP ENVIRONMENT
Reliable, trusted enterprise storage • Panasas storage offers enterprise class features such as snapshots, user
quotas, service and IT administration
Panasas allows users to scale computing and storage
independently • Features such as load balancing ensure all nodes are equally capable of
participating in data transfers
• Storage can be added to a live system and dynamically integrated into the
available pool
Data management and data retention • Supports data migration, old data can be moved to archives
• It can integrate into with existing data management systems − Hadoop lacks any built-in data migration other than replication the entire data to
another system
Scalable storage performance • Tightly balanced system that scales performance linearly as more nodes
are added to the system
Panasas and Hadoop 17
USING NAS WITH HADOOP
Can run on any distribution and any version (Cloudera,
Hortonworks, Apache)
• No updates required for newer versions of Hadoop
No need for proprietary software implementation
• Simple configuration setup
Can run on HDFS or run directly on PanFS
• Layer HDFS over PanFS
• Configure HDFS pathnames to use /panfs
− URL: hdfs://panfs/system/workspace
• Bypass HDFS entirely
• Configure file:// URLs to use /panfs
− URL: file://panfs/system/workspace
Details captured in a white paper and configuration guide
• visit www.panasas.com to get a copy of the paper
Panasas and Hadoop 18
PERFORMANCE, HDFS OVER PANFS
41% faster than local disk on HDFS (1 copy)
29% faster than local disk on HDFS (2 copy)
0
500
1,000
1,500
2,000
2,500
Local Disk ActiveStor 14T
TeraValidate
TeraSort
TeraGen
Seconds
2,302
1,638
Download Panasas whitepaper for detailed setup and results
http://www.panasas.com/sites/default/files/uploads/docs/hadoop_wp_lr_1096.pdf
HDFS configured to
store data into
PanFS
Equal # of disks
Panasas and Hadoop 19
0
500
1000
1500
2000
2500
3000
3500
4000
4500
5000
HDFS PanFS
TeraValidate
TeraGen
TeraSort
PERFORMANCE, HDFS VS PANFS
HDFS: nodes use local disk
PanFS: nodes use PanFS
HDFS: two-copy replication
PanFS: Object RAID
Generate, Sort, and
Validate 1TB of key/values
Seconds to complete
Lower is better
Panasas and Hadoop 20
SUMMARY
The decisions around the original Hadoop hardware platform
were driven by dedicated application specific requirements • Direct attach dedicated server cluster works when the data set is small or
when the entire business revolves around Hadoop
Mixed use environments, typical of the enterprise require a
system that has flexibility, high-reliability, enterprise fault
tolerance and supports typical Disaster recovery strategies
Panasas Network attached storage is a viable option for many
big data workloads including Hadoop analytics
As networking continues to get faster and cheaper Networked
storage will become an increasingly viable solution for Hadoop • Large data sets are unwieldy on local disk
• Management headache of the 1990’s in the enterprise again?
Hadoop is first an application, the hardware choice depends on
the business specific context. Panasas NAS is a viable, high
performance solution for mixed-use workloads
Panasas and Hadoop 21
THANK YOU