View
95
Download
2
Category
Preview:
DESCRIPTION
Citation preview
10/11/11 © MapR Confiden0al 1
MapR, Implica0ons for Integra0on
CMU – September 2011
10/11/11 © MapR Confiden0al 2
Outline
• MapR system overview • Map-‐reduce review • MapR architecture • Performance Results • Map-‐reduce on MapR
• Architectural implica0ons • Search indexing / deployment • EM algorithm for machine learning • … and more …
10/11/11 © MapR Confiden0al 3
Map-‐Reduce
!"
!"
!#
!#
$%&'()"*" +,&)!'%-(./%0)"*#
12'!!3)"*4 536'-3)!'%-(./%0)"*7
8'(&'()"930)"*:
;.<'=3)")>&=./=),=(?
@/-,9)A.0B
@/-,9)A.0B
!"!#
Input Output
Shuffle
10/11/11 © MapR Confiden0al 4
BoQlenecks and Issues
• Read-‐only files • Many copies in I/O path • Shuffle based on HTTP • Can’t use new technologies • Eats file descriptors
• Spills go to local file space • Bad for skewed distribu0on of sizes
10/11/11 © MapR Confiden0al 5
MapR Areas of Development
Map Reduce
Storage Services
Ecosystem
HBase
Management
10/11/11 © MapR Confiden0al 6
MapR Improvements
• Faster file system • Fewer copies • Mul0ple NICS • No file descriptor or page-‐buf compe00on
• Faster map-‐reduce • Uses distributed file system • Direct RPC to receiver • Very wide merges
10/11/11 © MapR Confiden0al 7
MapR Innova0ons
• Volumes • Distributed management • Data placement
• Read/write random access file system • Allows distributed meta-‐data • Improved scaling • Enables NFS access
• Applica0on-‐level NIC bonding • Transac0onally correct snapshots and mirrors
10/11/11 © MapR Confiden0al 8
MapR's Containers
l Each container contains l Directories & files l Data blocks
l Replicated on servers l No need to manage
directly
Files/directories are sharded into blocks, which are placed into mini NNs (containers ) on disks
Containers are 16-‐32 GB segments of disk, placed on nodes
10/11/11 © MapR Confiden0al 9
MapR's Containers
l Each container has a replica0on chain
l Updates are transac0onal l Failures are handled by rearranging replica0on
10/11/11 © MapR Confiden0al 10
Container loca0ons and replica0on
CLDB
N1, N2 N3, N2
N1, N2
N1, N3
N3, N2
N1
N2
N3 Container loca0on database (CLDB) keeps track of nodes hos0ng each container and replica0on chain order
10/11/11 © MapR Confiden0al 11
MapR Scaling Containers represent 16 -‐ 32GB of data
l Each can hold up to 1 Billion files and directories l 100M containers = ~ 2 Exabytes (a very large cluster)
250 bytes DRAM to cache a container l 25GB to cache all containers for 2EB cluster
- But not necessary, can page to disk l Typical large 10PB cluster needs 2GB
Container-‐reports are 100x -‐ 1000x < HDFS block-‐reports l Serve 100x more data-‐nodes l Increase container size to 64G to serve 4EB cluster
l Map/reduce not affected
10/11/11 © MapR Confiden0al 12
MapR's Streaming Performance
Read Write0
250
500
750
1000
1250
1500
1750
2000
2250
Read Write0
250
500
750
1000
1250
1500
1750
2000
2250
HardwareMapRHadoopMB
per sec
Tests: i. 16 streams x 120GB ii. 2000 streams x 1GB
11 x 7200rpm SATA 11 x 15Krpm SAS
Higher is be;er
10/11/11 © MapR Confiden0al 13
Terasort on MapR
1.0 TB0
10
20
30
40
50
60
3.5 TB0
50
100
150
200
250
300
MapRHadoop
Elapsed =me (mins)
10+1 nodes: 8 core, 24GB DRAM, 11 x 1TB SATA 7200 rpm
Lower is be;er
10/11/11 © MapR Confiden0al 14
HBase on MapR
Records per
second
Higher is be;er 0
5000
10000
15000
20000
25000
Zipfian Uniform
MapR
Apache
YCSB Random Read with 1 billion 1K records 10+1 node cluster: 8 core, 24GB DRAM, 11 x 1TB 7200 RPM
10/11/11 © MapR Confiden0al 15
# of files (m)
Rat
e (f
iles/
sec)
Op: -‐ create file -‐ write 100 bytes -‐ close
Notes:
-‐ NN not replicated
-‐ NN uses 20G DRAM
-‐ DN uses 2G DRAM
Out of box
Tuned
Small Files (Apache Hadoop, 10 nodes)
10/11/11 © MapR Confiden0al 16
MUCH faster for some opera0ons
# of files (millions)
Create Rate
Same 10 nodes …
10/11/11 © MapR Confiden0al 17
What MapR is not
• Volumes != federa0on • MapR supports > 10,000 volumes all with independent placement and defaults
• Volumes support snapshots and mirroring • NFS != FUSE • Checksum and compress at gateway • IP fail-‐over • Read/write/update seman0cs at full speed
• MapR != maprfs
10/11/11 © MapR Confiden0al 18
New Capabili0es
10/11/11 © MapR Confiden0al 19
Alterna0ve NFS moun0ng models
• Export to the world • NFS gateway runs on selected gateway hosts
• Local server • NFS gateway runs on local host • Enables local compression and check summing
• Export to self • NFS gateway runs on all data nodes, mounted from localhost
10/11/11 © MapR Confiden0al 20
Export to the world
NFS Server NFS Server NFS Server NFS Server NFS
Client
10/11/11 © MapR Confiden0al 21
Client
NFS Server
Local server
Applica0on
Cluster Nodes
10/11/11 © MapR Confiden0al 22
Cluster Node
NFS Server
Universal export to self
Task
Cluster Nodes
10/11/11 © MapR Confiden0al 23
Cluster Node
NFS Server
Task
Cluster Node
NFS Server
Task
Cluster Node
NFS Server
Task
Nodes are iden0cal
10/11/11 © MapR Confiden0al 24
Applica0on architecture
• High performance map-‐reduce is nice
• But algorithmic flexibility is even nicer
10/11/11 © MapR Confiden0al 25
Sharded text Indexing
Map Reducer
Input documents
Local disk Search
Engine Local disk
Clustered index storage
Assign documents to shards
Index text to local disk and then copy index to distributed file store
Copy to local disk typically required before index can be loaded
10/11/11 © MapR Confiden0al 26
Sharded text indexing
• Mapper assigns document to shard • Shard is usually hash of document id
• Reducer indexes all documents for a shard • Indexes created on local disk • On success, copy index to DFS • On failure, delete local files
• Must avoid directory collisions • can’t use shard id!
• Must manage and reclaim local disk space
10/11/11 © MapR Confiden0al 27
Conven0onal data flow
Map Reducer
Input documents
Local disk Search
Engine Local disk
Clustered index storage
Failure of a reducer causes garbage to accumulate in the
local disk
Failure of search engine requires
another download of the index from clustered storage.
10/11/11 © MapR Confiden0al 28
Search Engine
Simplified NFS data flows
Map Reducer
Input documents Clustered
index storage
Failure of a reducer is cleaned up by map-‐reduce framework
Search engine reads mirrored index directly.
Index to task work directory via NFS
10/11/11 © MapR Confiden0al 29
Simplified NFS data flows
Map Reducer
Input documents
Search Engine
Mirrors
Search Engine
Mirroring allows exact placement of index data
Aribitrary levels of replica0on also possible
10/11/11 © MapR Confiden0al 30
How about another one?
10/11/11 © MapR Confiden0al 31
K-‐means
• Classic E-‐M based algorithm • Given cluster centroids, • Assign each data point to nearest centroid • Accumulate new centroids • Rinse, lather, repeat
10/11/11 © MapR Confiden0al 32
Aggregate new
centroids
K-‐means, the movie
Assign to
Nearest centroid
Centroids
I n p u t
10/11/11 © MapR Confiden0al 33
But …
10/11/11 © MapR Confiden0al 34
Average models
Parallel Stochas0c Gradient Descent
Train sub
model
Model
I n p u t
10/11/11 © MapR Confiden0al 35
Update model
Varia0onal Dirichlet Assignment
Gather sufficient sta0s0cs
Model
I n p u t
10/11/11 © MapR Confiden0al 36
Old tricks, new dogs
• Mapper • Assign point to cluster • Emit cluster id, (1, point)
• Combiner and reducer • Sum counts, weighted sum of points • Emit cluster id, (n, sum/n)
• Output to HDFS
Read from HDFS to local disk by distributed cache
WriQen by map-‐reduce
Read from local disk from distributed cache
10/11/11 © MapR Confiden0al 37
Old tricks, new dogs
• Mapper • Assign point to cluster • Emit cluster id, (1, point)
• Combiner and reducer • Sum counts, weighted sum of points • Emit cluster id, (n, sum/n)
• Output to HDFS MapR FS
Read from NFS
WriQen by map-‐reduce
10/11/11 © MapR Confiden0al 38
Poor man’s Pregel
• Mapper
• Lines in bold can use conven0onal I/O via NFS
38
while not done:! read and accumulate input models! for each input:! accumulate model! write model! synchronize! reset input format!emit summary!
10/11/11 © MapR Confiden0al 39
Click modeling architecture
Feature extrac0on
and down
sampling
I n p u t
Side-‐data
Data join
Sequen0al SGD
Learning
Map-‐reduce
Now via NFS
10/11/11 © MapR Confiden0al 40
Click modeling architecture
Map-‐reduce Map-‐reduce
Feature extrac0on
and down
sampling
I n p u t
Side-‐data
Data join
Sequen0al SGD
Learning
Map-‐reduce cooperates with NFS
Sequen0al SGD
Learning
Sequen0al SGD
Learning
Sequen0al SGD
Learning
10/11/11 © MapR Confiden0al 41
And another…
10/11/11 © MapR Confiden0al 42
??
Hybrid model flow
Map-‐reduce
Map-‐reduce
Feature extrac0on and
down sampling
SVD (PageRank) (spectral)
Deployed Model
Down stream modeling
10/11/11 © MapR Confiden0al 43
10/11/11 © MapR Confiden0al 44
Map-‐reduce Sequen0al
Hybrid model flow
Feature extrac0on and
down sampling
SVD (PageRank) (spectral)
Deployed Model
Down stream modeling
10/11/11 © MapR Confiden0al 45
And visualiza0on…
10/11/11 © MapR Confiden0al 46
Trivial visualiza0on interface
• Map-‐reduce output is visible via NFS
• Legacy visualiza0on just works
$ R!> x <- read.csv(“/mapr/my.cluster/home/ted/data/foo.out”)!> plot(error ~ t, x)!> q(save=‘n’)!
10/11/11 © MapR Confiden0al 47
Conclusions
• We used to know all this • Tab comple0on used to work • 5 years of work-‐arounds have clouded our memories
• We just have to remember the future
Recommended