33
Cloud Infrastructure: GFS & MapReduce Andrii Vozniuk Used slides of: Jeff Dean, Ed Austin Data Management in the Clou d EPFL February 27, 2012

Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Embed Size (px)

DESCRIPTION

This presentation is based on two papers: 1) S. Ghemawat, H. Gobioff, S. Leung. The Google File System. SOSP, 2003 2) J. Dean, S. Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI, 2004

Citation preview

Page 2: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Outline

• Motivation• Problem Statement• Storage: Google File System (GFS)• Processing: MapReduce• Benchmarks • Conclusions

Page 3: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Motivation

• Huge amounts of data to store and process• Example @2004:– 20+ billion web pages x 20KB/page = 400+ TB– Reading from one disc 30-35 MB/s • Four months just to read the web• 1000 hard drives just to store the web• Even more complicated if we want to process data

• Exp. growth. The solution should be scalable.

Page 4: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Motivation

• Buy super fast, ultra reliable hardware?– Ultra expensive– Controlled by third party– Internals can be hidden and proprietary– Hard to predict scalability– Fails less often, but still fails!– No suitable solution on the market

Page 5: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Motivation• Use commodity hardware? Benefits:

– Commodity machines offer much better perf/$– Full control on and understanding of internals– Can be highly optimized for their workloads– Really smart people can do really smart things

• Not that easy:– Fault tolerance: something breaks all the time– Applications development– Debugging, Optimization, Locality– Communication and coordination– Status reporting, monitoring

• Handle all these issues for every problem you want to solve

Page 6: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Problem Statement

• Develop a scalable distributed file system for large data-intensive applications running on inexpensive commodity hardware

• Develop a tool for processing large data sets in parallel on inexpensive commodity hardware

• Develop the both in coordination for optimal performance

Page 7: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Google Cluster Environment

Servers Racks Clusters

DatacentersCloud

Page 8: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Google Cluster Environment

• @2009:– 200+ clusters– 1000+ machines in many of them– 4+ PB File systems– 40GB/s read/write load– Frequent HW failures– 100s to 1000s active jobs (1 to 1000 tasks)

• Cluster is 1000s of machines– Stuff breaks: for 1000 – 1 per day, for 10000 – 10 per day– How to store data reliably with high throughput? – How to make it easy to develop distributed applications?

Page 9: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Google Technology Stack

Page 10: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Google Technology Stack

Focus ofthis talk

Page 11: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Google File System*

* The Google File System. S. Ghemawat, H. Gobioff, S. Leung. SOSP, 2003

G F S

• Inexpensive commodity hardware• High Throughput > Low Latency• Large files (multi GB)• Multiple clients• Workload– Large streaming reads– Small random writes– Concurrent append to the same file

Page 12: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

ArchitectureG F S

• User-level process running on commodity Linux machines• Consists of Master Server and Chunk Servers• Files broken into chunks (typically 64 MB), 3x redundancy (clusters, DCs)• Data transfers happen directly between clients and Chunk Servers

Page 13: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Master Node• Centralization for simplicity• Namespace and metadata management• Managing chunks

– Where they are (file<-chunks, replicas)– Where to put new– When to re-replicate (failure, load-balancing)– When and what to delete (garbage collection)

• Fault tolerance– Shadow masters– Monitoring infrastructure outside of GFS– Periodic snapshots– Mirrored operations log

G F S

Page 14: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Master Node

• Metadata is in memory – it’s fast!– A 64 MB chunk needs less than 64B metadata => for 640 TB less

than 640MB•Asks Chunk Servers when– Master starts– Chunk Server joins the cluster

• Operation log– Is used for serialization of concurrent operations– Replicated– Respond to client only when log is flushed locally and remotely

G F S

Page 15: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Chunk Servers

• 64MB chunks as Linux files–Reduce size of the master‘s datastructure–Reduce client-master interaction– Internal fragmentation => allocate space lazily–Possible hotspots => re-replicate

• Fault tolerance–Heart-beat to the master– Something wrong => master inits replication

G F S

Page 16: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Mutation OrderG F SCurrent lease holder?

identity of primarylocation of replicas(cached by client)

3a. data

3b. data

3c. data

Write request

Primary assigns # to mutationsApplies itForwards write request

Operation completed

Operation completed

Operation completedor Error report

Page 17: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Control & Data Flow

• Decouple control flow and data flow• Control flow– Master -> Primary -> Secondaries

• Data flow– Carefully picked chain of Chunk Servers

• Forward to the closest first• Distance estimated based on IP

– Fully utilize outbound bandwidth– Pipelining to exploit full-duplex links

G F S

Page 18: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Other Important Things

• Snapshot operation – make a copy very fast• Smart chunks creation policy– Below-average disk utilization, limited # of recent

• Smart re-replication policy– Under replicated first– Chunks that are blocking client– Live files first (rather than deleted)

• Rebalance and GC periodically

How to process data stored in GFS?

G F S

Page 19: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

MapReduce*

• A simple programming model applicable to many large-scale computing problems

• Divide and conquer strategy• Hide messy details in MapReduce runtime library:– Automatic parallelization– Load balancing– Network and disk transfer optimizations – Fault tolerance– Part of the stack: improvements to core library benefit all

users of library

*MapReduce: Simplified Data Processing on Large Clusters. J. Dean, S. Ghemawat. OSDI, 2004

M M M

R R

Page 20: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Typical problem

• Read a lot of data. Break it into the parts• Map: extract something important from each part• Shuffle and Sort• Reduce: aggregate, summarize, filter, transform Map results

• Write the results• Chain, Cascade

• Implement Map and Reduce to fit the problem

M M M

R R

Page 21: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Nice ExampleM M M

R R

Page 22: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Other suitable examples

• Distributed grep• Distributed sort (Hadoop MapReduce won TeraSort)• Term-vector per host• Document clustering• Machine learning• Web access log stats• Web link-graph reversal• Inverted index construction• Statistical machine translation

M M M

R R

Page 23: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Model

• Programmer specifies two primary methods– Map(k,v) -> <k’,v’>*– Reduce(k’, <v’>*) -> <k’,v’>*

• All v’ with same k’ are reduced together, in order• Usually also specify how to partition k’– Partition(k’, total partitions) -> partition for k’

• Often a simple hash of the key• Allows reduce operations for different k’ to be parallelized

M M M

R R

Page 24: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

CodeM M M

R R

Mapmap(String key, String value): // key: document name // value: document contents for each word w in value:

EmitIntermediate(w, "1");

Reducereduce(String key, Iterator values): // key: a word // values: a list of counts int result = 0; for each v in values:

result += ParseInt(v); Emit(AsString(result))

Page 25: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

ArchitectureM M M

R R

• One master, many workers• Infrastructure manages scheduling and distribution• User implements Map and Reduce

Page 26: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

ArchitectureM M M

R R

Page 27: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

ArchitectureM M M

R R

Page 28: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

ArchitectureM M M

R R

Combiner = local Reduce

Page 29: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Important things• Mappers scheduled close to data• Chunk replication improves locality• Reducers often run on same machine as mappers• Fault tolerance

– Map crash – re-launch all task of machine– Reduce crash – repeat crashed task only– Master crash – repeat whole job– Skip bad records

• Fighting ‘stragglers’ by launching backup tasks• Proven scalability• September 2009 Google ran 3,467,000 MR Jobs averaging 488

machines per Job• Extensively used in Yahoo and Facebook with Hadoop

M M M

R R

Page 30: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Benchmarks: GFSG F S

Page 31: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Benchmarks: MapReduceM M M

R R

Page 32: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Conclusions

• GFS & MapReduce– Google achieved their goals– A fundamental part of their stack

• Open source implementations– GFS Hadoop Distributed FS (HDFS)– MapReduce Hadoop MapReduce

“We believe we get tremendous competitive advantageby essentially building our own infrastructure” -- Eric Schmidt

Page 33: Andrii Vozniuk - Cloud infrastructure. Google File System and MapReduce

Thank your for your [email protected]