49
Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Embed Size (px)

Citation preview

Page 1: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Java @ Google

JavaZone 2005

Knut Magne RisvikGoogle Inc.September 14, 2005

Page 2: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Presentation Outline

• Background: Google’s mission and computing platform.

• GFS and MapReduce: Ebony and Ivory of our Infrastructure

• Java for computing: Coupling infrastructure and Java

• Java in Google products: apps and middle-tiers

• The Java expertise at Google: We host Java leadership.

• Giving it back: Google contributions to Java

• Closing notes and Q&A: Swags for good questions.

Page 3: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Google’s Mission

To organize the world’s information

and make it universally

accessible and useful

Page 4: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Explosive Computational Requirements

more queries

better results

more data

Every Google service sees continuing growth in computational needs

• More queries: More users, happier users

• More data: Bigger web, mailbox, blog, etc.

• Better results: Find the right information, and find it faster

Page 5: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

A Simple Challenge For Our Computing Platform

1. Create world’s largest computing infrastructure

2. Make sure we can afford it

Need to drive efficiency of the computing infrastructure to unprecedented levels

Page 6: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Many Interesting Challenges

• Server design and architecture

• Power efficiency

• System software

• Large scale networking

• Performance tuning and optimization

• System management and repairs automation

Page 7: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Design Philosophy

Single-machine performance does not matter• The problems we are interested in are too large for any single system

• Can partition large problems, so throughput beats peak performance

Stuff Breaks• If you have one server, it may stay up three years (1,000 days)

• If you have 1,000 servers, expect to lose one a day

“Ultra-reliable” hardware makes programmers lazy• A reliable platform will still fail – software still needs to be fault-tolerant

• Fault-tolerant software beats fault-tolerant hardware

Page 8: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Why Use Commodity PCs?

• Single high-end 8-way Intel server:– IBM eserver xSeries 440– 8 2-GHz Xeon, 64 GB RAM, 8 TB of disk– $758,000

• Commodity machines: – Rack of 88 machines– 176 2-GHz Xeons, 176 GB RAM, ~7 TB of disk– $278,000

• 1/3X price, 22X CPU, 3X RAM, 1X disk

Sources: racksaver.com, TPC-C performance results, both from late 2002

Page 9: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

When Ultra-reliable Machines Won’t Help…

Page 10: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Take-home lesson: Murphy was right

Page 11: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

google.stanford.edu (circa 1997)

Page 12: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Lego Disk Case

Page 13: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

google.com (1999)

Page 14: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Google Data Center (circa 2000)

Page 15: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

google.com (new data center 2001)

Page 16: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

google.com (3 days later)

Page 17: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

When Servers Sleep… (2004)

Page 18: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Google Web Server

Spell checker

Ad Server

I0 I1 I2 IN

I0 I1 I2 IN

I0 I1 I2 IN

Replic

as …

Index shards

D0 D1 DM

D0 D1 DM

D0 D1 DM

Replic

as …

…Doc shards

query

Misc. servers

Index servers Doc servers

Elapsed time: 0.25s, machines involved: 1000+

Google Query Serving Infrastructure

Page 19: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Reliable Building Blocks

• Need to store data reliably

• Need to run jobs on pools of machines

• Need to make it easy to apply lots of computational resources to problems

In-house solutions:

• Storage: Google File System (GFS)

• Job scheduling: Global Work Queue (GWQ)

• MapReduce: simplified large-scale data processing

Page 20: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Client

Client

Misc. servers

Client

Replic

as

Masters

GFS Master

GFS Master

C0 C1

C2C5

Chunkserver 1

C0

C2

C5

Chunkserver N

C1

C3C5

Chunkserver 2

• Master manages metadata• Data transfers happen directly between clients/chunkservers• Files broken into chunks (typically 64 MB)• Chunks triplicated across three machines for safety

Google File System - GFS

Page 21: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

GoogleFile API access to GFS

• GoogleFile. Public API with two roles:

– Creational class. Static methods to obtain InputStream, OutputStream and GoogleChannel on top of a Google file.

– File manipulation. A subset of the methods provided by the java.io.File class.

• GoogleInputStream. Implements the read method.

• GoogleOutputStream. Extends java.io.OutputStream, write method.

• GoogleChannel. This is a public class. It implements the ByteChannel interface and a subset of the methods in the FileChannel class. This class provides random access.

• GoogleFile.Stats.

• The JNI Layer is implemented by the class FileImpl and set of SWIG JNI wrappers generated during the build process.

Page 22: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

GFS Usage at Google

• 30+ Clusters

• Clusters as large as 2000+ chunkservers

• Petabyte-sized filesystems

• 2000+ MB/s sustained read/write load

• All in the presence of HW failures

• More information can be found:

The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung 19th ACM Symposium on Operating Systems Principles

(http://labs.google.com/papers/gfs.html)

Page 23: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

MapReduce: Large Scale Data Processing

• Many tasks: Process lots of data to produce other data

• Want to use hundreds or thousands of CPUs, and it has to be easy

• MapReduce provides, for programs following a particular programming model:

– Automatic parallelization and distribution

– Fault-tolerance

– I/O scheduling

– Status and monitoring

Page 24: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Example: Word Frequencies in Web Pages

A typical exercise for a new engineer in his or her first week

• Have files with one document per record

• Specify a map function that takes a key/value pairkey = document namevalue = document text

• Output of map function is (potentially many) key/value pairs.In our case, output (word, “1”) once per word in the document

“document1”, “to be or not to be”

“to”, “1”“be”, “1”“or”, “1”…

Page 25: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Example continued: word frequencies in web pages

• MapReduce library gathers together all pairs with the same key

• The reduce function combines the values for a keyIn our case, compute the sum

• Output of reduce (usually 0 or 1 value) paired with key and saved

“be”, “2”“not”, “1”“or”, “1”“to”, “2”

key = “or”values = “1”

“1”

key = “be”values = “1”, “1”

“2”

key = “to”values = “1”, “1”

“2”

key = “not”values = “1”

“1”

Page 26: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Example: Pseudo-code

map(String input_key, String input_value): // input_key: document name // input_value: document contents for each word w in input_values: EmitIntermediate(w, "1");

Reduce(String key, Iterator intermediate_values): // key: a word, same for input and output // intermediate_values: a list of counts int result = 0; for each v in intermediate_values: result += ParseInt(v); Emit(AsString(result));

Total 80 lines of code

Page 27: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Typical Google Cluster

• 100s/1000s of 2-CPU x86 machines, 2-4 GB of memory

• Limited bisection bandwidth

• Storage: local IDE disks and Google File System (GFS)

• GFS running on the same machines provides reliable, replicated storage of input and output data

• Job scheduling system: jobs made up of tasks, scheduler assigns tasks to machines

Page 28: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Execution

GFS: Google File System

GFS: Google File System

Map task 1

map

k1:v k2:v

Map task 2

map

k1:v k3:v

Map task 3

map

k1:v k4:v

reduce

k1:v,v,v k3,v

Reduce task 1

reduce

k2:v k4,v

Reduce task 2

Shuffle and Sort

Page 29: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Optimizations

• Shuffle stage is pipelined with mapping

• Many more tasks than machines, for load balancing

• Locality: map tasks scheduled near the data they read

• Backup copies of map & reduce tasks (avoids stragglers)

• Compress intermediate data

• Re-execute tasks on machine failure

Page 30: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

MapReduce status: MR_Indexer-beta6-large-2003_10_28_00_03

Page 31: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

MapReduce status: MR_Indexer-beta6-large-2003_10_28_00_03

Page 32: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

MapReduce status: MR_Indexer-beta6-large-2003_10_28_00_03

Page 33: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

MapReduce status: MR_Indexer-beta6-large-2003_10_28_00_03

Page 34: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

MapReduce status: MR_Indexer-beta6-large-2003_10_28_00_03

Page 35: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

MapReduce status: MR_Indexer-beta6-large-2003_10_28_00_03

Page 36: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

MapReduce status: MR_Indexer-beta6-large-2003_10_28_00_03

Page 37: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

MapReduce status: MR_Indexer-beta6-large-2003_10_28_00_03

Page 38: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

MapReduce status: MR_Indexer-beta6-large-2003_10_28_00_03

Page 39: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

MapReduce status: MR_Indexer-beta6-large-2003_10_28_00_03

Page 40: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

MapReduce status: MR_Indexer-beta6-large-2003_10_28_00_03

Page 41: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Using 1800 machines:

• MR_Grep scanned 1 terabyte in 100 seconds

• MR_Sort sorted 1 terabyte of 100 byte records in 14 minutes

Rewrote Google's production indexing system

• a sequence of 7, 10, 14, 17, 21, 24 MapReductions

• simpler

• more robust

• faster

• more scalable

Results

Page 42: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Usage in March 2005

Number of jobs 72,229

Average completion time 934 secs

Machine days used 358,528 days ≈ 1 millennium

Input data read 12,571 TB

Intermediate data 2,756 TB

Output data written 941 TB

Average worker machines 232

Average worker deaths per job 1.9

Average map tasks per job 3097

Average reduce tasks per job 144

Unique map implementations 309

Unique reduce implementations 235

Unique map/reduce combinations 411

Page 43: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Widely applicable at Google

– Implemented as a C++ library linked to user programs

– Java JNI interface similar to GFS API.

– Can read and write many different data types

Example uses:

web access log statsweb link-graph reversalinverted index constructionstatistical machine translation…

distributed grepdistributed sortterm-vector per hostdocument clusteringmachine learning...

Page 44: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Conclusion

• MapReduce has proven to be a useful abstraction

• Greatly simplifies large-scale computations at Google

• Fun to use: focus on problem, let library deal with messy details

MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay GhemawatOSDI'04: Sixth Symposium on Operating System Design and Implementation

(Search Google for “MapReduce”)

Page 45: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Java in Google Applications

AdWords FE - Millions of ads - Billions of transactions - Extreme rates

GMail middletier - UI and storage brokerage - Content searching, analysis - Tagging

Page 46: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Java expertise @ Google

• Joshua Bloch - Collections Framework, Java 5.0 language enhancements, java.math, Author of "Effective Java," Coauthor of "Java Puzzlers."

• Neal Gafter - Lead developer of javac, implementor of Java 5.0 language enhancements, "Coauthor of "Java Puzzlers."

• Robert Griesmer - Architect and technical lead of the HotSpot JVM.

• Doug Kramer - Javadoc architect, Java platform documentation lead.

• Tim Lindholm - Original member of the Java project, key contributor to the Java programming language, implementor of the classic JVM, coauthor of "The Java Viutual Machine Specification."

• Michael "madbot" McCloskey - Designer and implementer of java.util.regexp.

• Srdjan Mitrovic - Co-implementor of the HotSpot JVM.

• David Stoutamire - Technical lead for Java performance, designer and implementer of parallel garbage collection.

• Frank Yellin - Original member of the Java project, Co-implementor of classic JVM, KVM and CLDC,  Coauthor of "The Java Viutual Machine specification."

Page 47: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Giving it back – JCP Expert Groups

• Executive Committe for J2SE/J2EE

• JSR 166X: Concurrency Utilities (continuing) http://www.jcp.org/en/jsr/detail?id=166

• JSR 199: Java Compiler API http://www.jcp.org/en/jsr/detail?id=199

• JSR 220: Enterprise JavaBeans 3. http://www.jcp.org/en/jsr/detail?id=220

• JSR 250: Common Annotations for the Java Platform http://www.jcp.org/en/jsr/detail?id=250

• JSR 260: Javadoc Tag Technology Update http://www.jcp.org/en/jsr/detail?id=260

• JSR 269: Pluggable Annotation Processing API http://www.jcp.org/en/jsr/detail?id=269

• JSR 270: J2SE 6.0 ("Mustang") Release Contents http://www.jcp.org/en/jsr/detail?id=270Google representative: gafter

• JSR 273: Design-Time API for JavaBeans JBDT http://www.jcp.org/en/jsr/detail?id=273

• JSR 274: The BeanShell Scripting Language http://www.jcp.org/en/jsr/detail?id=274

• JSR 277: Java Module System http://www.jcp.org/en/jsr/detail?id=277

Page 48: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Closing Notes

• Google = Computing infrastructure

• Java is becoming a first class citizen at Google

• Essential native interfaces being built

• API design extremely important at our scale, the Java expertise is driving general API work

• Google brings high-scale industrial experience into JCP expert groups.

Page 49: Java @ Google JavaZone 2005 Knut Magne Risvik Google Inc. September 14, 2005

Knut Magne RisvikGoogle Inc.September 14, 2005

Q&A