42
Open Talk Series presents A series of illuminating talks and interactions that open our minds to new ideas and concepts; that makes us look for newer or better ways of doing what we did; or point us to exciting things we have never done before. A range of topics on Technology, Business, Fun and Life. Be part of the learning experience at Aditi. Join the talks. Its free. Free as in freedom at work, not free-beer. Its not training. Its mind-opener. Speak at these events. Or bring an expert/friend to talk. Mail [email protected] with topic and availability. Usually at 4.30PM Wednesdays. Learning and Development

Google Architecture - Breaking it Open

Embed Size (px)

Citation preview

Page 1: Google Architecture - Breaking it Open

Open Talk Series

presents

A series of illuminating talks and

interactions that open our minds to new

ideas and concepts; that makes us look for

newer or better ways of doing what we

did; or point us to exciting things we have

never done before. A range of topics on

Technology, Business, Fun and Life.

Be part of the learning experience at Aditi.

Join the talks. Its free. Free as in freedom at work, not free-beer.

Its not training. Its mind-opener.

Speak at these events. Or bring an

expert/friend to talk.

Mail [email protected] with topic and

availability.

Usually at 4.30PM Wednesdays.

Learning and Development

Page 2: Google Architecture - Breaking it Open

HOW TO ENJOY AN TALK

Switch OFF mobile Switch ON mind

Sign attendance sheet

Bring coffee & friends

THANK the Talker

SHARE your wisdom QUESTION notions

SPREAD the good word

Page 3: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

New Champion

Sahil Sagar

Page 4: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

4

Agenda

• We are not talking about crawler

• No discussion on PageRank… maybe?

Page 5: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

5

The art of scale

10-50 users 100-500 users 500-10000

Page 6: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

6

Scale ????

Largest Linux Base

800,000 Machines

Page 7: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

7

• What gives us this scale?

Good Code?

More servers?

Powerful Servers?

Page 8: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

8

• Lets see what gives Google the scale

The apps on top of it.

The Secret Sauce

Infrastructure

SERVER HARDWARE

RHEL 2.6.X PAE

RACK

INTERIOR NETWORK IPv6

GFS / GFS II

BigTable Mapreduce

BigTable

Chubby Lock

GOOGLE APP

ENGINE

Python, Java, C++,

Sawzall, other

DC

GOOGLE APPS

SEARCH

INDEX

CRAWL

GMAIL...

Architecture

Python. Java.

C++

Exterior Network

GWQ

Page 9: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

9

Scale in Google

SERVER HARDWARE

RHEL 2.6.X PAE

RACK

INTERIOR NETWORK IPv6

GFS / GFS II

BigTable Mapreduce

BigTable

Chubby Lock

GOOGLE APP

ENGINE

Python, Java, C++,

Sawzall, other

DC

GOOGLE APPS

SEARCH

INDEX

CRAWL

GMAIL...

Architecture

Python. Java.

C++

Exterior Network

GWQ

1. The first touch

2. Size does matter

3. The Safe

4. Operating System Implementation

5. Interior Network Architecture

Page 10: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

10

The first touch to the services

Page 11: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

11

The first touch to the service

SERVER HARDWARE

RHEL 2.6.X PAE

RACK

INTERIOR NETWORK IPv6

GFS / GFS II

BigTable Mapreduce

BigTable

Chubby Lock

GOOGLE APP

ENGINE

Python, Java, C++,

Sawzall, other

DC

GOOGLE APPS

SEARCH

INDEX

CRAWL

GMAIL...

Architecture

Python. Java.

C++

Exterior Network

GWQ

CellInterior Network

GFS II etc

Firewall80/443

NetScalarhttp multiplexing

SquidReverse Proxy

GWSWeb Server Farm

FirewallDMZPerimeter

Client Browser80/443

Page 12: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

12

The touch is not always real

• Uses Squid Reverse Proxy • Perimeter Cache hit rates 30-60% = Huge!

• Dependent on search complexity/user preferences/traffic

type

• All Image Thumbnails caches, much Multimedia cached

• Expensive common queries cached (common words like ‘Obama‘) as they require significant back-end processing.

SERVER HARDWARE

RHEL 2.6.X PAE

RACK

INTERIOR NETWORK IPv6

GFS / GFS II

BigTable Mapreduce

BigTable

Chubby Lock

GOOGLE APP

ENGINE

Python, Java, C++,

Sawzall, other

DC

GOOGLE APPS

SEARCH

INDEX

CRAWL

GMAIL...

Architecture

Python. Java.

C++

Exterior Network

GWQ Squid

Reverse Proxy

80/443 80/443

Page 13: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

13

Size does matter

Page 14: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

14

Worldwide Data Centres

Last estimated were 36 Data Centers, 300+ GFSII Clusters and upwards of 800K machines.

SERVER HARDWARE

RHEL 2.6.X PAE

RACK

INTERIOR NETWORK IPv6

GFS / GFS II

BigTable Mapreduce

BigTable

Chubby Lock

GOOGLE APP

ENGINE

Python, Java, C++,

Sawzall, other

DC

GOOGLE APPS

SEARCH

INDEX

CRAWL

GMAIL...

Architecture

Python. Java.

C++

Exterior Network

GWQ

Page 15: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

15

The Modular Data Centre

Standard Google Modular DC (Cell) holds 1160 Servers / 250KW Power Consumption in 30 racks (40U).

This is the “Atomic“ Data Centre Building Block of Google.

A Data Centre would consist of 100‘s of Modular Cells.

SERVER HARDWARE

RHEL 2.6.X PAE

RACK

INTERIOR NETWORK IPv6

GFS / GFS II

BigTable Mapreduce

BigTable

Chubby Lock

GOOGLE APP

ENGINE

Python, Java, C++,

Sawzall, other

DC

GOOGLE APPS

SEARCH

INDEX

CRAWL

GMAIL...

Architecture

Python. Java.

C++

Exterior Network

GWQ

Page 16: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

16

THE Safe

How is a server stored in the Data Centre?

Page 17: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

17

Google Rack (GOOG rack)

EVERYTHING custom!

• Optimized Motherboards

• Have their own HW builds

• Build redundancy on top of failure

• Motherboard directly mounted into Rack

• Servers have no casing - just bare boards

• Assist with heat dispersal issues

SERVER HARDWARE

RHEL 2.6.X PAE

RACK

INTERIOR NETWORK IPv6

GFS / GFS II

BigTable Mapreduce

BigTable

Chubby Lock

GOOGLE APP

ENGINE

Python, Java, C++,

Sawzall, other

DC

GOOGLE APPS

SEARCH

INDEX

CRAWL

GMAIL...

Architecture

Python. Java.

C++

Exterior Network

GWQ

Page 18: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

18

THE OPERATING SYSTEM

The Core Software on each of those servers

Page 19: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

19

OPERATING SYSTEM

-100% Redhat Linux Based since 1998 inception - RHEL

- 2.6.X Kernel - PAE - Custom glibc.. rpc... ipvs... - Custom FS (GFS II) - Custom Kerberos - Custom NFS - Custom CUPS - Custom gPXE bootloader - Custom EVERYTHING.....

Kernel/Subsystem Modifications tcmalloc – replaces glibc 2.3 malloc – much faster! works very well with threads... rpc – the rpc layer extensively modified to provide > perf increase < latency (52%/40%) Significantly modified Kernel and Subsystems – all IPv6 enabled

SERVER HARDWARE

RHEL 2.6.X PAE

RACK

INTERIOR NETWORK IPv6

GFS / GFS II

BigTable Mapreduce

BigTable

Chubby Lock

GOOGLE APP

ENGINE

Python, Java, C++,

Sawzall, other

DC

GOOGLE APPS

SEARCH

INDEX

CRAWL

GMAIL...

Architecture

Python. Java.

C++

Exterior Network

GWQ

Page 20: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

20

THE Secret Sauce

Page 21: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

21

Section II – Googles Major Glue

SERVER HARDWARE

RHEL 2.6.X PAE

RACK

INTERIOR NETWORK IPv6

GFS / GFS II

BigTable Mapreduce

BigTable

Chubby Lock

GOOGLE APP

ENGINE

Python, Java, C++,

Sawzall, other

DC

GOOGLE APPS

SEARCH

INDEX

CRAWL

GMAIL...

Architecture

Python. Java.

C++

Exterior Network

GWQ

1. Google File System Architecture – GFS II 2. Google Database - Bigtable 3. Google Computation - Mapreduce

Page 22: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

22

GOOGLE FILE SYSTEM

Manages the underlying Data on behalf of the upper layers and ultimately the applications

Page 23: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

GFS versus NFS

• Single machine makes part of its file system available to other machines

• Sequential or random access

• PRO: Simplicity, generality, transparency

• CON: Storage capacity and throughput limited by single server

23 University of Pennsylvania

Single virtual file system spread over many machines

Optimized for sequential read and local accesses

PRO: High throughput, high capacity

"CON": Specialized for particular types of applications

Network File System (NFS) Google File System (GFS)

Page 24: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

24

FILE SYSTEM I – GFS II

SERVER HARDWARE

RHEL 2.6.X PAE

RACK

INTERIOR NETWORK IPv6

GFS / GFS II

BigTable Mapreduce

BigTable

Chubby Lock

GOOGLE APP

ENGINE

Python, Java, C++,

Sawzall, other

DC

GOOGLE APPS

SEARCH

INDEX

CRAWL

GMAIL...

Architecture

Python. Java.

C++

Exterior Network

GWQ

Elegant Master Failover Chunk Size is now 1MB Only ever lost one 64MB chunk (in GFS I) during its entire production deployment so assumed extremely reliable

Page 25: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

CAP Theorem (Brewer's theorem)

• Consistency: All nodes see the same data at the same time

• Availability: Node failures do not prevent survivors from continuing to operate

• Partition tolerance: The system continues to operate despite arbitrary message loss

25

Page 26: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

26

GOOGLE DATABASE

Accesses the underlying Data on behalf of the upper layers and ultimately the applications

Page 27: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

Why not commercial DB?

• Scale is too large for most commercial databases

• Cost would be very high – Building internally means system can be applied

across many projects for low incremental cost

• Low-level storage optimizations help performance significantly – Much harder to do when running on top of a database

layer

“Also fun and challenging to build large-scale systems”

27

Page 28: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

BigTable

• A distributed storage system for managing structured data. • Scalable

– Thousands of servers

– Terabytes of in-memory data

– Petabyte of disk-based data

– Millions of reads/writes per second, efficient scans • Self-managing

– Servers can be added/removed dynamically

– Servers adjust to load imbalance • Used for many Google projects

– Web indexing, Personalized Search, Google Earth, Google Analytics, Google Finance, …

28

Page 29: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

BigTable

• Physically sorted on row-key – like a row-store

• Column families - like column-stores

• Variable (record-by-record) columns within a column family

• Column-values versioned; stored in reverse chronological order

• Designed to store hyperlink structure of web

Page 30: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

30

GOOGLE MAPREDUCE

Computes the underlying Data on behalf of the applications

Page 31: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

31

Mapreduce I

Map Reduction can be seen as a way to exploit massive parallelism by breaking a task down into constituent parts and executing on multiple processors The Major Functions are MAP & REDUCE (with a number of intermediatary steps) MAP Break task down into parallel steps REDUCE Combine results into final output

Shown is a 2-pipeline Map Reduction (There are 24 Map Reductions in the indexing pipeline) Mappers & Reducers usually run on separate processors (90% loss of reducers job still completed!)

SERVER HARDWARE

RHEL 2.6.X PAE

RACK

INTERIOR NETWORK IPv6

GFS / GFS II

BigTable MapreduceBigTable

Chubby Lock

GOOGLE APP

ENGINE

Python, Java, C++,

Sawzall, other

DC

GOOGLE APPS

SEARCH

INDEX

CRAWL

GMAIL...

Architecture

Python. Java.

C++

Exterior Network

GWQ

Page 32: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

Word-Count using MapReduce

Problem: determine the frequency of each word in a large document collection

Page 33: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

What runs on top of all this

33

Page 34: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

PageRank: Intuition

• Imagine a contest for The Web's Best Page – Initially, each page has one vote

– Each page votes for all the pages it has a link to

– To ensure fairness, pages voting for more than one page must split their vote equally between them

– Voting proceeds in rounds; in each round, each page has the number of votes it received in the previous round

– In practice, it's a little more complicated - but not much! 34

A

B E

C

D F

G

H

I

J

Shouldn't E's vote be worth more than F's?

How many levels should we consider?

Page 35: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

Random Surfer Model

• PageRank has an intuitive basis in random walks on graphs

• Imagine a random surfer, who starts on a random page and, in each step,

– with probability d, clicks on a random link on the page

– with probability 1-d, jumps to a random page (bored?)

• The PageRank of a page can be interpreted as the fraction of steps the surfer spends on the corresponding page

35

Page 36: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

36

BUILD YOUR OWN GOOGLE

The Basic Open Source Tools

Page 37: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

37

The Google Stack (vs Yahoo‘ish/Open Source)

SERVER HARDWARE SERVER HARDWARE

RHEL 2.6.X PAE CentOS 2.6.X PAE

RACK RACK

INTERIOR NETWORK IPv6 INTERIOR NETWORK IPv6

GFS / GFS II HDFS (hadoop)

Hadoop FrameworkMapreduce

Hbase (Bigtable equiv.)

Mapreduce

BigTable

Chubby Lock

Pig Latin, Python, PHP, Java ....

anything

Python, Java, C++,

Sawzall, other

CLIENT APPLICATION

DC DC

GOOGLE APPS

SEARCH

INDEX

CRAWL

GMAIL...

Conceptual Overview

Google vs. Open Source

Architecture

Open Source(Yahoo’ish)

Architecture

Exterior Network Exterior Network

GWQ Job Tracker

Googles

Secret Sauce

Hadoop

Open Source(Other Tools such as crawlers, indexers readily available)

BigTable

Python, Java,

C++,

APP ENGINE

Task Queue

Page 38: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

38

END

(Thankyou)

Page 39: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

39

Pre Presentation The Google Philosophy (according to ed)

• Jedis build their own lightsabres (the MS Eat your own Dog Food)

• Parallelize Everything

• Distribute Everything (to atomic level if possible)

• Compress Everything (CPU cheaper than bandwidth)

• Secure Everything (you can never be too paranoid)

• Cache (almost) Everything

• Redundantize Everything (in triplicate usually)

• Latency is VERY evil

Page 40: Google Architecture - Breaking it Open

Aditi Technologies | Partnering Innovation

The Anatomy of the Google Architecture “The unofficial Version“

V1.0 November 2009

• Ed Austin • {ed, edik} @i-dot.com

Special Thanks to ….

Page 41: Google Architecture - Breaking it Open
Page 42: Google Architecture - Breaking it Open

Keep Learning

For any suggestions on topics/ feedbacks etc., Contact [email protected]