My mom told me that Git doesn’t scale by Vicent Martí

Preview:

DESCRIPTION

With over 2 million and a half repositories, GitHub is the world’s largest source code host. Since day one, we’ve faced an unique engineering problem: making terabytes of Git data always available, either directly or through our website. This talk offers a hopefully insightful view into the internals of Git, the way its original design affects our scalable architecture, and the many things we’ve learnt while solving this fascinating problem.

Citation preview

Saturday, May 11, 13

These are the things you don’t

care about

Saturday, May 11, 13

Saturday, May 11, 13

github

Saturday, May 11, 13

github

Git hosting:No longer a pain in the ass

Saturday, May 11, 13

github

Git hosting:No longer a pain in the ass

for you.Not for us.

Because, goddamnit,if I ever find the guy who invented

this thing I’m going to hang him froma fence by his underwear and.Saturday, May 11, 13

Let’s host some Git repos!

file.c src

file.h README.md COPYING.md

.git

Bare Repository

HEAD index objects refs

git-daemon

Saturday, May 11, 13

OK, now about the web...

grit

Ruby - Gitinterface

Saturday, May 11, 13

OK, now about the web...

grit

Bare Repository

Bare Repository

Bare Repository

Ruby - Gitinterface

Saturday, May 11, 13

1VMgrit

📽storage

rails app

Saturday, May 11, 13

nVM 📽storage

Saturday, May 11, 13

nVM 📽storage(GFS)

Saturday, May 11, 13

Rails was making us slow.

Saturday, May 11, 13

Saturday, May 11, 13

Literally.Saturday, May 11, 13

Time to move to

Real Hardware

Saturday, May 11, 13

fileservers

frontends

📸db

Saturday, May 11, 13

fileservers

frontends

📸db

?????????

Saturday, May 11, 13

smoke

Saturday, May 11, 13

Saturday, May 11, 13

bert(binary Erlang term)

Saturday, May 11, 13

bert(binary Erlang term)ernie

(not an acronym)

Saturday, May 11, 13

📸chimney(Redis)

frontend

fileserver

smoke

grit

erniegrit

Saturday, May 11, 13

Vertical Scaling#realtalk

Saturday, May 11, 13

bottleneck:

grit

Saturday, May 11, 13

bottleneck:

grit

solution:

git

shell out to

Saturday, May 11, 13

bottleneck:

git

shell out to

Saturday, May 11, 13

bottleneck:

git

solution:

git

shell out to

shell out to

Saturday, May 11, 13

bottleneck:

git

solution:

git

shell out to

shell out to

properly

Saturday, May 11, 13

Saturday, May 11, 13

GUISE

Saturday, May 11, 13

GUISEGUISE

Saturday, May 11, 13

GUISEGUISEGUISE

Saturday, May 11, 13

GUISEGUISEGUISE

...what?

Saturday, May 11, 13

Saturday, May 11, 13

Why don’t we take

Saturday, May 11, 13

Why don’t we take

the Git binary...

Saturday, May 11, 13

Why don’t we take

the Git binary...yeah?

Saturday, May 11, 13

Why don’t we take

the Git binary...yeah? and compile it as

Saturday, May 11, 13

Why don’t we take

the Git binary...yeah? and compile it as

a library

Saturday, May 11, 13

Why don’t we take

the Git binary...yeah? and compile it as

a libraryoh... go on...

Saturday, May 11, 13

Why don’t we take

the Git binary...yeah? and compile it as

a libraryoh... go on...

and link that into

Saturday, May 11, 13

Why don’t we take

the Git binary...yeah? and compile it as

a libraryoh... go on...

and link that intoour server

Saturday, May 11, 13

ScientificGraph™

Saturday, May 11, 13

Mem

ory

Usa

ge

Time

ScientificGraph™

Saturday, May 11, 13

Mem

ory

Usa

ge

Time

ScientificGraph™

Saturday, May 11, 13

Mem

ory

Usa

ge

Time

ScientificGraph™

Saturday, May 11, 13

Mem

ory

Usa

ge

Time

ScientificGraph™

Saturday, May 11, 13

Mem

ory

Usa

ge

Time

ScientificGraph™

Saturday, May 11, 13

Mem

ory

Usa

ge

Time

ScientificGraph™

Saturday, May 11, 13

Well, we didn’t think about

freeing memory, but...

Saturday, May 11, 13

Well, we didn’t think about

freeing memory, but...THIS IS THE KIND

OF PROBLEMWE COULD SOLVE

WITH CGISaturday, May 11, 13

Well, we didn’t think about

freeing memory, but...THIS IS THE KIND

OF PROBLEMWE COULD SOLVE

WITH CGIIN 1995

Saturday, May 11, 13

ScientificGraph™

Saturday, May 11, 13

Mem

ory

Usa

ge

Time

ScientificGraph™

Saturday, May 11, 13

Mem

ory

Usa

ge

Time

ScientificGraph™

Saturday, May 11, 13

Mem

ory

Usa

ge

Time

ScientificGraph™

Saturday, May 11, 13

Mem

ory

Usa

ge

Time

ScientificGraph™

Saturday, May 11, 13

Mem

ory

Usa

ge

Time

ScientificGraph™

Saturday, May 11, 13

Mem

ory

Usa

ge

Time

ScientificGraph™

Saturday, May 11, 13

Mem

ory

Usa

ge

Time

ScientificGraph™

Saturday, May 11, 13

What do you mean

the server died?

Saturday, May 11, 13

die("BUG: non-INDEX attr direction

in a bare repo");

die("a bad revision is needed");

die("'%s' is not a valid branch name.", name); die("Empty patc

h.

Aborted.");

die("unable to read index file");

What do you mean

the server died?

Saturday, May 11, 13

libgit

Saturday, May 11, 13

libgit2the “2” means this

one

frees memory

Saturday, May 11, 13

libgit2the “2” means this

one

frees memory

NOT ENOUGH

ABSTRACT

FACTORIES

Saturday, May 11, 13

JGitthe “J” means this oneis in Java

...not our thing.Saturday, May 11, 13

Javaa brief timeline

New companies don’t use Java

because it’snot like Unix

1995

New companies use Java

because it’snew and shiny

1997

New companies don’t use Java

because it’sooooooold

2005

New companies use the JVM

because WEBSCALE

2011

Saturday, May 11, 13

Javaa brief timeline

New companies don’t use Java

because it’snot like Unix

1995

New companies use Java

because it’snew and shiny

1997

New companies don’t use Java

because it’sooooooold

2005

New companies use the JVM

because WEBSCALE

2011

github

Saturday, May 11, 13

If you think you understandthe JVM, you are either:

Saturday, May 11, 13

If you think you understandthe JVM, you are either:

a) Very smart

Saturday, May 11, 13

If you think you understandthe JVM, you are either:

a) Very smartb) Very wrong

Saturday, May 11, 13

If you think you understandthe JVM, you are either:

a) Very smartb) Very wrong

Saturday, May 11, 13

Some people think that github is a

Rails shopRuby shop.or even a

Saturday, May 11, 13

Some people think that github is a

Rails shopRuby shop.or even a

github is a

Unix shopand everything else is

just a detail.

Saturday, May 11, 13

libgit2So,

Saturday, May 11, 13

Good Heavens,just look at the time.

It’s NoSQL o’clock

NoSQL

NoSQL NoSQL

NoSQL

NoSQL NoSQL

NoSQLNoSQL

Saturday, May 11, 13

Saturday, May 11, 13

...do you even

Saturday, May 11, 13

...do you even mongo?

Saturday, May 11, 13

a b r i e f i n t r o d u c t i o nt o t h e G i t d a t a m o d e l

Saturday, May 11, 13

Saturday, May 11, 13

Saturday, May 11, 13

file.c src

file.h README.md COPYING.md

Saturday, May 11, 13

file.c src

file.h README.md COPYING.md

treesrc/

README.md

COPYING.md

treefile.c

file.h

blob

blob

blob

blob

Saturday, May 11, 13

commit

parent

tree Tmetadata

Saturday, May 11, 13

commitT

commitT

commitT

commitT

commitT

commitT

Behold,a graph.

Saturday, May 11, 13

Saturday, May 11, 13

Well that was easy.

Saturday, May 11, 13

Saturday, May 11, 13

master

Ohgod

killme

Saturday, May 11, 13

Li!le knowntorture methods:

Saturday, May 11, 13

Saturday, May 11, 13

warning:the rabbithole ispretty deep

Saturday, May 11, 13

Git doesn’t give a #!%$ about CAP

Saturday, May 11, 13

Saturday, May 11, 13

Number of hops on a complex query 1,000,000

Saturday, May 11, 13

Number of hops on a complex query 1,000,000

Required hops for a successful query 1,000,000

Saturday, May 11, 13

Number of hops on a complex query 1,000,000

Required hops for a successful query 1,000,000

Replica count to ensure 100% availability

a metric shittonSaturday, May 11, 13

We could fix it.

Saturday, May 11, 13

We could fix it.But we won’t.

Saturday, May 11, 13

libgit2

Saturday, May 11, 13

libgit2

Saturday, May 11, 13

GitRPCSaturday, May 11, 13

GitRPCLess.

Saturday, May 11, 13

GitRPC

Rugged

libgit2

server

Ruby

Ruby

C

Saturday, May 11, 13

📸chimney(Redis)

frontend

fileserver

smoke

grit

ernie-corn

grit

GitRPC GitRPC

Saturday, May 11, 13

📸chimney(Redis)

frontend fileserver

GitRPC GitRPCserverclient

Saturday, May 11, 13

evolutionary(disappointing?)

Saturday, May 11, 13

We’ve had a lot of

Saturday, May 11, 13

We’ve had a lot of hard

engineering challenges

Saturday, May 11, 13

We tackled them by:

Saturday, May 11, 13

Using themost reliable

tools we know.

Saturday, May 11, 13

Challenging ourselves to build

the simplest thing.Not because it’s easy,but because it works.

Saturday, May 11, 13

Innovatingwhere it really ma!ers.

Saturday, May 11, 13

revolutionaryproduct

building a

revolutionarybackend.

not a

Saturday, May 11, 13

Having fun

Saturday, May 11, 13

Saturday, May 11, 13