37
An Infrastructure in Transition Joe Stump, Lead Architect, Digg

Joe Stump, Lead Architect, Digg

  • Upload
    vandieu

  • View
    255

  • Download
    10

Embed Size (px)

Citation preview

Page 1: Joe Stump, Lead Architect, Digg

An Infrastructure in Transition

Joe Stump, Lead Architect, Digg

Page 2: Joe Stump, Lead Architect, Digg

Introductions

Page 3: Joe Stump, Lead Architect, Digg

✓ 35,000,000 uniques✓ 3,500,000 users✓ 15,000 requests / sec✓Hundreds of servers

Page 4: Joe Stump, Lead Architect, Digg

“Web 2.0 sucks (for scaling).”Joe Stump

Page 5: Joe Stump, Lead Architect, Digg

What’s Scaling?

Page 6: Joe Stump, Lead Architect, Digg

What’s Scaling?

Specialization

Page 7: Joe Stump, Lead Architect, Digg

What’s Scaling?

Severe Hair Loss

Page 8: Joe Stump, Lead Architect, Digg

What’s Performance?

Page 9: Joe Stump, Lead Architect, Digg

What’s Performance?

Who cares?

Page 10: Joe Stump, Lead Architect, Digg

4 Stages of Scaling

Page 11: Joe Stump, Lead Architect, Digg
Page 12: Joe Stump, Lead Architect, Digg
Page 13: Joe Stump, Lead Architect, Digg
Page 14: Joe Stump, Lead Architect, Digg
Page 15: Joe Stump, Lead Architect, Digg

As it stands ...

Page 16: Joe Stump, Lead Architect, Digg

Applications

Netscalers

MogileFS

Rec. Engine

ZOMG ROFLAFK WTFLULZ

Lucene

Page 17: Joe Stump, Lead Architect, Digg

Building Blocks• MogileFS

- 9 nodes- 2.8TB of files

• Gearman- Each application

server- 400,000 jobs / day

• Memcached- 25 nodes- 2GB / node

Page 18: Joe Stump, Lead Architect, Digg

Moving forward ...

Page 19: Joe Stump, Lead Architect, Digg

MogileFS

Rec. EngineLucene

Netscalers

Applications

Services

IDDB

Netscalers

Applications

Services

IDDB

Messaging

Page 20: Joe Stump, Lead Architect, Digg

✓ Elastic horizontal partitions✓Heterogenous partition types✓Muti-homed✓ ID’s live in multiple places✓Partitioned result sets

IDDB

Page 21: Joe Stump, Lead Architect, Digg

IDDB_ID_Intid bigintdate_created timestampstatus tinyintversion bigint

IDDB_ID_Charcharid charname charvalue charintid bigintdate_created timestamp

IDDB_ID_Int_Shardsintid bigintshardid intstatus tinyint

IDDB_Shardsid biginttype charhost charport mediumintuser charpass charstatus tinyint

Page 22: Joe Stump, Lead Architect, Digg

✓Memcached + BDB✓ 28,000+ writes a second✓Persistent key/value storage✓Works with Memcached

clients

MemcacheDB

Page 23: Joe Stump, Lead Architect, Digg

War stories ...

Page 24: Joe Stump, Lead Architect, Digg

✓ 15,000 - 17,000 submissions per day

✓Crawl for images, video embeds, source, other meta data

✓Ran in parallel via Gearman

Digg Images

Page 25: Joe Stump, Lead Architect, Digg

✓ 230,000+ Diggs per day✓Most active Diggers are also

most followed✓ 3,000 writes per second✓Ran in background via

Gearman✓ Eventually consistent

Green Badges

Page 26: Joe Stump, Lead Architect, Digg

user_ip_views

Page 27: Joe Stump, Lead Architect, Digg

✓Switched to explicit caching✓ Intelligently grouped objects

in cache ✓Sorting, limiting, etc. done in

the application layer✓ 200% to 300% gains in

performance

Digg Comments

Page 28: Joe Stump, Lead Architect, Digg

✓ Vertical partitioning✓Migrate in background

processes ✓Use the bots✓Keep track of migration✓Retry failed migrations

automatically

Data Migration

Page 29: Joe Stump, Lead Architect, Digg

Things to ponder ...

Page 30: Joe Stump, Lead Architect, Digg

CAP Theorem

Page 31: Joe Stump, Lead Architect, Digg

Have I ran the numbers?

Page 32: Joe Stump, Lead Architect, Digg

Is MySQL the best solution?

Page 33: Joe Stump, Lead Architect, Digg

Can I do this later?

Page 34: Joe Stump, Lead Architect, Digg

How can I partition this data?

Page 35: Joe Stump, Lead Architect, Digg

How should I cache this data?

Page 36: Joe Stump, Lead Architect, Digg

Questions?!