28
Sergejus Barinovas

Flashback: QCon San Francisco 2012

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Flashback: QCon San Francisco 2012

Sergejus Barinovas

Page 2: Flashback: QCon San Francisco 2012

Why San Francisco?

Learn how others are doing at scale

Learn what problems others have

Learn does their solutions apply to us

Learn does their problems apply to us

Page 3: Flashback: QCon San Francisco 2012

Silicon Valley based companies:

- Google

- Facebook

- Twitter

- Netflix

- Pinterest

- Quora

- tons of others...

Why San Francisco?

Page 4: Flashback: QCon San Francisco 2012

NoSQL: Past, Present, Future

Eric Brewer – author of CAP theorem

CP vs. AP, but only on time-out (failure)

Page 5: Flashback: QCon San Francisco 2012

Real-time Web with

Page 6: Flashback: QCon San Francisco 2012

Real-time web

node.js – de-facto for real-time web

open connection for user and leave open for him

web sockets are great, but use fallbacks

- mobile devices doesn't support web sockets

- long polling, infinite frame, etc.

more companies moving to SPDY protocol

Page 7: Flashback: QCon San Francisco 2012

on mobile

Page 8: Flashback: QCon San Francisco 2012

Quora on mobile

first iPhone app

- mobile app is like old app shipped on CD

- hybrid application

- native code for controls and navigation

- HTML for viewing Q&A from the site

- separate mobile optimized HTML layout of the

web page

Page 9: Flashback: QCon San Francisco 2012

Quora on mobile

second Android app

- created clone of iPhone app - failed!

- UI natural on iPhone is alien on Android

- bought Android devices and learned their

philosophy

- used new Google Android UI design guidelines

- created new app with native for Android look & feel

- users in India pay per MB, so had to optimize traffic

- optimizations applied for iPhone app and web page

Page 10: Flashback: QCon San Francisco 2012

Quora on mobile

mobile first experience

- mobile has very unique requirements

- if you're good on mobile, you're good anywhere

- don't use mobile app on tablets, create separate

or use web

Page 11: Flashback: QCon San Francisco 2012

Continuous delivery

Page 12: Flashback: QCon San Francisco 2012

Continuous delivery

Jesse Robbins, author of Chef

infrastructure as code

- full stack automation

- datacenter API (for provisioning VMs, etc.)

- infrastructure is a product and app is a customer

Page 13: Flashback: QCon San Francisco 2012

Continuous delivery

application as services

- service orientation

- software resiliency

- deep instrumentation

dev / ops as teams

- service owners

- shared metrics / monitoring

- continuous integration / deployment

Page 14: Flashback: QCon San Francisco 2012

Release engineering at

Page 15: Flashback: QCon San Francisco 2012

Release engineering at Facebook

Chuck Rossi – release engineering manager

deployment process

- teams are not deploying to production by them selves

- for communication during deployment IRC is used

- if team member is not connected to IRC, release is

skipped

- BitTorrent for deployments

- powerful app monitoring and profiling

(instrumentation)

Page 16: Flashback: QCon San Francisco 2012

Release engineering at Facebook

deployment process

- ability to release on subset of servers

- very powerful feature flag mechanism by IP, gender,

age, …

- karma points for developers with down-vote button

facebook.com

- continuously deployed internally

- employees always access latest facebook.com

- easy to report bug from the internal facebook.com

Page 17: Flashback: QCon San Francisco 2012

Scaling

Page 18: Flashback: QCon San Francisco 2012

Scaling Pintereset

everything in Amazon cloud

before

- had every possible ‘hot’ technology including

MySQL,

Cassandra, Mongo, Redis, Memcached, Membase,

Elastic

Search – FAIL

- keep it simple, major re-architecting in late 2011

Page 19: Flashback: QCon San Francisco 2012

Scaling Pintereset

January 2012

- Amazon EC2 + S3 + Akamai, ELB

- 90 Web Engines + 50 API Engines

- 66 sharded MySQL DBs + 66 slave replicas

- 59 Redis

- 51 Memcache

- 1 Redis task queue + 25 task processors

- sharded Solr

- 6 engineers

Page 20: Flashback: QCon San Francisco 2012

Scaling Pintereset

now

- Amazon EC2 + S3 + Akamai, Level3, EdgeCast, ELB

- 180 Web Engines + 240 API Engines

- 80 sharded MySQL DBs + 80 slave replicas

- 110 Redis

- 200 Memcache

- 4 Redis task queues + 80 task processors

- sharded Solr

- 40 engineers

Page 21: Flashback: QCon San Francisco 2012

Scaling Pintereset

schemeless DB design

- no foreign keys

- no joins

- denormalized data (id + JSON data)

- users, user_has_boards, boards, board_has_pins, pins

- read slaves

- heavy use of cache for speed & better consistency

thinking of moving to their own DC

Page 22: Flashback: QCon San Francisco 2012

Architectural patterns for high availability at

Page 23: Flashback: QCon San Francisco 2012

Architectural patterns for HA

Adrian Cockcroft – director of architecture at Netflix

architecture

- everything in Amazon cloud in 3 availability zones

- chaos Gorilla, latency Gorilla

- service-based architecture, stateless micro-services

- high attention for service resilience

- handle dependent service unavailability or

increased latency

started open-sourcing to improve quality of the code

Page 24: Flashback: QCon San Francisco 2012

Architectural patterns for HA

Cassandra usage

- 2 dedicated Cassandra teams

- over 50 Casssandra clusters, over 500 nodes, over 30 TB

of

data, biggest cluster has 72 nodes

- most write operations, for reads Memcache layer is used

- moved to SSD in Amazon instead of spinning disks and

cache

- for ETL: read Cassandara backup files using Hadoop

- can scale zero-to-500 instances in 8 minutes

Page 25: Flashback: QCon San Francisco 2012

timelines at scale

Page 26: Flashback: QCon San Francisco 2012

Timelines at scale

Raffi Krikorian – director of Twiter's platform services

core architecture

- pull (timeline & search) and push (mobile, streams) use-

cases

- 300K QPS for timeline

- on write use fan-out process to copy data for each use-case

- timeline cache in Redis

- when you tweet and you have 200 followers there will be

200

inserts to each follower timeline

Page 27: Flashback: QCon San Francisco 2012

Timelines at scale

core architecture

- Hadoop for batch compute and recommendation

- code heavily instrumented (load times, latencies,

etc.)

- uses Cassandra, but moving off from it due to read

times

Page 28: Flashback: QCon San Francisco 2012

More info

Slides - http://qconsf.com/sf2012

Videos - http://www.infoq.com/