17
Cloud Computing Clase 7 Miguel Saez @masaez Johnny Halife @johnnyhalife Matias Woloski @woloski Based on a slide deck from Steve Huffman presented on May 2010

Cloud Computing Clase 7 Miguel Johnny e Matias Based on a slide deck from Steve Huffman presented on

Embed Size (px)

DESCRIPTION

Reddit.com

Citation preview

Page 1: Cloud Computing Clase 7 Miguel Johnny e Matias Based on a slide deck from Steve Huffman presented on

Cloud ComputingClase 7

Miguel Saez

@masaez

Johnny Halife@johnnyhalife

Matias Woloski

@woloski

Based on a slide deck from Steve Huffman presented on May 2010

Page 2: Cloud Computing Clase 7 Miguel Johnny e Matias Based on a slide deck from Steve Huffman presented on

Lecciones Aprendidas en Reddit

• Sitio: Reddit.com• Objetivo: entender lo que significa hacer una

aplicacion web que recibe 270 millones de page views por mes

• http://vimeo.com/10506751 • Puntos mas importantes

– Esquema abierto– Procesamiento asincronico– Stateless– Caching

Page 3: Cloud Computing Clase 7 Miguel Johnny e Matias Based on a slide deck from Steve Huffman presented on

Reddit.com

Page 4: Cloud Computing Clase 7 Miguel Johnny e Matias Based on a slide deck from Steve Huffman presented on

A brief history of reddit• Founded in June 2005• Acquired by Condé Nast October

2007• 7.5 Million user / month• 270 Million page views / month• Many mistakes along the way

Page 5: Cloud Computing Clase 7 Miguel Johnny e Matias Based on a slide deck from Steve Huffman presented on

Lesson 1: Crash!• …and restart.• Daemontools (supervise)• Single greatest improvement to

uptime we ever made.• When in doubt, let it die.• Don’t forget to read the logs!

Page 6: Cloud Computing Clase 7 Miguel Johnny e Matias Based on a slide deck from Steve Huffman presented on

Lesson 2: Separation of services

• Often, one->two machines more than doubles performance.

• Group similar process together.• Group similar types of data together.• Better caching.• Less contention for CPU.• Avoid threads. Processes are easier

to separate later.

Page 7: Cloud Computing Clase 7 Miguel Johnny e Matias Based on a slide deck from Steve Huffman presented on

Lesson 3: Open Schema

ID UPS DOWNS

TITLE URL

12345 120 34 Buffins Create Zombie Dog!

www.someaussiesite.co.au/dog.html

12346 3 24 Check out my new blog!

noobspamer.blogspot.com

12347 509 167 Pee in a sink if you’ve ever voted up.

self

Page 8: Cloud Computing Clase 7 Miguel Johnny e Matias Based on a slide deck from Steve Huffman presented on

Lesson 3: Open SchemaIn the early days:

• Too much time spent thinking about the database.

• Every feature required a schema update.

• Schema updates became more painful as we grew.

• Maintaining replication was difficult.• Deployment was complex.

Page 9: Cloud Computing Clase 7 Miguel Johnny e Matias Based on a slide deck from Steve Huffman presented on

Lesson 3: Open Schema

THING_ID KEY VALUE

12345 Title Boffins Create Zombie Dog!

12345 URL www.someaussiesite.com.au/zombiedog.html

12346 Title Pee in a sink if you’ve ever voted up.

12346 URL self

ID UPS DOWNS TYPE12345 120 34 Link12346 3 24 Link

Thing Data

Page 10: Cloud Computing Clase 7 Miguel Johnny e Matias Based on a slide deck from Steve Huffman presented on

Lesson 3: Open SchemaWith an open schema:

• Faster development• Easier deployment• Maintainable database replication• No joins = easy to distribute• Must be careful to maintain

consistency

Page 11: Cloud Computing Clase 7 Miguel Johnny e Matias Based on a slide deck from Steve Huffman presented on

Lesson 4: Keep it stateless• Goal: any app server can handle any

request• App server failure/restart is no big

deal• Scaling is straightforward• Caching must be independent from a

specific app server.

Page 12: Cloud Computing Clase 7 Miguel Johnny e Matias Based on a slide deck from Steve Huffman presented on

Lesson 5: Memcache everything

• Database data• Session data• Rendered pages• Memoizing internal functions• Rate-limiting (user actions, crawlers)• Storing pre-computing listings/pages• Global locking• Memcachedb for persistence

Page 13: Cloud Computing Clase 7 Miguel Johnny e Matias Based on a slide deck from Steve Huffman presented on

Lesson 6: Store redundant data

• Recipe for slow: keep data normalized until you need it.

• If data has multiple presentations, store it in multiple times in multiple formats.

• Disk and memory is less costly than making your users wait.

Page 14: Cloud Computing Clase 7 Miguel Johnny e Matias Based on a slide deck from Steve Huffman presented on

Lesson 7: Work offline• Do the minimum amount of work to

end the request.• Everything else can be done offline.• An architecture of queues is simple

and easy to scale.• AMQP/RabbitMQ.

Page 15: Cloud Computing Clase 7 Miguel Johnny e Matias Based on a slide deck from Steve Huffman presented on

Lesson 7: Work offline• Pre-computing listings• Fetching thumbnails• Detecting cheating• Removing spam• Computing awards• Updating the “search” index

Page 16: Cloud Computing Clase 7 Miguel Johnny e Matias Based on a slide deck from Steve Huffman presented on

Lesson 7: Work offline

Master Databases

App Servers

Worker Databases

Cache

Precomputer

Thumbnailer

Spam

Request

Queue

Page 17: Cloud Computing Clase 7 Miguel Johnny e Matias Based on a slide deck from Steve Huffman presented on

Consigna

• Reimplementar la funcionalidad de ranking de “el Prode” utilizando lo aprendido luego de haber visto esta presentacion