Upload
zmagg
View
429
Download
4
Tags:
Embed Size (px)
Citation preview
“Just” shard it Logical sharding at Etsy
Maggie Zhou @zmagg
ScaleConf 2015
@zmagg
What’s the infrastructure?
@zmagg
L A M P
@zmagg
L A M P
@zmagg
@zmagg
2019
http://surge.omniti.com/2011/speakers/ross-snyder
@zmagg
L A M P
yay, databases!
@zmagg
tickets index
shard 1 shard 2 shard 3 shard n
@zmagg
??
@zmagg
master-master replication
@zmagg
tickets index
shard 1 shard 2 shard 3 shard n
ORM
@zmagg
tickets index
shard 1 shard 2 shard 3 shard n
shard n+1
@zmagg
Capacity planning included setting aside
2 months for load balancing
@zmagg
2 months??
@zmagg
2010’s solution is Not Scaling
@zmagg
shop_1 shop_2 shop_3 shop_4
…
shop_2
shard N shard N+1
index
shop_2
…
…writes locked
1)
2)
3) update index, remove lock
@zmagg
Migrations were• error prone
• arbitrary
• developers had to be aware
• created orphaned data
• locked shops & users out of changes for up to hours
• slow
@zmagg
Migrations wereError prone? We can fix the errors! We can make the script more robust!
Arbitrary? We can write tooling that figures out which rows are right to migrate off for optimal balance!
Developers had to be aware? We can write better interfaces!
@zmagg
Orphaned data?• Deletes are expensive, so we didn’t do them.
• Migrations created orphaned data on old hosts that were still picked up by full table scans (downstream systems: search, analytics).
!
@zmagg
Migrations were• error prone
• arbitrary
• developers had to be aware
• created orphaned data
• locked shops & users out of changes for up to hours
• slow
@zmagg
Let’s talk about slowness…
What if we could move more than one row at a time?
@zmagg
<<enter logical sharding>>
@zmagg
Well, okay, why didn’t we do this in the first
place?
@zmagg
You have to run your site to learn your data
access patterns.
@zmagg
photo from gizmodo http://gizmodo.com/5632095/justin-bieber-has-dedicated-servers-at-twitter
@zmagg
listing from https://www.etsy.com/shop/NausicaaDistribution
@zmagg
listing from https://www.etsy.com/shop/Mugnificentart
@zmagg
Scaling Pinterest, 2012 “to increase capacity, a server is replicated and the new replica is responsible for some DBs”
slide 32 https://speakerdeck.com/yashh/scaling-pinterest
Sharding & IDs at Instagram, 2012 “simply by moving a set of logical shards from one database to another"
http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
@zmagg
shard1 shard2
… shard 10
db_host1 db_host2 db_host3 db_hostN
shard1 shard2
… shard10
shard11 shard12
… shard20
shard21 shard22
… shard30
!
!
…
@zmagg
db_host1
shard1 shard2
… shard10
db_host2
shard2
@zmagg
How did we move onto this new architecture?
@zmagg
We used the old row-based migration framework, one last time.
We migrated all the sharded data (120TB) without
downtime or developer disruption.
@zmagg
@zmagg
It took us 5 months to move that data using
the old way.
@zmagg
Today it would take us a few hours.
@zmagg
What’d we build?• made tooling logical shard aware
• generated database configs so that we could have two-way O(1) mappings between logical shards and physical hosts.
@zmagg
DNS related outage…
@zmagg
Nice side effects!• schema changes (alters) now run in parallel,
significantly faster!
• no more orphaned data!
• downstream analytics systems replicate faster at the shard-by-shard level!
@zmagg
The Future
Do you have any limits to how many shard hosts you can add?
@zmagg
Well, yes…It’s 999…because of technical debt.
Code base riddled with checks like this:
!
!
(We can support ~16x data growth, on current day hardware)
@zmagg
2019
http://surge.omniti.com/2011/speakers/ross-snyder
@zmagg
Resources• Using a tickets database for ID generation: http://code.flickr.net/2010/02/08/ticket-servers-distributed-
unique-primary-keys-on-the-cheap/
• Using master-master replication: https://codeascraft.com/2012/04/20/two-sides-for-salvation/
• Etsy’s shard architecture, the 2012 edition: http://www.percona.com/live/mysql-conference-2012/sessions/etsy-shard-architecture-starts-s-and-ends-hard
• Scaling Etsy, 2011: http://surge.omniti.com/2011/speakers/ross-snyder
• Instagram’s shard & id architecture: http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram
• Scaling Pinterest: https://speakerdeck.com/yashh/scaling-pinterest
• Morgue, Etsy’s postmortem tool https://github.com/etsy/morgue
Questions?
“Just” shard it Logical sharding at Etsy
Maggie Zhou @zmagg
ScaleConf 2015