Ensuring Consistency in a Replicated World

Josh Snyder 2014-‐09-‐30

what is Yelp?

• we operate in a bunch of markets • aim to be globally distributed • our users should never see stale content • our developers should be able to design an application resilient to

replication delay

a sample architecture

• a small set of moving parts • enables us to do more with fewer shards • masks geographic traffic split from users and developers • enhanced tolerance to replication delay • ability to

– perform online replication hierarchy changes – batch-load data

our toolset

cookies

• give the client a short-lived “dirty session” cookie • encode the time of the latest interaction between you and them • expire or ignore the cookie after replicas have caught up

cookies

• load balancer: • POST? • GET? -> cookie?

• routes the request into the appropriate datacenter • adds headers to requests

request routing

• users get read-after-write consistency • routing a user’s request between datacenters increases latency !

• getting it wrong: increased load on the master database

tradeoffs

• we need to be assured that a user’s request falls back to a datacenter that has all of their data

tradeoffs

• we need a clear picture of it • never underestimate replication delay, always overestimate

replication delay

• made of lies (for this purpose) • underestimates most of the time • overestimates some of the time

Seconds_Behind_Master

http://bugs.mysql.com/bug.php?id=66921

heartbeats

• insert known data on the master • wait until you see it on the slave • time waited is replication delay

heartbeats

clocks are evil

clocks are evil (2)

pt-heartbeat

yelp_heartbeat

the secret sauce

• A sensu check:

what does that get us? (pt 1)

why that way?

time is hard

http://bugs.mysql.com/bug.php?id=48326

• aggregates heartbeat information • provides it to the webapp • determines when to expire the dirty session cookie

repl_delay_reporter

• Wait for replication: • “I inserted some data; when will it be available on all replicas?”

• Throttle to replication: • “I want to bulk insert data. Will doing so cause too much replication delay?”

operations

• insert some data • ask the master database “what’s the heartbeat right now?”

• ask the repl_delay_reporter “what’s the lowest heartbeat right now?” • wait a bit

• loop until the lowest heartbeat exceeds the original master heartbeat

wait for replication

• determines when to expire the dirty session cookie • relies on only 1 clock, and only for monotonicity • used heavily by batches

– provides read-after-write consistency

wait for replication

• prevents batches from causing excessive replication delay • operates before the beginning of each transaction

– batches ask “is replication delay low enough for me to write right now?”

• batches are required to keep their transactions reasonably-sized

throttle to replication

• load on masters • laggards • over-throttling

gotchas

• batch data can reside on the same shards that serve OLTP requests • support databases with heterogenous SLAs • automatic load-shedding when there is a replication issue

what this gets us

• shunting of nearly ALL reading and reporting off of the master • better mileage out of the Percona toolkit • on-line replication hierarchy changes

what this gets us

Ensuring Consistency in a Replicated World

Technology

2 - Replicated

Important Lessonssrini/15-446/S09/lectures/10-consistency.pdf · Data-Centric Consistency Models The general organization of a logical data store, physically distributed and replicated

Dependable eventual consistency with replicated data types · I am thankful to my advisor, Marc Shapiro, for his insights and constant encouragement that he offered me throughout

ReplicatedDataConsistency …chase/cps510/slides/...Dr. Werner Vogels is Vice President & Chief Technology Officer at Amazon.com. Vogels on consistency Strong consistency: “After

Consistency Distributed Database CAP Theorem anddsb/cs386d/Projects14/CAPConsistencyPN.pdfThe CAP Theorem relates the Consistency model provided by a replicated data store with its

Ensuring Consistency of Critical Systems in Agile Development · Ensuring Consistency of Critical Systems in Agile Development ... Any distribution or copyi ng is subject to prior

Ensuring Consistency Across Nutrition Coordination ... · DNCC INITIATIVE Strengthening Nutrition Leadership and Governance in Uganda ENSURING CONSISTENCY ACROSS NUTRITION COORDINATION

Ensuring the Consistency between User Requirements and

Replicated Data Consistency in the Cloud

1 Consistency of Replicated Data in Weakly Connected Systems CS444N, Spring 2002 Instructor: Mary Baker

Ensuring Petabyte-Scale Data Consistency in a Multicloud …opensource.wandisco.com/documentation/IDC_Technology... · 2018-07-10 · #US44094218 Page 2 IDC TECHNOLOGY SPOTLIGHT Ensuring

Ensuring Quality and Consistency at Regional Centers

HPTS - Towards Stateful Serverless · Graphs (that all compose) CRDT ACID 2.0 Strong Eventual Consistency Replicated & Decentralized Always Converge Correctly Monotonic Merge Function

Masked cross-modal repetition priming: An event-related ...Ventura, Morais, Pattamadilok, and Kolinsky (2004) replicated and ex-tended this effect of sound-to-spelling consistency

Ensuring Model Consistency in Declarative Process Discovery

Ensuring Model Consistency in Declarative Process Discoverymontali/papers/diciccio-etal-BPM2015-consistency... · Ensuring Model Consistency in Declarative Process Discovery ... The

Process design; Ensuring consistency between a piping and ...Master of Science in Engineering and ICT. Supervisor: Bjørn Haugen, IPM. ... Analyze tools for consistency checking betwe

Replicated Data Consistency Explained Through Baseball slides by Landon Cox with some others from elsewhere prepended and appended (cultural history lesson)

Quality-of-Data for Consistency Levels in Geo-replicated ...lveiga/papers/cloud... · Spanner does use Paxos for strong guarantees on replicas. Strong consistency does not work well

Ensuring consistency in a distributed database system by use of