26
Replication Internals Fitting Everything Together

MongoDB 2.8 Replication Internals: Fitting it all together

Embed Size (px)

DESCRIPTION

MongoDB replication internal architecture for 2.8 Abstract: Replication in MongoDB requires deep integration with almost every part of the codebase, and has important hooks in various systems like storage, indexing, command processing and querying. Most of the replication components have seen a major overhaul recently in order to make further improvements. In this talk we will address what those pieces are, how they interact, and interesting choices made during their design. In this talk we get into the interaction of the replication protocols, commands really, writes and write concern enforcement, consensus (elections/ leader/follower/ majority) behaviors, and down into the depths of oplog generation and application on replicas. While a large part of the talk will be a technical overview of the big pieces we will dive into many important areas in order to ensure better understanding. The audience will be able to greatly affect which areas we focus on during the session, so come with ideas and a focus.

Citation preview

Page 1: MongoDB 2.8 Replication Internals: Fitting it all together

Replication

InternalsFitting Everything Together

Page 2: MongoDB 2.8 Replication Internals: Fitting it all together

2.8, Refactored

● Architecture as of 2.8

● Unit testable; more, and faster, cpp tests

● Many changes (heartbeats, locking, future)

● Interop with 2.6

● Larger replica sets

Page 3: MongoDB 2.8 Replication Internals: Fitting it all together

Large Blocks

● Topology Manager (state machine)

● Replication Coordinator (repl facade)

● Applier (replicate/apply oplog)

● Executor (network, heartbeats, serialization)

● Commands (re-config, init, status, etc)

● External (writes, storage, query, commands)

Page 4: MongoDB 2.8 Replication Internals: Fitting it all together

Blocks

Applier

Topology Manager

Replication

Coordinator

Oplog

CFG

CM

Ds

Write

s

Qu

ery

Executo

r

Page 5: MongoDB 2.8 Replication Internals: Fitting it all together

Blocks

Applier

Topology Manager

Replication

Coordinator

Oplog

CFG

CM

Ds

Write

s

Qu

ery

Executo

r

Page 6: MongoDB 2.8 Replication Internals: Fitting it all together

Topology

● Maintains Authoritative Stateo Heartbeat, ping, member state

o Roles and transitions

● Contains Decision Logic

● Unit Testable

● Serial AccessTopology Manager

CFG

Page 7: MongoDB 2.8 Replication Internals: Fitting it all together

Examples

● updateConfig

● prepare*Response for commands

● getSyncSource, *

● setFollowerMode (state)

● processHeartbeat

● prepareHeartbeatResponse

Page 8: MongoDB 2.8 Replication Internals: Fitting it all together

PrepareHeartbeatResponseStatus TopologyCoordinatorImpl::prepareHeartbeatResponse(...) {

// Check error conditions, then set response fields …

response->setElectable(!_getMyUnelectableReason(...));

response->setHbMsg(_getHbmsg(...));

response->setTime(...);

response->setOpTime(lastOpApplied);

if (!_syncSource) {

response->setSyncingTo(_syncSource); }

… topology_coordinator_impl.cpp:628

Page 9: MongoDB 2.8 Replication Internals: Fitting it all together

Failover Scenario

Heart

beats P

S

S

Health Check (rsHB)Active Primary

Page 10: MongoDB 2.8 Replication Internals: Fitting it all together

Failover Scenario

Heart

beats P

S

S

Active PrimaryP

Failed Primary

Page 11: MongoDB 2.8 Replication Internals: Fitting it all together

Failover Scenario

Heart

beats Failed

P

S

Health Check (rsHB)

Page 12: MongoDB 2.8 Replication Internals: Fitting it all together

Blocks

Applier

Topology Manager

Replication

Coordinator

Oplog

CFG

CM

Ds

Write

s

Qu

ery

Executo

r

Page 13: MongoDB 2.8 Replication Internals: Fitting it all together

Replications Coordinator

● Interface to other subsystems

● Uses executor to scheduleo Commands

o Elections, Initiate, Reconfig

o Role/State Changes

● Unit Testableo With help, requires mocking out bridge for

subsystems

Replication

Coordinator

Page 14: MongoDB 2.8 Replication Internals: Fitting it all together

Blocks

ApplierReplication

Coordinator

OplogC

MD

s

Write

s

Qu

ery

Executo

r

Topology ManagerCFG

Page 15: MongoDB 2.8 Replication Internals: Fitting it all together

Examples

● process*Response for commands

● awaitReplication* (for writes or migration)

● isReplEnabled

● canAcceptWrites*

Page 16: MongoDB 2.8 Replication Internals: Fitting it all together

Accepting writesstatic bool checkIsMasterForDatabase(const std::string& db, ...) {

if (!getReplicationCoordinator()->canAcceptWritesForDatabase(db)){

errorDetail->setErrCode(ErrorCodes::NotMaster);

errorDetail->setErrMessage("Not primary while writing to " + ns);

return false;

}

return true;

}

Page 17: MongoDB 2.8 Replication Internals: Fitting it all together

Blocks

Applier

Topology Manager

Replication

Coordinator

Oplog

CFG

CM

Ds

Write

s

Qu

ery

Executo

r

Page 18: MongoDB 2.8 Replication Internals: Fitting it all together

Applier

● Reads from *upstream* oplog

● Applier operations transformations

● Mostly unchanged since 2.4

● Includes UpdatePosition commands

Applier

Page 19: MongoDB 2.8 Replication Internals: Fitting it all together

Read + Apply Decoupled

● Background oplog reader thread (net)

● Pool of oplog applier threads (by collection)

Repl Source

Applier

Pool

Buffer

DB4

DB3

DB1 DB2

Local Oplog

Network

Page 20: MongoDB 2.8 Replication Internals: Fitting it all together

Replication Operations

oplog entry (fields):

o = update, o2 = query

{ "ns" : "test.tags",

"op" : "u", "v" : 2, "ts": ...,

"o2" : { "_id" : 1 },

"o" : { "$set" : { "tags.4" : "e" } } }

Page 21: MongoDB 2.8 Replication Internals: Fitting it all together

Blocks

Applier

Topology Manager

Replication

Coordinator

Oplog

CFG

CM

Ds

Write

s

Qu

ery

Executo

r

Page 22: MongoDB 2.8 Replication Internals: Fitting it all together

Executor

● Serializes access to Topology state

● Serializes global state changes wrt db writes

● Processes network requests in IO pool

● Supports event/signal notification

Page 23: MongoDB 2.8 Replication Internals: Fitting it all together

Write Request

● Sent by user

● Interpreted by command subsystem

● Checked by replication coordinator

● Executed

● Idempotent entry recorded in oplog

● ~ Replicated

● ~ Possibly verified during user write request

Page 24: MongoDB 2.8 Replication Internals: Fitting it all together

Write Request

ApplierReplication

Coordinator

OplogC

MD

s

Write

s

Qu

ery

Executo

r

Topology ManagerCFG

Page 25: MongoDB 2.8 Replication Internals: Fitting it all together

● Topology Manager (state machine)

● Replication Coordinator (repl facade)

● Applier (replicate/apply oplog)

● Executor (network, heartbeats, serialization)

● Commands (re-config, init, status, etc)

● External (writes, storage, query, commands)

Page 26: MongoDB 2.8 Replication Internals: Fitting it all together

Thanks

Questions?