60
MongoDB R eplica tion

Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Embed Size (px)

DESCRIPTION

One of the strongest points for using a NoSQL database is their focus on distribution — both for replication and sharding. This talks takes a short look at what replication is, why you should use it, and what is so difficult about it. We then take a look at MongoDB’s implementation in general and finally focus on what can go wrong. In a practical demo you see how to find the right balance between performance versus data safety and how to use it in your Java application.

Citation preview

Page 1: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

MongoDBReplication

Page 2: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Philipp Krenn

@xeraa

Page 3: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

MotivationAvailability & data safety

Read scalability

Helping backups

Page 4: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Data migration

Delayed members

Oplog Tailing (Meteor. js)

https://meteorhacks.com/mongodb-oplog-and-meteor.html

Page 5: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Basics

Page 6: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

TerminologyPrimary + Secondaries

Master + Slaves problematic — renamed

Arbiter

Page 7: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

http://docs.mongodb.org

Page 8: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

http://docs.mongodb.org

Page 9: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

http://docs.mongodb.org

> rs.addArb("arbiter.example.com:3000")

Page 10: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

http://docs.mongodb.org

Page 11: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Limits50 replica set members

12 before 2.7.8

7 voting members

Page 12: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Example

Page 13: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Single instance$ mkdir 1$ mongod --dbpath 1 --port 27001 --logpath log1$ mongo --port 27001> db.test.insert({ name: "Philipp", city: "Wien" })> db.test.find()

Stop instance

Page 14: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Add replication$ mkdir 2$ mkdir 3$ mongod --replSet javantura --dbpath 1 --port 27001 --logpath log1 --oplogSize 20$ mongod --replSet javantura --dbpath 2 --port 27002 --logpath log2 --oplogSize 20$ mongod --replSet javantura --dbpath 3 --port 27003 --logpath log3 --oplogSize 20

Page 15: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Connect

$ hostname$ mongo --port 27001> db.test.find()

Page 16: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Configure replicationStart on the old instance, otherwise data lostrs.initiate()rs.status()rs.add("PK-MBP:27002")rs.add("PK-MBP:27003")rs.status()db.isMaster()db.test.find()db.test.insert({ name: "Peter", city: "Steyr" })db.test.find()

Page 17: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Read from secondaries$ mongo --port 27002> db.test.find()> rs.slaveOk()> db.test.find()> db.test.insert({ name: "Dieter", city: "Graz" })

slaveOk only valid for the current connection

Page 18: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

FailoverKill primary with [Ctrl]+[C]Write to new primary> rs.status()> db.test.insert({ name: "Dieter", city: "Graz" })> db.test.find()

Page 19: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Restart old primary$ mongod --replSet name --dbpath 1 --port 27001 --logpath log1 --oplogSize 20$ mongo --port 27001> rs.status()> rs.slaveOk()> db.test.find()

Page 20: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Inner detailsCapped collection in oplog.rs of the local database> use local> show collectionsme 0.000MB / 0.008MBoplog.rs 0.000MB / 20.000MBreplset.minvalid 0.000MB / 0.008MBslaves 0.000MB / 0.008MBstartup_log 0.003MB / 10.000MBsystem.indexes 0.001MB / 0.008MBsystem.replset 0.000MB / 0.008MB

Page 21: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Inner details> db.oplog.rs.find(){ "h": NumberLong("-265486071808715859"), "ns": "test.test", "o": { "_id": ObjectId("541a8ed285ea5f8ae059d530"), "name": "Dieter" "city": "Graz" }, "op": "i", "ts": Timestamp(1411026642, 1), "v": 2}...

Page 22: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Election

Page 23: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Heartbeat2s interval

10s until election

Page 24: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Election rules1. Priority

2. Optime

3. Connections

Page 25: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Prioritycfg = rs.conf()cfg.members[0].priority = 0cfg.members[1].priority = 1cfg.members[2].priority = 2rs.reconfig(cfg)

Page 26: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Optime

Page 27: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Connections

Page 28: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

ElectionCandidate node asks for a vote

Others can veto

Page 29: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

ElectionOne yes for one node within 30s

Majority yes elects a new primary

Page 30: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Page 31: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Issues

Page 32: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

CAPSelect Availability or Consistency

Partition-tolerance is a prerequisite for distributed systems

"The network is reliable":http://aphyr.com/posts/288-the-network-is-reliable

Page 33: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

RollbackOld primary rolls back unreplicated changes once it rejoins the replica set

Page 34: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Rollback filerollback/ in data folder

File name: <database>.<collection>.

<timestamp>.bson

Page 35: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Election timeAt times 5 to 7 minutes

http://www.tokutek.com/2014/07/explaining-ark-part-2-how-elections-and-failover-currently-work/

Page 36: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Missing synchronization during election

Old primary sends last changes to a single node

If not new primary: rollback

Page 37: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Remember

Replication is asynchronous

Page 38: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Multiple primariesUnlikely but possible

Bugs: https://jira.mongodb.org/browse/SERVER-9765

Test script with no replies: https://groups.google.com/forum/#!topic/mongodb-dev/-mH6BOYyzeI

Page 39: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Kyle Kingsbury @aphyr: Call Me Maybehttp://aphyr.com/tags/jepsen

PostgreSQL, Redis, MongoDB, Riak, Zookeeper, RabbitMQ, etcd + Consul,

ElasticSearch

Page 40: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

http://aphyr.com/posts/284-call-me-maybe-mongodb

05/2013 version 2.4

Up to 42% data lost

Data written to old primary: rollback

Page 41: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn
Page 42: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

WriteConcernConfigure durability vs performance

https://github.com/mongodb/mongo-java-driver/blob/master/src/main/com/mongodb/WriteConcern. java

Page 43: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

WriteConcern. UNACKNOWLEDGED

w=0, j=0

Fire and forget

Default until 11/2012

Page 44: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

WriteConcern. ACKNOWLEDGED

w=1, j=0

Current default

Operation completed successfully in memory

Page 45: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

WriteConcern. JOURNALED

w=1, j=1

Operation written to the journal file

Since 1.8, single server durability

Page 46: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

WriteConcern.FSYNCEDw=1, fsync=true

Operation written to disk

Page 47: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

WriteConcern. REPLICA_ACKNOWLEDGED

w=2, j=0

Acknowledged by primary and at least one secondary

w is the server number

Page 48: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

WriteConcern. MAJORITY

w=majority, j=0

Acknowledgement by the majority of nodes

wtimeout recommended

Page 49: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

WriteConcern. MAJORITY

Nearly no data lost, but high overhead

Page 50: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Write concern performancehttps://blog.serverdensity.com/mongodb-on-google-

compute-engine-tips-and-benchmarks/

3 x 1,000 inserts on GCE

Local 10GB system diskDedicated 200GB disk

Dedicated 200GB for data and journal

Page 51: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

n1-standard-2

Page 52: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

n1-highmem-8

Page 53: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Thanks! Questions?Now, later today, or @xeraa

Page 54: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Backup Slides

Page 55: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Oplog

Page 56: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Replication via logsMongoDB: Operations log (Oplog)

MySQL: Binary log (Binlog)

Page 57: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Naiv approach: Transmit original queryStatement Based Replication (SBR)DELETE FROM test.table WHERE quantity > 20 LIMIT 1

db.collection.remove({ quantity: { $gt: 20 }}, true)//justOne: true

Page 58: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Unambiguous representation

Row-Based Replication (RBR): Oplog

Page 59: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

MongoDBAsynchronous replication

Secondaries can get the Oplog from...

their primary

another secondary with more recent data

Page 60: Javantura v2 - Replication with MongoDB - what could go wrong... - Philipp Krenn

Oplog size32bit: 48MB

64bit OS X: 183MB

64bit *nix, Windows: 1GB to 50GB (5% free disk)