Upload
hujak-hrvatska-udruga-java-korisnika-croatian-java-user-association
View
849
Download
2
Embed Size (px)
DESCRIPTION
One of the strongest points for using a NoSQL database is their focus on distribution — both for replication and sharding. This talks takes a short look at what replication is, why you should use it, and what is so difficult about it. We then take a look at MongoDB’s implementation in general and finally focus on what can go wrong. In a practical demo you see how to find the right balance between performance versus data safety and how to use it in your Java application.
Citation preview
MongoDBReplication
MotivationAvailability & data safety
Read scalability
Helping backups
Data migration
Delayed members
Oplog Tailing (Meteor. js)
https://meteorhacks.com/mongodb-oplog-and-meteor.html
Basics
TerminologyPrimary + Secondaries
Master + Slaves problematic — renamed
Arbiter
http://docs.mongodb.org
http://docs.mongodb.org
http://docs.mongodb.org
Limits50 replica set members
12 before 2.7.8
7 voting members
Example
Single instance$ mkdir 1$ mongod --dbpath 1 --port 27001 --logpath log1$ mongo --port 27001> db.test.insert({ name: "Philipp", city: "Wien" })> db.test.find()
Stop instance
Add replication$ mkdir 2$ mkdir 3$ mongod --replSet javantura --dbpath 1 --port 27001 --logpath log1 --oplogSize 20$ mongod --replSet javantura --dbpath 2 --port 27002 --logpath log2 --oplogSize 20$ mongod --replSet javantura --dbpath 3 --port 27003 --logpath log3 --oplogSize 20
Connect
$ hostname$ mongo --port 27001> db.test.find()
Configure replicationStart on the old instance, otherwise data lostrs.initiate()rs.status()rs.add("PK-MBP:27002")rs.add("PK-MBP:27003")rs.status()db.isMaster()db.test.find()db.test.insert({ name: "Peter", city: "Steyr" })db.test.find()
Read from secondaries$ mongo --port 27002> db.test.find()> rs.slaveOk()> db.test.find()> db.test.insert({ name: "Dieter", city: "Graz" })
slaveOk only valid for the current connection
FailoverKill primary with [Ctrl]+[C]Write to new primary> rs.status()> db.test.insert({ name: "Dieter", city: "Graz" })> db.test.find()
Restart old primary$ mongod --replSet name --dbpath 1 --port 27001 --logpath log1 --oplogSize 20$ mongo --port 27001> rs.status()> rs.slaveOk()> db.test.find()
Inner detailsCapped collection in oplog.rs of the local database> use local> show collectionsme 0.000MB / 0.008MBoplog.rs 0.000MB / 20.000MBreplset.minvalid 0.000MB / 0.008MBslaves 0.000MB / 0.008MBstartup_log 0.003MB / 10.000MBsystem.indexes 0.001MB / 0.008MBsystem.replset 0.000MB / 0.008MB
Inner details> db.oplog.rs.find(){ "h": NumberLong("-265486071808715859"), "ns": "test.test", "o": { "_id": ObjectId("541a8ed285ea5f8ae059d530"), "name": "Dieter" "city": "Graz" }, "op": "i", "ts": Timestamp(1411026642, 1), "v": 2}...
Election
Heartbeat2s interval
10s until election
Election rules1. Priority
2. Optime
3. Connections
Prioritycfg = rs.conf()cfg.members[0].priority = 0cfg.members[1].priority = 1cfg.members[2].priority = 2rs.reconfig(cfg)
Optime
Connections
ElectionCandidate node asks for a vote
Others can veto
ElectionOne yes for one node within 30s
Majority yes elects a new primary
Issues
CAPSelect Availability or Consistency
Partition-tolerance is a prerequisite for distributed systems
"The network is reliable":http://aphyr.com/posts/288-the-network-is-reliable
RollbackOld primary rolls back unreplicated changes once it rejoins the replica set
Rollback filerollback/ in data folder
File name: <database>.<collection>.
<timestamp>.bson
Election timeAt times 5 to 7 minutes
http://www.tokutek.com/2014/07/explaining-ark-part-2-how-elections-and-failover-currently-work/
Missing synchronization during election
Old primary sends last changes to a single node
If not new primary: rollback
Remember
Replication is asynchronous
Multiple primariesUnlikely but possible
Bugs: https://jira.mongodb.org/browse/SERVER-9765
Test script with no replies: https://groups.google.com/forum/#!topic/mongodb-dev/-mH6BOYyzeI
Kyle Kingsbury @aphyr: Call Me Maybehttp://aphyr.com/tags/jepsen
PostgreSQL, Redis, MongoDB, Riak, Zookeeper, RabbitMQ, etcd + Consul,
ElasticSearch
http://aphyr.com/posts/284-call-me-maybe-mongodb
05/2013 version 2.4
Up to 42% data lost
Data written to old primary: rollback
WriteConcernConfigure durability vs performance
https://github.com/mongodb/mongo-java-driver/blob/master/src/main/com/mongodb/WriteConcern. java
WriteConcern. UNACKNOWLEDGED
w=0, j=0
Fire and forget
Default until 11/2012
WriteConcern. ACKNOWLEDGED
w=1, j=0
Current default
Operation completed successfully in memory
WriteConcern. JOURNALED
w=1, j=1
Operation written to the journal file
Since 1.8, single server durability
WriteConcern.FSYNCEDw=1, fsync=true
Operation written to disk
WriteConcern. REPLICA_ACKNOWLEDGED
w=2, j=0
Acknowledged by primary and at least one secondary
w is the server number
WriteConcern. MAJORITY
w=majority, j=0
Acknowledgement by the majority of nodes
wtimeout recommended
WriteConcern. MAJORITY
Nearly no data lost, but high overhead
Write concern performancehttps://blog.serverdensity.com/mongodb-on-google-
compute-engine-tips-and-benchmarks/
3 x 1,000 inserts on GCE
Local 10GB system diskDedicated 200GB disk
Dedicated 200GB for data and journal
n1-standard-2
n1-highmem-8
Thanks! Questions?Now, later today, or @xeraa
Backup Slides
Oplog
Replication via logsMongoDB: Operations log (Oplog)
MySQL: Binary log (Binlog)
Naiv approach: Transmit original queryStatement Based Replication (SBR)DELETE FROM test.table WHERE quantity > 20 LIMIT 1
db.collection.remove({ quantity: { $gt: 20 }}, true)//justOne: true
Unambiguous representation
Row-Based Replication (RBR): Oplog
MongoDBAsynchronous replication
Secondaries can get the Oplog from...
their primary
another secondary with more recent data
Oplog size32bit: 48MB
64bit OS X: 183MB
64bit *nix, Windows: 1GB to 50GB (5% free disk)