MongoDB for Time Series Data Part 3: Sharding

Preview:

DESCRIPTION

 

Citation preview

Sr. Solutions Architect, MongoDB

Jake Angerman

#MongoDBWorld

Sharding Time Series Data

Let's Pretend We Are DevOps

What my friendsthink I do

What societythinks I do

What my Momthinks I do

What my bossthinks I do What I think I

doWhat I really do

DevOps

Sharding Overview

Primary

Secondary

Secondary

Shard 1

Primary

Secondary

Secondary

Shard 2

Primary

Secondary

Secondary

Shard 3

Primary

Secondary

Secondary

Shard N

Query Router

Query Router

Query Router

……

Driver

Application

Why do we need to shard?

• Reaching a limit on some resource– RAM (working set)– Disk space– Disk IO– Client network latency on writes (tag aware

sharding)– CPU

Do we need to shard right now?• Two schools of thought:

1. Shard at the outset to avoid technical debt later2. Shard later to avoid complexity and overhead

today

• Either way, shard before you need to!– 256GB data size threshold published in

documentation– Chunk migrations can cause memory contention

and disk IOWorking SetFree RAM

Things seemed fine…

Working Set… then I

waited too long to shard

> db.mdbw.stats()

{

"ns" : "test.mdbw",

"count" : 16000, // one hour's worth of documents

"size" : 65280000, // size of user data, padding included

"avgObjSize" : 4080,

"storageSize" : 93356032, // size of data extents, unused space included

"numExtents" : 11,

"nindexes" : 1,

"lastExtentSize" : 31354880,

"paddingFactor" : 1,

"systemFlags" : 1,

"userFlags" : 1,

"totalIndexSize" : 801248,

"indexSizes" : { "_id_" : 801248 },

"ok" : 1

}

collection stats

Storage model spreadsheet

sensors 16,000years to keep data 6docs per day 384,000docs per year 140,160,000docs total across all years 840,960,000indexes per day 801248 bytesstorage per hour 63 MBstorage per day 1.5 GBstorage per year 539 GBstorage across all years 3,235 GB

Why we need to shard now

539 GB in year one alone

1 2 3 4 5 60

500

1,000

1,500

2,000

2,500

3,000

3,500

YearTotal storage _x000d_(GB)

16,000 sensors today… … 47,000 tomorrow?

What will our sharded cluster look like?

• We need to model the application to answer this question

• Model should include:– application write patterns (sensors)– application read patterns (clients)– analytic read patterns– data storage requirements

• Two main collections– summary data (fast query times)– historical data (analysis of environmental conditions)

Option 1: Everything in one sharded cluster

Primary Primary Primary

Secondary

Secondary

Secondary

Secondary

Secondary

Secondary

Shard 2 Shard 3 Shard N

Primary

Secondary

Secondary

Shard 1Primary Shard

Primary

Secondary

Secondary

Shard 4

• Issue: prevent analytics jobs from affecting application

performance

• Summary data is small (16,000 * N bytes) and accessed

frequently

Option 2: Distinct replica set for summaries

Primary Primary Primary

Secondary

Secondary

Secondary

Secondary

Secondary

Secondary

Shard 1 Shard 2 Shard N

Primary

Secondary

Secondary

Replica set

Primary

Secondary

Secondary

Shard 3

• Pros: Operational separation between business

functions

• Cons: application must write to two different databases

Application read patterns

• Web browsers, mobile phones, and in-car navigation devices

• Working set should be kept in RAM

• 5M subscribers * 1% active * 50 sensors/query * 1 device query/min = 41,667 reads/sec

• 41,667 reads/sec * 4080 bytes = 162 MB/sec

– and that's without any protocol overhead

• Gigabit Ethernet is ≈ 118 MB/sec

Primary

Secondary

Secondary

Replica set

1 Gbps

Application read patterns (continued)

• Options– provision more bandwidth ($$

$)– tune application read pattern– add a caching layer– secondary reads from the

replica set

Primary

Secondary

Secondary

Replica set

1 Gbps

1 Gbps

1 Gbps

Secondary Reads from the Replica Set• Stale data OK in this use case

• caution: read preference of secondary could be disastrous in a 3-replica set if a secondary fails!

• app servers with mixed read preferences of primary and secondary are operationally cumbersome

• Use nearest read preference to access all nodes

Primary

Secondary

Secondary

Replica set

1 Gbps

1 Gbps

1 Gbps

db.collection.find().readPref( { mode: 'nearest'} )

Replica Set Tags• app servers in different data centers use

replica set tags plus read preference

nearest

• db.collection.find().readPref( { mode:

'nearest', tags: [ {'datacenter':

'east'} ] } )

east

Secondary

Secondary

Primary

> rs.conf()

{ "_id" : "rs0",

"version" : 2,

"members" : [

{ "_id" : 0,

"host" : "node0.example.net:27017",

"tags" : { "datacenter": "east" }

},

{ "_id" : 1,

"host" : "node1.example.net:27017",

"tags" : { "datacenter": "east" }

},

{ "_id" : 2,

"host" : "node2.example.net:27017",

"tags" : { "datacenter": "east" }

},

}

eastcentralwest

Replica Set Tags• Enables geographic distribution

Secondary

Secondary

Primary

eastcentralwest

Replica Set Tags• Enables geographic distribution

• Allows scaling within each data center

Secondary

Secondary

Secondary

Secondary

Secondary

Secondary

Primary

Secondary

Secondary

Analytic read patterns

• How does an analyst look at the data on the sharded cluster?

• 1 Year of data = 539 GB

2 4 6 8 10 12 14 16 180

50

100

150

200

250

300

Series1; 256

192

128

6432

Server RAM

Number of machines

Application write patterns

• 16,000 sensors every minute = 267 writes/sec

• Could we handle 16,000 writes in one second?

– 16,000 writes * 4080 bytes = 62 MB

• Load test the app!

Modeling the Application - summary

• We modeled:– application write patterns (sensors)– application read patterns (clients)– analytic read patterns– data storage requirements– the network, a little bit

Shard Key

Shard Key characteristics

• A good shard key has:– sufficient cardinality– distributed writes– targeted reads ("query isolation")

• Shard key should be in every query if possible

– scatter gather otherwise

• Choosing a good shard key is important!– affects performance and scalability– changing it later is expensive

Hashed shard key• Pros:

– Evenly distributed writes

• Cons:– Random data (and index) updates can be IO

intensive– Range-based queries turn into scatter gather

Shard 1

mongos

Shard 2

Shard 3

Shard N

Low cardinality shard key

• Induces "jumbo chunks"

• Examples: sensor ID

Shard 1

mongos

Shard 2

Shard 3

Shard N

[ a, b )

Ascending shard key

• Monotonically increasing shard key values cause "hot spots" on inserts

• Examples: timestamps, _id

Shard 1

mongos

Shard 2

Shard 3

Shard N

[ ISODate(…), $maxKey )

Choosing a shard key for time series data

• Consider compound shard key:{arbitrary value, incrementing value}

• Best of both worlds – local hot spotting, targeted reads

Shard 1

mongos

Shard 2

Shard 3

Shard N

[ {V1, ISODate(A)}, {V1, ISODate(B)} ),[ {V1, ISODate(B)}, {V1, ISODate(C)} ),[ {V1, ISODate(C)}, {V1, ISODate(D)} ),…

[ {V4, ISODate(A)}, {V4, ISODate(B)} ),[ {V4, ISODate(B)}, {V4, ISODate(C)} ),[ {V4, ISODate(C)}, {V4, ISODate(D)} ),…

[ {V2, ISODate(A)}, {V2, ISODate(B)} ),[ {V2, ISODate(B)}, {V2, ISODate(C)} ),[ {V2, ISODate(C)}, {V2, ISODate(D)} ),…

[ {V3, ISODate(A)}, {V3, ISODate(B)} ),[ {V3, ISODate(B)}, {V3, ISODate(C)} ),[ {V3, ISODate(C)}, {V3, ISODate(D)} ),…

What is our shard key?

• Let's choose: linkID, date– example: { linkID: 9000006, date: 140312 }– example: { _id: "900006:140312" }– this application's _id is in this form already, yay!

Summary

• Model the read/write patterns and storage

• Choose an appropriate shard key

• DevOps influenced the application– write recent summary data to separate database– replica set tags for summary database– avoid synchronous sensor checkins– consider changing client polling frequency– consider throttling REST API access to app servers

Which DevOps person are you?

Sr. Solutions Architect, MongoDB

Jake Angerman

#MongoDBWorld

Thank You

$ mongo --nodb

> cluster = new ShardingTest({"shards": 1, "chunksize": 1})

$ mongo --nodb

> // now connect to mongos on 30999

> db = (new Mongo("localhost:30999")).getDB("test")

Sharding Experimentation

I decided to shard from the outset

• Sensor summary documents can all fit in RAM

– 16,000 sensors * N bytes

• Velocity of sensor events is only 267 writes/sec

• Volume of sensor events is what dictates sharding

{ _id : <linkID>,

update : ISODate(“2013-10-10T23:06:37.000Z”),

last10 : {

avgSpeed : <int>,

avgTime : <int>

},

lastHour : {

avgSpeed : <int>,

avgTime : <int>

},

speeds : [ 52, 49, 45, 51, ... ],

times : [ 237, 224, 246, 233,... ],

pavement: "Wet Spots",

status: "Wet Conditions",

weather: "Light Rain"

}

> this_is_for_replica_sets_not_sharding = {

_id : "mySet",

members : [

{_id : 0, host : "A”, priority : 3},

{_id : 1, host : "B", priority : 2},

{_id : 2, host : "C"},

{_id : 3, host : "D", hidden : true},

{_id : 4, host : "E", hidden : true, slaveDelay : 3600}

]

}

> rs.initiate(conf)

Configuring Sharding

I'm off to my private island in New Zealand

Replica Set Diagram

> conf = {

_id : "mySet",

members : [

{_id : 0, host : "A”, priority : 3},

{_id : 1, host : "B", priority : 2},

{_id : 2, host : "C"},

{_id : 3, host : "D", hidden : true},

{_id : 4, host : "E", hidden : true, slaveDelay : 3600}

]

}

> rs.initiate(conf)

Configuration Options

My Wonderful Subsection

> conf = {

_id : "mySet”,

members : [

{_id : 0, host : "A”, priority : 3},

{_id : 1, host : "B", priority : 2},

{_id : 2, host : "C"},

{_id : 3, host : "D", hidden : true},

{_id : 4, host : "E", hidden : true, slaveDelay : 3600}

]

}

> rs.initiate(conf)

Configuration Options

Primary DC

Tag Aware Sharding

• Control where data is written to, and read from

• Each member can have one or more tags– tags: {dc: "ny"}– tags: {dc: "ny", subnet: "192.168",

rack: "row3rk7"}

• Replica set defines rules for write concerns

• Rules can change without changing app code

Recommended