45
Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases, Rackspace/ ObjectRocket www.linkedin.com/in/wilkinskimberly , @dba_denizen, [email protected]

Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Percona Live 2016

Kimberly Wilkins

Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations

Principal Engineer - Databases, Rackspace/ObjectRocket

www.linkedin.com/in/wilkinskimberly, @dba_denizen, [email protected]

Page 2: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

My Background

•  18+ Years working on various database platforms •  Mainly Oracle (databases, RAC, Enterprise Manager,

GoldenGateReplication, DataGuard, Database Vault, Exadata) •  MongoDB NoSQL and Big Data Infrastructure and techs at OR •  Industries –early online auto auctions, gaming, social media •  Specialties –re-architect enterprise db environments, infrastructure,

implementations, RAC, replication, system kernels, database storage •  Re-engineered the database infrastructure for SWTOR –Star Wars The

Old Republic MMO Game

Page 3: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Overview - Sharding

•  What is Sharding? •  Why do it? •  When to Shard? When not to Shard? •  Sharding Process •  Selecting Shard Keys •  Specific Tips and Examples, Managing Shards and Scaling •  Radical ideas and Storage Engine Considerations

Page 4: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Sharding – What is it?

• Sharding = Horizontal Scaling, Partitioning • Scale Out – add physical or virtual hosts • Add supporting network and app layers

Redundancy Flexible, Scalable Architectures

Add Resources on the fly

HA DR Fault Tolerant

Clusters

Many Different Sources, Types

Page 5: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Commodity Hardware vs Big Iron

• Multiple smaller hosts or Virtuals/Containerization • Larger Single Servers with Massive CPU, RAM, SAN’s

Page 6: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most
Page 7: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Out (Horizontal) vs Up (Ver tical)

• Multiple smaller hosts or Virtuals/Containerization • Larger Single Servers with Massive CPU, RAM, SAN’s • Out NOT Up, more smaller not fewer BIGGER

Page 8: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Why Do it?

• Scalability, Performance, High Availability, Redundancy

Page 9: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

High Availabil ity Matters; Redundancy Matters. Without a way to post, creep and comment on things within the world's most popular social network, many turned to Twitter with their updates. PANIC!!

“#facebookdown day 1, minute 3: we still have electricity, poppa has been hoarding antibiotics and microbrews. We're gonna ride this out.”

There are people on the streets hurling printed-out pics of their kids at strangers bellowing “Like them. Like them.” #facebookdown

Page 10: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Why Do it? Big Data Requirements

Page 11: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Now Internet of Al l…

Page 12: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

IoT, IoE - Internet of Things, Internet of Everything $$

Page 13: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Why Do it? … Why NoSQL?

• Faster, more flexible development – 24% • Lower software, hardware, and deployment $$ - 21% • Performance - faster writes, faster reads • Developers – “schemaless”, cool toys • ^^ dev’s than ^ dba’s • Variety of NoSQL Technologies

Page 14: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

RDBMS NoSQL records documents tables collections, buckets, tables rows fields set data types flexible data types rigid schemas, structured data Unstructured & structured data primary keys document or objectId’s normalized de-normalized referential integrity duplicated data is OK joins index intersections, partials

Page 15: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

When To Shard?

• Need Better Performance • Need Additional Write Scopes • App development today => Think ahead, expect growth • EARLY – Shard BEFORE You Run Out of Resources • Have Different Use cases • Best Tool for the Job - aka Polyglot Persistence

Page 16: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most
Page 17: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Why MongoDB Specif ically?

Page 18: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

How to Shard?

• Architectural Overview • General Process and Steps for Sharding • Shard Key Selection • Details, Examples, and Tips • Managing Shards and Replication • Radical ideas and Storage Engine Consideration

Page 19: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Primary

Secondary Secondary Heartbeat

Single Replica Set Basic MongoDB Architecture

Page 20: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Shard 1

Secondary

Secondary

Primary

Shard 2

Secondary

Secondary

Primary

Shard 3

Secondary

Secondary

Primary

Client Drivers

MongoS Tier (Router)

MongoD Tier Replica Sets

MongoS MongoS MongoS MongoS

Config Servers (Metadata)

Config 3

Config 1

Config 2

Replica Set 3.2

Sharded Cluster

Page 21: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

•  Good key, good performance. Bad key, bad performance. •  NO PERFECT Shard Key –trade-offs – users/social apps •  Shard Key - in all docs – immutable ** •  Shard Key -used in queries, know your query patterns •  Easily divisible – for balanced chunks, increase cardinality •  Consider Compound Keys to better limit return set •  Shard early, shard often – impactful so don’t wait

Shard Key Considerat ions

Page 22: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

•  What Does your App do? How does it work? •  More read heavy? More write heavy? Balanced 50/50? •  1 activity more important than others - ex. we write a lot but

we make our $ by people querying •  Expected growth patterns - per week? per month? per year? •  Busy times of day? week? month? year? •  Bulk Loads/Deletes? ever? when? •  Current pain or performance problem areas?

ASK <Addit ional> QUESTIONS!! !

Page 23: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

•  Profiler to ALL - REPRESENTATIVE time period •  Type of queries, # per namespace •  Patterns and predicate via aggregs •  Check for nulls – NO nulls allowed shard keys •  Consider Compound Keys - limit return set •  Check Cardinality – on secondaries – less hurtful!! •  NO PERFECT Shard Key –trade-offs – users/social apps

How to Shard – General Steps

Page 24: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

How to Shard – Specif ic Tasks

•  Perform Profiling and Query Pattern Analysis •  Select the BEST option for the Shard Key •  Create the Required Shard Key Index •  Disable the Balancer •  Enable Sharding at The DB level •  Shard the Collection / Add Shards •  Re-enable the Balancer

Page 25: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Sample Shard Key Evaluation Queries/Aggs

•  **Run Queries and Aggregations against unused Secondaries** SECONDARY> db.events.new.aggregate([{$project:{"ContainerId":1}},{ $group: { _id: "$ContainerId"} },{ $group: { _id: 1, count: { $sum: 1 } } }],{allowDiskUse:true}) { "_id" : 1, "count" : 3303464 ** Note good cardinality here**

•  SECONDARY> db.events.news.aggregate([{$project:{"ContainerId":1}},{$group: { _id:"$ContainerId",number : {$sum:1}}},{$sort:{number:-1}},{$limit:20}],{allowDiskUse:true})

•  { "_id" : "pnx-xxxxxxxx.003", "number" : 46889 }

•  { "_id" : "jhx-xxxxxxxx.002, "number" : 23644 }

•  { "_id" : "3tq9-xxxxxxxx.001", "number" : 17769 }

Page 26: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Another Shard Key Aggregation example •  Run agg query to find the most common reference id (rid) values and sort

to give you the top 5:

SECONDARY> db.items.aggregate([{$project:{rid:1}},{$group:{_id:"$rid",count:{$sum:1}}},{$sort:{count:-1}},{$limit:5}])

Page 27: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Another Shard Key Aggregation example –cont’d { "_id" : ObjectId("f58400d9a5e140d83af22035"), "count" : 1719248 }, { "_id" : ObjectId("f1430058d66d4d2861c1f435"), "count" : 1618900 }, { "_id" : ObjectId("eb80103780289205d2ed1645"), "count" : 1205436 }, { "_id" : ObjectId("ee220058d66d4d2853495435"), "count" : 1194683 }, { "_id" : ObjectId("cd0c103780289205fe7bb845"), "count" : 1158741

Page 28: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Actual Sample Sharding Commands

•  Shard Key Selection Analysis and Considerations •  Create required index :

use users; db.users.ensureIndex( {“_id” : “hashed”},{background:true} );

•  Enable sharding at the db level : use admin; db.runCommand( {enablesharding: “users”} );

•  Shard the collection db.adminCommand( { shardCollection :“users.users”,key : {“_id”:”hashed”} } );

Page 29: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Pre-Sharding for Very Active, Larger Collections

•  Connect to a‘non-real’ mongo shell •  Use javascript to create javascript for desired weeks •  Start a screen or tmux and name it •  Connect via MongoS to your real desired instance as admin db •  Use the generated scripts/commands to enable sharding at

the db level then create the collections with desired #of pre-allocated initial chunks

Page 30: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Snippet - javascript to create javascript

[host1] > mongo –nodb >

count=16;

while (count<17){

print("db.runCommand( { enablesharding : 'clicks-2016-"+count+"' } ) ;");

print("db.adminCommand({shardCollection:'clicks-2016-"+count+".clicks-2016-"+count+"', key:{'_id' : 'hashed'}, numInitialChunks : 2000});")

count++

}

Page 31: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Running Snippet to actually pre-create chunks

mongos> sh.getBalancerState() false

mongos> sh.isBalancerRunning() false

mongos> sh.stopBalancer()

-db.runCommand( { enablesharding : 'clicks-2016-16' } ) ;

db.adminCommand({shardCollection:'clicks-2016-13.clicks-2016-16', key:{'_id' : 'hashed'}, numInitialChunks : 2000}); …… . . . . . .

db.runCommand( { enablesharding : 'clicks-2016-17' } ) ;

db.adminCommand({shardCollection:'clicks-2016-18.clicks-2016-17', key:{'_id' : 'hashed'}, numInitialChunks : 2000});

Page 32: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Confirm pre-created chunks and Balance mongos> sh.status()

{ "_id" : "clicks-2016-15", "partitioned" : true, "primary" : "133b848dc15d027a626100a490de2430" }

clicks-2016-15.clicks-2016-15

shard key: { "_id" : "hashed" }

chunks:

133b848dc15d027a626100a490de2430 1030

14c15aaec95f9b40f00007e6336d3e08 638 <<removed 2 lines >> bit see still growing there with natural splits>>

{ "_id" : "clicks-2016-16", "partitioned" : true, "primary" : "133b848dc15d027a626100a490de2430" }

clicks-2016-16.clicks-2016-16 <<Just checking that correct weeks were created>>

shard key: { "_id" : "hashed" }

chunks:

133b848dc15d027a626100a490de2430 500

14c15aaec95f9b40f00007e6336d3e08 500

fa2b52148f1bfe3621b50ca9e3b3e5e2 500

fbdf42a1e05260c8a58057ed7e2eb77b 500

too many chunks to print, use verbose if you want to force print

Page 33: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

•  1st case - Large # of of Small sized Shards •  MANY Smaller shards as they need additional write scopes

•  2nd case - Medium # of Medium sized Shards

•  Larger but still need write scopes but without users spread so far across all of the shards when reading

•  3rd case - Smaller # of larger sized shards

•  Need additional resources for higher number of connections, higher number of queries

•  IN ALL 3 Cases – they are sharded on write friendly "_id" : "hashed”

3 Ver y Di f ferent Use Cases for Sharding

Page 34: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

BY - Large # Small Shards DR – Medium # Medium Shards BS – Small # Large Shards

mobile analytics and marketing app Shard Key - "_id" : "hashed"

social media app holding connective user data Shard Key - "_id" : "hashed”

Mobile game marketing and monetization customer Shard Key - "_id" : "hashed"

256 million smaller user docs of ~2143 bytes Smaller user updates and campaigns

~82 million bigger user docs of ~26036 bytes ~~10 billion smaller device docs of ~ 252 bytes Lots and lots of devices - mobile phones

45 shards @ 20G Plan size 22 shards @ 100G Plan size 7 shards @ 500G Plan Size

100 – 160 Queries per Second 100 – 125 Queries per Second

400 – 2000 Queries per Second *have seen up to 300,000 QPS

20 – 40 Updates per Second 85 – 110 Updates per Second 20 – 40 Updates per Second 10 – 20 Inserts per Second

~1200 connections per shard * 45 shards so ~54,000 connections

~4000 connections per shard * 22 shards so ~88,000 connections

~5700 connections per shard *7 shards so ~40,000 connections

Need more smaller shards for the lot more write scopes

Need more write scopes but not the associated spread out scatter gathers so not as many shards

Need additional resources of larger shards due to higher number of queries, connections, and smaller size of objects

Well balanced chunks and disks Well balanced chunks and disks AFTER initially taking a bit to get balanced

Not balanced naturally – must manipulate via numInitialCHhnks at new db and sharded collection creation point

Page 35: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Bad Shard Keys…. $#@#$

Page 36: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Bad Shard Keys…. Bad Per formance

•  Hot Spotting for Writes •  Hot Spotting for Reads •  Disk Imbalance •  Jumbo Chunks •  Slow Queries •  Slow Performance •  Slow Apps •  Angry Customers

Page 37: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Bad Shard Key… What to Do?

Fix It !!! - dump & restore - drains

Page 38: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Bad Shard Key… Fixing

•  Dump and Restore – Dump collection; drop collection; recreate collection – Re-shard collection, restore collection

•  Drain Shard – Estimate moveChunk time db.getSiblingDB("config").changelog.find({"what" : "moveChunk.commit"},{time:1,_id:0}).sort({time:-1})

–  Run js script to generate moveChunk commands

– Stop Balancer -Run moveChunk script – -Run removeShard command twice – Restart Balancer

•  ;

Page 39: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Bad Shard Key… Fixing … script examples

Page 40: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Larger Replica Set vs. Sharding

Replica Set Sharding Want simplification Expertise for Sharding Lots of reads – don’t want scatter gathers

Lots of writes/updates – want to go directly to exact shards

Lots of data, lower activity Lots of data, lots of activity Need More ‘normal’ resources – just disks

Need more of all resources – disks, RAM, CPU, write scopes

Application Knowledge Application Knowledge

Religious War - Do Not Engage Religious War - Do Not Engage

Page 41: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Storage Engine Considerations •  Workloads •  Document Sizes

– Now

– Future

•  Collection Sizes

– Now – Future

•  Hardware

Page 42: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

WiredTiger vs. MMAPv1 WiredTiger MMAPv1

Freq writes, inserts, appends Still better for heavy read loads

Compression; defragmentation No compression, fragmentation

Intent level locking (document) Collection level locking

Mass bulk loads, small docs V Updates in place, esp, that grow

Complete write and replace Updates existing, grow and move

Cache Eviction settings and issues, Cache settings, threads

Will use all memory allocated –memory mapped files

Page 43: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Questions?

Page 44: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

Contacts

[email protected] @dba_denizen

Slideshare: http://www.slideshare.net/kiwilkins/

Page 45: Percona Live 2016 · Percona Live 2016 Kimberly Wilkins Updated Sharding Guidelines in MongoDB 3.x with Storage Engine Considerations Principal Engineer - Databases ... world's most

ObjectRocket by Rackspace