How sitecore depends on mongo db for scalability and performance, and what it can teach you

Preview:

Citation preview

How Sitecore depends on MongoDB for scalability and performance, and

what it can teach youAntonios Giannopoulos

Database Administrator – ObjectRocket

Grant Killian Sitecore Architect - Rackspace

Percona Live 2017

Agenda

We are going to discuss:

Key terms

- Introduction to Sitecore

- Introduction to MongoDB

Best Practices for MongoDB with Sitecore

Scaling Sitecore

Benchmarks

Who We AreAntonios GiannopoulosDatabase Administrator w/ ObjectRocket

Grant KillianSitecore Architect w/ Rackspace

Sitecore MVP

Sitecore ArchitectureMinimum necessary to understand this talk

Gartner Magic

Quadrant for

WCM (Web

Content

Management)

-Sept 2016

Sitecore is a framework for building websites...

Sitecore ♥ MongoDB because . . .

● Unstructured document model is a better fit for

Sitecore analytics vs traditional database rows

● ∞ scalability

● Introduces key flexibility to the system

○ HTTP Session state

○ Optional repository for other Sitecore modules

○ 100% replacement for SQL Server (experimental)

■ $$$

MongoDB replica-setA group of mongod processes that maintain the same dataset

Replica sets provides:

- Redundancy

- High availability

- Scaling

MongoDB replica-setConsists of at least 3 nodes- Up to 50 nodes in 3.0 and higher

- 12 on previous versions

A replica-set node may be either:- Primary

- Secondary

- Arbiter

MongoDB replica-set

Asynchronous replication

- Delay between PRI and SECs

- SECs pull and apply operations

Automatic failover

- If a PRI fails a SEC takes its place

MongoDB replica-set

Best Practices

- Odd number of members

- Use same server specs

- Reliable network connections

- Adjust the oplog accordingly

MongoDB Sharded ClustersConsists of:Mongos- It’s a statement (query) router- Connection interface for the driver - makes sharding transparent

Config Servers: Holds cluster metadata - location of the dataShards: Contains a subset of the sharded data

MongoDB Sharded Clusters

MongoDB Sharded ClustersBest Practices

- Deploy shards as replica-sets

- Reliable network connections

- But most important… pick a shard key

Undo a shard key might require downtime

MongoDB Sharded ClustersWhat makes a good shard key:- High Cardinality

- Not Null values

- Immutable field(s)

- Not Monotonically increased fields

- Even read/write distribution

- Even data distribution

- Read targeting/locality

Most important choose a shard key according to your application requirements

MongoDB Storage Engines

MongoDB version 3.0 and higher supports:- MMAPv1

- WiredTiger

- RocksDB (Percona Server)

- In Memory (Percona Server)

- Fractal Tree (Percona Server)

Sitecore MongoDB Databases1. Analytics - customer visit metrics (IP address, browser,pages…) 2. Tracking_contact - contact processing3. Tracking_history - history worker queue for full rebuilds4. Tracking_live - task queue for real-time processing5. Private_session - “classic” http session state 6. Shared_session - meta http session state for contacts

(engagement state for livetime of interactions…)

For example . . .

Graphic courtesy of http://www.techphoria414.com

Scaling Sitecore – Separate Workloads

Move each Sitecore database to a separate instance

Sitecore uses different connection string per DatabaseconnectionString="mongodb://_mongo_server_01_:_port_number_/_session_database_name_" />connectionString="mongodb://_mongo_server_02_:_port_number_/_analytics_database_name_" />

Instances can be optimized according to their workload

Scaling Sitecore – PolyglotUse a different storage engine per database:

- Different instances- Sharded clusters, different storage engines per shard

Percona In-memory storage engine is a good fit for _sessions- Based on the in-memory storage engine used in MongoDB Enterprise Edition- _sessions data are not persistent

Scaling Sitecore - ShardingWhat to shard:- Large collections for capacity

- Busy collections for load distribution

How to pick a shard key:- Collect a representative statement sample and identify statement patterns- Pick a shard key that scales the workload/statements- Meet sharding constraints

Scaling Sitecore - Sharding

From Sitecore documentation: “Sitecore calculates diskspace sizing projections using 5KB per interaction and 2.5KB per identified contact and these two items make up 80% of the diskspace”

Sharding interaction and contact for capacity.

Scaling Sitecore - ShardingCollection InteractionReceives: Inserts, Queries and Updates

Read/Write Ratio: 60-40

Updates are using the _id

Queries are using:

"_id, ContactId” : 80%

"ContactId,_t”: 5%

"ContactId,ContactVisitIndex”: 15%

Scaling Sitecore - ShardingCollection InteractionRecommended shard key is _id:1 or _id:hashed

- Scale vast majority of statements

- But… few scatter-gather queries (around 20%)

{ContactId:1} is also decent, But:

- Updates on sharded collections MUST use the shard key (or {multi:true}) - _id an exception to that rule

- _id is generated by the application not the driver

- Potential for Jumbo chunks

Scaling Sitecore - Sharding

Collection InteractionChoose your shard key according to your engine

- MMAP _id:1 or _id:hashed

- WiredTiger _id:1 or _id:hashed or ContactId:1

Sitecore may optimize sharding by including ContactId on the updates

Scaling Sitecore - ShardingCollection ContactsReceives: Inserts, Queries and UpdatesRead/Write Ratio: 80-20

Updates are using the _id

Queries are using the _id (with additional fields)

Recommended shard key is _id:1 or _id:hashed

Scaling Sitecore - ShardingCollection Devices

Recommended shard key is _id:1 or _id:hashed

Collection ClassificationsMap

Recommended shard key is _id:1 or _id:hashed

Collection KeyBehaviorCache

Recommended shard key is _id:1 or _id:hashed

Scaling Sitecore - ShardingCollection GeoIps

Recommended shard key is _id:1 or _id:hashed

Collection OperationStatuses

Recommended shard key is _id:1 or _id:hashed

Collection ReferringSites

Recommended shard key is _id:1 or _id:hashed

Scaling Sitecore - Sharding{_id:1} vs {_id:hashed}

Client generated _id are monotonically increased thus “hashed” added for randomness

Sitecore_id is a .NET UUID (Universally Unique Identifier) bundled on BinData datatype

Example: "_id" : BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==")

Scaling Sitecore - Sharding{_id:1} vs {_id:hashed}

You may use the uuidhelpers.js utility to convert _id to UUID

Download from: https://github.com/mongodb/mongo-csharp-driver/blob/master/uuidhelpers.js

>doc = db.test.findOne()

{ "_id" : BinData(3,"1eDJ1NXU8EeiD5a6WJtxbA==") }

>doc._id.toCSUUID()

CSUUID("d4c9e0d5-d4d5-47f0-a20f-96ba589b716c")

Scaling Sitecore - ShardingUse {_id:"hashed”} when you have an empty collection

Using numInitialChunks allows to pre-split and distribute empty chunks.- Avoid chunk splits- Avoid chunk moves

db.adminCommand( { shardCollection: <collection>, key: {_id:”hashed”} , numInitialChunks:<number>} ) , number < 8192 per shard.

Scaling Sitecore - ShardingUse {_id:"hashed”} when you have an empty collection

Define numInitialChunks Size= Collection size (in MB)/32Count= Number of documents/125000Limit= Number of shards*8192

numInitialChunks = Min(Max(Size, Count), Limit)

Scaling Sitecore - ShardingMove Primary

Move each sitecore database to a different shard:

(analytics, tracking_live …)

db.runCommand( { movePrimary: <databaseName>, to: <newPrimaryShard> } )

Requires downtime for live databases

Scaling Sitecore – Secondary ReadsYou can configure Secondary Reads from the driver (secondary or

secondaryPreferred)

connectionString="mongodb://_mongo_server_01_:_port_number_/_session_da

tabase_name_?readPreference=secondary/>

In 3.4 maxStalenessSeconds was introduced to control stale reads

Specifies, in seconds, how stale a secondary can be before the client stops using

it for read operations

Scaling Sitecore – Secondary ReadsUse ReplicaSet Tags to target reads:- Direct reads to specific replica set nodes- Reduces availability

conf = rs.conf();

conf.members[0].tags = {"db": "analytics"}

rs.reconfig(conf)

Set readPreferenceTags on the connection string connectionString="mongodb://_mongo_server_01_:_port_number_/_session_database_name_?readPreferenceTags=analytics/>

Order matters when setting multiple tagsOrder matters

Scaling Sitecore – Multi Region

Challenges:

- Direct reads to the closest node

- Direct writes to the closest node

- Single database entity for reporting

- Minimum complexity

Scaling Sitecore – Multi Region

Replica Set:- Target reads using nearest read concern

- Target reads using region based tags

- Writes must go to the Primary

- Requires at least one secondary per region

Scaling Sitecore – Multi RegionSharded cluster:

- Target reads using nearest read concern

- Target reads using region based tags

- Requires at least one secondary per region

- Writes must go to the Primaries

- Tags or Zones are based on shard key ranges

- Add location to shard key as prefix – change the source code

Scaling Sitecore – Multi Region

Mongo to Mongo connector:- Creates a pipeline from a MongoDB cluster to another

MongoDB cluster

- Reads and replicates oplog operations

- Easy deployment

mongo-connector -m <name:port> -t <name:port> -d <database>

Scaling Sitecore – Connector

oplog oplog

db.Insert.foo ({a:1})

db.Insert.foo ({_id:1, a:1})

{ "ts" : Timestamp(), "h" : NumLong(), "v" : 2, "op" : "i", "ns”:”foo.foo”, "o" : {

"_id" : 1, a:1}

Scaling Sitecore – Multi Region

Mongo to Mongo Connector

Scaling Sitecore – Multi Region

Mongo to Mongo Connector

Scaling Sitecore – Multi Region

Mongo to Mongo Connector

BenchmarksBenchmark 1: Single/Replica set MMAP vs Single shard/Replica set

WiredTiger (3.2.8)

Results: WiredTiger is 9.5% faster

Benchmark 2: Sharded cluster MMAP vs Sharded cluster

WiredTiger (Analytics sharded on {_id:1})

Results: WiredTiger is 9.4% faster

So what?

- Evaluate your MongoDB architecture to determine if it

would benefit from scaling

- If scaling is in order, consider this talk as a

reference

- Recognize how MongoDB’s versatility makes it

relevant to a wide variety of applications

Whats next?

- Test MongoRocks (Percona Server) against Sitecore

- Test In-Memory (Percona Server) for sessions or

cache(s)

- Expand sharding recommendations on add-ons

- Evaluate other Sitecore modules for suitability with

MongoDB

- Re-invent our benchmarks

We’re Hiring! Looking to join a dynamic & innovative team?

Justine is here at Percona Live 2017,

Reach out directly to our Recruiter at justine.marmolejo@rackspace.com

Questions?Thank you!!!

antonios.giannopoulos@rackspace.co.uk

@iamantonios

🍍

grant.killian@rackspace.com

@sitecoreagent