View
6
Download
0
Category
Preview:
Citation preview
High Performance NoSQL with MongoDB
Michael Kennedy | @mkennedy | michaelckennedy.net
History of NoSQLJune 11th, 2009, San Francisco, USA
Johan Oskarsson (from http://last.fm/) organized a meetup to discuss advances in data storage which were all using distributed databases leveraging clusters. He asked the group for a short term they could use as a hashtag. [1]
Eric Evans (not of DDD fame) proposed #NoSQL and it stuck.
Michael Kennedy | @mkennedy | michaelckennedy.net
Michael's NoSQL Definition
Database systems which are cluster-friendly and which trade inter-entity relationships for both simplicity and performance.
Michael Kennedy | @mkennedy | michaelckennedy.net
Four types of "NoSQL" DBs• Key Value Stores
– Amazon DynamoDB– Redis
• Column-Oriented databases– Hbase– Cassandra– Google BigQuery
• Graph Databases– Neo4J– OrientDB
• Document Databases– MongoDB– CouchDB– DocumentDB (on Azure)
Michael Kennedy | @mkennedy | michaelckennedy.net
Key-value data storage
Michael Kennedy | @mkennedy | michaelckennedy.net
Column Oriented DBs
Michael Kennedy | @mkennedy | michaelckennedy.net
Graph DBs
Michael Kennedy | @mkennedy | michaelckennedy.net
Document DBs
Michael Kennedy | @mkennedy | michaelckennedy.net
Not so different
Michael Kennedy | @mkennedy | michaelckennedy.net
How much do you need perf?
Image credit: nerovivo
Michael Kennedy | @mkennedy | michaelckennedy.net
Relational 3NF models are complex
Michael Kennedy | @mkennedy | michaelckennedy.net
Document DBs for simplicity
Document db style
Michael Kennedy | @mkennedy | michaelckennedy.net
Document DBs for simplicity
Document db style
Michael Kennedy | @mkennedy | michaelckennedy.net
Single server performance
Single biggest performance problem (and fix)?
Incorrect indexes(too few or too many)
Michael Kennedy | @mkennedy | michaelckennedy.net
• Be data-driven: profile and then add indexesAdding indexes
Michael Kennedy | @mkennedy | michaelckennedy.net
• Indexes are more important than for RDBMSes
Adding indexes
Michael Kennedy | @mkennedy | michaelckennedy.net
Demo time
Michael Kennedy | @mkennedy | michaelckennedy.net
Step 1: Enable profiling
Michael Kennedy | @mkennedy | michaelckennedy.net
Step 2: Run common queries
Michael Kennedy | @mkennedy | michaelckennedy.net
Step 3: Analyze system.profile
Michael Kennedy | @mkennedy | michaelckennedy.net
Step 4: Add indexes for slow
Michael Kennedy | @mkennedy | michaelckennedy.net
Step 5: GOTO 1
Michael Kennedy | @mkennedy | michaelckennedy.net
Scaling out
Image credit: johnantoniImage credit: Torkild Retvedt
Michael Kennedy | @mkennedy | michaelckennedy.net
Scaling out• Scale-out is the great promise of NoSQL• MongoDB has two modes of scale out
– Sharding – Replication
Real-word statistics from one company
120,000 DB operations / second2GB of app-to-db I/O / second
Michael Kennedy | @mkennedy | michaelckennedy.net
Replication vs. scalability• Sharding is the primary way to improve single query speed• Replication is not the primary way to scale
– even though you may get better read performance, not much better write performance unless very read heavy
Server 1A-B-C-D-E
Server 4A-B-C-D-E
Server 2A-B-C-D-E
Server 3A-B-C-D-E
Server 5A-B-C-D-E
Server 1A
Server 4D
Server 2B
Server 3C
Server 5E
Replication Sharding
Michael Kennedy | @mkennedy | michaelckennedy.net
Sharding
...
Michael Kennedy | @mkennedy | michaelckennedy.net
Weather data from the entire 20th century in MongoDBCase study by MongoDB Inc:
http://www.mongodb.com/presentations/weather-century-part-2-high-performance
Scaling via Sharding – an example
Michael Kennedy | @mkennedy | michaelckennedy.net
• 2.5billiondatapoints• 4Terabyte(1.6kperdocument)
Data size and quantity
Michael Kennedy | @mkennedy | michaelckennedy.net
{ "st" : "u725053","ts" : ISODate("2013-06-03T22:51:00Z"),"airTemperature" : {
"value" : 21.1,"quality" : "5"
},"atmosphericPressure" : {
"value" : 1009.7,"quality" : "5"
}}
Sample record (JSON)
Michael Kennedy | @mkennedy | michaelckennedy.net
class WeatherRecord{
public string st {get; set;}public DateTime ts {get; set;}public Temp airTemperature {get; set;}public Pressure atmosphericPressure {get; set;}
}
class Temp{
public int value {get; set;}public string quality {get; set;}
}class Pressure{
public int value {get; set;}public string quality {get; set;}
}
Sample record in C#
Michael Kennedy | @mkennedy | michaelckennedy.net
Asingleserverwithareallybigdisk
Application mongod
i2.8xlarge251 GB RAM
6 TB SSD
c3.8xlarge
Scale Up
Michael Kennedy | @mkennedy | michaelckennedy.net
AreallybigclusterwhereeverythingisinRAM
Application / mongos
...100 x r3.2xlarge
61 GB RAM@
100 GB disk
mongod
c3.8xlarge
Scale out configuration
Michael Kennedy | @mkennedy | michaelckennedy.net
AreallybigclusterwhereeverythingisinRAM
Application / mongos
...100 x r3.2xlarge
61 GB RAM@
100 GB disk
mongod
Can scale even more
Michael Kennedy | @mkennedy | michaelckennedy.net
...
$60,000 / yr
$700,000 / yr
Cost per year in AWS?
Michael Kennedy | @mkennedy | michaelckennedy.net
0
0.5
1
1.5
2
single server cluster
ms avg
95th
99th
max. throughput: 40,000/s 610,000/s
(10 mongos)
db.data.find({"st" : "u747940","ts" : ISODate("1969-07-16T12:00:00Z")})
Performance: single time and place
Michael Kennedy | @mkennedy | michaelckennedy.net
0
1000
2000
3000
4000
5000
single server cluster
ms avg
95th
99th
max.throughput: 20/s 430/s
(10 mongos)
targeted query
db.data.find({"st" : "u747940","ts" : {"$gte": ISODate("1989-01-01"),
"$lt" : ISODate("1990-01-01")}})
Performance: 1 year's weather
Michael Kennedy | @mkennedy | michaelckennedy.net
61.8 °C = 143 °F
2 minCluster
4 h 45 minSingle Server
142x faster
db.data.aggregate([{ "$match" : { "airTemperature.quality" :
{ "$in" : [ "1", "5" ] } } },
{ "$group" : { "_id" : null,"maxTemp" : { "$max" :
"$airTemperature.value" } } }])
Analytics
Michael Kennedy | @mkennedy | michaelckennedy.net
Get the code and data
https://github.com/mikeckennedy/sdd2016
Michael Kennedy | @mkennedy | michaelckennedy.net
talkpython.fm
Want to go deeper?
training.talkpython.fm
michaelckennedy.netmikeckennedy@gmail.com
@mkennedy
Recommended