Upload
mongodb
View
1.331
Download
3
Embed Size (px)
Citation preview
Tales from the Field, or
True Stories (Anonymized), or
Don’t Solve The Wrong Problem
Richard Kreuter
Director of Consulting Engineering
MongoDB, Inc.
2
These Stories are (mostly) true.
Only the names have been
changed, to protect the (mostly)
innocent.
3
Roberta the Retailer had an ecommerce site,
selling diverse goods in 20+ countries.
Story 1: Roberta the Retailer
4
{
_id: 375
en_US : { name : ..., description : ..., <etc...> },
en_GB : { name : ..., description : ..., <etc...> },
fr_FR : { name : ..., description : ..., <etc...> },
de_DE : ...,
de_CH : ...,
<... and so on for other locales... >
}
Product Catalog (before)
5
• Each document contains all the data about the
product, across all possible locales.
• Most efficient way to retrieve the English, French,
German, etc. translations of a single product’s
information in one query.
What’s good about this solution?
6
But that’s not how product data
is used (except by translation staff, maybe).
7
db.catalog.find( { _id : 375 } , { en_US : true } );
db.catalog.find( { _id : 375 } , { fr_FR : true } );
db.catalog.find( { _id : 375 } , { de_DE : true } );
... and so forth for other locales ...
Dominant Catalog Queries
8
The Product Catalog's data
model didn't fit the way the data
are used.
9
• The catalog documents contained 20x more data
than any common use case demands
• MongoDB lets you request just a subset of a
document's contents (via the 2nd argument to find())...
• …but typically the whole document will get loaded
into RAM in order to serve the query.
Consequences for the catalog
10
Why is that an issue?
{ _id: 709, en_US : { name : ..., description : ..., <etc...> },en_GB : { name : ..., description : ..., <etc...> },fr_FR : { name : ..., description : ..., <etc...> },de_DE : ...,de_CH : ...,<... and so on for other locales... >
}
{ _id: 42, en_US : { name : ..., description : ..., <etc...> },en_GB : { name : ..., description : ..., <etc...> },fr_FR : { name : ..., description : ..., <etc...> },de_DE : ...,de_CH : ...,<... and so on for other locales... >
}
{ _id: 3600, en_US : { name : ..., description : ..., <etc...> },en_GB : { name : ..., description : ..., <etc...> },fr_FR : { name : ..., description : ..., <etc...> },de_DE : ...,de_CH : ...,<... and so on for other locales... >
}
Data in RED are being
used. Data in BLUE
take up memory but
aren't in demand.
11
So what's the right approach for
the problem?
12
99.99% of queries want the product data for exactly
one locale at a time.
Design for your use case
13
{ _id: "375-en_US",
name : ..., description : ..., <etc...> }
{ _id: "375-en_GB",
name : ..., description : ..., <etc...> }
{ _id: "375-fr_FR",
name : ..., description : ..., <etc...> }
... and so on for other locales ...
Product Catalog (after)
14
• Queries induced minimal memory overhead.
• 20x as many distinct products fit in RAM at once.
• Disk I/O utilization reduced
• UI latency decreased
• Profit (well, we hope)
Consequences of the redesign
15
Sal had some software for analyzing the day's trades.
Story #2: Sal the Securities Trader
16
sh.shardCollection ( "mydb.trades" ,
{ "analytics_serverid" : 1 } );
Sal's Shard Key (before)
17
Why did Sal pick this approach?
18
What's good about this architecture?
… 60 more
servers…
19
None of Sal's clients ever cared what server
analyzed the data.
Why the shard key was an issue
20
All queries became scatter/gather
21
Adding shards didn't help query
22
• MongoDB's sharding will automatically rebalance data as you add shards.
• But a low cardinality (few distinct values) shard key will inhibit balancing.
And there were subtler issues
23
Very nearly anything.
Really.
(Sal picked a local pessimum in the option space.)
What would have been a better shard
key?
24
• The common query patterns were based on
security_id and time.
• This compound, { sid : 1, ts : 1 } made a
good shard key.
What did we propose?
25
• Read throughput increased 500%.
• Balancing worked as expected.
The outcome: success!
26
Bill built a system that tracked status information for
entities in his business domain.
State changes happen in batches; sometimes 10%
of entities get updated, sometimes 100% get
updated.
Story #3: Bill the Bulk Updater
27
Bill's architecture
Application / mongosmongod
28
Bill's system was a success!
The number of business entities grew by a factor of 5.
What happened when it went into
production?
29
Bill's eventual architecture
Application / mongos
…16 more shards…
mongod
30
Bill's cluster scaled linearly!
(Bill's TCO scaled linearly, too.)
31
… and the usage was going to grow…
32
Horizontal Scaling = Linear Scaling
What problem did Bill overlook?
33
Scale up the random IOPS!
What we recommended
34
Bill's final architecture
Application / mongosmongod SSD
35
• Even smart people sometimes solve the wrong
problems:
– Roberta's products optimized for non-existent usage
– Sal solved for data ingestion, not query
– Bill went horizontal when vertical was needed
• Often, MongoDB's staff can tell you in advance
when you're going in the wrong direction…
• … and what you ought to do to help you get
where you need to arrive.
So what can we say
36
Thank you