36
Tales from the Field, or True Stories (Anonymized), or Don’t Solve The Wrong Problem Richard Kreuter Director of Consulting Engineering MongoDB, Inc.

Tales from the Field

  • Upload
    mongodb

  • View
    1.331

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Tales from the Field

Tales from the Field, or

True Stories (Anonymized), or

Don’t Solve The Wrong Problem

Richard Kreuter

Director of Consulting Engineering

MongoDB, Inc.

Page 2: Tales from the Field

2

These Stories are (mostly) true.

Only the names have been

changed, to protect the (mostly)

innocent.

Page 3: Tales from the Field

3

Roberta the Retailer had an ecommerce site,

selling diverse goods in 20+ countries.

Story 1: Roberta the Retailer

Page 4: Tales from the Field

4

{

_id: 375

en_US : { name : ..., description : ..., <etc...> },

en_GB : { name : ..., description : ..., <etc...> },

fr_FR : { name : ..., description : ..., <etc...> },

de_DE : ...,

de_CH : ...,

<... and so on for other locales... >

}

Product Catalog (before)

Page 5: Tales from the Field

5

• Each document contains all the data about the

product, across all possible locales.

• Most efficient way to retrieve the English, French,

German, etc. translations of a single product’s

information in one query.

What’s good about this solution?

Page 6: Tales from the Field

6

But that’s not how product data

is used (except by translation staff, maybe).

Page 7: Tales from the Field

7

db.catalog.find( { _id : 375 } , { en_US : true } );

db.catalog.find( { _id : 375 } , { fr_FR : true } );

db.catalog.find( { _id : 375 } , { de_DE : true } );

... and so forth for other locales ...

Dominant Catalog Queries

Page 8: Tales from the Field

8

The Product Catalog's data

model didn't fit the way the data

are used.

Page 9: Tales from the Field

9

• The catalog documents contained 20x more data

than any common use case demands

• MongoDB lets you request just a subset of a

document's contents (via the 2nd argument to find())...

• …but typically the whole document will get loaded

into RAM in order to serve the query.

Consequences for the catalog

Page 10: Tales from the Field

10

Why is that an issue?

{ _id: 709, en_US : { name : ..., description : ..., <etc...> },en_GB : { name : ..., description : ..., <etc...> },fr_FR : { name : ..., description : ..., <etc...> },de_DE : ...,de_CH : ...,<... and so on for other locales... >

}

{ _id: 42, en_US : { name : ..., description : ..., <etc...> },en_GB : { name : ..., description : ..., <etc...> },fr_FR : { name : ..., description : ..., <etc...> },de_DE : ...,de_CH : ...,<... and so on for other locales... >

}

{ _id: 3600, en_US : { name : ..., description : ..., <etc...> },en_GB : { name : ..., description : ..., <etc...> },fr_FR : { name : ..., description : ..., <etc...> },de_DE : ...,de_CH : ...,<... and so on for other locales... >

}

Data in RED are being

used. Data in BLUE

take up memory but

aren't in demand.

Page 11: Tales from the Field

11

So what's the right approach for

the problem?

Page 12: Tales from the Field

12

99.99% of queries want the product data for exactly

one locale at a time.

Design for your use case

Page 13: Tales from the Field

13

{ _id: "375-en_US",

name : ..., description : ..., <etc...> }

{ _id: "375-en_GB",

name : ..., description : ..., <etc...> }

{ _id: "375-fr_FR",

name : ..., description : ..., <etc...> }

... and so on for other locales ...

Product Catalog (after)

Page 14: Tales from the Field

14

• Queries induced minimal memory overhead.

• 20x as many distinct products fit in RAM at once.

• Disk I/O utilization reduced

• UI latency decreased

• Profit (well, we hope)

Consequences of the redesign

Page 15: Tales from the Field

15

Sal had some software for analyzing the day's trades.

Story #2: Sal the Securities Trader

Page 16: Tales from the Field

16

sh.shardCollection ( "mydb.trades" ,

{ "analytics_serverid" : 1 } );

Sal's Shard Key (before)

Page 17: Tales from the Field

17

Why did Sal pick this approach?

Page 18: Tales from the Field

18

What's good about this architecture?

… 60 more

servers…

Page 19: Tales from the Field

19

None of Sal's clients ever cared what server

analyzed the data.

Why the shard key was an issue

Page 20: Tales from the Field

20

All queries became scatter/gather

Page 21: Tales from the Field

21

Adding shards didn't help query

Page 22: Tales from the Field

22

• MongoDB's sharding will automatically rebalance data as you add shards.

• But a low cardinality (few distinct values) shard key will inhibit balancing.

And there were subtler issues

Page 23: Tales from the Field

23

Very nearly anything.

Really.

(Sal picked a local pessimum in the option space.)

What would have been a better shard

key?

Page 24: Tales from the Field

24

• The common query patterns were based on

security_id and time.

• This compound, { sid : 1, ts : 1 } made a

good shard key.

What did we propose?

Page 25: Tales from the Field

25

• Read throughput increased 500%.

• Balancing worked as expected.

The outcome: success!

Page 26: Tales from the Field

26

Bill built a system that tracked status information for

entities in his business domain.

State changes happen in batches; sometimes 10%

of entities get updated, sometimes 100% get

updated.

Story #3: Bill the Bulk Updater

Page 27: Tales from the Field

27

Bill's architecture

Application / mongosmongod

Page 28: Tales from the Field

28

Bill's system was a success!

The number of business entities grew by a factor of 5.

What happened when it went into

production?

Page 29: Tales from the Field

29

Bill's eventual architecture

Application / mongos

…16 more shards…

mongod

Page 30: Tales from the Field

30

Bill's cluster scaled linearly!

(Bill's TCO scaled linearly, too.)

Page 31: Tales from the Field

31

… and the usage was going to grow…

Page 32: Tales from the Field

32

Horizontal Scaling = Linear Scaling

What problem did Bill overlook?

Page 33: Tales from the Field

33

Scale up the random IOPS!

What we recommended

Page 34: Tales from the Field

34

Bill's final architecture

Application / mongosmongod SSD

Page 35: Tales from the Field

35

• Even smart people sometimes solve the wrong

problems:

– Roberta's products optimized for non-existent usage

– Sal solved for data ingestion, not query

– Bill went horizontal when vertical was needed

• Often, MongoDB's staff can tell you in advance

when you're going in the wrong direction…

• … and what you ought to do to help you get

where you need to arrive.

So what can we say

Page 36: Tales from the Field

36

Thank you