MongoDB Indexing: The Details

MongoDBIndexing and Query Optimizer

Details

Aaron Staple

MongoSV

December 3, 2010

What will we cover?

• Many details of how indexing and the query optimizer work

• A full understanding of these details is not required to use mongo, but this knowledge can be helpful when making optimizations.

• We’ll discuss functionality of Mongo 1.8 (for our purposes pretty similar to 1.6 and almost identical to 1.7 edge).

• Much of the material will be presented through examples.

• Diagrams are to aid understanding – some details will be left out.

What will we cover?

• Basic index bounds

• Compound key index bounds

• Or queries

• Automatic index selection

How will we cover it?

• We’re going to try and cover this material interactively - please volunteer your thoughts on what mongo should do in given scenarios when I ask.

• Pertinent questions are welcome, but please keep off topic or specialized questions until the end so we don’t lose momentum.

Btree (just a conceptual diagram)

{_id:4,x:6}

Basic Index Bounds

Find One Document

• db.c.find( {x:6} ).limit( 1 )

• Index {x:1}

Find One Document

1 2 3 4 5 6 7 8 9

{_id:4,x:6}

Find One Document>db.c.find( {x:6} ).limit( 1 ).explain()

"cursor" : "BtreeCursor x_1",

"nscanned" : 1,

"nscannedObjects" : 1,

"n" : 1,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

Find One Document

"indexBounds" : {

"x" : [

Find One Document

"nscanned" : 1,

"n" : 1,

Find One Document

{_id:4,x:6}

Find One Document

{_id:4,x:6}

Find One Document

1 2 3 4 5 6 6 6 9

{_id:4,x:6}

Now we have

duplicate x values

Find One Document

{_id:4,x:6}

Equality Match

• db.c.find( {x:6} )

• Index {x:1}

Equality Match

1 2 3 4 5 6 6 6

{_id:4,x:6} {_id:5,x:6}

{_id:1,x:6}

Equality Match>db.c.find( {x:6} ).explain()

"nscanned" : 3,

"n" : 3,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

Equality Match

"indexBounds" : {

"x" : [

Equality Match

"nscanned" : 3,

"n" : 3,

Equality Match

Full Document Matcher

• db.c.find( {x:6,y:1} )

• Index {x:1}

1 2 3 4 5 6 6 6

{y:4,x:6} {y:5,x:6}

{y:1,x:6}

Full Document Matcher>db.c.find( {x:6,y:1} ).explain()

"nscanned" : 3,

"n" : 1,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

"indexBounds" : {

"x" : [

"nscanned" : 3,

"n" : 1, Documents for all

matching keys

scanned, but only

one document

matched on non

index keys.

Range Match

• db.c.find( {x:{$gte:4,$lte:7}} )

• Index {x:1}

Range Match

1 2 3 4 5 6 7 9

4 <= ? <= 7

Range Match>db.c.find( {x:{$gte:4,$lte:7}} ).explain()

"nscanned" : 4,

"n" : 4,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

Range Match

"indexBounds" : {

"x" : [

Range Match

"nscanned" : 4,

"n" : 4,

Range Match

Exclusive Range Match

• db.c.find( {x:{$gt:4,$lt:7}} )

• Index {x:1}

1 2 3 4 5 6 7 9

4 < ? < 7

Exclusive Range Match>db.c.find( {x:{$gt:4,$lt:7}} ).explain()

"nscanned" : 2,

"n" : 2,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

"indexBounds" : {

"x" : [

Explain doesn’t

indicate that

the range is

exclusive.

"nscanned" : 2,

"n" : 2, But index keys

matching the

range bounds are

not scanned

because the

bounds are

exclusive.

Multikeys

• db.c.find( {x:{$gt:7}} )

• Index {x:1}

Multikeys

1 2 3 4 5 6 7 9

{_id:4,x:[8,9]}

Multikeys>db.c.find( {x:{$gt:7}} ).explain()

"nscanned" : 2,

"n" : 1,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : true,

"indexBounds" : {

"x" : [

1.7976931348623157e+308

Multikeys

"indexBounds" : {

"x" : [

1.7976931348623157e+308

Multikeys

"nscanned" : 2,

"n" : 1, All keys in valid

range are

scanned, but the

matcher rejects

duplicate

documents making

n == 1.

Multikeys

Range Types

• Explicit inequality

• db.c.find( {x:{$gt:4,$lt:7}} )

• db.c.find( {x:{$gt:4}} )

• db.c.find( {x:{$ne:4}} )

• Regular expression prefix

• db.c.find( {x:/^a/} )

• Data type

• db.c.find( {x:/a/} )

Range Types

db.c.find( {x:{$gt:4,$lt:7}} )

"indexBounds" : {

"x" : [

Range Types

db.c.find( {x:{$gt:4}} )

"indexBounds" : {

"x" : [

1.7976931348623157e+308

Range Types

db.c.find( {x:{$ne:4}} )

"indexBounds" : {

"x" : [

"$minElement" : 1

"$maxElement" : 1

Range Types

db.c.find( {x:/^a/} )

"indexBounds" : {"x" : [

["a","b"

/^a/,/^a/

Range Types

db.c.find( {x:/a/} )

"indexBounds" : {

"x" : [

Set Match

• db.c.find( {x:{$in:[3,6]}} )

• Index {x:1}

Set Match

1 2 3 4 5 6 7 9

Set Match>db.c.find( {x:{$in:[3,6]}} ).explain()

"cursor" : "BtreeCursor x_1 multi",

"nscanned" : 3,

"n" : 2,

"millis" : 8,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

Set Match

"indexBounds" : {

"x" : [

Set Match

"nscanned" : 3,

"n" : 2, Why is nscanned 3?

This is an

algorithmic detail

we’ll discuss more

later, but when there

are disjoint ranges

for a key nscanned

may be higher than

the number of

matching keys.

Set Match

All Match

• db.c.find( {x:{$all:[3,6]}} )

• Index {x:1}

All Match

1 2 3 4 5 6 7 9

{_id:4,x:[3,6]}

All Match>db.c.find( {x:{$all:[3,6]}} ).explain()

"nscanned" : 1,

"n" : 1,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

All Match

"indexBounds" : {

"x" : [

The first entry in the

$all match array is

always used for

index bounds. Note

this may not be the

least numerous

indexed value in the

$all array.

All Match

"nscanned" : 1,

"n" : 1,

All Match

• db.c.find( {x:{$lt:6},y:3} ).limit( 3 )

• Index {x:1}

1 2 3 4 5 6 7 9

y:3 y:1 y:3 y:3 y:3

Limit>db.c.find( {x:{$lt:6},y:3} ).limit( 3 ).explain()

"nscanned" : 4,

"n" : 3,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

-1.7976931348623157e+308,

"indexBounds" : {

"x" : [

-1.7976931348623157e+308,

"nscanned" : 4,

"n" : 3, Scan until three

matches are found,

then stop.

• db.c.find( {x:{$lt:6},y:3} ).skip( 3 )

• Index {x:1}

1 2 3 4 5 6 7 9

y:3 y:1 y:3 y:3 y:3

Skip>db.c.find( {x:{$lt:6},y:3} ).skip( 3 ).explain()

"nscanned" : 5,

"n" : 1,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

-1.7976931348623157e+308,

"indexBounds" : {

"x" : [

-1.7976931348623157e+308,

"nscanned" : 5,

"n" : 1, All skipped

documents are

scanned.

• db.c.find( {x:{$lt:6}} ).sort( {x:1} )

• Index {x:1}

1 2 3 4 5 6 7 9

y:3 y:1 y:3 y:3 y:3

Sort>db.c.find( {x:{$lt:6},y:3} ).sort( {x:1} ).explain()

"nscanned" : 5,

"n" : 4,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

-1.7976931348623157e+308,

• db.c.find( {x:{$lt:6}} ).sort( {y:1} )

• Index {x:1}

1 2 3 4 5 6 7 9

y:3 y:1 y:3 y:3 y:3

Sort>db.c.find( {x:{$lt:6},y:3} ).sort( {y:1} ).explain()

"nscanned" : 5,

"n" : 4,

"scanAndOrder" : true,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

-1.7976931348623157e+308,

"nscanned" : 5,

"n" : 4,

"scanAndOrder" : true,Results are sorted

on the fly to match

requested order.

The scanAndOrder

field is only printed

when its value is

Sort and scanAndOrder

• With “scanAndOrder” sort, all documents must be touched even if there is a limit spec.

• With scanAndOrder, sorting is performed in memory and the memory footprint is constrained by the limit spec if present.

• db.c.count( {x:{$gte:4,$lte:7}} )

• Index {x:1}

1 2 3 4 5 6 7 9

4 <= ? <= 7

We’re just counting

keys here, not

loading the full

documents.

• With some operators the full document must be checked. Some of these cases:• $all

• $size

• array match

• Negation - $ne, $nin, $not, etc.• With current semantics, all multikey elements must match

negation constraints

• Multikey de duplication works without loading full document

Covered Indexes

• db.c.find( {x:6}, {x:1,_id:0} )

• Index {x:1} Id would be returned

by default, but isn’t

in the index so we

need to exclude to

return only indexed

fields.

Covered Indexes

1 2 3 4 5 6 7 9

{_id:4,x:6}

Covered Indexes>db.c.find( {x:6}, {x:1,_id:0} ).explain()

"nscanned" : 1,

"n" : 1,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"indexOnly" : true,

"indexBounds" : {

"x" : [

Covered Indexes

"indexOnly" : true,

Covered Indexes

1 2 3 4 5 6 7 9

{_id:4,x:[6,7]}

Covered Indexes>db.c.find( {x:6}, {x:1,_id:0} ).explain()

"nscanned" : 1,

"n" : 1,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

Covered Indexes

"indexOnly" : false, Currently we set

isMultiKey to true the

first time we save a doc

where the field is a

multikey array. But

when all multikey docs

are removed we don’t

reset isMultiKey. This

can be improved.

Update

• db.c.find( {x:{$gte:4,$lte:7}}, {$set:{x:2}} )

• Index {x:1}

Update

1 2 3 4 5 6 7 9

4 <= ? <= 7

{_id:4,x:4}

Update

{_id:4,x:4}

Update

{_id:4,x:4}

Update

{_id:4,x:2}

Update

• We track the set of documents that have been updated in the course of the current operation so they are only updated once.

Compound Key Index Bounds

Two Equality Bounds

• db.c.find( ,x:5,y:’c’- )

• Index {x:1,y:1}

Two Equality Bounds

Two Equality Bounds>db.c.find( {x:5,y:'c'} ).explain()

"cursor" : "BtreeCursor x_1_y_1",

"nscanned" : 1,

"n" : 1,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

"y" : [

Two Equality Bounds"indexBounds" : {

"x" : [

"y" : [

Two Equality Bounds

"nscanned" : 1,

"n" : 1,

Two Equality Bounds

Equality and Set

• db.c.find( ,x:5,y:,$in:*’c’,’f’+-- )

• Index {x:1,y:1}

Equality and Set

Equality and Set>db.c.find( {x:5,y:{$in:['c','f']}} ).explain()

"cursor" : "BtreeCursor x_1_y_1 multi",

"nscanned" : 3,

"n" : 2,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

"y" : [

Equality and Set"indexBounds" : {

"x" : [

"y" : [

Equality and Set

"nscanned" : 3,

"n" : 2,

Equality and Set

Equality and Range

• db.c.find( ,x:5,y:,$gte:’d’-- )

• Index {x:1,y:1}

Equality and Range

<= ? <= 5d

5max string

Equality and Range>db.c.find( {x:5,y:{$gte:'d'}} ).explain()

"nscanned" : 2,

"n" : 2,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

"y" : [

Equality and Range"indexBounds" : {

"x" : [

"y" : [

Equality and Range

"nscanned" : 2,

"n" : 2,

Equality and Range

Two Set Bounds

• db.c.find( ,x:,$in:*5,9+-,y:,$in:*’c’,’f’+-- )

• Index {x:1,y:1}

Two Set Bounds

Two Set Bounds>db.c.find( {x:{$in:[5,9]},y:{$in:['c','f']}} ).explain()

"nscanned" : 5,

"n" : 3,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

"y" : [

Two Set Bounds"indexBounds" : {

"x" : [

"y" : [

Two Set Bounds

"nscanned" : 5,

"n" : 3,

Two Set Bounds

Set and Range

• db.c.find( ,x:,$in:*5,9+-,y:,$lte:’d’-- )

• Index {x:1,y:1}

Set and Range

<=?<=5min

string

minstring

Set and Range>db.c.find( {x:{$in:[5,9]},y:{$lte:'d'}} ).explain()

"nscanned" : 5,

"n" : 3,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

"y" : [

Set and Range"x" : [

"y" : [

Set and Range

"nscanned" : 5,

"n" : 3,

Range and Equality

• db.c.find( ,x:,$gte:4-,y:’c’- )

• Index {x:1,y:1}

Range and Equality

cand ?

Range and Equality>db.c.find( {x:{$gte:4},y:'c'} ).explain()

"nscanned" : 7,

"n" : 2,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

1.7976931348623157e+308

"y" : [

Range and Equality"indexBounds" : {

"x" : [

1.7976931348623157e+308

"y" : [

Range and Equality

"nscanned" : 7,

"n" : 2, High nscanned

because every

distinct value of x

must be checked.

Range and Equality

Every distinct value

of x must be

checked.

Range and Set

• db.c.find( ,x:,$gte:4-,y:,$in:*’c’,’a’+-- )

• Index {x:1,y:1}

Range and Set

cand ,

Range and Set>db.c.find( {x:{$gte:4},y:{$in:['c','a']}} ).explain()

"nscanned" : 7,

"n" : 3,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

1.7976931348623157e+308

"y" : [

Range and Set"indexBounds" : {

"x" : [

1.7976931348623157e+308

"y" : [

Range and Set

"nscanned" : 7,

"n" : 3,

Range and Set

Every distinct value

of x must be

checked for y values

‘a’ and ‘c’.

Two Ranges (2D Box)

• db.c.find( ,x:,$gte:3,$lte:7-,y:,$gte:’c’,$lte:’f’-- )

• Index {x:1,y:1}

Two Ranges (2D Box)

{x:{$gte:3,$lte:7},

y:,$gte:’c’,$lte:’f’--

Two Ranges (2D Box)

<=?<=7

f3 <=?<=

Two Ranges (2D Box)>db.c.find( {x:{$gte:3,$lte:7},y:{$gte:'c',$lte:'f'}} ).explain()

"nscanned" : 6,

"n" : 4,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

"y" : [

Two Ranges (2D Box)"indexBounds" : {

"x" : [

"y" : [

Two Ranges (2D Box)

"nscanned" : 6,

"n" : 4,

Two Ranges (2D Box)

<=?<=7

For every distinct value of x in this range

Scan for every value of y in this range

Disjoint $or Criteria

• db.c.find( ,$or:*,x:5-,,y:’d’-+- )

• Indexes {x:1}, {y:1}

Disjoint $or Criteria>db.c.find( {$or:[{x:5},{y:'d'}]} ).explain()

"clauses" : [

"nscanned" : 2,

"n" : 2,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

"cursor" : "BtreeCursor y_1",

"nscanned" : 2,

"n" : 1,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"y" : [

"nscanned" : 4,

"n" : 3,

"millis" : 1

Disjoint $or Criteria{

"nscanned" : 2,

"n" : 2,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

Disjoint $or Criteria{

"cursor" : "BtreeCursor y_1",

"nscanned" : 2,

"n" : 1,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"y" : [

Only return one

document matching

this clause.

"nscanned" : 4,

"n" : 3,

"millis" : 1

We have already

scanned the x index

for x:5. So this

document was

returned already. We

don’t return it again.

Unindexed $or Clause

• db.c.find( ,$or:*,x:5-,,y:’d’-+- )

• Index {x:1} (no index on y)

Unindexed $or Clause

>db.c.find( {$or:[{x:5},{y:'d'}]} ).explain()

"cursor" : "BasicCursor",

"nscanned" : 9,

"n" : 3,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

Since y is not indexed,

we must do a full

collection scan to

match y:’d’. Since a

full scan is required,

we don’t use the index

on x to match x:5.

Eliminated $or Clause

• db.c.find( {$or:[{x:{$gt:2,$lt:6}},{x:5}]} )

• Index {x:1}

81 2 3 4 6 7 95

2 < ? < 6

81 2 3 4 6 7 95

>db.c.find( {$or:[{x:{$gt:2,$lt:6}},{x:5}]} ).explain()

"nscanned" : 3,

"n" : 3,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

The index range of the

second clause is

included in the index

range of the first

clause, so we use the

first index range only.

Eliminated $or Clause with Differing Unindexed Criteria

• db.c.find( ,$or:*,x:,$gt:2,$lt:6-,y:’c’-,,x:5,y:'d’-+- )

• Index {x:1}

< ? <2 6 and c

5 and d

>db.c.find( ,$or:*,x:,$gt:2,$lt:6-,y:’c’-,,x:5,y:'d’-+- ).explain()

"nscanned" : 4,

"n" : 2,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

< ? <2 6 and c , d

The index range for the first clause contains the index

range for the second clause, so all matching is done

using the index range for the first clause.

Overlapping $or Clauses

• db.c.find( {$or:[{x:{$gt:2,$lt:6}},{x:{$gt:4,$lt:7}}]} )

• Index {x:1,y:1}

81 2 3 4 6 7 95

2 < ? < 6

81 2 3 4 6 7 95

4 < ? < 7

Overlapping $or Clauses>db.d.find( {$or:[{x:{$gt:2,$lt:6}},{x:{$gt:4,$lt:7}}]} ).explain()

"clauses" : [

"nscanned" : 3,

"n" : 3,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

"nscanned" : 1,

"n" : 1,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

"nscanned" : 4,

"n" : 4,

"millis" : 1

"nscanned" : 3,

"n" : 3,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

"nscanned" : 1,

"n" : 1,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

The index range

scanned for the

previous clause is

removed.

81 2 3 4 6 7 95

2 < ? < 6

81 2 3 4 7 95

6 <= ? < 7

2D Overlapping $or Clauses

• db.c.find( ,$or:*,x:,$gt:2,$lt:6-,y:,$gt:’b’,$lt:’f’--,,x:,$gt:4,$lt:7-,y:,$gt:’b’,$lt:’e’--+- )

• Index {x:1,y:1}

Clause 2

Clause 1

2D Overlapping $or Clauses>db.c.find( {$or:[{x:{$gt:2,$lt:6},y:{$gt:'b',$lt:'f'}},{x:{$gt:4,$lt:7},y:{$gt:'b',$lt:'e'}}]} ).explain()

"clauses" : [

"nscanned" : 4,

"n" : 3,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

"y" : [

"nscanned" : 0,

"n" : 0,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

"y" : [

"nscanned" : 4,

"n" : 3,

"millis" : 1

"nscanned" : 4,

"n" : 3,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

"y" : [

"nscanned" : 0,

"n" : 0,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"indexBounds" : {

"x" : [

"y" : [

The index range

scanned for the

previous clause is

removed.

Clause 2

We only have

to scan the

remainder

Clause 1

• Rule of thumb for n dimensions: We subtract earlier clause boxes from current box when the result is a/some box(es).

$or TODO

• Use indexes on $or fields to satisfy a sort specification SERVER-1205

• Use full query optimizer to select $or clause indexes in getMore SERVER-1215

• Improve index range elimination (handling some cases where remainder is not a box)

Automatic Index Selection

(Query Optimizer)

Optimal Index

• find( {x:5} )– Index {x:1}

– Index {x:1,y:1}

• find( {x:5} ).sort( {y:1 } )– Index {x:1,y:1}

• find( {} ).sort( {x:1} )– Index {x:1}

• find( {x:{$gt:1,$lt:7}} ).sort( {x:1} )– Index {x:1}

Optimal Index

• Rule of Thumb

– No scanAndOrder

– All fields with index useful constraints are indexed

– If there is a range or sort it is the last field of the index used to resolve the query

• If multiple optimal indexes exist, one chosen arbitrarily.

Optimal Index

• These same criteria are useful when you are designing your indexes.

Multiple Candidate Indexes

• find( ,x:4,y:’a’- )

– Index {x:1} or {y:1}?

• find( {x:4} ).sort( {y:1} )

– Index {x:1} or {y:1}?

– Note: {x:1,y:1} is optimal

• find( ,x:,$gt:2,$lt:7-,y:,$gt:’a’,$lt:’f’-- )

– Index {x:1,y:1} or {y:1,x:1}?

• The only index selection criterion is nscanned

• find( ,x:4,y:’a’- )

– Index {x:1} or {y:1} ?

– If fewer documents match {y:’a’- than ,x:4- then nscanned for {y:1} will be less so we pick {y:1}

• find( ,x:,$gt:2,$lt:7-,y:,$gt:’b’,$lt:’f’-- )

– Index {x:1,y:1} or {y:1,x:1} ?

– If fewer distinct values of 2 <x< 7 than distinct values of ‘b’ <y< ‘f’ then ,x:1,y:1- chosen (rule of thumb)

• The only index selection criterion is nscanned

• Pretty good, but doesn’t cover every case, eg

– Cost of scanAndOrdervs ordered index

– Cost of loading full document vs just index key

– Cost of scanning adjacent btree keys vs non adjacent keys/documents

Competing Indexes

• At most one query plan per index

• Run in interleaved fashion

• Plans kept in a priority queue ordered by nscanned. We always continue progress on plan with lowest nscanned.

Competing Indexes

• Run until one plan returns all results or enough results to satisfy the initial query request (based on soft limit spec / data size requirement for initial query).

• We only allow plans to compete in initial query. In getMore, we continue reading from the index cursor established by the initial query.

“Learning” a Query Plan

• When an index is chosen for a query the query’s “pattern” and nscanned are recorded

– find( ,x:3,y:’c’- )

• {Pattern: {x:’equality’, y:’equality’-, Index: ,x:1-, nscanned: 50}

– find( ,x:,$gt:5-,y:,$lt:’z’-- )

• {Pattern: {x:’gt bound’, y:’lt bound’-, Index: ,y:1-, nscanned: 500}

“Learning” a Query Plan

• When a new query matches the same pattern, the same query plan is used

– find( ,x:5,y:’z’- )

• Use index {x:1}

– find( ,x:,$gt:20-,y:,$lt:’b’-- )

• Use index {y:1}

“Un-Learning” a Query Plan

• 100 writes to the collection

• Indexes added / removed

Bad Plan Insurance

• If nscanned for a new query using a recorded plan is much worse than the recorded nscanned for an earlier query with the same pattern, we start interleaving other plans with the current plan.

• Currently “much worse” means 10x

Query Planner

• Ad hoc heuristics in some cases

• Seem to work decently in practice

Feedback

• Large and small scale optimizer features are generally prioritized based on user input.

• Please use jira to request new features and vote on existing feature requests.

Thanks!

Feature Requests

jira.mongodb.org

Support

groups.google.com/group/mongodb-user

Next up:

Sharding Details with Eliot

MongoDB Indexing: The Details

Technology

Шардинг в MongoDB, Henrik Ingo (MongoDB)

By Prof. Bhavana A.Khivsara - WordPress.com · Introduction to MongoDB Installation in Windows Starting MongoDB in Windows Basic Operations CRUD Operations Indexing Aggregation XAMPP

MongoDB Profiler Deep Dive; MongoDB Austin 2013

INDEXING* INDEXING*

Indexing Details

MongoDB Europe 2016 - Distributed Ledgers, Blockchain + MongoDB

RTree Spatial Indexing with MongoDB - MongoDC

MongoDB Days Germany: Data Processing with MongoDB

MongoDB Days UK: MongoDB and Spark

MongoDB World 2016: MongoDB + Google Cloud

Automate MongoDB with MongoDB Management Service (MMS)

Indexing In MongoDB

MongoDB Days Silicon Valley: Introducing MongoDB 3.2

Indexing & retrieval. Approaches to indexing Key word indexing Concept indexing Social indexing Non-text indexing

FamilySearch Indexing : Indexing - LDS

MongoDB Europe 2016 - Graph Operations with MongoDB

Indexing Big Data 30,000 Foot View of Databases Big data ...wsga.sandia.gov/docs/Bender.streaming.pdf · Tokutek’s high-performance MySQL and MongoDB. File System MySQL Database--

Realtime Analytics with MongoDB - MongoDB Meetup NYC

MongoDB Europe 2016 - MongoDB 3.4 preview and introduction to MongoDB Atlas

MongoDB and using MongoDB with .NET