MongoDB Indexing: The Details

Preview:

DESCRIPTION

Aaron Staple's presentation from MongoSV

Citation preview

MongoDBIndexing and Query Optimizer

Details

Aaron Staple

MongoSV

December 3, 2010

What will we cover?

• Many details of how indexing and the query optimizer work

• A full understanding of these details is not required to use mongo, but this knowledge can be helpful when making optimizations.

• We’ll discuss functionality of Mongo 1.8 (for our purposes pretty similar to 1.6 and almost identical to 1.7 edge).

• Much of the material will be presented through examples.

• Diagrams are to aid understanding – some details will be left out.

What will we cover?

• Basic index bounds

• Compound key index bounds

• Or queries

• Automatic index selection

How will we cover it?

• We’re going to try and cover this material interactively - please volunteer your thoughts on what mongo should do in given scenarios when I ask.

• Pertinent questions are welcome, but please keep off topic or specialized questions until the end so we don’t lose momentum.

Btree (just a conceptual diagram)

1

2

3 4

5

6

7

8 9

{_id:4,x:6}

Basic Index Bounds

Find One Document

• db.c.find( {x:6} ).limit( 1 )

• Index {x:1}

Find One Document

1 2 3 4 5 6 7 8 9

6 ?

{_id:4,x:6}

Find One Document>db.c.find( {x:6} ).limit( 1 ).explain()

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 1,

"nscannedObjects" : 1,

"n" : 1,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

6,

6

]

]

}

}

Find One Document

"indexBounds" : {

"x" : [

[

6,

6

]

]

}

Find One Document

"nscanned" : 1,

"nscannedObjects" : 1,

"n" : 1,

Find One Document

1

2

3 4

5

6

7

8 9

6 ?

{_id:4,x:6}

Find One Document

1

2

3 4

5

6

7

8 9

6 ?

{_id:4,x:6}

Find One Document

1 2 3 4 5 6 6 6 9

6 ?

{_id:4,x:6}

Now we have

duplicate x values

Find One Document

1

2

3 4

5

6

6

6 9

6 ?

{_id:4,x:6}

Equality Match

• db.c.find( {x:6} )

• Index {x:1}

9

Equality Match

1 2 3 4 5 6 6 6

6 ?

{_id:4,x:6} {_id:5,x:6}

{_id:1,x:6}

Equality Match>db.c.find( {x:6} ).explain()

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 3,

"nscannedObjects" : 3,

"n" : 3,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

6,

6

]

]

}

}

Equality Match

"indexBounds" : {

"x" : [

[

6,

6

]

]

}

Equality Match

"nscanned" : 3,

"nscannedObjects" : 3,

"n" : 3,

Equality Match

1

2

3 4

5

6

6

6 9

6 ?

Full Document Matcher

• db.c.find( {x:6,y:1} )

• Index {x:1}

9

Full Document Matcher

1 2 3 4 5 6 6 6

6 ?

{y:4,x:6} {y:5,x:6}

{y:1,x:6}

Full Document Matcher>db.c.find( {x:6,y:1} ).explain()

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 3,

"nscannedObjects" : 3,

"n" : 1,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

6,

6

]

]

}

}

Full Document Matcher

"indexBounds" : {

"x" : [

[

6,

6

]

]

}

Full Document Matcher

"nscanned" : 3,

"nscannedObjects" : 3,

"n" : 1, Documents for all

matching keys

scanned, but only

one document

matched on non

index keys.

Range Match

• db.c.find( {x:{$gte:4,$lte:7}} )

• Index {x:1}

8

Range Match

1 2 3 4 5 6 7 9

4 <= ? <= 7

Range Match>db.c.find( {x:{$gte:4,$lte:7}} ).explain()

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 4,

"nscannedObjects" : 4,

"n" : 4,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

4,

7

]

]

}

}

Range Match

"indexBounds" : {

"x" : [

[

4,

7

]

]

Range Match

"nscanned" : 4,

"nscannedObjects" : 4,

"n" : 4,

Range Match

1

2

3 4

5

6

7

8 9

Exclusive Range Match

• db.c.find( {x:{$gt:4,$lt:7}} )

• Index {x:1}

8

Exclusive Range Match

1 2 3 4 5 6 7 9

4 < ? < 7

Exclusive Range Match>db.c.find( {x:{$gt:4,$lt:7}} ).explain()

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 2,

"nscannedObjects" : 2,

"n" : 2,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

4,

7

]

]

}

}

Exclusive Range Match

"indexBounds" : {

"x" : [

[

4,

7

]

]

}

Explain doesn’t

indicate that

the range is

exclusive.

Exclusive Range Match

"nscanned" : 2,

"nscannedObjects" : 2,

"n" : 2, But index keys

matching the

range bounds are

not scanned

because the

bounds are

exclusive.

Exclusive Range Match

1

2

3 4

5

6

7

8 9

Multikeys

• db.c.find( {x:{$gt:7}} )

• Index {x:1}

Multikeys

1 2 3 4 5 6 7 9

? > 7

{_id:4,x:[8,9]}

8

Multikeys>db.c.find( {x:{$gt:7}} ).explain()

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 2,

"nscannedObjects" : 2,

"n" : 1,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : true,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

7,

1.7976931348623157e+308

]

]

}

}

Multikeys

"indexBounds" : {

"x" : [

[

7,

1.7976931348623157e+308

]

]

}

Multikeys

"nscanned" : 2,

"nscannedObjects" : 2,

"n" : 1, All keys in valid

range are

scanned, but the

matcher rejects

duplicate

documents making

n == 1.

Multikeys

1

2

3 4

5

6

7

8 9

Range Types

• Explicit inequality

• db.c.find( {x:{$gt:4,$lt:7}} )

• db.c.find( {x:{$gt:4}} )

• db.c.find( {x:{$ne:4}} )

• Regular expression prefix

• db.c.find( {x:/^a/} )

• Data type

• db.c.find( {x:/a/} )

Range Types

db.c.find( {x:{$gt:4,$lt:7}} )

"indexBounds" : {

"x" : [

[

4,

7

]

]

}

Range Types

db.c.find( {x:{$gt:4}} )

"indexBounds" : {

"x" : [

[

4,

1.7976931348623157e+308

]

]

}

Range Types

db.c.find( {x:{$ne:4}} )

"indexBounds" : {

"x" : [

[

{

"$minElement" : 1

},

4

],

[

4,

{

"$maxElement" : 1

}

]

]

}

Range Types

db.c.find( {x:/^a/} )

"indexBounds" : {"x" : [

["a","b"

],[

/^a/,/^a/

]]

}

Range Types

db.c.find( {x:/a/} )

"indexBounds" : {

"x" : [

[

"",

{

}

],

[

/a/,

/a/

]

]

}

Set Match

• db.c.find( {x:{$in:[3,6]}} )

• Index {x:1}

8

Set Match

1 2 3 4 5 6 7 9

3 , 6

Set Match>db.c.find( {x:{$in:[3,6]}} ).explain()

{

"cursor" : "BtreeCursor x_1 multi",

"nscanned" : 3,

"nscannedObjects" : 2,

"n" : 2,

"millis" : 8,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

3,

3

],

[

6,

6

]

]

}

}

Set Match

"indexBounds" : {

"x" : [

[

3,

3

],

[

6,

6

]

]

}

Set Match

"nscanned" : 3,

"nscannedObjects" : 2,

"n" : 2, Why is nscanned 3?

This is an

algorithmic detail

we’ll discuss more

later, but when there

are disjoint ranges

for a key nscanned

may be higher than

the number of

matching keys.

Set Match

1

2

3 4

5

6

7

8 9

All Match

• db.c.find( {x:{$all:[3,6]}} )

• Index {x:1}

8

All Match

1 2 3 4 5 6 7 9

3 ?

{_id:4,x:[3,6]}

All Match>db.c.find( {x:{$all:[3,6]}} ).explain()

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 1,

"nscannedObjects" : 1,

"n" : 1,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : true,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

3,

3

]

]

}

}

All Match

"indexBounds" : {

"x" : [

[

3,

3

]

]

}

The first entry in the

$all match array is

always used for

index bounds. Note

this may not be the

least numerous

indexed value in the

$all array.

All Match

"nscanned" : 1,

"nscannedObjects" : 1,

"n" : 1,

All Match

1

2

3 4

5

6

7

8 9

Limit

• db.c.find( {x:{$lt:6},y:3} ).limit( 3 )

• Index {x:1}

8

Limit

1 2 3 4 5 6 7 9

6? <

y:3 y:1 y:3 y:3 y:3

Limit>db.c.find( {x:{$lt:6},y:3} ).limit( 3 ).explain()

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 4,

"nscannedObjects" : 4,

"n" : 3,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : true,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

-1.7976931348623157e+308,

6

]

]

}

}

Limit

"indexBounds" : {

"x" : [

[

-1.7976931348623157e+308,

6

]

]

}

Limit

"nscanned" : 4,

"nscannedObjects" : 4,

"n" : 3, Scan until three

matches are found,

then stop.

Skip

• db.c.find( {x:{$lt:6},y:3} ).skip( 3 )

• Index {x:1}

8

Skip

1 2 3 4 5 6 7 9

6? <

y:3 y:1 y:3 y:3 y:3

Skip>db.c.find( {x:{$lt:6},y:3} ).skip( 3 ).explain()

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 5,

"nscannedObjects" : 5,

"n" : 1,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : true,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

-1.7976931348623157e+308,

6

]

]

}

}

Skip

"indexBounds" : {

"x" : [

[

-1.7976931348623157e+308,

6

]

]

}

Skip

"nscanned" : 5,

"nscannedObjects" : 5,

"n" : 1, All skipped

documents are

scanned.

Sort

• db.c.find( {x:{$lt:6}} ).sort( {x:1} )

• Index {x:1}

8

Sort

1 2 3 4 5 6 7 9

6? <

y:3 y:1 y:3 y:3 y:3

Sort>db.c.find( {x:{$lt:6},y:3} ).sort( {x:1} ).explain()

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 5,

"nscannedObjects" : 5,

"n" : 4,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : true,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

-1.7976931348623157e+308,

6

]

]

}

}

Sort

"cursor" : "BtreeCursor x_1",

Sort

• db.c.find( {x:{$lt:6}} ).sort( {y:1} )

• Index {x:1}

8

Sort

1 2 3 4 5 6 7 9

6? <

y:3 y:1 y:3 y:3 y:3

Sort>db.c.find( {x:{$lt:6},y:3} ).sort( {y:1} ).explain()

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 5,

"nscannedObjects" : 5,

"n" : 4,

"scanAndOrder" : true,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : true,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

-1.7976931348623157e+308,

6

]

]

}

}

Sort

"cursor" : "BtreeCursor x_1",

"nscanned" : 5,

"nscannedObjects" : 5,

"n" : 4,

"scanAndOrder" : true,Results are sorted

on the fly to match

requested order.

The scanAndOrder

field is only printed

when its value is

true.

Sort and scanAndOrder

• With “scanAndOrder” sort, all documents must be touched even if there is a limit spec.

• With scanAndOrder, sorting is performed in memory and the memory footprint is constrained by the limit spec if present.

Count

• db.c.count( {x:{$gte:4,$lte:7}} )

• Index {x:1}

8

Count

1 2 3 4 5 6 7 9

4 <= ? <= 7

Count

1

2

3 4

5

6

7

8 9

We’re just counting

keys here, not

loading the full

documents.

Count

• With some operators the full document must be checked. Some of these cases:• $all

• $size

• array match

• Negation - $ne, $nin, $not, etc.• With current semantics, all multikey elements must match

negation constraints

• Multikey de duplication works without loading full document

Covered Indexes

• db.c.find( {x:6}, {x:1,_id:0} )

• Index {x:1} Id would be returned

by default, but isn’t

in the index so we

need to exclude to

return only indexed

fields.

8

Covered Indexes

1 2 3 4 5 6 7 9

6 ?

{_id:4,x:6}

Covered Indexes>db.c.find( {x:6}, {x:1,_id:0} ).explain()

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 1,

"nscannedObjects" : 1,

"n" : 1,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : true,

"indexBounds" : {

"x" : [

[

6,

6

]

]

}

}

Covered Indexes

"isMultiKey" : false,

"indexOnly" : true,

8

Covered Indexes

1 2 3 4 5 6 7 9

6 ?

{_id:4,x:[6,7]}

Covered Indexes>db.c.find( {x:6}, {x:1,_id:0} ).explain()

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 1,

"nscannedObjects" : 1,

"n" : 1,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : true,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

6,

6

]

]

}

}

Covered Indexes

"isMultiKey" : true,

"indexOnly" : false, Currently we set

isMultiKey to true the

first time we save a doc

where the field is a

multikey array. But

when all multikey docs

are removed we don’t

reset isMultiKey. This

can be improved.

Update

• db.c.find( {x:{$gte:4,$lte:7}}, {$set:{x:2}} )

• Index {x:1}

8

Update

1 2 3 4 5 6 7 9

4 <= ? <= 7

{_id:4,x:4}

Update

1

2

3 4

5

6

7

8 9

{_id:4,x:4}

Update

1

2

3 4

5

6

7

8 9

{_id:4,x:4}

Update

1

2

2 3

5

6

7

8 9

{_id:4,x:2}

Update

• We track the set of documents that have been updated in the course of the current operation so they are only updated once.

Compound Key Index Bounds

Two Equality Bounds

• db.c.find( ,x:5,y:’c’- )

• Index {x:1,y:1}

Two Equality Bounds

?5c

1b

3d

4g

5d

5f

6c

7a

9b

5c

Two Equality Bounds>db.c.find( {x:5,y:'c'} ).explain()

{

"cursor" : "BtreeCursor x_1_y_1",

"nscanned" : 1,

"nscannedObjects" : 1,

"n" : 1,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

5,

5

]

],

"y" : [

[

"c",

"c"

]

]

}

}

Two Equality Bounds"indexBounds" : {

"x" : [

[

5,

5

]

],

"y" : [

[

"c",

"c"

]

]

}

}

Two Equality Bounds

"nscanned" : 1,

"nscannedObjects" : 1,

"n" : 1,

Two Equality Bounds

?

1b

3d

4g

5c

5d

5f

5c

6c

7a

9b

Equality and Set

• db.c.find( ,x:5,y:,$in:*’c’,’f’+-- )

• Index {x:1,y:1}

Equality and Set

,5c

1b

3d

4g

5d

5f

6c

7a

9b

5c

5f

Equality and Set>db.c.find( {x:5,y:{$in:['c','f']}} ).explain()

{

"cursor" : "BtreeCursor x_1_y_1 multi",

"nscanned" : 3,

"nscannedObjects" : 2,

"n" : 2,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

5,

5

]

],

"y" : [

[

"c",

"c"

],

[

"f",

"f"

]

]

}

}

Equality and Set"indexBounds" : {

"x" : [

[

5,

5

]

],

"y" : [

[

"c",

"c"

],

[

"f",

"f"

]

]

}

Equality and Set

"nscanned" : 3,

"nscannedObjects" : 2,

"n" : 2,

Equality and Set

1b

3d

4g

5c

5d

5f

6c

7a

9b

Equality and Range

• db.c.find( ,x:5,y:,$gte:’d’-- )

• Index {x:1,y:1}

Equality and Range

1b

3d

4g

5d

5f

6c

7a

9b

5c

<= ? <= 5d

5max string

Equality and Range>db.c.find( {x:5,y:{$gte:'d'}} ).explain()

{

"cursor" : "BtreeCursor x_1_y_1",

"nscanned" : 2,

"nscannedObjects" : 2,

"n" : 2,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

5,

5

]

],

"y" : [

[

"d",

{

}

]

]

}

}

Equality and Range"indexBounds" : {

"x" : [

[

5,

5

]

],

"y" : [

[

"d",

{

}

]

]

}

Equality and Range

"nscanned" : 2,

"nscannedObjects" : 2,

"n" : 2,

Equality and Range

1b

3d

4g

5c

5d

5f

6c

7a

9b

Two Set Bounds

• db.c.find( ,x:,$in:*5,9+-,y:,$in:*’c’,’f’+-- )

• Index {x:1,y:1}

Two Set Bounds

,5c

1b

3d

4g

5d

5f

6c

7a

9f

5c

5f ,9

c9

f,

Two Set Bounds>db.c.find( {x:{$in:[5,9]},y:{$in:['c','f']}} ).explain()

{

"cursor" : "BtreeCursor x_1_y_1 multi",

"nscanned" : 5,

"nscannedObjects" : 3,

"n" : 3,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

5,

5

],

[

9,

9

]

],

"y" : [

[

"c",

"c"

],

[

"f",

"f"

]

]

}

Two Set Bounds"indexBounds" : {

"x" : [

[

5,

5

],

[

9,

9

]

],

"y" : [

[

"c",

"c"

],

[

"f",

"f"

]

]

}

Two Set Bounds

"nscanned" : 5,

"nscannedObjects" : 3,

"n" : 3,

Two Set Bounds

1b

3d

4g

5c

5d

5f

6c

7a

9f

Set and Range

• db.c.find( ,x:,$in:*5,9+-,y:,$lte:’d’-- )

• Index {x:1,y:1}

Set and Range

<=?<=5min

string

1b

3d

4g

5d

5f

6c

9a

9f

5c

5d

9d, 9

minstring

<=?<=

Set and Range>db.c.find( {x:{$in:[5,9]},y:{$lte:'d'}} ).explain()

{

"cursor" : "BtreeCursor x_1_y_1 multi",

"nscanned" : 5,

"nscannedObjects" : 3,

"n" : 3,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

5,

5

],

[

9,

9

]

],

"y" : [

[

"",

"d"

]

]

}

}

Set and Range"x" : [

[

5,

5

],

[

9,

9

]

],

"y" : [

[

"",

"d"

]

]

}

Set and Range

"nscanned" : 5,

"nscannedObjects" : 3,

"n" : 3,

Range and Equality

• db.c.find( ,x:,$gte:4-,y:’c’- )

• Index {x:1,y:1}

Range and Equality

? >=4

1b

3d

4g

5d

6a

7e

9f

5c

cand ?

8c

Range and Equality>db.c.find( {x:{$gte:4},y:'c'} ).explain()

{

"cursor" : "BtreeCursor x_1_y_1",

"nscanned" : 7,

"nscannedObjects" : 2,

"n" : 2,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

4,

1.7976931348623157e+308

]

],

"y" : [

[

"c",

"c"

]

]

}

}

Range and Equality"indexBounds" : {

"x" : [

[

4,

1.7976931348623157e+308

]

],

"y" : [

[

"c",

"c"

]

]

}

Range and Equality

"nscanned" : 7,

"nscannedObjects" : 2,

"n" : 2, High nscanned

because every

distinct value of x

must be checked.

Range and Equality

1b

3d

4g

5c

5d

9f

6a

7e

8c

Range and Equality

1b

3d

4g

5c

5d

9f

6a

7e

8c

Every distinct value

of x must be

checked.

Range and Set

• db.c.find( ,x:,$gte:4-,y:,$in:*’c’,’a’+-- )

• Index {x:1,y:1}

Range and Set

? >=4

1b

3d

4g

5d

6a

7e

9f

5c

cand ,

8c

a

Range and Set>db.c.find( {x:{$gte:4},y:{$in:['c','a']}} ).explain()

{

"cursor" : "BtreeCursor x_1_y_1 multi",

"nscanned" : 7,

"nscannedObjects" : 3,

"n" : 3,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

4,

1.7976931348623157e+308

]

],

"y" : [

[

"a",

"a"

],

[

"c",

"c"

]

]

}

}

Range and Set"indexBounds" : {

"x" : [

[

4,

1.7976931348623157e+308

]

],

"y" : [

[

"a",

"a"

],

[

"c",

"c"

]

]

}

Range and Set

"nscanned" : 7,

"nscannedObjects" : 3,

"n" : 3,

Range and Set

1b

3d

4g

5c

5d

9f

6a

7e

8c

Range and Set

1b

3d

4g

5c

5d

9f

6a

7e

8c

Every distinct value

of x must be

checked for y values

‘a’ and ‘c’.

Two Ranges (2D Box)

• db.c.find( ,x:,$gte:3,$lte:7-,y:,$gte:’c’,$lte:’f’-- )

• Index {x:1,y:1}

Two Ranges (2D Box)

x

y

3 7

c

f

{x:{$gte:3,$lte:7},

y:,$gte:’c’,$lte:’f’--

Two Ranges (2D Box)

<=?<=7

1b

3d

4g

5d

6a

7e

9f

5c

c&

7g

f3 <=?<=

Two Ranges (2D Box)>db.c.find( {x:{$gte:3,$lte:7},y:{$gte:'c',$lte:'f'}} ).explain()

{

"cursor" : "BtreeCursor x_1_y_1",

"nscanned" : 6,

"nscannedObjects" : 4,

"n" : 4,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

3,

7

]

],

"y" : [

[

"c",

"f"

]

]

}

}

Two Ranges (2D Box)"indexBounds" : {

"x" : [

[

3,

7

]

],

"y" : [

[

"c",

"f"

]

]

}

Two Ranges (2D Box)

"nscanned" : 6,

"nscannedObjects" : 4,

"n" : 4,

Two Ranges (2D Box)

1b

3d

4g

5c

5d

9f

6a

7e

7g

Two Ranges (2D Box)

<=?<=7

c f

3

<=?<=

For every distinct value of x in this range

Scan for every value of y in this range

$or

Disjoint $or Criteria

• db.c.find( ,$or:*,x:5-,,y:’d’-+- )

• Indexes {x:1}, {y:1}

Disjoint $or Criteria

?

1b

3d

4g

5d

6a

7e

9f

5c

d

7g

5

?

1b

3d

4g

5d

6a

7e

9f

5c

7g

Disjoint $or Criteria>db.c.find( {$or:[{x:5},{y:'d'}]} ).explain()

{

"clauses" : [

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 2,

"nscannedObjects" : 2,

"n" : 2,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

5,

5

]

]

}

},

{

"cursor" : "BtreeCursor y_1",

"nscanned" : 2,

"nscannedObjects" : 2,

"n" : 1,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"y" : [

[

"d",

"d"

]

]

}

}

],

"nscanned" : 4,

"nscannedObjects" : 4,

"n" : 3,

"millis" : 1

}

Disjoint $or Criteria{

"cursor" : "BtreeCursor x_1",

"nscanned" : 2,

"nscannedObjects" : 2,

"n" : 2,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

5,

5

]

]

}

},

Disjoint $or Criteria{

"cursor" : "BtreeCursor y_1",

"nscanned" : 2,

"nscannedObjects" : 2,

"n" : 1,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"y" : [

[

"d",

"d"

]

]

}

}

Only return one

document matching

this clause.

Disjoint $or Criteria

"nscanned" : 4,

"nscannedObjects" : 4,

"n" : 3,

"millis" : 1

Disjoint $or Criteria

?

1b

3d

4g

5d

6a

7e

9f

5c

7g

5

Disjoint $or Criteria

d ?

1b

3d

4g

5d

6a

7e

9f

5c

7g

We have already

scanned the x index

for x:5. So this

document was

returned already. We

don’t return it again.

Unindexed $or Clause

• db.c.find( ,$or:*,x:5-,,y:’d’-+- )

• Index {x:1} (no index on y)

Unindexed $or Clause

>db.c.find( {$or:[{x:5},{y:'d'}]} ).explain()

{

"cursor" : "BasicCursor",

"nscanned" : 9,

"nscannedObjects" : 9,

"n" : 3,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

}

}

Since y is not indexed,

we must do a full

collection scan to

match y:’d’. Since a

full scan is required,

we don’t use the index

on x to match x:5.

Eliminated $or Clause

• db.c.find( {$or:[{x:{$gt:2,$lt:6}},{x:5}]} )

• Index {x:1}

Eliminated $or Clause

81 2 3 4 6 7 95

2 < ? < 6

81 2 3 4 6 7 95

5 ?

Eliminated $or Clause

>db.c.find( {$or:[{x:{$gt:2,$lt:6}},{x:5}]} ).explain()

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 3,

"nscannedObjects" : 3,

"n" : 3,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

2,

6

]

]

}

}

The index range of the

second clause is

included in the index

range of the first

clause, so we use the

first index range only.

Eliminated $or Clause with Differing Unindexed Criteria

• db.c.find( ,$or:*,x:,$gt:2,$lt:6-,y:’c’-,,x:5,y:'d’-+- )

• Index {x:1}

Eliminated $or Clause with Differing Unindexed Criteria

1b

3d

4g

5d

6a

7e

9f

5c

7g

< ? <2 6 and c

1b

3d

4g

5d

6a

7e

9f

5c

7g

5 and d

Eliminated $or Clause with Differing Unindexed Criteria

>db.c.find( ,$or:*,x:,$gt:2,$lt:6-,y:’c’-,,x:5,y:'d’-+- ).explain()

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 4,

"nscannedObjects" : 4,

"n" : 2,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

2,

6

]

]

}

}

Eliminated $or Clause with Differing Unindexed Criteria

1b

3d

4g

5d

6a

7e

9f

5c

7g

< ? <2 6 and c , d

The index range for the first clause contains the index

range for the second clause, so all matching is done

using the index range for the first clause.

Overlapping $or Clauses

• db.c.find( {$or:[{x:{$gt:2,$lt:6}},{x:{$gt:4,$lt:7}}]} )

• Index {x:1,y:1}

Overlapping $or Clauses

81 2 3 4 6 7 95

2 < ? < 6

81 2 3 4 6 7 95

4 < ? < 7

Overlapping $or Clauses>db.d.find( {$or:[{x:{$gt:2,$lt:6}},{x:{$gt:4,$lt:7}}]} ).explain()

{

"clauses" : [

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 3,

"nscannedObjects" : 3,

"n" : 3,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

2,

6

]

]

}

},

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 1,

"nscannedObjects" : 1,

"n" : 1,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

6,

7

]

]

}

}

],

"nscanned" : 4,

"nscannedObjects" : 4,

"n" : 4,

"millis" : 1

}

>

Overlapping $or Clauses

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 3,

"nscannedObjects" : 3,

"n" : 3,

"millis" : 0,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

2,

6

]

]

}

},

Overlapping $or Clauses

{

"cursor" : "BtreeCursor x_1",

"nscanned" : 1,

"nscannedObjects" : 1,

"n" : 1,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

6,

7

]

]

}

}

The index range

scanned for the

previous clause is

removed.

Overlapping $or Clauses

81 2 3 4 6 7 95

2 < ? < 6

81 2 3 4 7 95

6 <= ? < 7

6

2D Overlapping $or Clauses

• db.c.find( ,$or:*,x:,$gt:2,$lt:6-,y:,$gt:’b’,$lt:’f’--,,x:,$gt:4,$lt:7-,y:,$gt:’b’,$lt:’e’--+- )

• Index {x:1,y:1}

2D Overlapping $or Clauses

x

y

2 6

b

f

Clause 2

e

7

Clause 1

2D Overlapping $or Clauses>db.c.find( {$or:[{x:{$gt:2,$lt:6},y:{$gt:'b',$lt:'f'}},{x:{$gt:4,$lt:7},y:{$gt:'b',$lt:'e'}}]} ).explain()

{

"clauses" : [

{

"cursor" : "BtreeCursor x_1_y_1",

"nscanned" : 4,

"nscannedObjects" : 3,

"n" : 3,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

2,

6

]

],

"y" : [

[

"b",

"f"

]

]

}

},

{

"cursor" : "BtreeCursor x_1_y_1",

"nscanned" : 0,

"nscannedObjects" : 0,

"n" : 0,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

6,

7

]

],

"y" : [

[

"b",

"e"

]

]

}

}

],

"nscanned" : 4,

"nscannedObjects" : 3,

"n" : 3,

"millis" : 1

2D Overlapping $or Clauses

{

"cursor" : "BtreeCursor x_1_y_1",

"nscanned" : 4,

"nscannedObjects" : 3,

"n" : 3,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

2,

6

]

],

"y" : [

[

"b",

"f"

]

]

}

2D Overlapping $or Clauses

{

"cursor" : "BtreeCursor x_1_y_1",

"nscanned" : 0,

"nscannedObjects" : 0,

"n" : 0,

"millis" : 1,

"nYields" : 0,

"nChunkSkips" : 0,

"isMultiKey" : false,

"indexOnly" : false,

"indexBounds" : {

"x" : [

[

6,

7

]

],

"y" : [

[

"b",

"e"

]

]

}

}

],

The index range

scanned for the

previous clause is

removed.

2D Overlapping $or Clauses

x

y

2 6

b

f

Clause 2

e

7

We only have

to scan the

remainder

here

Clause 1

Overlapping $or Clauses

• Rule of thumb for n dimensions: We subtract earlier clause boxes from current box when the result is a/some box(es).

2✓

2✓

11

Overlapping $or Clauses

• Rule of thumb for n dimensions: We subtract earlier clause boxes from current box when the result is a/some box(es).

2✗

1

$or TODO

• Use indexes on $or fields to satisfy a sort specification SERVER-1205

• Use full query optimizer to select $or clause indexes in getMore SERVER-1215

• Improve index range elimination (handling some cases where remainder is not a box)

Automatic Index Selection

(Query Optimizer)

Optimal Index

• find( {x:5} )– Index {x:1}

– Index {x:1,y:1}

• find( {x:5} ).sort( {y:1 } )– Index {x:1,y:1}

• find( {} ).sort( {x:1} )– Index {x:1}

• find( {x:{$gt:1,$lt:7}} ).sort( {x:1} )– Index {x:1}

Optimal Index

• Rule of Thumb

– No scanAndOrder

– All fields with index useful constraints are indexed

– If there is a range or sort it is the last field of the index used to resolve the query

• If multiple optimal indexes exist, one chosen arbitrarily.

Optimal Index

• These same criteria are useful when you are designing your indexes.

Multiple Candidate Indexes

• find( ,x:4,y:’a’- )

– Index {x:1} or {y:1}?

• find( {x:4} ).sort( {y:1} )

– Index {x:1} or {y:1}?

– Note: {x:1,y:1} is optimal

• find( ,x:,$gt:2,$lt:7-,y:,$gt:’a’,$lt:’f’-- )

– Index {x:1,y:1} or {y:1,x:1}?

Multiple Candidate Indexes

• The only index selection criterion is nscanned

• find( ,x:4,y:’a’- )

– Index {x:1} or {y:1} ?

– If fewer documents match {y:’a’- than ,x:4- then nscanned for {y:1} will be less so we pick {y:1}

• find( ,x:,$gt:2,$lt:7-,y:,$gt:’b’,$lt:’f’-- )

– Index {x:1,y:1} or {y:1,x:1} ?

– If fewer distinct values of 2 <x< 7 than distinct values of ‘b’ <y< ‘f’ then ,x:1,y:1- chosen (rule of thumb)

Multiple Candidate Indexes

• The only index selection criterion is nscanned

• Pretty good, but doesn’t cover every case, eg

– Cost of scanAndOrdervs ordered index

– Cost of loading full document vs just index key

– Cost of scanning adjacent btree keys vs non adjacent keys/documents

Competing Indexes

• At most one query plan per index

• Run in interleaved fashion

• Plans kept in a priority queue ordered by nscanned. We always continue progress on plan with lowest nscanned.

Competing Indexes

• Run until one plan returns all results or enough results to satisfy the initial query request (based on soft limit spec / data size requirement for initial query).

• We only allow plans to compete in initial query. In getMore, we continue reading from the index cursor established by the initial query.

“Learning” a Query Plan

• When an index is chosen for a query the query’s “pattern” and nscanned are recorded

– find( ,x:3,y:’c’- )

• {Pattern: {x:’equality’, y:’equality’-, Index: ,x:1-, nscanned: 50}

– find( ,x:,$gt:5-,y:,$lt:’z’-- )

• {Pattern: {x:’gt bound’, y:’lt bound’-, Index: ,y:1-, nscanned: 500}

“Learning” a Query Plan

• When a new query matches the same pattern, the same query plan is used

– find( ,x:5,y:’z’- )

• Use index {x:1}

– find( ,x:,$gt:20-,y:,$lt:’b’-- )

• Use index {y:1}

“Un-Learning” a Query Plan

• 100 writes to the collection

• Indexes added / removed

Bad Plan Insurance

• If nscanned for a new query using a recorded plan is much worse than the recorded nscanned for an earlier query with the same pattern, we start interleaving other plans with the current plan.

• Currently “much worse” means 10x

Query Planner

• Ad hoc heuristics in some cases

• Seem to work decently in practice

Feedback

• Large and small scale optimizer features are generally prioritized based on user input.

• Please use jira to request new features and vote on existing feature requests.

Thanks!

Feature Requests

jira.mongodb.org

Support

groups.google.com/group/mongodb-user

Next up:

Sharding Details with Eliot