Upload
others
View
10
Download
0
Embed Size (px)
Citation preview
1
Ghislain Fourny
Big Data Fall 2019
12. Document stores
pinkyone / 123RF Stock Photo
Ilya Akinshin / 123RF Stock Photo
44
Relational Model: Tables
4
A B C D
a 1 alpha foo
a 2 alpha bar
a 3 beta foo
"Everything is a table"
Relational integrity
Atomic integrity
55
Relational Model: Schemas
5
A B C D
string integer char(3) date
Atomic types assigned to each column
Relational integrity
Atomic integrity
Domain integrity
77
Relational Syntax: CSV
ID,Last name,First name,Theory,
1,Einstein,Albert,"General, Special Relativity"
2,Gödel,Kurt,"""Incompleteness"" Theorem"
Physical view
Syntax
ID Last name First name Theory
1 Einstein Albert General, Special Relativity
2 Gödel Kurt "Incompleteness" Theorem
Logical view
Data Model
88
Relational Language: SQL
SELECT century AS c
FROM persons
GROUP BY century
HAVING COUNT(*) > 2
name middle_initial last_name century captain
varchar(30) char(1) text integer boolean
James T Kirk 23 TRUE
Beverly C Crusher 24 FALSE
Jean-Luc NULL Picard 24 TRUE
Kathryn NULL Janeway 24 TRUE
persons
century
integer
24
99
The stack
Storage
Encoding
Syntax
Data models
Validation
Processing
Indexing
Data stores
User interfaces
Querying
1010
The stack
Storage
Encoding
Syntax
Data models
Validation
Processing
Indexing
Data stores
User interfaces
Querying
We already
rebuilt this stack
with tables
HBase
SQL
CSV
DataFrames
HDFS
UTF-8
Spark
1111
The stack
Storage
Encoding
Syntax
Data models
Validation
Processing
Indexing
Data stores
User interfaces
QueryingNow, can we
rebuild this all
with XML/JSON?
XML, JSON
Trees
HDFS
UTF-8
Spark
XML Schema
?
?
?
1414
Flat trees
14
{
"foo": 1,
"bar": "foo",
"foobar" : true,
"a" : "bar",
"b" : 3.14
}
foo bar foobar a b
1 foo true bar 3.14
1515
Flat trees
15
<row>
<foo>1</foo>
<bar>foo</bar>
<foobar>true</foobar>
<a>foo</a>
<b>3.14</b>
</row>
foo bar foobar a b
1 foo true foo 3.14
1616
Collections of flat trees
16
<row>
<foo>1</foo>
<bar>foo</bar>
<foobar>true</foobar>
<a>foo</a>
<b>3.14</b>
</row>
foo bar foobar a b
1 foo true foo 3.14
2 bar false bar 4.2
<row>
<foo>a</foo>
<bar>bar</bar>
<foobar>false</foobar>
<a>bar</a>
<b>4.2</b>
</row>
1717
Schemas: from SQL to NoSQL
17
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
<xs:element name="row">
<xs:complexType>
<xs:sequence>
<xs:element name="foo" type="xs:integer"/>
<xs:element name="bar" type="xs:string"/>
<xs:element name="foobar" type="xs:boolean"/>
<xs:element name="a" type="xs:string"/>
<xs:element name="b" type="xs:decimal"/>
</xs:sequence>
</xs:complexType>
</xs:element>
</xs:schema>
foo bar foobar a b
integer string boolean string decimal
1818
NoSQL
But with JSON and XML, we can have
Nestedness
Heterogeneity
and
Ato
mic
in
teg
rity
(Fir
st norm
al fo
rm)
Re
latio
na
l in
teg
rity
Do
ma
in in
teg
rity
1919
Nested arrays{
"category ": 1,"job": "mathematician",
"name": [ {
"last": "Ramanujan",
"first": "Srinivasa"},
{
"last": "Gödel",
"first": "Kurt"} ]
}
{"category": 2,
"job": "physicist"",
"name": [ {
"last": "Einstein","first": "Albert"
} ]
}
...
category job
1 mathematician
2 physicist
category name.last name.first
1 Ramanujan Srinivasa
1 Gödel Kurt
2 Einstein Albert
2020
Heterogeneity
{
"id": 1,
"profession": "physicist"
"last name": "Einstein"
}
}
{
"id": 2,
"profession": "engineer"
}
{
"id": 3,
"first name": "Kurt"
}
id profession last
name
first
name
1 physicist Einstein NULL
2 engineer NULL NULL
3 NULL NULL Kurt
2424
Documents{
"foo": 1,
"bar": "foo",
"name": [ {
"last": "Einstein",
"first": "Albert"
},
{
"last": "Gödel",
"first": "Kurt"
} ]
}
<row>
<foo>1</foo>
<bar>foo</bar>
<names>
<name>
<last>Einstein></last>
<first>Albert</first>
</name>
<name>
<last>Gödel</last>
<first>Kurt</first>
</name>
</names>
</row>
XML
JSON
2525
Collection of trees
25
{
"foo": 1,
"bar": [ "foo", "bar" ],
"foobar" : true,
"a" : { "foo" : null, "b" : [ 3, 2 ] },
"b" : 3.14
}
{
"foo": 1,
"bar": "foo"
}
{
"foo": 2,
"bar": [ "foo", "foobar" ],
"foobar" : false,
"a" : { "foo" : "foo", "b" : [ 3, 2 ] },
"b" : 3.1415
}
2626
Collection of trees
26
{
"foo": 1,
"bar": [ "foo", "bar" ],
"foobar" : true,
"a" : { "foo" : null, "b" : [ 3, 2 ] },
"b" : 3.14
}
{
"foo": 1,
"bar": "foo"
}
{
"foo": 2,
"bar": [ "foo", "foobar" ],
"foobar" : false,
"a" : { "foo" : "foo", "b" : [ 3, 2 ] },
"b" : 3.1415
}
Typically small documents
2727
Collection of trees
27
{
"foo": 1,
"bar": "foo"
}
{
"foo": 2,
"bar": [ "foo", "foobar" ],
"foobar" : false,
"a" : { "foo" : "foo", "b" : [ 3, 2 ] },
"b" : 3.1415
}
Typ
ica
lly la
rge
(up to t
ho
usan
ds
, m
illi
on
s, b
illio
ns
of
ob
jects
)
{
"foo": 1,
"bar": [ "foo", "bar" ],
"foobar" : true,
"a" : { "foo" : null, "b" : [ 3, 2 ] },
"b" : 3.14
}
3131
NoSQL: validation after the data was populated
<row>
<foo>1</foo>
<bar>foo</bar>
<foobar>true</foobar>
<a>foo</a>
<b>3.14</b>
</row>
<row>
<foo>a</foo>
<bar>bar</bar>
<foobar>3</foobar>
<a>null</a>
<b>foo</b>
</row>
3737
ASCII
37
ASCII Code Chart, scanner copied from the material delivered with TermiNet 300
impact type printer with Keyboard, February 1972, General Electric Data communication Product Dept., Waynesboro VA. http://archive.computerhistory.org/resources/text/GE/GE.TermiNet300.1971.10264620
7.pdf
4343
Read: selecting all documents
43
db.scientists.find({})
db.scientists.find()
SELECT * FROM scientists
SQL CheatSheet
{ "First" : "Albert", "Last" : "Einstein", "Theory": "Relativity" }
{ "First" : "Isaac", "Last" : "Newton", "Theory": "Gravitation" }{ "First" : "Kurt", "Last" : "Gödel", "Theory": "Relativity" }
{ "First" : "Hermann", "Last" : "Minkowski", "Theory": "Relativity" }
4444
Read
44
db.scientists.find(
{ "Theory" : "Relativity" }
)
{ "First" : "Albert", "Last" : "Einstein", "Theory": "Relativity" }
{ "First" : "Isaac", "Last" : "Newton", "Theory": "Gravitation" }{ "First" : "Kurt", "Last" : "Gödel", "Theory": "Incompleteness" }
{ "First" : "Hermann", "Last" : "Minkowski", "Theory": "Relativity" }
4545
Read
45
db.scientists.find(
{ "Theory" : "Relativity" }
)
SELECT *
FROM scientists
WHERE Theory = "Relativity"
SQL CheatSheet
{ "First" : "Albert", "Last" : "Einstein", "Theory": "Relativity" }
{ "First" : "Isaac", "Last" : "Newton", "Theory": "Gravitation" }{ "First" : "Kurt", "Last" : "Gödel", "Theory": "Incompleteness" }
{ "First" : "Hermann", "Last" : "Minkowski", "Theory": "Relativity" }
4646
Read: projection
46
db.scientists.find(
{ "Theory" : "Relativity" },
{ "Name" : 1, "Last": 1 }
)
4747
Read: projection
47
db.scientists.find(
{ "Theory" : "Relativity" },
{ "Name" : 1, "Last": 1 }
)
WHERE
SELECT
SELECT Name, Last
FROM scientists
WHERE Theory = "Particle Physics"
SQL CheatSheet
4848
Read: projection
db.scientists.find(
{ "Theory" : "Relativity" },
{ "Name" : 1, "Last": 1 }
)
SELECT Name, Last
FROM scientists
WHERE Theory = "Particle Physics"
SQL CheatSheet
{ "First" : "Albert", "Last" : "Einstein", "Theory": "Relativity" }
{ "First" : "Isaac", "Last" : "Newton", "Theory": "Gravitation" }{ "First" : "Kurt", "Last" : "Gödel", "Theory": "Incompleteness" }
{ "First" : "Hermann", "Last" : "Minkowski", "Theory": "Relativity" }
{ "First" : "Albert", "Last" : "Einstein" }
{ "First" : "Hermann", "Last" : "Minkowski" }
4949
Read: AND
db.scientists.find(
{
"Theory" : "Relativity",
"Last" : "Einstein"
}
)
SELECT Name, Last
FROM scientists
WHERE Theory = "Particle Physics"
AND Last = "Einstein"
SQL CheatSheet
{ "First" : "Albert", "Last" : "Einstein", "Theory": "Relativity" }
{ "First" : "Isaac", "Last" : "Newton", "Theory": "Gravitation" }{ "First" : "Kurt", "Last" : "Gödel", "Theory": "Relativity" }
{ "First" : "Hermann", "Last" : "Minkowski", "Theory": "Relativity" }
5050
Read: OR
db.scientists.find({
"$or" : [{ "Last" : "Newton" },{ "Last" : "Einstein" }
]}
)
SELECT Name, Last
FROM scientists
WHERE Last = "Newton"
OR Last = "Einstein"
SQL CheatSheet
{ "First" : "Albert", "Last" : "Einstein", "Theory": "Relativity" }
{ "First" : "Isaac", "Last" : "Newton", "Theory": "Gravitation" }{ "First" : "Kurt", "Last" : "Gödel", "Theory": "Relativity" }
{ "First" : "Hermann", "Last" : "Minkowski", "Theory": "Relativity" }
5151
Read: Comparison
db.scientists.find({ "Publications" : { $gte : 100 } }
)
SELECT Name, Last
FROM scientists
WHERE Publications >= 100
SQL CheatSheet
{ "First" : "Albert", "Last" : "Einstein", "Publications": 500}
{ "First" : "Isaac", "Last" : "Newton", "Publications ": 30}{ "First" : "Kurt", "Last" : "Gödel", "Publications ": 400}
{ "First" : "Hermann", "Last" : "Minkowski", " Publications ": 50}
5252
Heterogeneity
52
db.scientists.find(
{ "Theory" : "Relativity" }
)
SELECT *
FROM scientists
WHERE Theory = "Relativity"
SQL CheatSheet
{ "First" : "Albert", "Last" : "Einstein", "Theory": "Relativity" }
{ "First" : "Isaac", "Last" : "Newton", "Theory": false }{ "First" : "Kurt", "Last" : "Gödel", "Theory": "Incompleteness" }
{ "First" : "Hermann", "Last" : "Minkowski", "Theory": "Relativity" }
{ "First" : "Niels", "Last" : "Bohr" }
Other type
Missing field
5353
Heterogeneity
53
db.scientists.find(
{ "Theory" : null }
)
SELECT *
FROM scientists
WHERE Theory IS NULL
SQL CheatSheet
{ "First" : "Albert", "Last" : "Einstein", "Theory": "Relativity" }
{ "First" : "Isaac", "Last" : "Newton", "Theory": "Gravitation" }{ "First" : "Kurt", "Last" : "Gödel", "Theory": "Incompleteness" }
{ "First" : "Hermann", "Last" : "Minkowski", "Theory": "Relativity" }
{ "First" : "Niels", "Last" : "Bohr" }
5555
Read: nestedness (objects)
55
db.scientists.find({
"Name.First" : "Albert"
})
{
"Name" : {"First" : "Albert",
"Last" : "Einstein"
},
"Theories": [ "Relativity" ]}
{
"Name" : {"First" : "Albert",
"Last" : "Zweistein"
},
"Theories": [ "Unification" ]}
{
"Name" : {"First" : "Kurt",
"Last" : "Gödel"
},
"Theories": [ "Incompleteness" ]}
5656
Read: possible confusion
db.scientists.find({
"Name" : { "First" : "Albert" }
})
{
"Name" : {"First" : "Albert",
"Last" : "Einstein"
},
"Theories": [ "Relativity" ]}
{
"Name" : {
"First" : "Albert"
},
"Theories": [ "Unification" ]
}
{
"Name" : {"First" : "Kurt",
"Last" : "Gödel"
},
"Theories": [ "Incompleteness" ]}
5757
Read: nestedness (arrays)
57
db.scientists.find({
"Theories" : "Special relativity"
})
{
"Name" : {
"First" : "Albert",
"Last" : "Einstein"
},
"Theories": [
"Special relativity",
"General relativity"
]
}
{
"Name" : {
"First" : "Kurt",
"Last" : "Gödel"
},
"Theories": [ "Incompleteness" ]
}
5858
Read: other operators
58
db.scientists.find({
"University" : {
$in : [ "ETH Zurich", "EPFL" ]
}
})
{
"Name" : {"First" : "Albert",
"Last" : "Einstein"
},
"University" : "ETH Zurich"}
{
"Name" : {"First" : "Kurt",
"Last" : "Gödel"
},
"University" : "Uni Wien"}
5959
Read: other operators
59
db.scientists.find({
"University" : {
$nin : [ "ETH Zurich", "EPFL" ]
}
})
{
"Name" : {"First" : "Albert",
"Last" : "Einstein"
},
"University" : "ETH Zurich"}
{
"Name" : {"First" : "Kurt",
"Last" : "Gödel"
},
"University" : "Uni Wien"}
6161
Sort
61
db.scientists.find({"University" : {$in : [ "ETH Zurich", "EPFL" ]
}}).sort({"Founded" : -1 })
6262
Limit and offset
62
db.scientists.find({"University" : {$in : [ "ETH Zurich", "EPFL" ]
}}).sort({"Founded" : -1 }).skip(30).limit(10)
6464
Aggregation and pipelines
64
db.scientists.aggregate({ $match : { "Century" : 20 },{ $group : { "Year" : "$year", "Count" : { "$sum" : 1 } } },{ $sort : { "Count" : -1 } },{ $limit : 5 }
)
Pipeline
6565
Aggregation and pipelines
65
db.scientists.aggregate({ $match : { "Century" : 20 },{ $group : { "Year" : "$year", "Count" : { "$sum" : 1 } } },{ $sort : { "Count" : -1 } },{ $limit : 5 }
)
Pipeline
Like MapReduce and Spark!
STAGESTRANFORMATION
ACTIONCREATION
But we'll see a much easier way next week.
7878
Replica sets on physical level
Primary
Secondary Secondary
Replica set
Primary
Secondary Secondary
Replica set
Primary
Secondary Secondary
Replica set
Shard 1 Shard 2 Shard 3
8585
Indices
85
Ilya Akinshin / 123RF Stock Photo
_______________
_______________
_______________
_______________
______________________________
_______________
_______________
_______________
______________________________
8686
Indices
{
Name: "Apple", "Color": [ "green", "red" ]
}
{
Name: "Orange", "Color": [ "orange" ]
}
{
Name: "Banana", "Color": [ "yellow" ]
}
{
Name: "Kiwi", "Color": [ "brown", "green" ]
}
{
Name: "Ananas", "Color": [ "yellow" ]
}
8787
Big collections
{"Name":"Einstein", "Profession":"Physicist"}
{"Name":"Gödel", "Profession":"Mathematician"}
{"Name":"Ramanujan", "Profession":"Mathematician "}
{"Name":"Pythagoras", "Profession":"Mathematician "}
{"Name":"Turing", "Profession":"Computer Scientist"}
{"Name":"Church", "Profession":"Computer Scientist"}
{"Name":"Nash", "Profession":"Economist"}
{"Name":"Euler", "Profession":"Mathematician"}
{"Name":"Bohm", "Profession":"Physicist"}
{"Name":"Galileo", "Profession":"Astrophysicist"}
{"Name":"Lagrange", "Profession":"Mathematician"}
{"Name":"Gauss", "Profession":"Mathematician"}
{"Name":"Thales", "Profession":"Mathematician"}
...
Billions
of
objects
8888
Point queries
{"Name":"Einstein", "Profession":"Physicist"}
{"Name":"Gödel", "Profession":"Mathematician"}
{"Name":"Ramanujan", "Profession":"Mathematician "}
{"Name":"Pythagoras", "Profession":"Mathematician "}
{"Name":"Turing", "Profession":"Computer Scientist"}
{"Name":"Church", "Profession":"Computer Scientist"}
{"Name":"Nash", "Profession":"Economist"}
{"Name":"Euler", "Profession":"Mathematician"}{"Name":"Bohm", "Profession":"Physicist"}
{"Name":"Galileo", "Profession":"Astrophysicist"}
{"Name":"Lagrange", "Profession":"Mathematician"}
{"Name":"Gauss", "Profession":"Mathematician"}
{"Name":"Thales", "Profession":"Mathematician"}
...
find("Name":"Euler"})
8989
Not highly-filtering queries
{"Name":"Einstein", "Profession":"Physicist"}
{"Name":"Gödel", "Profession":"Mathematician"}
{"Name":"Ramanujan", "Profession":"Mathematician "}{"Name":"Pythagoras", "Profession":"Mathematician "}
{"Name":"Turing", "Profession":"Computer Scientist"}
{"Name":"Church", "Profession":"Computer Scientist"}
{"Name":"Nash", "Profession":"Economist"}
{"Name":"Euler", "Profession":"Mathematician"}{"Name":"Bohm", "Profession":"Physicist"}
{"Name":"Galileo", "Profession":"Astrophysicist"}
{"Name":"Lagrange", "Profession":"Mathematician"}
{"Name":"Gauss", "Profession":"Mathematician"}
{"Name":"Thales", "Profession":"Mathematician"}...
find("Profession":"Mathematician"})
9090
Range queries
{"Name":"Einstein", "Year":1879}
{"Name":"Gödel", "Year":1906}
{"Name":"Ramanujan", "Year":1887}
{"Name":"Pythagoras", "Year":-570}
{"Name":"Turing", "Year":1912}
{"Name":"Church", "Year":1903}
{"Name":"Nash", "Year":1928}
{"Name":"Euler", "Year":1707}
{"Name":"Bohm", "Year":1917}
{"Name":"Galileo", "Year":1564}
{"Name":"Lagrange", "Year":1736}
{"Name":"Gauss", "Year":1777}
{"Name":"Thales", "Year":-624}
...
find("Year":{"$gte":1900})
9292
Indices
yellow
orange
red
green
brown
{
Name: "Apple", "Color": [ "green", "red" ]
}
{
Name: "Orange", "Color": [ "orange" ]
}
{
Name: "Banana", "Color": [ "yellow" ]
}
{
Name: "Kiwi", "Color": [ "brown", "green" ]
}
{
Name: "Ananas", "Color": [ "yellow" ]
}
9393
Indices
yellow
orange
red
green
brown
{
Name: "Apple", "Color": [ "green", "red" ]
}
{
Name: "Orange", "Color": [ "orange" ]
}
{
Name: "Banana", "Color": [ "yellow" ]
}
{
Name: "Kiwi", "Color": [ "brown", "green" ]
}
{
Name: "Ananas", "Color": [ "yellow" ]
}
9494
Hash indices (the fastest)
94
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
9595
Hash indices (the fastest)
95
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
db.scientists.createIndex({
"Century" : "hash"
})
Scientists
9696
Hash indices (the fastest)
96
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
9797
Hash indices (the fastest)
97
Value Records
20h(20)=0{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
9898
Hash indices (the fastest)
98
Value Records
20h(20)=0{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
9999
Hash indices (the fastest)
99
Value Records
20h(20)=0{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
100100
Hash indices (the fastest)
100
Value Records
20
19h(19)=4
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
101101
Hash indices (the fastest)
101
Value Records
20
-4
19
h(-4)=3
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
102102
Hash indices (the fastest)
102
Value Records
20
-6
-4
19
h(-6)=1
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
103103
Hash indices (the fastest)
103
Value Records
20
-6
-4
19
h(20)=0{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
104104
Hash indices (the fastest)
104
Value Records
20
-6
-4
19
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
105105
Hash indices (the fastest)
105
Value Records
20
-6
-4
19
db.scientists.find({"Century":19}
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
106106
Hash indices (the fastest)
106
Value Records
20
-6
-4
19h(19)=4
db.scientists.find({"Century":19}
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
107107
Hash indices (the fastest)
107
Value Records
20
-6
-4
19h(19)=4
db.scientists.find({"Century":19}
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
108108
Hash indices (the fastest)
108
Value Records
20
-6
-4
19h(19)=4
db.scientists.find({"Century":19}
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
109109
Limitations of hash indices
No support for range queries
Hash function not perfect in real life
Space requirements for collision avoidance
110110
B+-tree example
almost carefully
fair
is Laertes
most
be
come hour mymerely
it takepossess
that
should
youupon yourthine
timethy to
this
possess
come is merely that thy upon
Disks block access
111111
B+-tree example
almost carefully
fair
is Laertes
most
be
come hour mymerely
it takepossess
that
should
youupon yourthine
timethy to
this
possess
come is merely that thy upon
All leaves at same depth
112112
B+-tree example
almost carefully
fair
is Laertes
most
be
come hour mymerely
it takepossess
that
should
youupon yourthine
timethy to
this
possess
come is merely that thy upon
All non-leaf nodes have between 3 and 5 children
4 4
2
General case: #children between d+1 and 2d+1
113113
B+-tree example
almost carefully
fair
is Laertes
most
be
come hour mymerely
it takepossess
that
should
youupon yourthine
timethy to
this
possess
come is merely that thy upon
4 4
2
But it's fine if the root has less.
114114
B+-tree example
almost carefully
fair
is Laertes
most
be
come hour mymerely
it takepossess
that
should
youupon yourthine
timethy to
this
possess
come is merely that thy upon
Actual values only at the leaves
136136
Tree indices (logarithmic)
136
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
137137
Tree indices (logarithmic)
137
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
db.scientists.createIndex({
"Century" : 1
})
2-3 B+-tree
138138
Tree indices (logarithmic)
138
20
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
139139
Tree indices (logarithmic)
139
20
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
140140
Tree indices (logarithmic)
140
19 20
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
141141
Tree indices (logarithmic)
141
19 20-4
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
142142
Tree indices (logarithmic)
142
-4 19 20
20
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
143143
Tree indices (logarithmic)
143
-4 19 20
20
-6
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
144144
Tree indices (logarithmic)
144
-6 -4 19 20
19
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
145145
Tree indices (logarithmic)
145
-6 -4 19 20
19
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
146146
Tree indices (logarithmic)
146
db.scientists.find({"Century":{"$gte:19}}
-6 -4 19 20
19
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
147147
Tree indices (logarithmic)
147
-6 -4 19 20
19
db.scientists.find({"Century":{"$gte:19}}
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
148148
Tree indices (logarithmic)
148
-6 -4 19 20
19
db.scientists.find({"Century":{"$gte:19}}
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
149149
Tree indices (logarithmic)
149
-6 -4 19 20
19
db.scientists.find({"Century":{"$gte:19}}
{"Name":{"F":"Albert","L":"Einstein"},"Country":"Switzerland","Century":20}
{"Name":"Gödel","Country":"Austria","Century":20}
{"Name":"Ramanujan","Country":"India","Century":19}
{"Name":"Euclid","Country":"Greece","Century":-4}
{"Name":"Pythagoras","Country":"Greece","Century":-6}
{"Name":"Turing","Country":"UK","Century":20}
Scientists
156156
Index creation: hash
156
db.scientists.createIndex({"Name.Last" : "hash"
})
db.scientists.find({
"Name.Last" : "Einstein"
})
Index
Query
157157
Index creation: hash
157
db.scientists.createIndex({"Name.Profession" : "hash"
})
db.scientists.find({
"Profession" : "Physicist",
"Theories" : "Relativity"
})
Index
Query
Post-filtering
158158
Index creation: compound (only B+-tree!)
158
db.scientists.createIndex({"Birth" : 1,
"Death" : 1
})
db.scientists.find({"Birth" : 1887,
"Death" : 1946
})
Index
Query
159159
Index creation: compound (only B+-tree!)
159
db.scientists.createIndex({"Birth" : 1,
"Death" : -1
})
db.scientists.find({"Birth" : 1887,
"Death" : 1946
})
Index
Query
descending
160160
Index creation: range
160
db.scientists.createIndex({"Birth" : 1
})
db.scientists.find({"Birth" : { "$gte": 1946 }
})
Index
Query
161161
Index creation: range
161
db.scientists.createIndex({"Birth" : 1
})
db.scientists.find({"Birth" : { "$gte": 1946 },"Death" : 1998
})
Index
Query
Post-filtering
162162
Index creation: compound (only B+-tree!)
162
db.scientists.createIndex({"Birth" : 1,
"Death" : -1
})
db.scientists.find({"Birth" : 1887
})
Index
Query
163163
Index creation: compound (only B+-tree!)
163
db.scientists.createIndex({"Birth" : 1,"Death" : -1
})
db.scientists.find({"Birth" : { "$gte" : 1980 }
})
Index
Query
164164
Index creation: compound (only B+-tree!)
164
db.scientists.createIndex({"Birth" : 1,
"Death" : -1
})
db.scientists.find({"Death" : 1887
})
Index
Query
Post-filtering (why?)