31
CP344 – Databases Open Notes Chapter 10: Document-Based Databases

CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

CP344 – Databases

Open Notes Chapter 10:Document-Based Databases

Page 2: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

Human embryos can be grown in lab for longer than 14 days

Tech News!

Page 3: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

Medical error third leading cause of death

Human embryos can be grown in lab for longer than 14 days

Tech News!

Page 4: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

Hackers' Tip of the Day:Create indexes in MySQL

CREATE INDEX color_index on Jeep(color);

Page 5: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2
Page 6: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

Table/Indexing Updates?

Page 7: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

Main Weakness of Key/Value Stores

● Data only has a manually defined structure

Page 8: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

Main Weakness of Key/Value Stores

● Data only has a manually defined structure

● Redis stores data structures (sets, lists, etc)● These data structures cannot easily be queried

Page 9: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

Document Store

Key Value

id1 doc1

id2 doc2

id3 doc3

id4 doc4

id5 doc5

id6 doc6

Page 10: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

Document Store

Key Value

id1 doc1

id2 doc2

id3 doc3

id4 doc4

id5 doc5

id6 doc6

What's a document?

Page 11: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

XML Document

Page 12: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

XML DocumentHuman and machine readable.

Page 13: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

XML DocumentHuman and machine

readable.

Built-in schema.

Page 14: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

XML DocumentHuman and machine readable.

Built-in schema.

User-defined tags.

Page 15: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

JSON Document

Page 16: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

JSON DocumentHuman and machine

readable.

Page 17: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

JSON DocumentHuman and machine

readable.

Flexible schema.

Page 18: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

JSON DocumentHuman and machine readable.

Flexible schema.

Subset of Javascript.

Page 19: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

YAML Document

Page 20: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

YAML DocumentHuman and machine

readable.

Page 21: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

YAML DocumentHuman and machine readable.

Easy to read with whitespace.

Page 22: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

RESTful APIs(Representational State Transfer)

http://www.dog.com/search?q=”dog”

Normal HTTP GET request

Page 23: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

RESTful APIs(Representational State Transfer)

http://www.dog.com/search?q=”dog”

{“dogName”: “Mr. Paws”,“breed”: “Golden-pointer”,“favBed”: “Paw Palace”,“favPastime”: “Barking”

}

JSON

Normal HTTP GET request

Page 24: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

Exercise: Access Freebase

https://www.googleapis.com/freebase/v1/search?query=dog

Example request:

import jsonimport urllib2

Download text from a queryParse text using json libraryPrint out result number 3

Python pseudocode:

Page 25: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

BSON Documents

● Binary JSON

● Values are stored in binary instead of plain text● Save space● Faster to read● Not human-readable

Page 26: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

Document Stores

● Pros● Documents have built-in schema● Fast key/value lookup● Easy to split up across machines

Page 27: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

Document Stores

● Pros● Documents have built-in schema● Fast key/value lookup● Easy to split up across machines

● Cons● Code must keep track of schema of each doc.● No overall database structure● Some queries are hard to write

Page 28: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

Document Stores

● MongoDB

● CouchDB

● Terrastore

Page 29: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

MongoDB Examples

Page 30: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

Exercise: Insert Freebase Articles to MongoDB

Download JSON from freebase

Insert each result as a separate MongoDB document

Run an example MongoDB query that searches documents based on one attribute

Pseudocode:

import pymongo

conn = pymongo.Connection('localhost', 27017)db = conn['test_database']coll = db['test_collection']

doc = {"Name":"Benny", "Password":"Pancake"}docID1 = coll.insert(doc)

pymongo example:

Page 31: CP344 – Databasescs.coloradocollege.edu/~mwhitehead/courses/2015_2016/CP344/Le… · id1 doc1 id2 doc2 id3 doc3 id4 doc4 id5 doc5 id6 doc6. Document Store Key Value id1 doc1 id2

Final Project