Upload
jason-davies
View
1.161
Download
2
Tags:
Embed Size (px)
DESCRIPTION
Apparently NoSQL is all the rage these days, but what does it really mean and what technologies are out there? When to use a non-relational database? How to decide which one to use to achieve world domination? How do I use CouchDB with Ruby on Rails?
Citation preview
Non-Relational Databases
andWorld Domination
Jason Davies
Thursday, 3 December 2009
Overview• Relational vs. Non-Relational
• Why Switch?
• Non-Relational Solutions
• Document Databases
• Key Value Stores
• CouchDB Features
Thursday, 3 December 2009
Relational Databases
Thursday, 3 December 2009
Relational Databases• Relational algebra: union, intersection,
difference, cartesian product
• Easy to perform dynamic queries
• Fixed Schemas
• Normalisation
Thursday, 3 December 2009
Non-Relational Databases• Everything else!
• Myriad of features, including:
• Key-Value stores with external indexers
• Schemaless
• RESTful APIs
Thursday, 3 December 2009
CAP !eorem• Three requirements for applications in a
distributed environment:
• Consistency
• Availability
• Partition tolerance
• Pick two
Thursday, 3 December 2009
Why Switch?• Data structure
• Scalability
• The New Cool
Thursday, 3 December 2009
Data Structure Symptoms
Thursday, 3 December 2009
Sparse Data• Tables with many columns, only a few
being used by any particular row
Thursday, 3 December 2009
Attribute Tables• Each row is (fkey, att_name, att_value)
Thursday, 3 December 2009
Data Dumps• Given up on using columns for structured
data
• Instead simply serialising it (JSON, YAML, XML, etc.) and dumping strings to database
Thursday, 3 December 2009
Too Many Joins• Schemas involving large numbers of
many-to-many join tables or tree-like structures
Thursday, 3 December 2009
Frequent Schema Changes• May be fine for small databases
• Can be tedious
• Rebuilding indexes is slow for millions of rows
Thursday, 3 December 2009
Scalability
Thursday, 3 December 2009
Thursday, 3 December 2009
Thursday, 3 December 2009
Write Capacity• If read capacity is the problem, then set
up master-slave replication
Thursday, 3 December 2009
Too Much Data• Too much for one server to hold
• Hard to shard the data sensibly
Thursday, 3 December 2009
Non-Relational Solutions
Thursday, 3 December 2009
Diverse Ecosystem• Column-oriented databases
• Document-oriented databases
• Key value stores
• Graph-oriented databases
• Distributed databases
• MapReduce
Thursday, 3 December 2009
BigTable• “a sparse, distributed multi-dimensional
sorted map”
• Designed to scale into the petabyte range
• HBase (Java, Hadoop)
• Hypertable
• Cassandra (Facebook, based on Amazon’s Dynamo)
Thursday, 3 December 2009
Document Databases• Arbitrary number of “sparse” attributes per
document
• Documents often map well to JSON e.g. in CouchDB
• Cons: usually can’t perform joins or transactions spanning multiple documents
Thursday, 3 December 2009
Graph Databases• Good for highly interconnected data
• Focus on the relationships between items
• Optimised for querying transitive relationships i.e. variable length chains of joins
• Neo4J, AllegroGraph, Sesame
Thursday, 3 December 2009
Distributed K-V Stores• Giant hash table/dictionary
• Mainly solve data scalability problems
• Transparently partition and replicate data
• Cons:
• eventual consistency or other distributed transaction protocols
• hard to do integrity constraints, hard to catch application bugs
Thursday, 3 December 2009
Distributed K-V Stores• Scalaris, Dynomite, Ringo: data
consistency
• MemcacheDB, Tokyo Cabinet: low latency
Thursday, 3 December 2009
CouchDBApache
Thursday, 3 December 2009
CouchDB and Ruby# with !, it creates the database if it doesn't already exist@db = CouchRest.database!("http://127.0.0.1:5984/couchrest-test")response = @db.save_doc({ :key => 'value', 'another key' => 'another value'})doc = @db.get(response['id'])puts doc.inspect
Thursday, 3 December 2009
CouchDB and [email protected]_save([ {"wild" => "and random"}, {"mild" => "yet local"}, {"another" => ["set","of","keys"]} ])# returns ids and revs of the current docsputs @db.documents.inspect
Thursday, 3 December 2009
CouchDB and [email protected]_doc({ "_id" => "_design/first", :views => { :test => { :map => "function(doc){for(var w in doc){ if(!w.match(/^_/))emit(w,doc[w])}}" } } })puts @db.view('first/test')['rows'].inspect
Thursday, 3 December 2009
CouchDB and Ruby• Read more about CouchRest on github
• Also check out newcomer RubyAqua
Thursday, 3 December 2009
Features• Schema-Free (JSON)
• Document Oriented, Not Relational
• Highly Concurrent
• RESTful HTTP API
• JavaScript-Powered Map/Reduce
• N-Master Replication
• Robust Storage
Thursday, 3 December 2009
Features• Schema-Free (JSON)
• Document Oriented, Not Relational
• Highly Concurrent
• RESTful HTTP API
• JavaScript-Powered Map/Reduce
• N-Master Replication
• Robust Storage
Thursday, 3 December 2009
Documents
http://www.flickr.com/photos/stilleben2001/223243329/
Thursday, 3 December 2009
Schema-Free ( JSON){ "_id": "BCCD12CBB", "_rev": "AB764C",
"type": "person", "name": "Darth Vader", "age": 63, "headware": ["Helmet", "Sombrero"], "dark_side": true}
Thursday, 3 December 2009
Schema-Free ( JSON){ "_id": "BCCD12CBB", "_rev": "AB764C",
"type": "person", "name": "Darth Vader", "age": 63, "headware": ["Helmet", "Sombrero"], "dark_side": true}
Thursday, 3 December 2009
Schema-Free ( JSON){ "_id": "BCCD12CBB", "_rev": "AB764C",
"type": "person", "name": "Darth Vader", "age": 63, "headware": ["Helmet", "Sombrero"], "dark_side": true}
Thursday, 3 December 2009
Schema-Free ( JSON){ "_id": "BCCD12CBB", "_rev": "AB764C",
"type": "person", "name": "Darth Vader", "age": 63, "headware": ["Helmet", "Sombrero"], "dark_side": true}
Thursday, 3 December 2009
Features• Schema-Free (JSON)
• Document-Oriented, Not Relational
• Highly Concurrent
• RESTful HTTP API
• JavaScript-Powered Map/Reduce
• N-Master Replication
• Robust Storage
Thursday, 3 December 2009
Document-Oriented
• Documents in the Real World™
• Bills, letters, tax forms…
• Same type != same structure
• Can be out of date (so what?)
• No references
Not Relational
Thursday, 3 December 2009
Document-Oriented• Documents in the Real World™
• Bills, letters, tax forms…
• Same type != same structure
• Can be out of date (so what?)
• No references
Not Relational
Natural Data Behaviour
Thursday, 3 December 2009
Features• Schema-Free (JSON)
• Document-Oriented, Not Relational
• Highly Concurrent
• RESTful HTTP API
• JavaScript-Powered Map/Reduce
• N-Master Replication
• Robust Storage
Thursday, 3 December 2009
Highly Concurrent
Thursday, 3 December 2009
Highly Concurrent
• Functional languages highly appropriate for parallellism
Thursday, 3 December 2009
Highly Concurrent
• Functional languages highly appropriate for parallellism
• Lightweight “processes” and message-passing; “shared-nothing”
Thursday, 3 December 2009
Highly Concurrent
• Functional languages highly appropriate for parallellism
• Lightweight “processes” and message-passing; “shared-nothing”
• Easy to create fault-tolerant systems
Thursday, 3 December 2009
MVCC• Multiversion Concurrency Control
• Reads: lock-free; never block
• Potential for massive horizontal scaling
• Writes: all-or-nothing
• Success
• Fail: conflict error, fetch and try again
Thursday, 3 December 2009
Features• Schema-Free (JSON)
• Document-Oriented, Not Relational
• Highly Concurrent
• RESTful HTTP API
• JavaScript-Powered Map/Reduce
• N-Master Replication
• Robust Storage
Thursday, 3 December 2009
"#$%ful &%%' (')• Create
HTTP PUT /db/mydocid
• ReadHTTP GET /db/mydocid
• UpdateHTTP PUT /db/mydocid
• DeleteHTTP DELETE /db/mydocid
CRUD
Thursday, 3 December 2009
couch = CouchRest.database!("http://127.0.0.1:5984/tweets")
tweets_url = "http://twitter.com/statuses/user_timeline.json"
tweets = http.get(tweets_url)couch.bulk_save(tweets)
"#$%ful &%%' (')Example
Thursday, 3 December 2009
Cacheability• Both documents and views return ETags
• Clients send If-None-Match
• CouchDB responds with 304 Not Modified and bypasses potentially expensive lookup
• Can use Varnish/Squid as caching proxy
• Proxy- friendly
Thursday, 3 December 2009
Features• Schema-Free (JSON)
• Document-Oriented, Not Relational
• Highly Concurrent
• RESTful HTTP API
• JavaScript-Powered Map/Reduce
• N-Master Replication
• Robust Storage
Thursday, 3 December 2009
JavaScript-Powered Map/Reduce
• Map functions extract data from your documents
• Reduce functions aggregate intermediate values
• The kicker: Incremental B-tree storage
Thursday, 3 December 2009
http://horicky.blogspot.com/2008/10/couchdb-implementation.htmlThursday, 3 December 2009
Map/Reduce ViewsDocs
Map{"user" : "Chris",
"points" : 3 }{"user": "Joe","points" : 10 }
{"user": "Alice","points" : 5 }
{"user": "Mary","points" : 9}
{"user": "Bob","points": 7}
function(doc) {if (doc.user && doc.points) {
emit(doc.user, doc.points);}
}
{"key": "Alice", "value": 5}{"key": "Bob", "value": 7}
{"key": "Chris", "value": 3}{"key": "Joe", "value": 10}{"key": "Mary", "value": 9}
ReduceAlice ... Chris: 15
Everyone: 34function(keys, values, rereduce) { return sum(values);}
Thursday, 3 December 2009
Map/Reduce ViewsDocs
Map{"user" : "Chris",
"points" : 3 }{"user": "Joe","points" : 10 }
{"user": "Alice","points" : 5 }
{"user": "Mary","points" : 9}
{"user": "Bob","points": 7}
function(doc) {if (doc.user && doc.points) {
emit(doc.user, doc.points);}
}
{"key": "Alice", "value": 5}{"key": "Bob", "value": 7}
{"key": "Chris", "value": 3}{"key": "Joe", "value": 10}{"key": "Mary", "value": 9}
ReduceAlice … Chris: 15
Everyone: 34function(keys, values, rereduce) { return sum(values);}
Thursday, 3 December 2009
Map/Reduce ViewsDocs
Map{"user" : "Chris",
"points" : 3 }{"user": "Joe","points" : 10 }
{"user": "Alice","points" : 5 }
{"user": "Mary","points" : 9}
{"user": "Bob","points": 7}
function(doc) {if (doc.user && doc.points) {
emit(doc.user, doc.points);}
}
{"key": "Alice", "value": 5}{"key": "Bob", "value": 7}
{"key": "Chris", "value": 3}{"key": "Joe", "value": 10}{"key": "Mary", "value": 9}
ReduceAlice … Chris: 15
Everyone: 34function(keys, values, rereduce) { return sum(values);}
Thursday, 3 December 2009
Render Views as HTMLlists/index.js /drl/_list/sofa/index/recent-posts?descending=true&limit=8
Thursday, 3 December 2009
Server-Side JavaScript• _show for transforming documents
• _list for transforming views
• _update for transforming PUTs/POSTs
• Code-sharing between client and server
• Easy deployment
Thursday, 3 December 2009
Features• Schema-Free (JSON)
• Document-Oriented, Not Relational
• Highly Concurrent
• RESTful HTTP API
• JavaScript-Powered Map/Reduce
• N-Master Replication
• Robust Storage
Thursday, 3 December 2009
Replication• Incremental
• Near-real-time
• Clustered mirrors
• Scheduled
• Ad-hoc
Thursday, 3 December 2009
http://www.flickr.com/photos/mcpig/872293700/
“Ground Computing”@jhuggins
- local to the user, more like desktop web than like Gears - local http server - browser apps - same application on the client and server or the cloud
Thursday, 3 December 2009
http://www.flickr.com/photos/hercwad/2290378571/
Thursday, 3 December 2009
Latency Sucks
speed of lightdrawback to cloud computing
Thursday, 3 December 2009
! !
Stuart Langridge - Canonical
Thursday, 3 December 2009
Thursday, 3 December 2009
Thursday, 3 December 2009
Thursday, 3 December 2009
Thursday, 3 December 2009
Thursday, 3 December 2009
Thursday, 3 December 2009
Thursday, 3 December 2009
Thursday, 3 December 2009
Thursday, 3 December 2009
Thursday, 3 December 2009
Con*icts
Thursday, 3 December 2009
Con*ict resolution by example
A B
Thursday, 3 December 2009
❦
Con*ict resolution by example
A B
Thursday, 3 December 2009
❦❦
Con*ict resolution by example
A B
Thursday, 3 December 2009
❦ ❦
Con*ict resolution by example
A B
Thursday, 3 December 2009
❦
Con*ict resolution by example
A B
❦
Thursday, 3 December 2009
❦
Con*ict resolution by example
A B
❦ ✿
Thursday, 3 December 2009
❦
Con*ict resolution by example
A B
❦ ✿♪
Thursday, 3 December 2009
✿♪
Con*ict resolution by example
A B
♪
Thursday, 3 December 2009
✿♪
Con*ict resolution by example
A B
♪
Thursday, 3 December 2009
✿♪
Con*ict resolution by example
A B
♪
Thursday, 3 December 2009
✿♪
Con*ict resolution by example
A B
♪
Thursday, 3 December 2009
Features• Schema-Free (JSON)
• Document-Oriented, Not Relational
• Highly Concurrent
• RESTful HTTP API
• JavaScript-Powered Map/Reduce
• N-Master Replication
• Robust Storage
Thursday, 3 December 2009
Robust Storage
Append-Only File Structure
Designed to Crash
Instant-On
Thursday, 3 December 2009
Robust
- when britain is burning - Enda Farrell - bbc
Thursday, 3 December 2009
Thursday, 3 December 2009
!anks!
www.jasondavies.com
@jasondavies
Thursday, 3 December 2009