Upload
couchbase
View
980
Download
0
Embed Size (px)
DESCRIPTION
We’re all familiar with modeling data the relational way. When we move to a document database we need to think about things a little differently. In this talk we’ll look how best to plan, model and maintain your data using a document database. By diving into real world case studies of Couchbase users, we’ll look at the three main things you need to know about modeling your data in a document database: document design, key design and querying.
Citation preview
Three things you need to know
about data modellingMatthew Revell | Developer Advocate, Couchbase
This is a talk about answering
questions using Couchbase
This is data: Sumerian clay tablet
©2014 Couchbase, Inc. 4
This is data: Sumerian clay tablet
©2014 Couchbase, Inc. 5
This is data: Jacquard loom
©2014 Couchbase, Inc. 6
The first commercial database (1960)
©2014 Couchbase, Inc. 7
EF Codd proposes the relational model (1970)
©2014 Couchbase, Inc. 8
The three things
Questions, not answers
Soft schema, not no schema
The key is the key
©2014 Couchbase, Inc. 9
First, some Couchbase basics
Key-value store with:
special support for JSON documents
counter and string data types
store binaries up to 20MB
Built-in and transparent memcached-compatible caching layer
Distributed around a cluster of servers
Generate secondary indexes using map/reduce queries
The basics of Couchbase Server
©2014 Couchbase, Inc. 11
Questions, not answers
Relational databases are optimised for questions
©2014 Couchbase, Inc. 13
Simple ecommerce example
©2014 Couchbase, Inc. 14
Document databases are optimised for answers
©2014 Couchbase, Inc. 15
That order in a heavily denormalised document database
©2014 Couchbase, Inc. 16
Answer oriented databases
©2014 Couchbase, Inc. 17
http://martinfowler.com/bliki/AggregateOrientedDatabase.html
Storing together the data that we access together is efficient
SQL queries are slow because disk seeks are slow
Data aggregates are easy to distribute
Why optimise for a certain set of questions?
©2014 Couchbase, Inc. 18
What questions do we ask?
Choosing the questions
©2014 Couchbase, Inc. 20
Embed, or refer?
How much should we denormalise?
You could think that denormalisation is a credo of NoSQL.
In the real world, we denormalise all the time in Couchbase.
We have to decide when to embed data (i.e. denormalise) and when to refer to data.
Denormalisation
©2014 Couchbase, Inc. 22
You should embed data when:
You need speed of access (less of a concern with Couchbase)
Reads outnumber writes
You are comfortable with the slim risk of two denormalisedoccurrences of the same data losing sync
When to embed
©2014 Couchbase, Inc. 23
You should refer to data when:
Query flexibility is important
Consistency is a priority
The data has large growth potential
When to refer
©2014 Couchbase, Inc. 24
Embedded v referred
©2014 Couchbase, Inc. 25
Why is that okay?
I thought we were here to denormalise!
Couchbase is the database you can use without fear!
Denormalisation is useful
But referring to other documents is very cheap in Couchbase
Document GETs take microseconds from the cache
It’s okay to refer
©2014 Couchbase, Inc. 28
Soft schema, not no schema
Usually, there’s still a schema when we use Couchbase.
The difference is:
Couchbase doesn’t enforce the schema
If schema matters, you can enforce it at the application side
Schema can vary completely from document to document
Migrations are cheap and asynchronous
Impedence mismatch is yesterday’s problem
It’s still okay to store unstructured data
Schema unenforced
©2014 Couchbase, Inc. 31
public class user {
private String name;private String email;private String streetAddress;private String city;private String country;private String postCode;private String telephone;
}
public class user {
private String name;private String email;private String streetAddress;private String city;private String country;private String postCode;private String telephone;private Array orders;
}
Modelling a user
©2014 Couchbase, Inc. 32
Relational
Simple types are easy: there’s a
column for that
Complex data types are harder:
joins
or serialise to text and break
normalisation
public class user {
private String name;private String email;private String streetAddress;private String city;private String country;private String postCode;private String telephone;private Array orders;
}
Modelling a user with Couchbase
©2014 Couchbase, Inc. 33
Couchbase
Simple types are easy: it’s just
JSON
Complex data types are easy:
it’s just JSON
{"name": "Matthew Revell","email": "[email protected]","address": "11-21 Paul Street","city": "London","postCode": "EC2A 4JU","telephone": "44-20-3837-9130","orders": [ 1, 9, 698, 32 ]
}
public class user {
private String name;private String email;private String streetAddress;private String city;private String country;private String postCode;private String telephone;private Array orders;
}
Schema changes
©2014 Couchbase, Inc. 34
Update in place
No need to lock a dataset to
ALTER
Change each document as
needed
{"name": "Matthew Revell","email": "[email protected]","address": "11-21 Paul Street","city": "London","postCode": "EC2A 4JU","telephone": "44-20-3837-9130","orders": [ 1, 9, 698, 32 ],“schemaVersion”: 1
}
public class user {
private String name;private String email;private String streetAddress;private String city;private String country;private String postCode;private String telephone;private Array orders;private Array productsViewed;
}
Schema changes
©2014 Couchbase, Inc. 35
Update in place
No need to lock a dataset to ALTER
Change each document as needed
{"name": "Matthew Revell","email": "[email protected]","address": "11-21 Paul Street","city": "London","postCode": "EC2A 4JU","telephone": "44-20-3837-9130","orders": [ 1, 9, 698, 32 ],“productsViewed”: [8, 33, 99, 100],“schemaVersion”: 2
}
The key is the key
Key design is as important as document design.
There are three broad types of key:
Human readable/deterministic: e.g. an email address
Computer generated/random: e.g. UUID
Compound: e.g. UUID with a deterministic portion
Three ways to build a key
©2014 Couchbase, Inc. 37
Human readable/deterministic
©2014 Couchbase, Inc. 38
public class user {
private String name;private String email;private String streetAddress;private String city;private String country;private String postCode;private String telephone;private Array orders;private Array productsViewed;
}
{"name": "Matthew Revell","address": "11-21 Paul Street","city": "London","postCode": "EC2A 4JU","telephone": "44-20-3837-9130","orders": [ 1, 9, 698, 32 ],“productsViewed”: [8, 33, 99, 100]
}
Key: [email protected]
Random/computer genereated
©2014 Couchbase, Inc. 39
{"name": "Matthew Revell","email": "[email protected]","address": "11-21 Paul Street","city": "London","postCode": "EC2A 4JU","telephone": "44-20-3837-9130","orders": [ 1, 9, 698, 32 ],“productsViewed”: [8, 33, 99, 100]
}
Key: 1001
Using counters to generate keys
©2014 Couchbase, Inc. 40
Application
user_id = incr(“counter_key")
add(user_id, data)
Creating a new user
add(email_address, user_id)
Application
key = get("[email protected]")
get(key)
Finding user by email address
Multiple look-up documents
©2014 Couchbase, Inc. 41
u::count
1001
u::1001
{ "name": “Matthew Revell",
"facebook_id": 16172910,
"email": “[email protected]”,
“password”: ab02d#Jf02K
"created_at": "5/1/2012 2:30am",
“facebook_access_token”: xox0v2dje20,
“twitter_access_token”: 20jffieieaaixixj }
fb::16172910
1001
nflx::2939202
1001
twtr::2920283830
1001
1001
1001
uname::mrevell
1001
Compound keys
Compound keys are look-up documents with a predictable name.
It’s a continuation of the embedded versus referred data discussion.
Compound keys: example
u::1001
{
"name": "Matthew Revell",
"email": "[email protected]",
"address": "11-21 Paul Street",
"city": "London",
"postCode": "EC2A 4JU",
"telephone": "44-20-3837-9130",
"orders": [ 1, 9, 698, 32 ],
“productsViewed”: [8, 33, 99, 100]
}
Compound keys: example
u::1001
{
"name": "Matthew Revell",
"email": "[email protected]",
"address": "11-21 Paul Street",
"city": "London",
"postCode": "EC2A 4JU",
"telephone": "44-20-3837-9130",
"orders": [ 1, 9, 698, 32 ]
}
u::1001::productsviewed
{"productsList": [
8, 33, 99, 100]
}
Compound keys: example
u::1001
{
"name": "Matthew Revell",
"email": "[email protected]",
"address": "11-21 Paul Street",
"city": "London",
"postCode": "EC2A 4JU",
"telephone": "44-20-3837-9130",
"orders": [ 1, 9, 698, 32 ]
}
u::1001::productsviewed
{"productsList": [
8, 33, 99, 100]
}
p::8
{
id": 1,"name": "T-shirt","description": "Red Couchbase shirt","quantityInStock": 99,"image": "tshirt.jpg”
}
Compound keys: example
u::1001
{
"name": "Matthew Revell",
"email": "[email protected]",
"address": "11-21 Paul Street",
"city": "London",
"postCode": "EC2A 4JU",
"telephone": "44-20-3837-9130",
"orders": [ 1, 9, 698, 32 ]
}
u::1001::productsviewed
{"productsList": [
8, 33, 99, 100]
}
p::8
{
id": 1,"name": "T-shirt","description": "Red Couchbase shirt","quantityInStock": 99
}
p::8::img
“http://someurl.com/tshirt.jpg”
Couchbase views and N1QL are amazing.
You should use them where:
You discover new query patterns.
You have short-lived query types.
Ad-hoc querying.
However: manual indexes should be your first port of call.
What about automatic indexes?
©2014 Couchbase, Inc. 47
Questions!