46
Three things you need to know about data modelling Matthew Revell | Developer Advocate, Couchbase

Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Embed Size (px)

DESCRIPTION

We’re all familiar with modeling data the relational way. When we move to a document database we need to think about things a little differently. In this talk we’ll look how best to plan, model and maintain your data using a document database. By diving into real world case studies of Couchbase users, we’ll look at the three main things you need to know about modeling your data in a document database: document design, key design and querying.

Citation preview

Page 1: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Three things you need to know

about data modellingMatthew Revell | Developer Advocate, Couchbase

Page 2: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

This is a talk about answering

questions using Couchbase

Page 3: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

This is data: Sumerian clay tablet

©2014 Couchbase, Inc. 4

Page 4: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

This is data: Sumerian clay tablet

©2014 Couchbase, Inc. 5

Page 5: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

This is data: Jacquard loom

©2014 Couchbase, Inc. 6

Page 6: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

The first commercial database (1960)

©2014 Couchbase, Inc. 7

Page 7: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

EF Codd proposes the relational model (1970)

©2014 Couchbase, Inc. 8

Page 8: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

The three things

Questions, not answers

Soft schema, not no schema

The key is the key

©2014 Couchbase, Inc. 9

Page 9: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

First, some Couchbase basics

Page 10: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Key-value store with:

special support for JSON documents

counter and string data types

store binaries up to 20MB

Built-in and transparent memcached-compatible caching layer

Distributed around a cluster of servers

Generate secondary indexes using map/reduce queries

The basics of Couchbase Server

©2014 Couchbase, Inc. 11

Page 11: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Questions, not answers

Page 12: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Relational databases are optimised for questions

©2014 Couchbase, Inc. 13

Page 13: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Simple ecommerce example

©2014 Couchbase, Inc. 14

Page 14: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Document databases are optimised for answers

©2014 Couchbase, Inc. 15

Page 15: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

That order in a heavily denormalised document database

©2014 Couchbase, Inc. 16

Page 16: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Answer oriented databases

©2014 Couchbase, Inc. 17

http://martinfowler.com/bliki/AggregateOrientedDatabase.html

Page 17: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Storing together the data that we access together is efficient

SQL queries are slow because disk seeks are slow

Data aggregates are easy to distribute

Why optimise for a certain set of questions?

©2014 Couchbase, Inc. 18

Page 18: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

What questions do we ask?

Page 19: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Choosing the questions

©2014 Couchbase, Inc. 20

Page 20: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Embed, or refer?

How much should we denormalise?

Page 21: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

You could think that denormalisation is a credo of NoSQL.

In the real world, we denormalise all the time in Couchbase.

We have to decide when to embed data (i.e. denormalise) and when to refer to data.

Denormalisation

©2014 Couchbase, Inc. 22

Page 22: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

You should embed data when:

You need speed of access (less of a concern with Couchbase)

Reads outnumber writes

You are comfortable with the slim risk of two denormalisedoccurrences of the same data losing sync

When to embed

©2014 Couchbase, Inc. 23

Page 23: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

You should refer to data when:

Query flexibility is important

Consistency is a priority

The data has large growth potential

When to refer

©2014 Couchbase, Inc. 24

Page 24: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Embedded v referred

©2014 Couchbase, Inc. 25

Page 25: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Why is that okay?

I thought we were here to denormalise!

Page 26: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Couchbase is the database you can use without fear!

Denormalisation is useful

But referring to other documents is very cheap in Couchbase

Document GETs take microseconds from the cache

It’s okay to refer

©2014 Couchbase, Inc. 28

Page 27: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Soft schema, not no schema

Page 28: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Usually, there’s still a schema when we use Couchbase.

The difference is:

Couchbase doesn’t enforce the schema

If schema matters, you can enforce it at the application side

Schema can vary completely from document to document

Migrations are cheap and asynchronous

Impedence mismatch is yesterday’s problem

It’s still okay to store unstructured data

Schema unenforced

©2014 Couchbase, Inc. 31

Page 29: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

public class user {

private String name;private String email;private String streetAddress;private String city;private String country;private String postCode;private String telephone;

}

public class user {

private String name;private String email;private String streetAddress;private String city;private String country;private String postCode;private String telephone;private Array orders;

}

Modelling a user

©2014 Couchbase, Inc. 32

Relational

Simple types are easy: there’s a

column for that

Complex data types are harder:

joins

or serialise to text and break

normalisation

Page 30: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

public class user {

private String name;private String email;private String streetAddress;private String city;private String country;private String postCode;private String telephone;private Array orders;

}

Modelling a user with Couchbase

©2014 Couchbase, Inc. 33

Couchbase

Simple types are easy: it’s just

JSON

Complex data types are easy:

it’s just JSON

{"name": "Matthew Revell","email": "[email protected]","address": "11-21 Paul Street","city": "London","postCode": "EC2A 4JU","telephone": "44-20-3837-9130","orders": [ 1, 9, 698, 32 ]

}

Page 31: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

public class user {

private String name;private String email;private String streetAddress;private String city;private String country;private String postCode;private String telephone;private Array orders;

}

Schema changes

©2014 Couchbase, Inc. 34

Update in place

No need to lock a dataset to

ALTER

Change each document as

needed

{"name": "Matthew Revell","email": "[email protected]","address": "11-21 Paul Street","city": "London","postCode": "EC2A 4JU","telephone": "44-20-3837-9130","orders": [ 1, 9, 698, 32 ],“schemaVersion”: 1

}

Page 32: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

public class user {

private String name;private String email;private String streetAddress;private String city;private String country;private String postCode;private String telephone;private Array orders;private Array productsViewed;

}

Schema changes

©2014 Couchbase, Inc. 35

Update in place

No need to lock a dataset to ALTER

Change each document as needed

{"name": "Matthew Revell","email": "[email protected]","address": "11-21 Paul Street","city": "London","postCode": "EC2A 4JU","telephone": "44-20-3837-9130","orders": [ 1, 9, 698, 32 ],“productsViewed”: [8, 33, 99, 100],“schemaVersion”: 2

}

Page 33: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

The key is the key

Page 34: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Key design is as important as document design.

There are three broad types of key:

Human readable/deterministic: e.g. an email address

Computer generated/random: e.g. UUID

Compound: e.g. UUID with a deterministic portion

Three ways to build a key

©2014 Couchbase, Inc. 37

Page 35: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Human readable/deterministic

©2014 Couchbase, Inc. 38

public class user {

private String name;private String email;private String streetAddress;private String city;private String country;private String postCode;private String telephone;private Array orders;private Array productsViewed;

}

{"name": "Matthew Revell","address": "11-21 Paul Street","city": "London","postCode": "EC2A 4JU","telephone": "44-20-3837-9130","orders": [ 1, 9, 698, 32 ],“productsViewed”: [8, 33, 99, 100]

}

Key: [email protected]

Page 36: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Random/computer genereated

©2014 Couchbase, Inc. 39

{"name": "Matthew Revell","email": "[email protected]","address": "11-21 Paul Street","city": "London","postCode": "EC2A 4JU","telephone": "44-20-3837-9130","orders": [ 1, 9, 698, 32 ],“productsViewed”: [8, 33, 99, 100]

}

Key: 1001

Page 37: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Using counters to generate keys

©2014 Couchbase, Inc. 40

Application

user_id = incr(“counter_key")

add(user_id, data)

Creating a new user

add(email_address, user_id)

Application

key = get("[email protected]")

get(key)

Finding user by email address

Page 38: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Multiple look-up documents

©2014 Couchbase, Inc. 41

u::count

1001

u::1001

{ "name": “Matthew Revell",

"facebook_id": 16172910,

"email": “[email protected]”,

“password”: ab02d#Jf02K

"created_at": "5/1/2012 2:30am",

“facebook_access_token”: xox0v2dje20,

“twitter_access_token”: 20jffieieaaixixj }

fb::16172910

1001

nflx::2939202

1001

twtr::2920283830

1001

em::[email protected]

1001

em::[email protected]

1001

uname::mrevell

1001

Page 39: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Compound keys

Compound keys are look-up documents with a predictable name.

It’s a continuation of the embedded versus referred data discussion.

Page 40: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Compound keys: example

u::1001

{

"name": "Matthew Revell",

"email": "[email protected]",

"address": "11-21 Paul Street",

"city": "London",

"postCode": "EC2A 4JU",

"telephone": "44-20-3837-9130",

"orders": [ 1, 9, 698, 32 ],

“productsViewed”: [8, 33, 99, 100]

}

Page 41: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Compound keys: example

u::1001

{

"name": "Matthew Revell",

"email": "[email protected]",

"address": "11-21 Paul Street",

"city": "London",

"postCode": "EC2A 4JU",

"telephone": "44-20-3837-9130",

"orders": [ 1, 9, 698, 32 ]

}

u::1001::productsviewed

{"productsList": [

8, 33, 99, 100]

}

Page 42: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Compound keys: example

u::1001

{

"name": "Matthew Revell",

"email": "[email protected]",

"address": "11-21 Paul Street",

"city": "London",

"postCode": "EC2A 4JU",

"telephone": "44-20-3837-9130",

"orders": [ 1, 9, 698, 32 ]

}

u::1001::productsviewed

{"productsList": [

8, 33, 99, 100]

}

p::8

{

id": 1,"name": "T-shirt","description": "Red Couchbase shirt","quantityInStock": 99,"image": "tshirt.jpg”

}

Page 43: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Compound keys: example

u::1001

{

"name": "Matthew Revell",

"email": "[email protected]",

"address": "11-21 Paul Street",

"city": "London",

"postCode": "EC2A 4JU",

"telephone": "44-20-3837-9130",

"orders": [ 1, 9, 698, 32 ]

}

u::1001::productsviewed

{"productsList": [

8, 33, 99, 100]

}

p::8

{

id": 1,"name": "T-shirt","description": "Red Couchbase shirt","quantityInStock": 99

}

p::8::img

“http://someurl.com/tshirt.jpg”

Page 44: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Couchbase views and N1QL are amazing.

You should use them where:

You discover new query patterns.

You have short-lived query types.

Ad-hoc querying.

However: manual indexes should be your first port of call.

What about automatic indexes?

©2014 Couchbase, Inc. 47

Page 45: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Questions!

Page 46: Three Things to Know about Modeling Data for Document Databases: Couchbase Connect 2014

Thanks!

Matthew Revell

[email protected]

@matthewrevell