Chris Lea - What does NoSQL Mean for You

Preview:

DESCRIPTION

From FOWA Dublin 2010 Video: http://www.ustream.tv/myvideos/1/6906682

Citation preview

What Does NoSQL Mean for You?

Chris Lea(mt) Media Temple

FOWA Dublin 2010

For Starters: What does it mean at all?

For Starters: What does it mean at all?

“NoSQL is a blanket term used to describestructured storage that doesn’t rely on SQL

to be accessed in a useful way”.

-- Chris Lea

For Starters: What does it mean at all?

“NoSQL” DOES NOT mean “SQL is Bad”

MySQL does what I need, why should I care?

MySQL does what I need, why should I care?

“If I’d asked my customers what they wanted, they’d have said a faster horse.” -- Henry Ford

MySQL does what I need, why should I care?

RDBMS NoSQL

Designed for generic workloads

Designed to solve specific problems

Large (and growing) feature sets

Trades features for performance

(the NoSQL umbrella)

(the NoSQL umbrella)

Key / Value Caches

•Redis•Memcached

(the NoSQL umbrella)

Key / Value Caches

•Redis•Memcached

Key / Value Stores

•Tokyo cabinet•Memcachedb•Project Voldemort•Cassandra

(the NoSQL umbrella)

Key / Value Caches

•Redis•Memcached

Key / Value Stores

•Tokyo cabinet•Memcachedb•Project Voldemort•Cassandra

Tabular

•HBase•Hypertable

(the NoSQL umbrella)

Key / Value Caches

•Redis•Memcached

Key / Value Stores

•Tokyo cabinet•Memcachedb•Project Voldemort•Cassandra

Tabular

•HBase•Hypertable

Document

•CouchDB•MongoDB• Jackrabbit

(the NoSQL umbrella)

Key / Value Caches

•Redis•Memcached

Key / Value Stores

•Tokyo cabinet•Memcachedb•Project Voldemort•Cassandra

Tabular

•HBase•Hypertable

Document

•CouchDB•MongoDB• Jackrabbit

Should I be Thinking aboutNoSQL?

Should I be Thinking aboutNoSQL?

Do you needtransactions?

Think aboutNoSQL.

Probably needRDBMS.

No

Yes Can you sanely dowhat you need in the app? No

Yes

NoSQL Systems TypicallyDon’t do Transactions

or Joins

NoSQL Systems TypicallyDon’t do Transactions

or Joins

• If you really need transactions, stick with RDBMS•Not having joins turns out to be not such a big deal

NoSQL Systems TypicallyDon’t do Transactions

or Joins

MongoDB is an excellent use case example

Why MongoDB?• Comfortable if you are coming from MySQL

• Written in C++ means all machine code

• no Erlang / Java / virtual machines

• Tools like mongo (shell), mongodump, mongostat,

mongoimport

• Native drives in languages you care about

• no Thrift / REST / code generation steps

Why MongoDB?

• No complex transactions

• If you don’t use them, this is a non-issue

• No joins

• This turns out to not be a big deal generally, because

we’re going to rethink our data modeling

Why MongoDB?

• No complex transactions

• If you don’t use them, this is a non-issue

• No joins

• This turns out to not be a big deal generally, because

we’re going to rethink our data modeling

Transactions and joins are a huge computationaloverhead, even if you don’t use them!

Why MongoDB?

• No complex transactions

• If you don’t use them, this is a non-issue

• No joins

• This turns out to not be a big deal generally, because

we’re going to rethink our data modeling

Transactions and joins are a huge computationaloverhead, even if you don’t use them!

Thinking About Your Data (RDBMS)

•Look at data, determine logical groupings• (hope structure never changes)

•Make tables based on groups, link with ID fields•Break up data on insert, put into appropriate tables•Use joins on select to re-assemble data•Create indexes as needed for fast queries

Thinking About Your Data (RDBMS)

user_t

user_id

user_name

post_t

post_id

user_id

post_title

post_body

comment_t

comment_id

post_id

comment_body

Thinking About Your Data (RDBMS)

This leads to queries such as:

SELECT post_title,post_body,post_id FROM post_t,user_t WHERE user_t.user_name = “Lorraine” AND post_t.user_id = user_t.user_id LIMIT 1;

SELECT comment_body FROM comment_t WHERE comment_t.post_id = $post_id;

Thinking About Your Data (MongoDB)

•Figure out how you will eventually use the data•Store it that way•Create indexes as needed for fast queries

Thinking About Your Data (MongoDB)

from pymongo import Connectionconnection = Connection()db = connection['blog']

posts = db['posts']

post = {"author": "Lorraine", "title": "Who on Earth lets Chris Lea Talk on Stage?", "post": "Seriously. That's just not cool.", "comments": ["Is he really that bad?", "Yes, he really is."], "date": datetime.datetime.utcnow()}

posts.insert(post)

Thinking About Your Data (MongoDB)

from pymongo import Connectionconnection = Connection()db = connection['blog']

posts = db['posts']

post = posts.find_one({“author”: “Lorraine”})

Say Goodbye to Schemas

from pymongo import Connectionconnection = Connection()db = connection['blog']

posts = db['posts']

post = {"author": "Lorraine", "title": "Who on Earth lets Chris Lea Talk on Stage?", "post": "Seriously. That's just not cool.", "comments": ["Is he really that bad?", "Yes, he really is."], "date": datetime.datetime.utcnow()}

posts.insert(post)

Say Goodbye to Schemas

from pymongo import Connectionconnection = Connection()db = connection['blog']

posts = db['posts']

post = {"author": "Lorraine", "title": "Who on Earth lets Chris Lea Talk on Stage?", "post": "Seriously. That's just not cool.", "comments": ["Is he really that bad?", "Yes, he really is."], "tags": ["fowa", "nosql", "nerds"], "date": datetime.datetime.utcnow()}

posts.insert(post)

Say Goodbye to Schemas

from pymongo import Connectionconnection = Connection()db = connection['blog']

posts = db['posts']

post = {"author": "Lorraine", "title": "Who on Earth lets Chris Lea Talk on Stage?", "post": "Seriously. That's just not cool.", "comments": ["Is he really that bad?", "Yes, he really is."], "tags": ["fowa", "nosql", "nerds"], "date": datetime.datetime.utcnow()}

posts.insert(post)

If you want new fields... just startusing them!

Enjoy a Wealth of Query Options

from pymongo import Connectionconnection = Connection()db = connection['blog']

posts = db['posts']

posts.find_one({“author”: “Lorraine”})

Enjoy a Wealth of Query Options

from pymongo import Connectionconnection = Connection()db = connection['blog']

posts = db['posts']

posts.find({“author”: “Lorraine”}).limit(5)

Enjoy a Wealth of Query Options

from pymongo import Connectionconnection = Connection()db = connection['blog']

posts = db['posts']

posts.find({“author”: /^Lor/})

Enjoy a Wealth of Query Options

from pymongo import Connectionconnection = Connection()db = connection['blog']

posts = db['posts']

posts.find({“author”: {$not: “Lorraine”} })

Enjoy a Massive Performance Jump

•Mileage will vary, but 10x is not uncommon•For reads and writes

•Writes happen at near disk native speed•Logging to MongoDB is perfectly acceptable

•Reads for active data near Memcached speeds

Enjoy a Massive Performance Jump

Ability to write bad queries isenormously reduced!

Ability to write bad queries isenormously reduced!

• No joins means need for complex indexes reduced• Chances of index / query mismatches vastly lower• Disk I/O much less complex, and therefore much faster

Caveats for MongoDB

•Really should use 64bit machines for production•32bit has 2G limit per collection (table)

•Happiest with lots of RAM relative to active data•Under heavy development

•Features / drivers / docs changing rapidly

Questions?

Recommended