42
NoSQL for SQL Professionals Don Pinto Product Manager

No sql for sql professionals

Embed Size (px)

Citation preview

Page 1: No sql for sql professionals

NoSQL for SQL Professionals

Don Pinto

Product Manager

Page 2: No sql for sql professionals

NoSQL+ +

More Data More Users Interactive Apps

Macro Trends Driving NoSQL Technology

Page 3: No sql for sql professionals

Lacking Solutions, Users Forced to Invent

DynamoOctober 2007

CassandraAugust 2008

VoldemortFebruary 2009November 2006

Bigtable

Very few organizations can build and maintain database software technology.But every organization building interactive web applications needs this technology.

Page 4: No sql for sql professionals

What Is Biggest Data Management Problem Driving Use of NoSQL in Coming Year?

Lack of flexibility/rigid schemas

Inability to scale out data

Performance challenges

Cost All of these Other

49%

35%

29%

16% 12% 11%

Source: Couchbase Survey, December 2011, n = 1351.

Page 5: No sql for sql professionals

Relational vs. NoSQL

Page 6: No sql for sql professionals

Key Differences

Page 7: No sql for sql professionals

RDBMS Scales UpGet a bigger, more complex server

Users

Application Scales OutJust add more commodity web servers

Users

System CostApplication Performance

Relational Technology Scales Up

Relational Database

Web/App Server Tier

Expensive and disruptive sharding, doesn’t perform at web scale

System CostApplication Performance

Won’t scale beyond this point

Page 8: No sql for sql professionals

NoSQL Database Scales Out Like App Tier

NoSQL Database Scales OutCost and performance mirrors app tier

Users

Scaling out flattens the cost and performance curves

Couchbase Distributed Data Store

Application Scales OutJust add more commodity web servers

Users

System CostApplication Performance

Application Performance System Cost

Web/App Server Tier

Page 9: No sql for sql professionals
Page 10: No sql for sql professionals

Relational vs Document Data Model

Relational data model Document data modelCollection of complex documents with

arbitrary, nested data formats andvarying “record” format.

Highly-structured table organization with rigidly-defined data formats and

record structure.

C1 C2 C3 C4

JSONJSON

JSON

{

}

Page 11: No sql for sql professionals

RDBMS Example: User Profile

Address Info

1 DEN 30303CO

2 MV 94040CA

3 CHI 60609IL

User Info

KEY First ZIP_idLast

4 NY 10010NY

1 Dipti 2Borkar

2 Joe 2Smith

3 Ali 2Dodson

4 John 3Doe

ZIP_id CITY ZIPSTATE

1 2

2 MV 94040CA

To get information about specific user, you perform a join across two tables

Page 12: No sql for sql professionals

Document Example: User Profile

All data in a single document

{ “ID”: 1, “FIRST”: “Dipti”, “LAST”: “Borkar”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA” }

JSON

= +

Page 13: No sql for sql professionals

Making a Change Using RDBMSUser ID First Last Zip

1 Dipti Borkar 94040

2 Joe Smith 94040

3 Ali Dodson 94040

4 Sarah Gorin NW1

5 Bob Young 30303

6 Nancy Baker 10010

7 Ray Jones 31311

8 Lee Chen V5V3M

• • •

50000 Doug Moore 04252

50001 Mary White SW195

50002 Lisa Clark 12425

Country ID

TEL3

001

Country ID Country name

001 USA

002 UK

003 Argentina

004 Australia

005 Aruba

006 Austria

007 Brazil

008 Canada

009 Chile

• • •

130 Portugal

131 Romania

132 Russia

133 Spain

134 Sweden

User ID Photo ID Comment

2 d043 NYC

2 b054 Bday

5 c036 Miami

7 d072 Sunset

5002 e086 Spain

Photo Table

001

007

001

133

133

User ID Status ID Text

1 a42 At conf

4 b26 excited

5 c32 hockey

12 d83 Go A’s

5000 e34 sailing

Status Table

134

007

008

001

005

Country Table

User ID Affl ID Affl Name

2 a42 Cal

4 b96 USC

7 c14 UW

8 e22 Oxford

Affiliations TableCountry

ID

001

001

001

002

Country ID

Country ID

001

001

002

001

001

001

008

001

002

001

User Table

...

Page 14: No sql for sql professionals

Making the Same Change With a Document DB

{ “ID”: 1, “FIRST”: “Don”, “LAST”: “Pinto”, “ZIP”: “94040”, “CITY”: “MV”, “STATE”: “CA”, “STATUS”: { “TEXT”: “At Conf” }

}

“GEO_LOC”: “134” },“COUNTRY”: ”USA”

Just add information to a document

JSON

,}

Page 15: No sql for sql professionals

User ID First Last Zip

1 Frank Wiegel 94040

2 Joe Smith 94040

3 Ali Dodson 94040

4 Sarah Gorin NW1

5 Bob Young 30303

6 Nancy Baker 10010

7 Ray Jones 31311

8 Lee Chen V5V3

• • •

5000 Doug Moore 04252

5001 Mary White 41694

5002 Lisa Clark 12425

User ID

PhotoID Comment

2 d043 NYC

2 b054 Bday

5 c036 Miami

7 d072 Sunset

5002 e086 Spain

User Table Photo Table

User ID

Status ID Text

1 a42 At conf

4 b26 excited

5 c32 hockey

12 d83 Go A’s

5000 e34 sailing

Status Table

User ID

AffiliationsID

AffiliationsName

2 a42 Cal

4 b96 USC

7 c14 UW

8 e22 Oxford

Affiliations Table

Relational vs Document Performance

1 Frank 94040Weigel

a421 At conf

5 Bob 30303Young

c0365 Miami

4 Sarah NW1Gorin

b264 hockey

JSON

{

}

JSON

{

}

JSON

{

}JSON

{

}

JSON

{

}JSON

{

}

JSON

{

}JSON

{

}

JSON

{

}JSON

{

}8 Lee V5V3Chen

e228 Oxford5002 Lisa 12425Clark

e0865002 Spain

c0325 excited

Faster response times and higher throughput

Page 16: No sql for sql professionals

Document Databases Easily Accommodate Unstructured Data

{ “ID”: 1, “NAME”: “Fairmont San Francisco”, “DESCRIPTION”: “Historic grandeur…”, “AVG_REVIEWER_SCORE”: “4.3”, “AMENITY”: {“TYPE”: “gym”, DESCRIPTION: “fitness center” }, {“TYPE”: “wifi”, “DESCRIPTION”: “free wifi”}, “RATE_TYPE”: “nightly”, “PRICE”: “$199”, “REVIEWS”: [“review_1”, “review_2”], “ATTRACTIONS”: “Chinatown”, }

JSON

{ “ID”: 2, “NAME”: “W San Francisco”, “DESCRIPTION”: “Chic, hip accommodations..”, “AVG_REVIEWER_SCORE”: “4.0”, “AMENITY”: {“TYPE”: “spa”, DESCRIPTION: “Bliss Spa” }, {“TYPE”: “wifi”, “DESCRIPTION”: “free wifi”}, {“TYPE”: “dining”, “DESCRIPTION”: “bar/lounge”}, “RATE_TYPE”: “nightly”, “PRICE”: “$194”, “REVIEWS”: [“review_1”, “review_2”],} JSON

Hotels

Page 17: No sql for sql professionals

Document Databases Easily Accommodate Unstructured Data

{ “ID”: 1, “NAME”: “Fairmont San Francisco”,…} JSON

{ “REVIEW_ID”: 1, “REVIEW”: “Loved Hotel & Location”, “WOULD RECOMMEND”: “yes”, “AVG_REVIEWER_SCORE”: “5”, “REVIEW_DATE”: “May 29, 2013”, “USER_PROFILE_ID”: “271”,

}

JSON

{ “REVIEW_ID”: 2, “REVIEW”: “Nice, but a few kinks”, “WOULD RECOMMEND”: “yes”, “AVG_REVIEWER_SCORE”: “4”, “REVIEW_DATE”: “May 22, 2013”, “USER_PROFILE_ID”: “923”,

}

JSON

Hotels

Reviews

Page 18: No sql for sql professionals

Document Databases Easily Accommodate Unstructured Data

{ “ID”: 1, “NAME”: “Fairmont San Francisco”,…} JSON

Hotel Descriptions

Reviews { “REVIEW_ID”: 1, “REVIEW”: “Loved Hotel…”,…} JSON

{ “REVIEW_ID”: 2, “REVIEW”: “Nice, but …”,…}

JSON

User Profiles { “USER_ID”: 1, “DISPLAY_NAME ”: “Ted’s Trip Experience”, “CITY”: “Saratoga”, “STATE”: “California”,“NUM_OF_REVIEWS”: “8”, }

JSON

{ “USER_ID”: 1, “DISPLAY_NAME ”: “WhatWhat567”, “CITY”: “Kansas City”, “STATE”: “MO”,“NUM_OF_REVIEWS”: “3”, } JSON

Page 19: No sql for sql professionals

Document Databases Easily Accommodate Unstructured Data

{ “ID”: 1, “NAME”: “Fairmont San Francisco”,…} JSON

Hotel Descriptions

Reviews { “REVIEW_ID”: 1, “REVIEW”: “Loved Hotel…”,…} JSON

{ “REVIEW_ID”: 2, “REVIEW”: “Nice, but …”,…}

JSON

User Profiles { “USER_ID”: 1, “DISPLAY”: “Ted’s Trip…”,…}

JSON

{ “USER_ID”: 2, “DISPLAY”: “WhatWhat …”,…}

JSON

Document IDs associates related objects

Hotels points to reviews

Reviews points to users

Page 20: No sql for sql professionals

Indexing with Document DatabasesIndex on AVG_REVIEWER_SCORE

Page 21: No sql for sql professionals

Indexing with Document DatabasesIndex on AVG_REVIEWER_SCORE

…4.0, doc_id4.0, doc_id4.1, doc_id4.3, doc_id5.0, doc_id…

Index

Page 22: No sql for sql professionals

Querying with Document DatabasesQuery on AVG_REVIEWER_SCORE

…3.4, doc_id3.4, doc_id3.5, doc_id3.6, doc_id3.7, doc_id3.8, doc_id4.0, doc_id4.1, doc_id4.3, doc_id4.5, doc_id4.7, doc_id4.9, doc_id5.0, doc_id…5.0, doc_id

Index Matching ResultsQuery

Page 23: No sql for sql professionals

Flavors of NoSQL

Page 24: No sql for sql professionals

NoSQL catalog

Key-Value

memcached redis

Data Structure Document Column Graph

mongoDB

couchbase cassandra

Cach

e(m

emor

y on

ly)

Dat

abas

e(m

emor

y/di

sk)

Neo4j

Page 25: No sql for sql professionals

Couchbase Open Source Project

• Leading NoSQL database project focused on distributed database technology and surrounding ecosystem

• Supports both key-value and document-oriented use cases

• All components are available under the Apache 2.0 Public License

• Obtained as packaged software in both enterprise and community editions.

Couchbase Open Source Project

Page 26: No sql for sql professionals

Easy Scalabili

ty

Consistent High

Performance

Always On

24x365

Grow cluster without application changes, without downtime with a single click

Consistent sub-millisecond read and write response times

with consistent high throughput

No downtime for software upgrades, hardware maintenance, etc.

JSONJSONJSON

JSONJSON

PERFORMANCE

Flexible Data Model

JSON document model with no fixed schema.

Couchbase Server

Page 27: No sql for sql professionals

Couchbase Server Architecture

Hea

rtbe

at

Proc

ess

mon

itor

Glo

bal s

ingl

eton

sup

ervi

sor

Confi

gura

tion

man

ager

on each node

Reba

lanc

e or

ches

trat

or

Nod

e he

alth

mon

itor

one per cluster

vBuc

ket s

tate

and

repl

icati

on m

anag

er

httpRE

ST m

anag

emen

t API

/Web

UI

HTTP8091

Erlang port mapper4369

Distributed Erlang21100 - 21199

Erlang/OTP

storage interface

Couchbase EP Engine

11210Memcapable 2.0

Moxi

11211Memcapable 1.0

Memcached

New Persistence Layer

8092Query API

Que

ry E

ngin

e

Data Manager Cluster Manager

Page 28: No sql for sql professionals

Couchbase Server Architecture

Replication, Rebalance, Shard State Manager

REST management API/Web UI

8091Admin Console

Erla

ng /

OTP

11210 / 11211Data access ports

Object-managedCache

Multi-threaded Persistence Engine

8092Query API

Que

ry E

ngin

e

http

Data Manager Cluster Manager

Page 29: No sql for sql professionals

Where is NoSQL a good fit?

Page 30: No sql for sql professionals

Market AdoptionInternet Companies Enterprises

• Communications

• Retail

• Financial Services

• Health Care

• Automotive/Airline

• Agriculture

• Consumer Electronics

• Business Systems

• Social Gaming

• Ad Networks

• Social Networks

• Online Business Services

• E-Commerce

• Online Media

• Content Management

• Cloud Services

Page 31: No sql for sql professionals

Application Characteristics - Data driven

• 3rd party or user defined structure (Twitter feeds)

• Support for unlimited data growth (Viral apps)

• Data with non-homogenous structure

• Need to quickly and often change data structure

• Variable length documents

• Sparse data records

• Hierarchical data

NoSQL is a good fit

Page 32: No sql for sql professionals

Application Characteristics - Performance driven

• Low latency critical (ex. 1millisecond)

• High throughput (ex. 200000 ops / sec)

• Large number of users

• Unknown demand with sudden growth of users/data

• Predominantly direct document access

• Read / Mixed / Write heavy workloads

NoSQL is a good fit

Page 33: No sql for sql professionals

Q & A

Page 34: No sql for sql professionals

Thank you!

[email protected]@nosqldon

www.linkedin.com/in/donpinto/

Page 35: No sql for sql professionals

Extra - Couchbase Operations

Page 36: No sql for sql professionals

33 2

Single node - Couchbase Write Operation

Managed Cache

Dis

k Q

ueue

Disk

Replication Queue

App Server

Couchbase Server Node

Doc 1Doc 1

Doc 1

To other node

Page 37: No sql for sql professionals

33 2

Single node - Couchbase Update Operation

Managed Cache

Dis

k Q

ueue

Replication Queue

App Server

Doc 1’

Doc 1

Doc 1’Doc 1

Doc 1’

Disk

To other node

Couchbase Server Node

Page 38: No sql for sql professionals

GET

Doc

1

33 2

Single node - Couchbase Read Operation

Dis

k Q

ueue

Replication Queue

App Server

Doc 1

Doc 1Doc 1

Managed Cache

Disk

To other node

Couchbase Server Node

Page 39: No sql for sql professionals

33 2

Single node – Couchbase Cache Miss2

Dis

k Q

ueue

Replication Queue

App Server

Couchbase Server Node

Doc 1

Doc 3Doc 5 Doc 2Doc 4

Doc 6 Doc 5 Doc 4 Doc 3 Doc 2

Doc 4

GET

Doc

1

Doc 1

Doc 1

Managed Cache

Disk

To other node

Page 40: No sql for sql professionals

COUCHBASE SERVER CLUSTER

Basic Operation

• Docs distributed evenly across servers

• Each server stores both active and replica docsOnly one server active at a time

• Client library provides app with simple interface to database

• Cluster map provides map to which server doc is onApp never needs to know

• App reads, writes, updates docs

• Multiple app servers can access same document at same time

User Configured Replica Count = 1

READ/WRITE/UPDATE

ACTIVE

Doc 5

Doc 2

Doc

Doc

Doc

SERVER 1

ACTIVE

Doc 4

Doc 7

Doc

Doc

Doc

SERVER 2

Doc 8

ACTIVE

Doc 1

Doc 2

Doc

Doc

Doc

REPLICA

Doc 4

Doc 1

Doc 8

Doc

Doc

Doc

REPLICA

Doc 6

Doc 3

Doc 2

Doc

Doc

Doc

REPLICA

Doc 7

Doc 9

Doc 5

Doc

Doc

Doc

SERVER 3

Doc 6

APP SERVER 1

COUCHBASE Client Library

CLUSTER MAP

COUCHBASE Client Library

CLUSTER MAP

APP SERVER 2

Doc 9

Page 41: No sql for sql professionals

Add Nodes to Cluster

• Two servers addedOne-click operation

• Docs automatically rebalanced across clusterEven distribution of docsMinimum doc movement

• Cluster map updated

• App database calls now distributed over larger number of servers

REPLICA

ACTIVE

Doc 5

Doc 2

Doc

Doc

Doc 4

Doc 1

Doc

Doc

SERVER 1

REPLICA

ACTIVE

Doc 4

Doc 7

Doc

Doc

Doc 6

Doc 3

Doc

Doc

SERVER 2

REPLICA

ACTIVE

Doc 1

Doc 2

Doc

Doc

Doc 7

Doc 9

Doc

Doc

SERVER 3 SERVER 4 SERVER 5

REPLICA

ACTIVE

REPLICA

ACTIVE

Doc

Doc 8 Doc

Doc 9 Doc

Doc 2 Doc

Doc 8 Doc

Doc 5 Doc

Doc 6

READ/WRITE/UPDATE READ/WRITE/UPDATE

APP SERVER 1

COUCHBASE Client Library

CLUSTER MAP

COUCHBASE Client Library

CLUSTER MAP

APP SERVER 2

COUCHBASE SERVER CLUSTER

User Configured Replica Count = 1

Page 42: No sql for sql professionals

Fail Over Node

REPLICA

ACTIVE

Doc 5

Doc 2

Doc

Doc

Doc 4

Doc 1

Doc

Doc

SERVER 1

REPLICA

ACTIVE

Doc 4

Doc 7

Doc

Doc

Doc 6

Doc 3

Doc

Doc

SERVER 2

REPLICA

ACTIVE

Doc 1

Doc 2

Doc

Doc

Doc 7

Doc 9

Doc

Doc

SERVER 3 SERVER 4 SERVER 5

REPLICA

ACTIVE

REPLICA

ACTIVE

Doc 9

Doc 8

Doc Doc 6 Doc

Doc

Doc 5 Doc

Doc 2

Doc 8 Doc

Doc

• App servers accessing docs

• Requests to Server 3 fail

• Cluster detects server failedPromotes replicas of docs to activeUpdates cluster map

• Requests for docs now go to appropriate server

• Typically rebalance would follow

Doc

Doc 1 Doc 3

APP SERVER 1

COUCHBASE Client Library

CLUSTER MAP

COUCHBASE Client Library

CLUSTER MAP

APP SERVER 2

User Configured Replica Count = 1

COUCHBASE SERVER CLUSTER