MongoDB Basic Concepts

Preview:

Citation preview

MongoDB Basic Concepts

Senior Solutions Architect, 10gen

Norberto Leite

2

Agenda

• Overview

• Replication

• Scalability

• Consistency & Durability

• Flexibility / Developer Experience

But first ...

HappyHanukkah!!!

Who’s this guy?

6

Norberto Leite

Senior Solutions ArchitectSenior Solutions Architect

@nleite / norberto@10gen.com

7

Norberto Leite

BarcelonaBarcelona

Senior Solutions Senior Solutions ArchitectArchitect

@nleite / norberto@10gen.com

8

Norberto Leite

BarcelonaBarcelona

Love MongoDBLove MongoDB

Senior Solutions Senior Solutions ArchitectArchitect

@nleite / norberto@10gen.com

9

Norberto Leite

BarcelonaBarcelona

Love MongoDBLove MongoDB

and others ...and others ...

Senior Solutions Senior Solutions ArchitectArchitect

@nleite / norberto@10gen.com

Your Data

13

Fundamentals

mongomongoDBDB

High Performance

ApplicationApplication

mongomongoDBDBmongomongoDBDB mongomongoDBDB

Horizontal Scalability

FullyConsistent

DocumentOriented{{ name: ‘Norberto Leite’,name: ‘Norberto Leite’, position: ‘SA’,position: ‘SA’, nick: ‘WingMan’,nick: ‘WingMan’, based: [‘Barcelona’, ‘London’]based: [‘Barcelona’, ‘London’]}}

Replication

15

Why do we need Replication?

• Failover

• Backups

• Secondary Batch Jobs

• High Availability

16

Outages

• Planned – Hardware upgrade– OS or file-system tuning– Software upgrade– Relocation of data to new file-system / storage

• Un-planed– Human Error– Hardware Failure– Data Center / Region Outage– Application Corruption

17

Replica Sets

• Data Protection– Multiple copies of data– Data spread across data centers, AZ’s etc

• High Availability– Automated Failover– Automated Recovery

AppPrimary

Secondary

Secondary

Asynchronous Replication

Read(default)

Write

Read(optional)

Read(optional)

AppPrimary

Secondary

Secondary

Failover

Read(default)

Write

Read(optional)

Read(optional)

AppPrimary

Secondary

Automatic Failover

Read(default)

Write

Read(optional)

Primary

Primary Election

AppRecovery

Secondary

Automatic Recovery

Read(default)

Write

Read(optional)

Primary

SecondaryRead(optional)

Sharding

23

Sharding

• Data Location Transparent to Code

• Data Distribution is Automatic– as well as re-distribution

• Aggregation System resources Horizontally

• No CODE Changes!!!

shard01 shard02 shard03

sh.shardCollection("test.tweets", {_id: 1} , false)Range Distribution

a-i j-m n-z

shard01 shard02 shard03

Chunk Split

a-i j-m n-zk-mja-jz

ki-mka-kj

shard01 shard02 shard03

Auto Balancing

a-i j-m n-zja-jz

ki-mka-kjka-kjki-m

shard01 shard02 shard03

Routed Queries

a-i j-m n-zja-jz

ki-m

ka-kj

db.tweets.find( {_id: ‘norberto’})

shard01 shard02 shard03

Scatter Gather

a-i j-m n-zja-jz

ki-m

ka-kj

db.tweets.find( {email: ‘norberto@10gen’})

shard01

a-i

j-r

n-z

300 G

B D

ata

300 GB

96 GB Mem3:1 Data/Mem

Caching

shard01

a-i

300 G

B D

ata

100 GB

96 GB Mem1:1 Data/Mem

Horizontal Distribution

shard02

a-ij-r

100 GB

96 GB Mem1:1 Data/Mem

shard03

n-z

100 GB

96 GB Mem1:1 Data/Mem

Consistency and Durability

32

Consistency

• Eventual Consistency– Allow updates when a system as been

partitioned– Resolve conflicts later– Ex: Cassandra, CouchDB

• Immediate Consistency– Single Master– Avoids conflicts– Example: MongoDB

33

Durability

• For how long is my data available?

• When do I know my data is safe?!

• Where is it safe?

• MongoDB style:– Fire and Forget– Get Last Error– Journal Sync– Replica Safe

34

Durability

Memory Journal Secondary NodesMultiple Data

Centers

RDMS

j=true

Async

w=1(default)

w=majority

w=”tag”

Flexibility

36

Data Model

• Why Json?

– Well understood data format

– Maps simply to objects

– Linking & Embedding to describe relationships

JSON

place1 = { name : "10gen HQ",address : "578 Broadway 7th Floor", city : "New York", zip : "10011", tags : [ "business", "tech" ]}

Relational Way

MongoDB Wayembedding

linking

40

JSON & Scale Out

• Embedding removes the need for:

– Distributed Joins

– Two Phase Commit

• Enables data to be distributed across many nodes without penalty

Recommended