NET User Group · PDF file• Adobe, Alibaba, Ebay, use Hadoop . UNDERSTANDING THE...

Preview:

Citation preview

.NET User

Group

Bern

Roger Rudin

bbv Software Services AG

roger.rudin@bbv.ch

Agenda

– What is NoSQL

– Understanding the Motivation behind NoSQL

– MongoDB: A Document Oriented Database

– NoSQL Use Cases

What is NoSQL?

NoSQL = Not only SQL

NoSQL Definition http://nosql-database.org/

NoSQL DEFINITION: Next Generation Databases mostly addressing some of the points: being non-relational,

distributed, open-source and horizontal scalable. The original

intention has been modern web-scale databases. The

movement began early 2009 and is growing rapidly. Often

more characteristics apply as: schema-free, easy replication

support, simple API, eventually consistent /BASE (not ACID),

a huge data amount, and more. So the misleading

term "nosql" (the community now translates it mostly with

"not only sql") should be seen as an alias to something like

the definition above.

Who Uses NoSQL? • Twitter uses DBFlock/MySQL and Cassandra

• Cassandra is an open source project from Facebook

• Digg, Reddit use Cassandra

• bit.ly, foursquare, sourceforge, and New York Times use

MongoDB

• Adobe, Alibaba, Ebay, use Hadoop

UNDERSTANDING THE

MOTIVATION BEHIND NOSQL

Why SQL sucks..

• O/R mapping (also known as Impedance Mismatch)

• Data-Model changes are hard and expensive

• SQL database are designed for high throughput, not low latency

• SQL Databases do no scale out well

• Microsoft, Oracle, and IBM charge big bucks for databases

– And then you need to hire a database admin

• Take it from the context of Google, Twitter, Facebook and Amazon.

– Your databases are among the biggest in the world and nobody pays you for that feature

– Wasting profit!!!

What has NoSQL done?

• Implemented the most common use cases

as a piece of software

• Designed for scalability and performance

Visual Guide To NoSQL http://blog.nahurst.com/visual-guide-to-nosql-systems

NoSQL Data Models

• Key-Value

• Document-Oriented

• Column Oriented/Tabular

MONGODB: A DOCUMENT

ORIENTED DATABASE

NoSQL Data Model: Document

Oriented

• Data is stored as “documents” • We are not talking about Word documents

• Comparable to Aggregates in DDD

• It means mostly schema free structured data • Can be queried

• Is easily mapped to OO systems (Domain

Model, DDD)

• No join need to implement via programming

Network Communications

• REST/JSON

• TCP/BSON (ClientDriver)

BSON [bee · sahn], short for Bin­ary JSON, is a bin­ary-

en­coded seri­al­iz­a­tion of JSON-like doc­u­ments.

Like JSON, BSON sup­ports the em­bed­ding of

doc­u­ments and ar­rays with­in oth­er doc­u­ments and

ar­rays. BSON also con­tains ex­ten­sions that al­low

rep­res­ent­a­tion of data types that are not part of the

JSON spec. For ex­ample, BSON has a Date type and a

BinData type.

Client Drivers (Apache License)

• MongoDB currently has client support for the following programming languages:

• C

• C++

• Erlang

• Haskell

• Java

• Javascript

• .NET (C# F#, PowerShell, etc)

• Perl

• PHP

• Python

• Ruby

• Scala

Collections vs. Capped Collection

(Table in SQL)

• Collections • blog.posts

• blog.comments

• forum.users

• etc.

• Capped collections (ring buffer) • Logging

• Caching

• Archiving

db.createCollection("log", {capped: true, size: <bytes>, max: <docs>});

Indexes

• Every field in the document can be indexed

• Simple Indexes:

db.cities.ensureIndex({city: 1});

• Compound indexes:

db.cities.ensureIndex({city: 1, zip: 1});

• Unique indexes:

db.cities.ensureIndex({city: 1, zip: 1}, {unique: true});

• Sort order: 1 = descending, -1 = ascending

Relations

• ObjectId db.users.insert(

{name: "Umbert", car_id: ObjectId("<GUID>")});

• DBRef db.users.insert(

{name: "Umbert", car: new DBRef("cars“, ObjectId("<GUID>")});

db.users.findOne(

{name: "Umbert"}).car.fetch().name;

Queries (1)

Queries (Regular Expressions)

{field: /regular.*expression/i}

// get all cities that start with “atl” and end on “a” (e.g. atlanta)

db.cities.count({city: /atl.*a/i});

Queries (2) : LINQ https://github.com/craiggwilson/fluent-mongo

Equals

x => x.Age == 21 will translate to {"Age": 21}

Greater Than, $gt:

x => x.Age > 18 will translate to {"Age": {$gt: 18}}

Greater Than Or Equal, $gte:

x => x.Age >= 18 will translate to {"Age": {$gte: 18}}

Less Than, $lt:

x => x.Age < 18 will translate to {"Age": {$lt: 18}}

Less Than Or Equal, $lte:

x => x.Age <= 18 will translate to {"Age": {$lte: 18}}

Not Equal, $ne:

x => x.Age != 18 will translate to {"Age": {$ne: 18}}

Atomic Operations (Optimistic

Locking)

• Update if current: • Fetch the object.

• Modify the object locally.

• Send an update request that says "update the object

to this new value if it still matches its old value".

Atomic Operations: Sample

> t=db.inventory

> s = t.findOne({sku:'abc'})

{"_id" : "49df4d3c9664d32c73ea865a" , "sku" : "abc" , "qty" : 1}

> t.update({sku:"abc",qty:{$gt:0}}, { $inc : { qty : -1 } } ) ;

> db.$cmd.findOne({getlasterror:1})

{"err" : , "updatedExisting" : true , "n" : 1 , "ok" : 1} // it has worked

> t.update({sku:"abcz",qty:{$gt:0}}, { $inc : { qty : -1 } } ) ;

>db.$cmd.findOne({getlasterror:1})

{"err" : , "updatedExisting" : false , "n" : 0 , "ok" : 1} // did not work

Atomic Operations: multiple

items

db.products.update(

{cat: “boots”, $atomic: 1},

{$inc: {price: 10.0}},

false, //no upsert

true //update multiple

);

Replica set (1)

• Automatic failover

• Automatic recovery of servers that were

offline

• Distribution over more than one

Datacenter

• Automatic nomination of a new Master

Server in case of a failure

• Up to 7 server in one replica set

ReplicaSet

RECOVERING

Replica set (2)

PRIMARY DOWN

PRIMARY

Mongo Sharding

• Partitioning data across multiple physical servers to

provide application scale-out

• Can distribute databases, collections or objects in a

collection

• Choose how you partition data (shardkey)

• Balancing, migrations, management all automatic

• Range based

• Can convert from single master to sharded system with

0 downtime

• Often works in conjunction with object replication

(failover)

Sharding-Cluster

Map Reduce http://www.joelonsoftware.com/items/2006/08/01.html

• It is a two step calculation where one

step is used to simplify the data, and the

second step is used to summarize the

data

Map Reduce Sample

Map Reduce using LINQ https://github.com/craiggwilson/fluent-mongo/wiki/Map-Reduce

• LINQ is by far an easier way to compose map-reduce functions.

// Compose a map reduce to get the sum everyone's ages. var sum = collection.AsQueryable().Sum(x => x.Age); // Compose a map reduce to get the age range of everyone grouped by the first letter of their last name. var ageRanges = from p in collection.AsQueryable() group p by p.LastName[0] into g select new { FirstLetter = g.Key, AverageAge = g.Average(x => x.Age), MinAge = g.Min(x => x.Age), MaxAge = g.Max(x => x.Age) };

Store large Files: GridFS

• The database supports native storage of binary data within BSON objects (limited in size 4 – 16 MB).

• GridFS is a specification for storing large files in MongoDB

• Comparable to Amazon S3 online storage service when using it in combination with replication and sharding

Performance

On MySql, SourceForge was reaching its limits of performance at its current user load. Using some of the easy scale-out options in MongoDB, they fully replaced MySQL and found MongoDB could handle the current user load easily. In fact, after some testing, they found their site can now handle 100 times the number of users it currently supports.

It means you can charge a lot less per user of your application and get the same revenue. Think about it.

Performance http://www.michaelckennedy.net/blog/2010/04/29/MongoDBVsSQLServer2008PerformanceShowdown.aspx

• It’s the inserts where the differences are most

obvious between MongoDB and SQL Server

(about 30x-50x faster than SQL Server)

Administration: MongoVUE

(Windows)

Administration: Monitoring

• MongoDB

Monitoring Service

NOSQL USE CASES

Use Cases: Well suited

• Archiving and event logging

• Document and Content Management Systems

• E-Commerce

• Gaming. High performance small read/writes, geospatial indexes

• High volume problems

• Mobile. Specifically, the server-side infrastructure of mobile systems

• Projects using iterative/agile development methodologies

• Real-time stats/analytics

Use Cases: Less Well Suited

• Systems with a heavy emphasis on

complex transactions such as banking

systems and accounting (multi-object

transactions)

• Traditional Non-Realtime Data

Warehousing

• Problems requiring SQL

Questions?

roger.rudin@bbv.ch

Recommended