View
214
Download
0
Category
Preview:
Citation preview
.NET User
Group
Bern
Roger Rudin
bbv Software Services AG
roger.rudin@bbv.ch
Agenda
– What is NoSQL
– Understanding the Motivation behind NoSQL
– MongoDB: A Document Oriented Database
– NoSQL Use Cases
What is NoSQL?
NoSQL = Not only SQL
NoSQL Definition http://nosql-database.org/
NoSQL DEFINITION: Next Generation Databases mostly addressing some of the points: being non-relational,
distributed, open-source and horizontal scalable. The original
intention has been modern web-scale databases. The
movement began early 2009 and is growing rapidly. Often
more characteristics apply as: schema-free, easy replication
support, simple API, eventually consistent /BASE (not ACID),
a huge data amount, and more. So the misleading
term "nosql" (the community now translates it mostly with
"not only sql") should be seen as an alias to something like
the definition above.
Who Uses NoSQL? • Twitter uses DBFlock/MySQL and Cassandra
• Cassandra is an open source project from Facebook
• Digg, Reddit use Cassandra
• bit.ly, foursquare, sourceforge, and New York Times use
MongoDB
• Adobe, Alibaba, Ebay, use Hadoop
UNDERSTANDING THE
MOTIVATION BEHIND NOSQL
Why SQL sucks..
• O/R mapping (also known as Impedance Mismatch)
• Data-Model changes are hard and expensive
• SQL database are designed for high throughput, not low latency
• SQL Databases do no scale out well
• Microsoft, Oracle, and IBM charge big bucks for databases
– And then you need to hire a database admin
• Take it from the context of Google, Twitter, Facebook and Amazon.
– Your databases are among the biggest in the world and nobody pays you for that feature
– Wasting profit!!!
What has NoSQL done?
• Implemented the most common use cases
as a piece of software
• Designed for scalability and performance
Visual Guide To NoSQL http://blog.nahurst.com/visual-guide-to-nosql-systems
NoSQL Data Models
• Key-Value
• Document-Oriented
• Column Oriented/Tabular
MONGODB: A DOCUMENT
ORIENTED DATABASE
NoSQL Data Model: Document
Oriented
• Data is stored as “documents” • We are not talking about Word documents
• Comparable to Aggregates in DDD
• It means mostly schema free structured data • Can be queried
• Is easily mapped to OO systems (Domain
Model, DDD)
• No join need to implement via programming
Network Communications
• REST/JSON
• TCP/BSON (ClientDriver)
BSON [bee · sahn], short for Binary JSON, is a binary-
encoded serialization of JSON-like documents.
Like JSON, BSON supports the embedding of
documents and arrays within other documents and
arrays. BSON also contains extensions that allow
representation of data types that are not part of the
JSON spec. For example, BSON has a Date type and a
BinData type.
Client Drivers (Apache License)
• MongoDB currently has client support for the following programming languages:
• C
• C++
• Erlang
• Haskell
• Java
• Javascript
• .NET (C# F#, PowerShell, etc)
• Perl
• PHP
• Python
• Ruby
• Scala
Collections vs. Capped Collection
(Table in SQL)
• Collections • blog.posts
• blog.comments
• forum.users
• etc.
• Capped collections (ring buffer) • Logging
• Caching
• Archiving
db.createCollection("log", {capped: true, size: <bytes>, max: <docs>});
Indexes
• Every field in the document can be indexed
• Simple Indexes:
db.cities.ensureIndex({city: 1});
• Compound indexes:
db.cities.ensureIndex({city: 1, zip: 1});
• Unique indexes:
db.cities.ensureIndex({city: 1, zip: 1}, {unique: true});
• Sort order: 1 = descending, -1 = ascending
Relations
• ObjectId db.users.insert(
{name: "Umbert", car_id: ObjectId("<GUID>")});
• DBRef db.users.insert(
{name: "Umbert", car: new DBRef("cars“, ObjectId("<GUID>")});
db.users.findOne(
{name: "Umbert"}).car.fetch().name;
Queries (1)
Queries (Regular Expressions)
{field: /regular.*expression/i}
// get all cities that start with “atl” and end on “a” (e.g. atlanta)
db.cities.count({city: /atl.*a/i});
Queries (2) : LINQ https://github.com/craiggwilson/fluent-mongo
Equals
x => x.Age == 21 will translate to {"Age": 21}
Greater Than, $gt:
x => x.Age > 18 will translate to {"Age": {$gt: 18}}
Greater Than Or Equal, $gte:
x => x.Age >= 18 will translate to {"Age": {$gte: 18}}
Less Than, $lt:
x => x.Age < 18 will translate to {"Age": {$lt: 18}}
Less Than Or Equal, $lte:
x => x.Age <= 18 will translate to {"Age": {$lte: 18}}
Not Equal, $ne:
x => x.Age != 18 will translate to {"Age": {$ne: 18}}
Atomic Operations (Optimistic
Locking)
• Update if current: • Fetch the object.
• Modify the object locally.
• Send an update request that says "update the object
to this new value if it still matches its old value".
Atomic Operations: Sample
> t=db.inventory
> s = t.findOne({sku:'abc'})
{"_id" : "49df4d3c9664d32c73ea865a" , "sku" : "abc" , "qty" : 1}
> t.update({sku:"abc",qty:{$gt:0}}, { $inc : { qty : -1 } } ) ;
> db.$cmd.findOne({getlasterror:1})
{"err" : , "updatedExisting" : true , "n" : 1 , "ok" : 1} // it has worked
> t.update({sku:"abcz",qty:{$gt:0}}, { $inc : { qty : -1 } } ) ;
>db.$cmd.findOne({getlasterror:1})
{"err" : , "updatedExisting" : false , "n" : 0 , "ok" : 1} // did not work
Atomic Operations: multiple
items
db.products.update(
{cat: “boots”, $atomic: 1},
{$inc: {price: 10.0}},
false, //no upsert
true //update multiple
);
Replica set (1)
• Automatic failover
• Automatic recovery of servers that were
offline
• Distribution over more than one
Datacenter
• Automatic nomination of a new Master
Server in case of a failure
• Up to 7 server in one replica set
ReplicaSet
RECOVERING
Replica set (2)
PRIMARY DOWN
PRIMARY
Mongo Sharding
• Partitioning data across multiple physical servers to
provide application scale-out
• Can distribute databases, collections or objects in a
collection
• Choose how you partition data (shardkey)
• Balancing, migrations, management all automatic
• Range based
• Can convert from single master to sharded system with
0 downtime
• Often works in conjunction with object replication
(failover)
Sharding-Cluster
Map Reduce http://www.joelonsoftware.com/items/2006/08/01.html
• It is a two step calculation where one
step is used to simplify the data, and the
second step is used to summarize the
data
Map Reduce Sample
Map Reduce using LINQ https://github.com/craiggwilson/fluent-mongo/wiki/Map-Reduce
• LINQ is by far an easier way to compose map-reduce functions.
// Compose a map reduce to get the sum everyone's ages. var sum = collection.AsQueryable().Sum(x => x.Age); // Compose a map reduce to get the age range of everyone grouped by the first letter of their last name. var ageRanges = from p in collection.AsQueryable() group p by p.LastName[0] into g select new { FirstLetter = g.Key, AverageAge = g.Average(x => x.Age), MinAge = g.Min(x => x.Age), MaxAge = g.Max(x => x.Age) };
Store large Files: GridFS
• The database supports native storage of binary data within BSON objects (limited in size 4 – 16 MB).
• GridFS is a specification for storing large files in MongoDB
• Comparable to Amazon S3 online storage service when using it in combination with replication and sharding
Performance
On MySql, SourceForge was reaching its limits of performance at its current user load. Using some of the easy scale-out options in MongoDB, they fully replaced MySQL and found MongoDB could handle the current user load easily. In fact, after some testing, they found their site can now handle 100 times the number of users it currently supports.
It means you can charge a lot less per user of your application and get the same revenue. Think about it.
Performance http://www.michaelckennedy.net/blog/2010/04/29/MongoDBVsSQLServer2008PerformanceShowdown.aspx
• It’s the inserts where the differences are most
obvious between MongoDB and SQL Server
(about 30x-50x faster than SQL Server)
Administration: MongoVUE
(Windows)
Administration: Monitoring
• MongoDB
Monitoring Service
NOSQL USE CASES
Use Cases: Well suited
• Archiving and event logging
• Document and Content Management Systems
• E-Commerce
• Gaming. High performance small read/writes, geospatial indexes
• High volume problems
• Mobile. Specifically, the server-side infrastructure of mobile systems
• Projects using iterative/agile development methodologies
• Real-time stats/analytics
Use Cases: Less Well Suited
• Systems with a heavy emphasis on
complex transactions such as banking
systems and accounting (multi-object
transactions)
• Traditional Non-Realtime Data
Warehousing
• Problems requiring SQL
Questions?
roger.rudin@bbv.ch
Recommended