23
A general overview of the non-relational database By Andrew Kandels

Overview of MongoDB and Other Non-Relational Databases

Embed Size (px)

DESCRIPTION

My Minnesota PHP Usergroup (mnphp.org) presentation where I give an overview on MongoDB and other non-relational databases and their ability to solve unique, complex problems.

Citation preview

Page 1: Overview of MongoDB and Other Non-Relational Databases

A general overview of the non-relational database

By Andrew Kandels

Page 2: Overview of MongoDB and Other Non-Relational Databases

When to use an RDMS?

Organized, structured data matched by common characteristics.• Financial & Medical Records• Personal Information• Access Control (Usernames & Passwords) • Order Processing• Logistics• Mailing Lists

… or, any data that works more efficiently when normalized

Page 3: Overview of MongoDB and Other Non-Relational Databases

What Relational Databases are Bad At

• Content Management System (CMS)• Real-time Analytics• Caching• Logging and Archiving Events• Messaging• Job Queue• Social Networking• Data Mining and Warehousing

Page 4: Overview of MongoDB and Other Non-Relational Databases

When to Consider NoSQL?

• De-normalizing SQL as last resort• Consistency can be sacrificed for scale• Dynamic data models• Tables storing meta-data• BLOB tables storing serialized data!• Very high writes, reads, or both• Don’t have a DBA• Temporary & volatile data

Caching layers are a band aid that fix problems the RDMS was never meant to handle

Page 5: Overview of MongoDB and Other Non-Relational Databases

ConsistencyService operates fully or not at all. You either clicked “Place Order” or you didn’t.

AvailabilityService is always available with no need for scheduled downtime or maintenance windows.

Partition ToleranceNo set of failures less than total network failure is allowed to cause the system to respond incorrectly.

Pick two.

Brewer’s CAP Theorem

Page 6: Overview of MongoDB and Other Non-Relational Databases

(CA) Consistency, Availability• Relational Databases

Trouble with partitions & scale. Deal with it through replication.

(CP) Consistency, Partition-Tolerant• MongoDB• HBase• Redis

Trouble with availability while staying consistent.

(AP) Availability, Partition-Tolerant• CouchDB• Cassandra• Riak• Voldemort

Trouble with partitions & scale. Deal with it through replication.

Page 7: Overview of MongoDB and Other Non-Relational Databases

Non-Relational Databases

• Key/Value Stores

• Document Databases

• Graph Databases

• Big Data & Warehousing Databases

Page 8: Overview of MongoDB and Other Non-Relational Databases

Key/Value Store

MemcachedSimple, high-performance distributed memory object caching system.Pros:• Caching• Rate limiting• Real-time analytics

Cons:• Serialization• Replication• Not fault tolerant

RedisAdvanced key-value store with support for hashes, lists, sets and sorted sets.Pros:• Disk-backed, persistent, journaled (fault tolerant) • Replication out-of-the-box• VERY fast reads/writes

Cons:• Complex to query

Page 9: Overview of MongoDB and Other Non-Relational Databases

Key/Value Store

CassandraVery scalable, distributed and decentralized data store.Pros:• Extremely fast reads and writes (Twitter boasts 100k/second+)• Massive, engaged open source community (Twitter, Facebook)• Fault tolerant

Cons:• Java (see: Riak, an Erlang/C alternative that’s very similar)• Not production ready

VoldemortLinkedIn’s distributed persistent caching solution.Pros:• Distributed storage• In-memory with disk-backed persistence and fault tolerance (no single POF)• Very fast reads and writes (10-20k/second)• Drop-in storage layer (great for unit testing mock objects)• MVCC• Native Serialization (hash tables, arrays, etc.)

Page 10: Overview of MongoDB and Other Non-Relational Databases

Document Databases

MongoDBScalable, high performance database with familiar RDMS functionality.Pros:• Semi-structured (hash tables, lists, dates, …)• Full, range and nested Indexes• Replication and distributed storage• Query language and Map/Reduce• GridFS file storage (NFS replacement)• BSON Serialization• Capped Collections

Cons:• Map/Reduce is single process (soon to be resolved)

CouchDBPortable, fault-tolerant document database.Pros:• Bi-directional replication (offline access)• Some transaction support (ACID)

Cons:• Complicated to query (Map/Reduce)

Page 11: Overview of MongoDB and Other Non-Relational Databases

Graph Databases

Neo4JDesigned on an object-oriented, flexible network structure rather than with strict and static tables. Ideal for social networking applications.Pros:• Read optimized• Indexing• Complex relationship tree processing

Page 12: Overview of MongoDB and Other Non-Relational Databases

Big Data & Warehouse Databases

HBaseThe Hadoop database. For very large tables (billions of rows times millions of columns) on commodity hardware.Pros:• On-demand distributed processing (Map/Reduce)• ETL optional• Integrates tightly in Hadoop ecosystem (Pig, Hive, HDFS)

Cons:• Slow, seconds or minutes (not milliseconds)

InfiniDBDistributed column-oriented database.Pros:• Data warehousing (high speed data loader)• Very fast queries and joins• Analytics & Metrics

Cons:• Slow Updates• Schema designed up-front (hard to change later)

Page 13: Overview of MongoDB and Other Non-Relational Databases

My Two Cents

Content Management System (CMS) MongoDB, CouchDB

Real-time analytics MongoDB, Cassandra (Rainbird)

Page/Query Cache Redis, Voldemort

Logging and Archiving Events MongoDB, Cassandra

Messaging Redis, Cassandra

Job Queue MongoDB (CC), Redis

Social Networking Neo4J, Cassandra

Data Mining & Warehousing HBase, InfiniDB

Binary Storage (Files, Images, …) MongoDB

Sessions Redis, MongoDB

Page 14: Overview of MongoDB and Other Non-Relational Databases

Why Choose MongoDB?

• Semi-structured Data• Native BSON Serialization• Full Index Support• Built-In Replication & Cluster Management• Distributed Storage (Sharding)• Easy to Query• Fast In-Place Updates• GridFS File Storage• Capped collections

MongoDB in many ways “feels” like an RDMS. It’s easy to learn and quick to implement.

Page 15: Overview of MongoDB and Other Non-Relational Databases

Semi-Structured Data

MongoDB is NOT a key/value store. Store complex documents as arrays, hash tables, integers, objects and everything else supported by JSON:

Page 16: Overview of MongoDB and Other Non-Relational Databases

Native BSON Serialization

BSON JSON PHP0

0.2

0.4

0.6

0.8

1

1.2

100,000 serialize/de-serialize runs of bson_encode(), json_encode() and serialize() in the PHP:

The PHP MongoDB extension serializes the data in C outside of the runtime leading to even better results.

Page 17: Overview of MongoDB and Other Non-Relational Databases

Full Index Support

Page 18: Overview of MongoDB and Other Non-Relational Databases

Built-In Replication & Cluster Management

•Data redundancy

•Fault tolerant (automatic failover AND recovery)

•Consistency (wait-for-propagate or write-and-forget)

•Distribute read load

•Simplified maintenance

•Servers in the cluster managed by an elected leader

Page 19: Overview of MongoDB and Other Non-Relational Databases

Easy to Query

Page 20: Overview of MongoDB and Other Non-Relational Databases

Fast In-Place Updates

MongoDB stores documents in padded memory slots. Typical RDMS updates on VARCHAR columns:

• Mark the row and index as deleted (without freeing the space)• Append the new updated row• Append the new index and possibly rebuild the tree

Most updates are small and don’t drastically change the size of the row:

• Last login date• UUID replace / Password update• Session cookie• Counters (failed login attempts, visits)

MongoDB can apply most updates over the existing row, keeping the index and data structure relatively untouched – and do so VERY FAST.

Page 21: Overview of MongoDB and Other Non-Relational Databases

GridFS File Storage

Efficiently store binary files in MongoDB:• Videos• Pictures• Translations• Configuration files

Data is distributed in 4 or 16 MB chunks and stored redundantly in your MongoDB network.

• No serialization / fast reads• Command line and PHP extension access

Page 22: Overview of MongoDB and Other Non-Relational Databases

Capped Collections

Fixed-size round robin tables with extremely fast reads and writes.

Perfect for:• Logging• Messaging• Job Queues• Caching

Features:• Automatically “ages out” old data• Can also query, delete and update out of FIFO order• FIFO reads/writes are nearly as fast as cat > file; tail –f /file• Tailable cursor stays open as reads rows as they are added• Persistent, fault-tolerant, distributed• Atomic pop items off the stack

Page 23: Overview of MongoDB and Other Non-Relational Databases

Object Document Mapper

doctrine-project.org/ projects/mongodb_odm

The Doctrine MongoDB Object Document Mapper is built for PHP 5.3.2+ and provides transparent persistence for PHP objects.

The PHP MongoDB extension is simple; but, this makes it even easier for:

• Document generation seamlessly from your class• Query using your existing class structures• Very easy migration path from an ORM• Rapid Application Development