39
Topic: Topic: NoSQL NoSQL Database Database – MongoDB MongoDB Presenter: Rajesh Kumar Presenter: Rajesh Kumar Sr. Data Architect Sr. Data Architect -Big Data Analytics & Information Management Big Data Analytics & Information Management Agenda: What is NoSQL ,Why NoSQL The different Types of NoSQL Databases & Data Model approach Detailed overview of one of the most popular NoSQL database–MongoDB Model- Document oriented database JSON CRUD Operation Model Data In MongoDB Data Model design consideration Indexing Sharding Sharding Replication Use cases Reference Architecture Insurance Conceptual Data Model

MongoDB NoSQL database a deep dive -MyWhitePaper

Embed Size (px)

Citation preview

Page 1: MongoDB  NoSQL database a deep dive -MyWhitePaper

Topic: Topic: NoSQLNoSQL Database Database –– MongoDBMongoDBPresenter: Rajesh KumarPresenter: Rajesh KumarSr. Data Architect Sr. Data Architect --Big Data Analytics & Information ManagementBig Data Analytics & Information Management

Agenda:• What is NoSQL ,Why NoSQL

• The different Types of NoSQL Databases & Data Model approach

• Detailed overview of one of the most popular NoSQL database–MongoDB

• Model- Document oriented database

• JSON

• CRUD Operation

• Model Data In MongoDB

• Data Model design consideration

• Indexing

• Sharding• Sharding

• Replication

• Use cases

• Reference Architecture

• Insurance Conceptual Data Model

Page 2: MongoDB  NoSQL database a deep dive -MyWhitePaper

Relational database has been so well but..Relational database has been so well but..

The relational Database has been excellent, But the world of data is rapidly changing. The amount of data created each year is almost doubling , and it is kind of data explosion. And these data are not simply transactional structured data. They are the new types of data-generated from web log, documents, clickstream, devices, censors & other IoT;.

Traditional RDBMS systems are not designed to handle such volume , variety and velocity of these (semi-structured & unstructured) data produced in such enormous quantity. Traditional RDBMS can’t provide scalability, performance, and flexibility needed for modern distributed data storage and processing .

Page 3: MongoDB  NoSQL database a deep dive -MyWhitePaper

Mongo DBMongo DB

A Document based database

Page 4: MongoDB  NoSQL database a deep dive -MyWhitePaper

MongoDBMongoDB-- A NoSQL DBA NoSQL DB

Page 5: MongoDB  NoSQL database a deep dive -MyWhitePaper

What is NoSQL What is NoSQL -- Not Only SQL ?Not Only SQL ?

Non relational,

distributed,

schema free,

flexible,

horizontal scalable,

open-source

simple API

Page 6: MongoDB  NoSQL database a deep dive -MyWhitePaper

Why NoSQL ?Why NoSQL ?

Support for distributed platform in the age of Big data

Ability to effectively deal with all kinds of data format images, docs, streaming, text, web, geospatial, sensor, machine , real time operational

Scalability and performance(low latency and faster data access )

Rapid scale - scale out as much as business need to support more user and growing data

24*7 data availability and global deployment

Data to support next gen high performance apps

Real time reporting and analytics (predictive analytics, Machine learning) support beyond their data warehouses support

Lowers data management cost Lowers data management cost

Page 7: MongoDB  NoSQL database a deep dive -MyWhitePaper

Types of NoSQL DatabasesTypes of NoSQL Databases Key/Value store – Memchased, DynamoDB,

Column Store – cassandra, Hbase

Document Store-MongoDB, CouchDB, DynamoDB

Graph Store- Neo4j

Multi-Model databases – DynamoDB,CouchDB

Mongo DB is document oriented database

Data structure is composed of key/value pair in JSON File format

Page 8: MongoDB  NoSQL database a deep dive -MyWhitePaper

What is MongoDB What is MongoDB ??

An Open source document oriented NoSQL database that provides high performance, automatic scaling and flexible schema design.

Page 9: MongoDB  NoSQL database a deep dive -MyWhitePaper

MongoDB fulfills both traditional and new requirementMongoDB fulfills both traditional and new requirement

Page 10: MongoDB  NoSQL database a deep dive -MyWhitePaper

NoSQL but fully featuredNoSQL but fully featured

Page 11: MongoDB  NoSQL database a deep dive -MyWhitePaper

A quick recap of MongoDB CharacteristicsA quick recap of MongoDB Characteristics

Distributed document oriented NoSQL Database

MongoDB store data in JSON-Documents represented as BSON

Dynamic and flexible schema

Horizontal scaling, easy to scale

Support reach query language

Support CRUD for read and write operation

Support for Text search and Geospatial queries

Efficient text and geospatial Index

Very strong sharding and replicationVery strong sharding and replication

_id : It’s a special key assign to each document

-id is unique across a collection

Page 12: MongoDB  NoSQL database a deep dive -MyWhitePaper

A record in MongoDB is a document, which is a data structure composed of A record in MongoDB is a document, which is a data structure composed of field(key) field(key) and value pairsand value pairs. The values of fields may include other nested . The values of fields may include other nested documents, arrays, and arrays of documents.documents, arrays, and arrays of documents.

Page 13: MongoDB  NoSQL database a deep dive -MyWhitePaper

MongoDB Data ModelMongoDB Data Model

MongoDB store document in JSON(BSON Actually)

JSON - short for JavaScript Object Notation

BSON is binary serialization of JSON objects

A JSON object is a key-value("key" : "value" )pair data format that is enclosed in curly braces { }

Document creation is free from schema- No structure, data type , size is required to be predefined. You can create as many fields as you require dynamically.

Data type supported BY JSON/BSON in MongoDB –Strings, Numbers(integer, long, double), Objects, Arrays, Boolean(true/false),Null, Date, Timestamp.

Other construct in MongoDB are Databases, collections, documents, fields

Page 14: MongoDB  NoSQL database a deep dive -MyWhitePaper

Mongo DB Data model core conceptsMongo DB Data model core concepts

Databases-In MongoDB databases is physical container of collection that holds collection of documents.

Collection- Collection is a container of documents, document can be anything.

Document- document is a group of fields in Key/Value pair and free from schema, table, column; adocument can hold any type of data.

Think of Collection and Documents as table & rows in RDBMS

Hierarchical

A document can reference other document

A document can contain other embedded document, array, arrays of document

Page 15: MongoDB  NoSQL database a deep dive -MyWhitePaper

Collection and DocumentCollection and Document

Page 16: MongoDB  NoSQL database a deep dive -MyWhitePaper

Mongo DB Data Mongo DB Data ModelModel-- A Document Store A Document Store ModelModelNot PDF , Word, CSV or HTML,Not PDF , Word, CSV or HTML,Documents Documents are nested structures created using JavaScript Object are nested structures created using JavaScript Object Notation(JSON). Notation(JSON). TThink of document as hink of document as a records in a records in below example, below example, lets see how lets see how a document look a document look like in MongoDBlike in MongoDB

Page 17: MongoDB  NoSQL database a deep dive -MyWhitePaper

MongoDB Document type areMongoDB Document type are

Page 18: MongoDB  NoSQL database a deep dive -MyWhitePaper

MongoDB system componentMongoDB system component

COMPONENTS

mongod - The database process.

mongo - The database shell (uses interactive javascript). The command line shell for interacting directly with database.

mongos - Sharding router

UTILITIES UTILITIES

mongostat - Show performance statistics

mongofiles - Utility for putting and getting files from MongoDB GridFS

mongoimport - Import into mongo from JSON or CSV

mongoexport - Export a single collection (JSON, CSV)

Page 19: MongoDB  NoSQL database a deep dive -MyWhitePaper

Basic Mongo Shell commandsBasic Mongo Shell commands

MongoDB stores documents in collections. If a collection does not exist, MongoDB creates the collection when you first store data for that collection.

Select/create Database : use customerdb

>db tells you the current database

List databases:

>show dbs

local 0.78125GB

test 0.23012GB

customerdb

myDBmyDB

Create collection:

db.createCollection(“products")

List collections,already created

>Show collections

Page 20: MongoDB  NoSQL database a deep dive -MyWhitePaper

Data Manipulation: Create & Read operationData Manipulation: Create & Read operation

Page 21: MongoDB  NoSQL database a deep dive -MyWhitePaper

DData manipulation frequently used methodsata manipulation frequently used methods

The createCollection() Method

db.createCollection(name, options)

The drop() Method

MongoDB's db.collection.drop() is used to drop a collection from the database.

Rename Collection:

>db.collection.renameCollection(“NewColName”)

>db.cusstomer.renameCollection(“Customer”)

The Insert Method ()

>db.COLLECTION_NAME.insert(document)

Query document using find method-

>db.COLLECTION_NAME.find()

Update() Method Update() Method

>db.COLLECTION_NAME.update(SELECTION_CRITERIA, UPDATED_DATA)

>db.col.update({“title”:”MongoDB '},{$set:{“title”: “MongoDB Definitive Guide”}})

The remove() Method

>db.col.remove({“title “ :”MongoDB”})

The sort() Method

>db.COLLECTION_NAME.find().sort({KEY:1})

sorting order 1 and -1 are used. 1 is used for ascending order while -1 is used for descending order.

Page 22: MongoDB  NoSQL database a deep dive -MyWhitePaper

Basic DB operations in a complex documentBasic DB operations in a complex document

Find operation

Querying in embedded object

Comparison operators

Querying in arrays of document

Indexing on embedded document

Indexing on multiple key

Page 23: MongoDB  NoSQL database a deep dive -MyWhitePaper

Model Your DataModel Your Data

Terminology:Terminology:

Page 24: MongoDB  NoSQL database a deep dive -MyWhitePaper

Example Schema.Example Schema.

Model Data in MongoDB: Model your data the way it is used.Model Data in MongoDB: Model your data the way it is used.

Page 25: MongoDB  NoSQL database a deep dive -MyWhitePaper

Lets Model some more data ..Lets Model some more data ..

Page 26: MongoDB  NoSQL database a deep dive -MyWhitePaper

Some schema design considerationsSome schema design considerations

What is priority High consistency

High read performance

High write performance

ODS application

Real time

How does the application access and manipulate data

Data access path and types of queries

Read versus write ratio Read versus write ratio

Analytics( aggregation, video, images, machine, geospatial data)

Page 27: MongoDB  NoSQL database a deep dive -MyWhitePaper

IndexesIndexes--Indexes are special data structure that store subset of your data in an efficient Indexes are special data structure that store subset of your data in an efficient way for easy & faster access to the dataway for easy & faster access to the data

MongoDB store Index in a b-tree format which allows efficient traversal to the index content

Proper Index selection is important in MongoDB and makes DB run optimally, improper Indexing may bring system to a lot of issues in read-write operations and data distribution across shardedcluster)

Indexes Types:

-id

Simple

Compound

Multi key

Full Text Full Text

Geo-spatial

Hashed

Page 28: MongoDB  NoSQL database a deep dive -MyWhitePaper

Index continued..Index continued..

The –id index : It is automatically created, immutable and can’t be removed.

This is same as primary key in RDBMS.

Default value is a 12 byte Object ID

4-Byte timestamp, 3-byte machine id, 2-byte process id,3-byte counter

Simple Index: A simple Index is an Index on a single key

Compound Index: A compound Index is created over two or more fields in a document

Multi-key Index: A multi-key Index is an Index created on a field that contains an array

Full-text search Index: This is an Index over a text based field, similar to how google indexes web pages. e.g Find all tweets that mention auto insurance within 30 days. Search Big Data in a blogpost or all the tweets in last 30 days.

Geo-spatial Index: This Index is to support efficient queries of geospatial coordinate data .It is Geo-spatial Index: This Index is to support efficient queries of geospatial coordinate data .It is used when you need to query location based spatial data. This Index is really a great feature because location based data is one of the valuable data being collected today for targeted location based customer, location based product analysis . e.g Find all customers that live within 50 miles of NY.

Hashed Index: Used mainly in Hash based sharding, and allows for more randomized data distribution across shards

Create Index syntax:

db.employee.ensureIndex({“email”:1},{“unique”:true})

db.employee.ensureIndex({“age”;1}, {“sparse”: true})

db.employee.find({age: {$gte :25}})

Page 29: MongoDB  NoSQL database a deep dive -MyWhitePaper

Index Continue..Index Continue..

Index Properties:

TTL Index-TTL indexes are special indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time

Sparse Index-The sparse property of an index ensures that the index only contain entries for documents that have the indexed field. The index skips documents that do not have the indexed field.

Unique Index- To enable the uniqueness of the field.

Text Search Index:

MongoDB provides text indexes to support text search queries on text content.To perform text search queries, you must have a text index on your collection. A collection can only have one text search index, but that index can cover multiple fields.

Creating text search Index over the ”title” and “content” fields :

db.blogpost.ensureIndex( { title: "text", content: "text" } )db.blogpost.ensureIndex( { title: "text", content: "text" } )

Use the $text query operator to perform text searches

on a collection with a text index.

$text perform a logical OR of all such on the intended search string.

For example, we can use the following query to find term MongoDB and BigData in the blogpost.

db.blogpost.find( { $text: { $search: “MongoDB" } } )

db.blogpost.find({$text:{$search:”BigData”}})

Deleting Text Index: To delete an existing text index, first find the name of index using the following query,

to get the name of the index >db.blogpost.getIndexes()

Now you can drop the text Index: >db.blogpost.dropIndex(“title_text_content_text")

Page 30: MongoDB  NoSQL database a deep dive -MyWhitePaper

TText ext indexesindexes to support text search to support text search analyticsanalytics--By exampleBy example

Page 31: MongoDB  NoSQL database a deep dive -MyWhitePaper

Mongo DB Mongo DB ShardingSharding

Sharding is a method for storing data across multiple machines in clustered computing environment. MongoDB uses sharding to support deployments with very large data sets and high throughput operations.

Purpose of Sharding

When Database system grows very large, capacity of the single server machine can be challenged in increased work load and high concurrent user that demands high throughput . After a certain level ,you can’t keep doing vertical scaling by adding more CPU,RAM and storage, vertical scaling has limitations.

In contrast, Sharding works on Horizontal scaling; divides the data sets and distribute the data over the multiple shards servers. Each shards work as an independent database and collectively all the shards make up a single logical database unit.collectively all the shards make up a single logical database unit.

Sharding reduces the amount of data that each server needs to store. When data grows you can add more shards in the cluster and subsequently each shard stores less data as the cluster grows.

For example, if a database has a 1 terabyte data set, and there are 4 shards, then each shard might hold only 256GB of data. If there are 40 shards, then each shard might hold only 25GB of data

Page 32: MongoDB  NoSQL database a deep dive -MyWhitePaper

Shards in Mongo DB Architecture Shards in Mongo DB Architecture

Page 33: MongoDB  NoSQL database a deep dive -MyWhitePaper

ReplicationReplication

The primary accepts all write operations. Then the secondary replicate the oplogto apply to their data sets.

Page 34: MongoDB  NoSQL database a deep dive -MyWhitePaper

Replication Continue..Replication Continue..

Replica set membersA replica set in MongoDB is a group of mongod processes that provide redundancy and high availability. The members of a replica set are:

Primary- It receives all write operations and records the operation in primary oplog.

Secondary – Secondary member replicate operations from the primary to maintain an identical copy of data set to recover from failure.

Note :The minimum recommended configuration for a replica set is: A primary, a secondary, and an arbiter. Most deployments, will keep three members that store data: A primary and two secondary members

Page 35: MongoDB  NoSQL database a deep dive -MyWhitePaper

Use Use casescases-- Type of workload suitable with NoSQLType of workload suitable with NoSQL

Mobile app development

Internet of things

Digital advertisement

Streaming application

Web application

Social applications

Gaming

Content management

Customer personalization

Recommendation engine

360 customer view of customer, business, product

Fraud detection

Real time analytics Gaming Real time analytics

Page 36: MongoDB  NoSQL database a deep dive -MyWhitePaper

MongoDB supports for programming languagesMongoDB supports for programming languages

Page 37: MongoDB  NoSQL database a deep dive -MyWhitePaper
Page 38: MongoDB  NoSQL database a deep dive -MyWhitePaper

Other cool stuffOther cool stuff

Sharding

Aggregation and map/reduce

Storage engine- Wired Tiger

Capped collection

GridFS

Text and GeoSpatial Index

Use of python, Java Scripting language for complex data handling Use of python, Java Scripting language for complex data handling

Replication

Page 39: MongoDB  NoSQL database a deep dive -MyWhitePaper

That’s itThank you !Email me:[email protected]

Follow me on Twitter: @rajesh14k