Performance Optimization Strategies for MongoDB
choosing right database server hardware
schema design (denormalizing schema)
query optimization ($in, $nin)
Indexing
choosing approapriate shard key in sharding clusters
What are indexes?
Chemist Drawer
Indexing = technique used to make search faster
Computer Science definition
Index = any data structure that improves the performance of lookup.
DB Index datastructures
Binary Tree B+ Tree Balanced Tree Hashes
Binary Search Tree
Our Favourite Employee Table
Search By Employee Id
select * from employee where employee_id= 3
B+ Tree
The B-tree is a generalization of a binary search tree in that a node can have more than two children
Order of B-Tree= max no of child nodes The left subtree of a node contains only nodes
with keys less than the node's key. he right subtree of a node contains only nodes
with keys greater than the node's key.
A database index improves data retrieval operations but they come up
with the cost. slower writes and the use of more
storage space.
3 Gigabytes of collection, if you have 1 index, approx it uses 500 Mb for that index
INDEX CARDINALITY
Cardinality: Unique values in the column
MONGO DOCUMENT
{
employee_id : 8
Name : “john”
Salary : 2000
}
{
employee_id : 5
Name : “james”
Salary : 3000
}
TAKE AWAY...
Index Datastructure Index Cardinality Indexing is not the only solution to improve the
performance
Points to consider while creating index
Keys (columns) frequently involved in search conditions of a query
Indexes can be created on Array, Sub-documents and also Embedded Fields
Use Indexes to Sort Query Results Queries that return a range of values using operators such as $gt,$lt Negation: Inequality queries are inefficient with respect to indexes
High cardinality (firstname). If low cardinality (gender) then indexing is not efficient Low selectivity indexes: An index should radically reduce the set of possible documents to select
from. Creating multiple indexes in support of a single query: MongoDB will use a single index to optimize a
query. If you need
to specify multiple predicates, you need a compound index. Compound index are ordered by field and order matters
Indexes have storage requirements, and impacts insert/update speed to some degree
For queries with the $or operator, each clause of an $or query executes in parallel, and can each use a different index.