19
MongoDB use cases and setup involving Elasticsearch MongoDB Meetup @hikeapp Gurgaon Bharvi Dixit @d_bharvi 13 th February 2015

MongoDB meetup at Hike

Embed Size (px)

Citation preview

MongoDB use cases and setup involving Elasticsearch

MongoDB Meetup @hikeapp GurgaonBharvi Dixit@d_bharvi

13th February 2015

Agenda

About Me and Orkash.Why we chose MongoDB. Our use cases and setup of MongoDB. Better Than Apple: MongoDB-Elasticsearch. Elasticsearch An Overview. The most common issues.Mongo University: Learn from the masters.

About Me

Software Engineer @Orkash. Organizer and Speaker @Delhi Elasticsearch Meetup. Loves Java, Data, Elasticsearch, MongoDB, Eclipse. Interested in all things scale, search, security & DevOps. Working with NoSQL databases for more than a year. Social Media and News Media Intelligence. (Complex

schemas & Query designs)

About Orkash

Founded in 2007 by Ashish Sonal. An R&D driven company which provides Big Data Automated Intelligence

Platform with a focus in following areas:– Counter-terrorism, Security intelligence and Risk management.– Political Consulting And Homeland Security.– Decision Support Systems.– Market/Brand intelligence.

We create the FOUR pillars of Automated intelligence:– Information Extraction and Monitoring.– Semantic and Link Analysis.– Geo-Spatial Analysis.– Data Mining & Forensics.

Everything starts with a problem..!!

• Data Driven Decisions• Logfiles for scaling up/down• Warehouse withdrawal triggers orders• History for fraud detection• Internet of Things and Smart Cities.

... data explosion

Everything starts with a problem..!!

Better decisions == more dataAnd NoSQL adds more problems

Data

Big Data

BIG DATA

Big Data Problem goes on..• I need BIG DATA.• I need to analyze this data.• I need to enrich this big data & make it more bigger. • I need fast searching.• I need real-time analytics.• Ohh wait.. I need relational queries on this big data to get

more insights..

Why we chose mongoDB

• It does the impossible. (Can incorporate any kind of data)• Document model.• Distributed computing.• Awesome sharding and replications.• Scales big (horizontally) on commodity hardware's.• Powerful Analytics with aggregation framework.• Highly Persistence and Read-Write Performance.• Awesome security features.• OS-Managed memory management.

Our use cases and setup of MongoDB.

• A primary data store for collecting and storing humongousamount of unstructured/semi-structured texts.

• Building GIS applications for government and security agenciesusing GEO Spatial features.

• Data analytics.

Our use cases and setup of MongoDB.

Our current production setup has 14 nodes:

Node Type #of nodes Hardware SpecificationsData nodes 5 (20 GB RAM with 8 core CPU each)Mongos (VM’s) 4 (4 GB RAM with 4 core CPU each)Arbiter nodes(VM’s) 2 (1 GB RAM with 1 core CPU each)Config servers(VM’s) 3 (4 GB RAM with 2 core CPU each)

Better Than Apple: MongoDB-Elasticsearch

• One of the greatestcombinations this era hasseen.

• Continuous improvements• Fulfills each other’s

missing features.• Both have almost similar

concepts and data types.• Both keep cloud in mind.• Driven by Open-Source

community, knowledgesharing, and Highcollaboration with users.

Better Than Apple: MongoDB-Elasticsearch

Sources: Twitter

Elasticsearch Overview

What is Elasticsearch:• “you know, for search”• Schema-free, REST & JSON Based distributed Full Text

search engine & document store.• Written in JAVA & Build on top of Lucene.• Highly reliable, scalable, fault tolerant.• Support distributed Indexing, Replication, and load

balanced querying.• Powerful Geo-Spatial Queries.• Latest Release : 1.4.2Wait..!! Schema Free?? The real gotcha.. Mongo-ES breakup

Elasticsearch Overview

What does it add to Lucene:• REST service: Json API’s over HTTP

• High Availability & Performance: Clustering & Replication

• A Powerful query DSL.• Interoperation with non-Java/JVM languages.• More and more Resilience.• Multitenancy• And the best one: It allows to maintain relationship

among documents.

The Elasticsearch Open Source Model

Understanding Elasticsearch Structure in respect to MongoDB

The most common issues..

1. Distributed computing comes with two problems:Node failures and Network BottlenecksNode failures can be handled by MongoDB very easily but

Network bottleneck/partitions won’t let you sleep at nightsbecause of Replicaset failovers and Rollbacks.

Separate networks for read and write.2. Assuring Business continuity planMongodump is not fit for the large dataset backups.3. Data Modeling4. Keeping a close eye on Connection5. Importing embedded documents in CSV

Mongo University: Learn from the masters..!!