ElasticSearch as a distributed NoSQL DB

ElasticSearch as a distributed NoSQL DB

Agenda

1. ElasticSearch architecture overview2. How data is stored in ElasticSearch3. Using ElasticSearch to store semi-structured

data

● ElasticSearch is a distributed inverted index● Build on top of Apache Lucene

○ Lucene is a most popular java-based full text search index implementation■ is used not only for text

Overview

ElasticSearch cluster

Index request

Search request

Routing

● Any request can be manually routed○ index request○ search request

● Both master and slave replicas can process search requests

Replication

● Indexed documents are replicated to node holding slave replicas of a shard

● Sync replication (all nodes holding the shard copies must acknowledge the request)

● Optional async replication

Indexing

● New documents are not indexed immediately instead they are stored in memory and indexed in batches○ Queued documents are not appear in search results

● Any change means that whole document will be marked as deleted and be reindexed

Agenda

1. ElasticSearch architecture overview2. How data is stored in ElasicSearch3. Using ElasticSearch to store semi-structured

data

Lucene inverted index structure

Lucene index updates

● Index is immutable○ All changes are added to the auxiliary index

(segment) in batches○ Search is done simultaneously in all segments of an

index● Segments are eventually merged to larger

ones○ Deleted documents is actually removed during

merge process

Agenda

1. ElasticSearch architecture overview2. How data is stored in ElasticSearch3. Using ElasticSearch to store semi-structured

data

Why use ElasticSearch for semi-structured data?

● Effective in search by many conditions○ type: jeans AND color: [+blue +brown] AND price:

[10 TO 100] AND brand: [+levis +colins]● Inverted index has column-oriented layout

○ less disk IO○ only data required to handle request is processed○ effective compression is possible for the DocId lists

● Document-oriented, no strict schema

Example document JSON{ “name”: “Ivan”, “age”: 18, “likes”: [ { “title”: “The Lord of the Rings”, “type”: ”book” }, { “title”: “The Matrix”, “type”: ”movie” } ]}

ElasticSearch fields

● name● age● likes.title● likes.type

Mapping JSON to index

● Array elements field values are just a list of terms○ how to search for users who like “The Lord of the

Rings” movie?● Separate document for each array item

○ store them on the same shard (data affinity)● Add type prefix to field names● Add type prefix to title term value

Using ElasticSearch with BigData storages

● Index in ElasticSearch, data blobs on S3○ user profiles in ElasticSearch○ user wall dumps on S3

● Index in ElasticSearch, data blobs in HBase○ user post summaries in ElasticSearch○ wall post contents in HBase

The end

Technology

ElasticSearch as a distributed NoSQL DB