19
ElasticSearch as a distributed NoSQL DB

ElasticSearch as a distributed NoSQL DB

Embed Size (px)

DESCRIPTION

Slides from Moscow BigData/Cassandra September 2013 meetup

Citation preview

Page 1: ElasticSearch as a distributed NoSQL DB

ElasticSearch as a distributed NoSQL DB

Page 2: ElasticSearch as a distributed NoSQL DB

Agenda

1. ElasticSearch architecture overview2. How data is stored in ElasticSearch3. Using ElasticSearch to store semi-structured

data

Page 3: ElasticSearch as a distributed NoSQL DB

● ElasticSearch is a distributed inverted index● Build on top of Apache Lucene

○ Lucene is a most popular java-based full text search index implementation■ is used not only for text

Overview

Page 4: ElasticSearch as a distributed NoSQL DB

ElasticSearch cluster

Page 5: ElasticSearch as a distributed NoSQL DB

Index request

Page 6: ElasticSearch as a distributed NoSQL DB

Search request

Page 7: ElasticSearch as a distributed NoSQL DB

Routing

● Any request can be manually routed○ index request○ search request

● Both master and slave replicas can process search requests

Page 8: ElasticSearch as a distributed NoSQL DB

Replication

● Indexed documents are replicated to node holding slave replicas of a shard

● Sync replication (all nodes holding the shard copies must acknowledge the request)

● Optional async replication

Page 9: ElasticSearch as a distributed NoSQL DB

Indexing

● New documents are not indexed immediately instead they are stored in memory and indexed in batches○ Queued documents are not appear in search results

● Any change means that whole document will be marked as deleted and be reindexed

Page 10: ElasticSearch as a distributed NoSQL DB

Agenda

1. ElasticSearch architecture overview2. How data is stored in ElasicSearch3. Using ElasticSearch to store semi-structured

data

Page 11: ElasticSearch as a distributed NoSQL DB

Lucene inverted index structure

Page 12: ElasticSearch as a distributed NoSQL DB

Lucene index updates

● Index is immutable○ All changes are added to the auxiliary index

(segment) in batches○ Search is done simultaneously in all segments of an

index● Segments are eventually merged to larger

ones○ Deleted documents is actually removed during

merge process

Page 13: ElasticSearch as a distributed NoSQL DB

Agenda

1. ElasticSearch architecture overview2. How data is stored in ElasticSearch3. Using ElasticSearch to store semi-structured

data

Page 14: ElasticSearch as a distributed NoSQL DB

Why use ElasticSearch for semi-structured data?

● Effective in search by many conditions○ type: jeans AND color: [+blue +brown] AND price:

[10 TO 100] AND brand: [+levis +colins]● Inverted index has column-oriented layout

○ less disk IO○ only data required to handle request is processed○ effective compression is possible for the DocId lists

● Document-oriented, no strict schema

Page 15: ElasticSearch as a distributed NoSQL DB

Example document JSON{ “name”: “Ivan”, “age”: 18, “likes”: [ { “title”: “The Lord of the Rings”, “type”: ”book” }, { “title”: “The Matrix”, “type”: ”movie” } ]}

Page 16: ElasticSearch as a distributed NoSQL DB

ElasticSearch fields

● name● age● likes.title● likes.type

Page 17: ElasticSearch as a distributed NoSQL DB

Mapping JSON to index

● Array elements field values are just a list of terms○ how to search for users who like “The Lord of the

Rings” movie?● Separate document for each array item

○ store them on the same shard (data affinity)● Add type prefix to field names● Add type prefix to title term value

Page 18: ElasticSearch as a distributed NoSQL DB

Using ElasticSearch with BigData storages

● Index in ElasticSearch, data blobs on S3○ user profiles in ElasticSearch○ user wall dumps on S3

● Index in ElasticSearch, data blobs in HBase○ user post summaries in ElasticSearch○ wall post contents in HBase

Page 19: ElasticSearch as a distributed NoSQL DB

The end