Upload
dmitri-babaev
View
104
Download
2
Embed Size (px)
DESCRIPTION
Slides from Moscow BigData/Cassandra September 2013 meetup
Citation preview
ElasticSearch as a distributed NoSQL DB
Agenda
1. ElasticSearch architecture overview2. How data is stored in ElasticSearch3. Using ElasticSearch to store semi-structured
data
● ElasticSearch is a distributed inverted index● Build on top of Apache Lucene
○ Lucene is a most popular java-based full text search index implementation■ is used not only for text
Overview
ElasticSearch cluster
Index request
Search request
Routing
● Any request can be manually routed○ index request○ search request
● Both master and slave replicas can process search requests
Replication
● Indexed documents are replicated to node holding slave replicas of a shard
● Sync replication (all nodes holding the shard copies must acknowledge the request)
● Optional async replication
Indexing
● New documents are not indexed immediately instead they are stored in memory and indexed in batches○ Queued documents are not appear in search results
● Any change means that whole document will be marked as deleted and be reindexed
Agenda
1. ElasticSearch architecture overview2. How data is stored in ElasicSearch3. Using ElasticSearch to store semi-structured
data
Lucene inverted index structure
Lucene index updates
● Index is immutable○ All changes are added to the auxiliary index
(segment) in batches○ Search is done simultaneously in all segments of an
index● Segments are eventually merged to larger
ones○ Deleted documents is actually removed during
merge process
Agenda
1. ElasticSearch architecture overview2. How data is stored in ElasticSearch3. Using ElasticSearch to store semi-structured
data
Why use ElasticSearch for semi-structured data?
● Effective in search by many conditions○ type: jeans AND color: [+blue +brown] AND price:
[10 TO 100] AND brand: [+levis +colins]● Inverted index has column-oriented layout
○ less disk IO○ only data required to handle request is processed○ effective compression is possible for the DocId lists
● Document-oriented, no strict schema
Example document JSON{ “name”: “Ivan”, “age”: 18, “likes”: [ { “title”: “The Lord of the Rings”, “type”: ”book” }, { “title”: “The Matrix”, “type”: ”movie” } ]}
ElasticSearch fields
● name● age● likes.title● likes.type
Mapping JSON to index
● Array elements field values are just a list of terms○ how to search for users who like “The Lord of the
Rings” movie?● Separate document for each array item
○ store them on the same shard (data affinity)● Add type prefix to field names● Add type prefix to title term value
Using ElasticSearch with BigData storages
● Index in ElasticSearch, data blobs on S3○ user profiles in ElasticSearch○ user wall dumps on S3
● Index in ElasticSearch, data blobs in HBase○ user post summaries in ElasticSearch○ wall post contents in HBase
The end