28
ELASTICSEARCH

ElasticSearch Basics

Embed Size (px)

Citation preview

ELASTICSEARCH

Why we need SEARCH ENGINE!!!

Work Life without Google ?

Amazon,Flipkart,Facebook,Twitter,GitHub,

StackOverflow,Zomato don’t have a search capability ?

You know, for search …

Lot of DATA around us but less INFORMATION.

Make our life easier.

Find relevant stuff.

Find it faster.

For research,to shop,for entertainment etc

Elasticsearch !!!

Real-time(near real time) distributed search & analytics engine

Runs on top of Apache Lucene, written in Java,supports REST API

Search Engine Software

Private search engine service (like a Bing or a Google) but with say, private, sensitive, or confidential data/documents that you don’t want on the public web

Developed by Shay Banon

ElasticSearch Clients

Git Hub

StackOverflow

Wikipedia

The Guardian

SoundCloud

McGraw-Hill

Relational db vs Elasticsearch

ElasticSearch Concepts

Document : JSON document stored in ES. Like row in table in Relational DB

Id : Uniquely identifies a document

Field : key-value pairs. Like column in Relational DB

- Simple value like string ,integer, date

- Array or an object

Type : Like a table in realational DB.Has list of fields.

ElasticSearch Concepts

Near RealTime(nrt) : Slight time lag between index a document and being searchable.

Shard : Low level worker unit,Single lucene instance

- Primary Shard (Physically stored document)

- Replica Shard (Copy of primary shard)

Index : Like Database in relational db,Logical namespace which maps to primary and replica shard

Node : Running instance of elasticsearch

ElasticSearch Concepts

Cluster : Collection of one or more nodes

- Facilitates indexing

- Search capabilities across nodes

Inside a Node

Inside a cluster

ElasticSearch Getting Started …

Recent version of Java

elasticsearch.org/download

Latest version of any browser

Marvel & Sense

Marvel : monitoring and management tool

Sense : interactive console

Talking to ElasticSearch

RESTful api json over http

A request to Elasticsearch consists of the same parts as any HTTP request:

curl -X<VERB> '<PROTOCOL>://<HOST>/<PATH>' -d '<BODY>‘

curl -X<VERB> '<PROTOCOL>://<HOST>/<PATH>?<QUERY_STRING>'

Document Oriented

Stores entire objects or documents. It not only stores them, but also indexes the contents of each document in order to make them searchable

Elasticsearch uses JavaScript Object Notation, or JSON,

{"email": "[email protected]","first_name": "John","last_name": "Smith","info": {

"bio": "Eco-warrior and defender of the weak","age": 25,"interests": [ "dolphins", "whales" ]

},"join_date": "2014/05/01"

}

Create Index,Insert Data …

Cluster : myelasticsearch

Node : The Dark Knight

Index : Megacorp

Type : Employee

A request to Elasticsearch consists of the same parts as any HTTP request(Using Sense):

<VERB> '<PROTOCOL>://<HOST>/<INDEX>/<TYPE>/ID ' '<BODY>‘

<VERB> '<PROTOCOL>://<HOST>/<PATH><INDEX>/<TYPE>/ID /?<QUERY_STRING>'

Create Index Example :

PUT localhost:9201/megacorp/employee/1

{

"first_name" : "John",

"last_name" : "Smith",

"age" : 25,

"about" : "I love to go rock climbing",

"interests": [ "sports", "music" ]

}

What is Inverted Index?

It allows very fast full text search.

Doc_1 : The quick brown fox jumped over the lazy dog

Doc_2 : Quick brown foxes leap over lazy dogs in summer.

Inverted Index cont

Search “quick brown”

Both docs match here.

Doc_1 is more relevant.

Analysis and Analyzers

Character filters A character filter could be used to strip out HTML, or to convert &characters to the word and.

Tokenizer Next, the string is tokenized into individual terms by a tokenizer.

Token filters Last, each term is passed through any token filters in turn, which can change terms (for example, lowercasing Quick), remove terms (for example, stopwords such as a, and, the) or add terms (for example, synonyms like jump and leap).

Examples: Standard Analyzer,Simple Analyzer,Whitespaceanalyzer,Language Analyzers etc

Inverted Index After Analysis

Doc_1 : The quick brown fox jumped over the lazy dog

Doc_2 : Quick brown foxes leap over lazy dogs in summer.

Quick can be lowercased to become quick.

foxes can be stemmed--reduced to its root form—to

Become fox. Similarly, dogs could be stemmed to dog.

jumped and leap are synonyms and can be indexed as just

the single term jump.

Retrieve Using Query String (SearchLite)…

Retrieve Example :

GET localhost:9201/megacorp/employee/1

Query String Search Example :

GET /megacorp/employee/_search?q=last_name:Smith

Retrieve Using DSL …

Query DSL Search

Build complicated and robust queries.

The domain-specific language (DSL) is specified using a JSON request body.

Example :

GET /megacorp/employee/_search{

"query": {

"match": {

"last_name": "Smith"

}

}

}

Querying

First part is the range

Filter.

Second Part is the

Match query

Full Text Search

Elasticsearch can search within full-text fields and return the most relevant results first. This concept of relevance is important to Elasticsearch, and is a concept that is completely foreign to traditional relational databases, in which a record either matches or it doesn’t.

Full Text Search Example

Phrase Search

Matching Exact sequence of words or phrases

Highlight Searches

Highlight snippets of text from each search result.

QUESTIONS ???

THANK YOU !!!