23
Real time analytics of big data with Elasticsearch Karel Mina ř ík

Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

Embed Size (px)

DESCRIPTION

A presentation from the New Media Inspiration 2013 conference (http://www.tuesday.cz/akce/new-media-inspiration-2013/) about using Elasticsearch's faceting features for realtime analytics of big data.

Citation preview

Page 1: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

Real time analyticsof big data with Elasticsearch

Karel Minařík

Page 2: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

JSON

Facets

Analytics

http://www.youtube.com/watch?v=-GftBySG99Q

Page 3: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

Realtime Analytics With ElasticSearch

http://karmi.cz

http://elasticsearch.com

Page 4: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

Realtime Analytics With ElasticSearch

Using a search engine for analytics?

wat?

Page 5: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

A collection of documentsHOW DOES SEARCH WORK?

file_1.txtThe  ruby  is  a  pink  to  blood-­‐red  colored  gemstone  ...

file_2.txtRuby  is  a  dynamic,  reflective,  general-­‐purpose  object-­‐oriented  programming  language  ...

file_3.txt"Ruby"  is  a  song  by  English  rock  band  Kaiser  Chiefs  ...

Page 6: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

How do you search documents?HOW DOES SEARCH WORK?

File.read('file_1.txt').include?('ruby')File.read('file_2.txt').include?('ruby')...

Page 7: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

The inverted indexHOW DOES SEARCH WORK?

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices

TOKENS POSTINGS

ruby file_1.txt file_2.txt file_3.txt

pink file_1.txt

gemstone file_1.txt

dynamic file_2.txt

reflective file_2.txt

programming file_2.txt

song file_3.txt

english file_3.txt

rock file_3.txt

Page 8: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

The inverted indexHOW DOES SEARCH WORK?

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices

ruby file_1.txt file_2.txt file_3.txt

pink file_1.txt

gemstone file_1.txt

dynamic file_2.txt

reflective file_2.txt

programming file_2.txt

song file_3.txt

english file_3.txt

rock file_3.txt

search  "ruby"

Page 9: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

The inverted indexHOW DOES SEARCH WORK?

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices

pink file_1.txt

gemstone file_1.txt

dynamic file_2.txt

reflective file_2.txt

programming file_2.txt

song file_3.txt

english file_3.txt

rock file_3.txt

search  "song"

ruby file_1.txt file_2.txt file_3.txt

Page 10: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

The inverted indexHOW DOES SEARCH WORK?

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices

ruby file_1.txt file_2.txt file_3.txt

pink file_1.txt

gemstone file_1.txt

dynamic file_2.txt

reflective file_2.txt

programming file_2.txt

english file_3.txt

rock file_3.txt

search  "ruby  AND  song"

song file_3.txt

Page 11: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

The inverted indexHOW DOES SEARCH WORK?

http://en.wikipedia.org/wiki/Index_(search_engine)#Inverted_indices

TOKENS POSTINGS

ruby file_1.txt file_2.txt file_3.txt

pink file_1.txt

gemstone file_1.txt

dynamic file_2.txt

reflective file_2.txt

programming file_2.txt

song file_3.txt

english file_3.txt

rock file_3.txt

31

Statistics!

Page 12: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

http://elasticsearch.org

Page 13: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

Realtime Analytics With ElasticSearch

ElasticSearch is an open source, scalable, distributed, cloud-ready, highly-available full-text search engine and database with powerful aggregation features, communicating by JSON over RESTful HTTP, based on Apache Lucene.

Page 14: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

Faceted NavigationFACETS

http://blog.linkedin.com/2009/12/14/linkedin-faceted-search/

Query

Facets

Page 15: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

Faceted Navigation with ElasticsearchFACETS

curl  "http://localhost:9200/people/_search?pretty=true"  -­‐d  '{    "query"  :  {        "match"  :  {  "name"  :  "John"}    },    "filter"  :  {        "terms"  :  {  "employer"  :  ["IBM"]  }    },    "facets"  :  {        "employer"  :  {            "terms"  :  {                    "field"  :  "employer",                    "size"    :  3            }        }    }}'

User query

“Checkboxes”

Facets

http://www.elasticsearch.org/guide/reference/api/search/facets/index.html

"facets"  :  {        "employer"  :  {            "missing"  :  0,            "total"  :  10,            "other"  :  3,            "terms"  :  [  {                "term"  :  "ibm",                "count"  :  3            },  {                "term"  :  "twitter",                "count"  :  2            },  {                "term"  :  "apple",                "count"  :  2            }  ]        }    }

Response

Page 16: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

Visualizing the FacetsFACETS

http://mbostock.github.com/d3/tutorial/bar-1.html

"facets"  :  {        "employer"  :  {            "missing"  :  0,            "total"  :  10,            "other"  :  3,            "terms"  :  [  {                "term"  :  "ibm",                "count"  :  3            },  {                "term"  :  "twitter",                "count"  :  2            },  {                "term"  :  "apple",                "count"  :  2            }  ]        }    }

d3.js ~ A Bar Chart, Part 1

DEMO: http://bl.ocks.org/4571766

Page 17: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

Visualizing the FacetsFACETS

Page 18: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

Visualizing the FacetsFACETS

Page 19: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

Visualizing the FacetsFACETS

http://demo.kibana.org

Page 20: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

Realtime Analytics With ElasticSearch

‣No batch orientation‣No stats precomputation and caching‣No predefined metrics or schemas

Important Concepts

‣Combination of free text search, structured search, and facets‣ Scripting for performing ad–hoc analytics‣ Extendable: write your own facet types

Page 21: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

ScriptingFACETS

curl -X DELETE localhost:9200/demo-articlescurl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }'

curl -X PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}'curl -X PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}'curl -X PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}'curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}'curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}'curl -X POST localhost:9200/demo-articles/_refresh

curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{ "facets": { "popular-domains": { "terms": { "field" : "url",

"script" : "term.replace(new RegExp(\"https?://\"), \"\").split(\"/\")[0]", "lang" : "javascript" } } }}'

Extract and aggregate most popular domains from article URLs

"facets"  :  {        "popular-­‐domains"  :  {            //  ...            "terms"  :  [  {                "term"  :  "some.blogger.com",  "count"  :  3            },  {                "term"  :  "github.com",  "count"  :  1            }  ]        }    }

Response

Page 22: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

DemonstrationsFACETS

curl -X DELETE localhost:9200/demo-articlescurl -X POST localhost:9200/demo-articles -d '{"mappings": { "a": { "properties": {"url": {type: "string", "index": "not_analyzed"}} } } }'

curl -X PUT localhost:9200/demo-articles/a/1 -d '{"title":"...","url":"http://some.blogger.com/2012/09/01/index.html"}'curl -X PUT localhost:9200/demo-articles/a/2 -d '{"title":"...","url":"http://some.blogger.com/2012/09/11/index.html"}'curl -X PUT localhost:9200/demo-articles/a/3 -d '{"title":"...","url":"http://some.blogger.com/about.html"}'curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"https://github.com/user/A"}'curl -X PUT localhost:9200/demo-articles/a/5 -d '{"title":"...","url":"http://github.com/user/B"}'curl -X POST localhost:9200/demo-articles/_refresh

curl -X GET 'localhost:9200/demo-articles/_search/?search_type=count&pretty' -d '{ "facets": { "popular-domains": { "terms": { "field" : "url",

"script" : "term.replace(new RegExp(\"https?://\"), \"\").split(\"/\")[0]", "lang" : "javascript" } } }}'

Extract and aggregate most popular domains from article URLs

"facets"  :  {        "popular-­‐domains"  :  {            //  ...            "terms"  :  [  {                "term"  :  "some.blogger.com",  "count"  :  3            },  {                "term"  :  "github.com",  "count"  :  1            }  ]        }    }

Response

Demo

Page 23: Realtime Analytics With Elasticsearch [New Media Inspiration 2013]

Thanks!d