47
elasticsearch basics workshop mathieu Elie at giroll mardi 17 décembre 13

elasticsearch basics workshop

Embed Size (px)

DESCRIPTION

Quick install of elasticsearch, put documents, request, set a mapping and prepare yourself to read the doc !

Citation preview

Page 1: elasticsearch basics workshop

elasticsearch basicsworkshop

mathieu Elie at giroll

mardi 17 décembre 13

Page 2: elasticsearch basics workshop

speaker : @mathieuel

• freelance & founder @oneplaylist

• full stack skills

• see what i’ve done on http://www.mathieu-elie.net

mardi 17 décembre 13

Page 3: elasticsearch basics workshop

goal

• go from first steps

• and get over first frustation

• give the you the power needed to learn by yourself

mardi 17 décembre 13

Page 4: elasticsearch basics workshop

install

• be sure you have java runtime

• apt-get install openjdk-6-jre-headless -y

• consider oracle jvm

mardi 17 décembre 13

Page 5: elasticsearch basics workshop

unzip and run !

## Get the latest stable archivewget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.7.zip

## Extract the archiveunzip elasticsearch-0.90.7.zip

cd elasticsearch-0.90.7

## run !# This will run elasticsearch on foreground. ./bin/elasticsearch -f

mardi 17 décembre 13

Page 6: elasticsearch basics workshop

its alive ! [2013-12-13 15:45:25,187][INFO ][node ] [Bridge, George Washington] version[0.90.7], pid[37998], build[36897d0/2013-11-13T12:06:54Z][2013-12-13 15:45:25,189][INFO ][node ] [Bridge, George Washington] initializing ...[2013-12-13 15:45:25,202][INFO ][plugins ] [Bridge, George Washington] loaded [], sites [][2013-12-13 15:45:28,342][INFO ][node ] [Bridge, George Washington] initialized[2013-12-13 15:45:28,342][INFO ][node ] [Bridge, George Washington] starting ...[2013-12-13 15:45:28,491][INFO ][transport ] [Bridge, George Washington] bound_address {inet[/0:0:0:0:0:0:0:0:9300]}, publish_address {inet[/192.168.1.12:9300]}[2013-12-13 15:45:31,545][INFO ][cluster.service ] [Bridge, George Washington] new_master [Bridge, George Washington][pKCdh1b_TP2TlurO1gm4_g][inet[/192.168.1.12:9300]], reason: zen-disco-join (elected_as_master)[2013-12-13 15:45:31,577][INFO ][discovery ] [Bridge, George Washington] elasticsearch/pKCdh1b_TP2TlurO1gm4_g[2013-12-13 15:45:31,595][INFO ][http ] [Bridge, George Washington] bound_address {inet[/0:0:0:0:0:0:0:0:9200]}, publish_address {inet[/192.168.1.12:9200]}[2013-12-13 15:45:31,596][INFO ][node ] [Bridge, George Washington] started[2013-12-13 15:45:31,629][INFO ][gateway ] [Bridge, George Washington] recovered [0] indices into cluster_state

mardi 17 décembre 13

Page 7: elasticsearch basics workshop

ping es on port 9200

curl http://127.0.0.1:9200{ "ok" : true, "status" : 200, "name" : "Gideon, Gregory", "version" : { "number" : "0.90.6", "build_hash" : "e2a24efdde0cb7cc1b2071ffbbd1fd874a6d8d6b", "build_timestamp" : "2013-11-04T13:44:16Z", "build_snapshot" : false, "lucene_version" : "4.5.1" }, "tagline" : "You Know, for Search" }%

mardi 17 décembre 13

Page 8: elasticsearch basics workshop

Store a Document

curl -XPUT http://localhost:9200/workshop/site/1 -d '{ "url": "http://www.elasticsearch.org", "title": "Open Source Distributed Real Time Search & Analytics", "description": "Elasticsearch is a powerful open source search and analytics engine that makes data easy to explore.", "tags": ["Open Source", "elasticsearch", "Distributed"]}'{"ok":true,"_index":"workshop","_type":"sites","_id":"1","_version":1}%

mardi 17 décembre 13

Page 9: elasticsearch basics workshop

retreive the document

curl -XGET http://localhost:9200/workshop/site/1

{"_index":"workshop","_type":"site","_id":"1","_version":2,"exists":true, "_source" :{ "url": "http://www.elasticsearch.org", "title": "Open Source Distributed Real Time Search & Analytics", "description": "Elasticsearch is a powerful open source search and analytics engine that makes data easy to explore.", "tags": ["Open Source", "elasticsearch", "Distributed"]}}%

mardi 17 décembre 13

Page 10: elasticsearch basics workshop

add more documentscurl -XPUT http://localhost:9200/workshop/site/2 -d '{ "url": "http://www.mathieu-elie.net", "title": "Mathieu ELIE Freelance - Full Stack Data Engineer, Data Visualization", "description": "Freelance Consultant in Bordeaux, System & Software Architect. Love dataviz, redis, elasticsearch, architecture scalability recipes and playing with data.", tags: ["elasticsearch", "Data Visualization"]}'

curl -XPUT http://localhost:9200/workshop/site/3 -d '{ "url": "http://www.giroll.org", "title": "Collectif Giroll - Gironde Logiciels Libres", "description": "Giroll, collectif basÎ È Bordeaux, rÎunis autour des Logiciels et des Cultures libres. Ateliers tous les mardis de 18h30 È 20h30 et organisation d''Install Party Linux tous les six", tags: ["Open Source", "Collectif"]}'

mardi 17 décembre 13

Page 11: elasticsearch basics workshop

now search !

mardi 17 décembre 13

Page 12: elasticsearch basics workshop

curl 'http://localhost:9200/workshop/_search?pretty=true'{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 3, "max_score" : 1.0, "hits" : [ { "_index" : "workshop", "_type" : "site", "_id" : "1", "_score" : 1.0, "_source" :{ "url": "http://www.elasticsearch.org", "title": "Open Source Distributed Real Time Search & Analytics", "description": "Elasticsearch is a powerful open source search and analytics engine that makes data easy to explore.", "tags": ["Open Source", "elasticsearch", "Distributed"]} }, { "_index" : "workshop", "_type" : "site", "_id" : "3", "_score" : 1.0, "_source" :{ "url": "http://www.giroll.org", "title": "Collectif Giroll - Gironde Logiciels Libres", "description": "Giroll, collectif basÎ È Bordeaux, rÎunis autour des Logiciels et des Cultures libres. Ateliers tous les mardis de 18h30 È 20h30 et organisation dInstall Party Linux tous les six", tags: ["Open Source", "Collectif"]} }, {

mardi 17 décembre 13

Page 13: elasticsearch basics workshop

ok great, but now i want to search for

text !

mardi 17 décembre 13

Page 14: elasticsearch basics workshop

step 1 : pass query as a request body

curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d '{ "query" : { "match_all" : { } }}'

mardi 17 décembre 13

Page 17: elasticsearch basics workshop

so lets use the query_string query dsl

curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d '{ "query" : { "query_string" : { "query" : "elasticsearch" } }}'

mardi 17 décembre 13

Page 18: elasticsearch basics workshop

result is a a quiet verbose lets get only title and tags fields

curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d '{ "fields" : ["title", "tags"], "query" : {

"query_string" : { "query" : "elasticsearch" } }}'

mardi 17 décembre 13

Page 19: elasticsearch basics workshop

{ "took" : 6, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "failed" : 0 }, "hits" : { "total" : 2, "max_score" : 0.081366636, "hits" : [ { "_index" : "workshop", "_type" : "site", "_id" : "1", "_score" : 0.081366636, "fields" : { "tags" : [ "Open Source", "elasticsearch", "Distributed" ], "title" : "Open Source Distributed Real Time Search & Analytics" } }, { "_index" : "workshop", "_type" : "site", "_id" : "2", "_score" : 0.06780553, "fields" : { "tags" : [ "elasticsearch", "Data Visualization" ], "title" : "Mathieu ELIE Freelance - Full Stack Data Engineer, Data Visualization" } } ] }}

mardi 17 décembre 13

Page 21: elasticsearch basics workshop

Facets dsl

curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d '{ "fields" : ["title", "tags"], "query" : {

"query_string" : { "query" : "elasticsearch" } }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } }}'

mardi 17 décembre 13

Page 22: elasticsearch basics workshop

ho no!!

"facets" : { "tags" : { "_type" : "terms", "missing" : 0, "total" : 7, "other" : 0, "terms" : [ { "term" : "elasticsearch", "count" : 2 }, { "term" : "visualization", "count" : 1 }, { "term" : "source", "count" : 1 }, { "term" : "open", "count" : 1 }, { "term" : "distributed", "count" : 1 }, { "term" : "data", "count" : 1 } ] } }

mardi 17 décembre 13

Page 23: elasticsearch basics workshop

• hey ! see "Open Source" ! it is lower cased and exploded in multiple tokens !

• this is done by the defautl mapping and analyzer

mardi 17 décembre 13

Page 24: elasticsearch basics workshop

curl 'http://localhost:9200/workshop/site/_mapping?pretty=true' { "site" : { "properties" : { "description" : { "type" : "string" }, "tags" : { "type" : "string" }, "title" : { "type" : "string" }, "url" : { "type" : "string" } } }}

mardi 17 décembre 13

Page 26: elasticsearch basics workshop

test the default analyzer

curl -XGET 'localhost:9200/workshop/_analyze?pretty=true' -d 'Open Source'{ "tokens" : [ { "token" : "open", "start_offset" : 0, "end_offset" : 4, "type" : "<ALPHANUM>", "position" : 1 }, { "token" : "source", "start_offset" : 5, "end_offset" : 11, "type" : "<ALPHANUM>", "position" : 2 } ]}

mardi 17 décembre 13

Page 28: elasticsearch basics workshop

curl -XGET 'localhost:9200/workshop/_analyze?analyzer=keyword&pretty=true' -d 'Open Source'{ "tokens" : [ { "token" : "Open Source", "start_offset" : 0, "end_offset" : 11, "type" : "word", "position" : 1 } ]}

got it ! now how to apply this to our tags field ?

mardi 17 décembre 13

Page 29: elasticsearch basics workshop

curl 'http://localhost:9200/workshop/site/_mapping?pretty=true' -d '{ "site" : { "properties" : { "url" : {"type" : "string"}, "title" : {"type" : "string"}, "description" : {"type" : "string"}, "tags" : {"type" : "string", "analyzer": "keyword" } } }}'{ "error" : "MergeMappingException[Merge failed with failures {[mapper [tags] has different index_analyzer]}]", "status" : 400}

oops ! we need to drop something..

mardi 17 décembre 13

Page 30: elasticsearch basics workshop

curl -XDELETE 'http://localhost:9200/workshop/'{"ok":true,"acknowledged":true}%

# index should exists if we want to put mapping..curl -XPUT 'http://localhost:9200/workshop/'{"ok":true,"acknowledged":true}%

curl 'http://localhost:9200/workshop/site/_mapping?pretty=true' -d '{ "site" : { "properties" : { "url" : {"type" : "string"}, "title" : {"type" : "string"}, "description" : {"type" : "string"}, "tags" : {"type" : "string", "analyzer": "keyword" } } }}'{"ok":true,"acknowledged":true}%

mardi 17 décembre 13

Page 31: elasticsearch basics workshop

# test on the field analysis curl -XGET 'localhost:9200/workshop/_analyze?pretty=true&field=site.tags' -d 'Open Source'{ "tokens" : [ { "token" : "Open Source", "start_offset" : 0, "end_offset" : 11, "type" : "word", "position" : 1 } ]}

# congrats !

mardi 17 décembre 13

Page 32: elasticsearch basics workshop

# lets push data againcurl -XPUT http://localhost:9200/workshop/site/1 -d '{ "url": "http://www.elasticsearch.org", "title": "Open Source Distributed Real Time Search & Analytics", "description": "Elasticsearch is a powerful open source search and analytics engine that makes data easy to explore.", "tags": ["Open Source", "elasticsearch", "Distributed"]}'

curl -XPUT http://localhost:9200/workshop/site/2 -d '{ "url": "http://www.mathieu-elie.net", "title": "Mathieu ELIE Freelance - Full Stack Data Engineer, Data Visualization", "description": "Freelance Consultant in Bordeaux, System &amp; Software Architect. Love dataviz, redis, elasticsearch, architecture scalability recipes and playing with data.", tags: ["elasticsearch", "Data Visualization"]}'

curl -XPUT http://localhost:9200/workshop/site/3 -d '{ "url": "http://www.giroll.org", "title": "Collectif Giroll - Gironde Logiciels Libres", "description": "Giroll, collectif basÎ È Bordeaux, rÎunis autour des Logiciels et des Cultures libres. Ateliers tous les mardis de 18h30 È 20h30 et organisation d''Install Party Linux tous les six", tags: ["Open Source", "Collectif"]}'

mardi 17 décembre 13

Page 33: elasticsearch basics workshop

# faceting ok ???curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d '{ "fields" : ["title", "tags"], "query" : {

"query_string" : { "query" : "elasticsearch" } }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } }}'

mardi 17 décembre 13

Page 34: elasticsearch basics workshop

"facets" : { "tags" : { "_type" : "terms", "missing" : 0, "total" : 5, "other" : 0, "terms" : [ { "term" : "elasticsearch", "count" : 2 }, { "term" : "Open Source", "count" : 1 }, { "term" : "Distributed", "count" : 1 }, { "term" : "Data Visualization", "count" : 1 } ] } }

cool ! our facets contains whole tags ! great jobs !!

mardi 17 décembre 13

Page 36: elasticsearch basics workshop

• more efficient than full text search

• cached / indexed

• you can filter using facet items

curl -XGET 'http://localhost:9200/workshop/site/_search?pretty=true' -d '{ "query" : { "match_all" : { } }, "filter" : { "term" : { "tags" : "Open Source"} }}'

mardi 17 décembre 13

Page 37: elasticsearch basics workshop

RTFM WAY

• elasticsearch doc is great

• but it is exhaustive

• so at the beguining its a bit frustrating

mardi 17 décembre 13

Page 38: elasticsearch basics workshop

Think about json hierachy

curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d '{ "fields" : ["title", "tags"], "query" : { "query_string" : { "query" : "elasticsearch" } }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } }}'

mardi 17 décembre 13

Page 40: elasticsearch basics workshop

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl.html

your using the query dsl

curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d '{ "fields" : ["title", "tags"], "query" : { "query_string" : { "query" : "elasticsearch" } }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } }}'

mardi 17 décembre 13

Page 41: elasticsearch basics workshop

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-queries.html

your using different types of queries

curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d '{ "fields" : ["title", "tags"], "query" : { "query_string" : { "query" : "elasticsearch" } }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } }}'

mardi 17 décembre 13

Page 42: elasticsearch basics workshop

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/query-dsl-query-string-query.html

this query is a query_string typewith a query parameter set to elasticsearch

curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d '{ "fields" : ["title", "tags"], "query" : { "query_string" : { "query" : "elasticsearch" } }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } }}'

mardi 17 décembre 13

Page 43: elasticsearch basics workshop

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets.html

we also use faceting

curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d '{ "fields" : ["title", "tags"], "query" : { "query_string" : { "query" : "elasticsearch" } }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } }}'

mardi 17 décembre 13

Page 44: elasticsearch basics workshop

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-facets-terms-facet.html

we use a terms facet

curl -XPOST 'http://localhost:9200/workshop/site/_search?pretty=true' -d '{ "fields" : ["title", "tags"], "query" : { "query_string" : { "query" : "elasticsearch" } }, "facets" : { "tags" : { "terms" : {"field" : "tags"} } }}'

mardi 17 décembre 13

Page 45: elasticsearch basics workshop

RTFM WAY

• common mistake: the code example are not showing always whole query

• so you should replace the code in the doc in the whole dsl hierarchy

• think about hierarchy and everything should be more clear

mardi 17 décembre 13

Page 46: elasticsearch basics workshop

the end for me...

the begguining for you...

mardi 17 décembre 13

Page 47: elasticsearch basics workshop

questions and more

• twitter @mathieuel

• contact on my freelance website

• http://www.mathieu-elie.net

• thanks to giroll for hosting this workshop !

mardi 17 décembre 13