Elasticsearch for SQL Users - Percona · PDF fileCREATE TABLE IF NOT EXISTS emails (sender...

Preview:

Citation preview

Elasticsearchfor SQL UsersPhilipp Krenn @xeraa

Infrastructure | Developer Advocate

ViennaDBPapers We Love Vienna

AgendaEcosystemArchitecture

QueriesSchema

Conclusion

Ecosystem

You Know, for Search

ELK Stack

Elastic Stack

Architecture

Apache LuceneLibrary

Index, store, search

ElasticsearchDistribution

RESTQuery DSL

IndexType

Document

ClusterIndex

Shard & Replica

Topology

Write

Immutable Segments

Immutable Segments

Schema-Free Is a Lie!Dynamic mapping

CREATE TABLE IF NOT EXISTS emails ( sender VARCHAR(255) NOT NULL, recipients TEXT, cc TEXT, bcc TEXT, subject VARCHAR(1024), body MEDIUMTEXT, datetime DATETIME);

CREATE INDEX emails_sender ON emails(sender);CREATE FULLTEXT INDEX emails_subject ON emails(subject);CREATE FULLTEXT INDEX emails_body ON emails(body);

PUT /communication{ "mappings": { "email": { "properties": { "sender": { "type": "keyword" }, "recipients": { "type": "keyword" }, "cc": { "type": "keyword" }, "bcc": { "type": "keyword" }, "subject": { "type": "text", "analyzer": "english" }, "body": { "type": "text", "analyzer": "english" } } } }}

Beware of the Reindex

Queries

Let's Add Some Data

POST /communication/email{ "sender": "david@elastic.co", "recipients": [ "philipp@elastic.co" ], "cc": [], "subject": "Elasticsearch is pretty cool", "body": "Hey Philipp, this is great stuff. Check it out!"}

Let's Add Some Data

POST /communication/email{ "sender": "philipp@elastic.co", "recipients": [ "shay@elastic.co" ], "cc": [ "david@elastic.co" ], "subject": "Thanks for creating Elasticsearch", "body": "David pointed me to your project. It's awesome"}

Let's Add Some Data

POST /communication/email{ "sender": "david@elastic.co", "recipients": [ "shay@elastic.co" ], "cc": [], "subject": "We should hire Philipp", "body": "He is really into the project and will do great things."}

Get All the Data

POST /communication/_search{ "query": { "match_all": { } }}

Find great

POST /communication/_search{ "query": { "match": { "body": "great" } }}

text vs keywordtext (default) is analyzed

keyword is not

Analyzed Text

PUT /cities/city/1{ "city": "Sandusky", "population": 25340}PUT /cities/city/2{ "city": "New Albany", "population": 8829}PUT /cities/city/3{ "city": "New York", "population": 8406000}

Analyzed TextStop words, stemming, synonyms, fuzzying,...

Analyzed Text

POST /cities/_search{ "query": { "match": { "city": "New Albany" } }}

Not Analyzed Text

DELETE /citiesPUT /cities{ "mappings": { "city": { "properties": { "city": { "type": "keyword" } } } }}

Not Analyzed Text

POST /cities/_search{ "query": { "match": { "city": "New Albany" } }}

Search for great stuff

POST /communication/_search{ "query": { "match": { "body": "great stuff" } }}

Search for the Phrase great stuff

POST /communication/_search{ "query": { "match_phrase": { "body": "great stuff" } }}

Multiple Fields

POST /communication/_search{ "query": { "multi_match": { "query": "Philipp", "fields": [ "subject", "body" ] } }}

BoostingPositive (>1) or negative (<1)

POST /communication/_search{ "query": { "multi_match": { "query": "Philipp", "fields": [ "subject^3", "body" ] } }}

Fuzziness

POST /communication/_search{ "query": { "match": { "body": { "query": "awsome", "fuzziness": 1 } } }}

Lots MoreBoolean (must, must_not, should, minimum_should_match)

HighlightingSearch and scroll

Geo

(Pipelined) AggregationsExample: Group by day, sum, moving average

Schema

Application-Side Joins

PUT /blog/author/1{ "name": "Philipp", "bio": "..."}PUT /blog/post/1{ "author_id": 1, "title": "...", "body": "..."}PUT /blog/post/2{ "author_id": 1, "title": "...", "body": "..."}

Application-Side Joins

POST /blog/author/_search{ "query": { "match": { "name": "Philipp" } }}

POST /blog/post/_search{ "query": { "match": { "author_id": <each id from query 1 result> } }}

Data Denormalization

PUT /blog/post/1{ "author_name": "Philipp", "title": "...", "body": "..."}PUT /blog/post/2{ "author_name": "Philipp", "title": "...", "body": "..."}

Data Denormalization

POST /blog/post/_search{ "query": { "match": { "author_name": "Philipp" } }}

Nested Objects

PUT /blog/author/1{ "name": "Philipp", "bio": "...", "blog_posts": [ { "title": "...", "body": "..." }, { "title": "...", "body": "..." }, { "title": "...", "body": "..." } ]}

Nested Objects

POST /blog/author/_search{ "query": { "match": { "name": "Philipp" } }}

Parent-Child Documents

PUT /blog{ "mappings": { "author": {}, "post": { "_parent": { "type": "author" } } }}

Parent-Child Documents

PUT /blog/author/1{ "name": "Philipp", "bio": "..."}PUT /blog/post/1?parent=1{ "title": "...", "body": "..."}PUT /blog/post/2?parent=1{ "title": "...", "body": "..."}

Parent-Child Documents

POST /blog/post/_search{ "query": { "has_parent": { "type": "author", "query": { "match": { "name": "Philipp" } }}

Conclusion

Many QueriesSearch, boolean aggregate, geo,...

text vs keyword

SchemaApplication, denormalization, nesting, parent-child

Getting Data from Your RDBMSRDBMS trigger

LogstashForked writes from your application

Can I use it as a primary datastore?

It dependshttps://www.elastic.co/guide/en/elasticsearch/

resiliency/current/index.html

How fast is it?

Datastore, logs, metrics, analytics,...

Morehttps://www.elastic.co/training

Amsterdam: Nov 21-24

Thanks!

Questions?Philipp Krenn @xeraa

PS: Stickers

Recommended