Upload
antoni-orfin
View
674
Download
6
Embed Size (px)
Citation preview
USE CASES
1. Intelligent search engines …learning on users behaviour
„Search for cats that I would love from 3M database”
…forgiving spelling mistakes „Search for Mihael Jakson photos and show Michael Jackson photos”
OLD SCHOOL Searching in MySQL
SELECT * FROM photos WHERE title LIKE ”%cat%”
SELECT * FROM photos WHERE title LIKE ”%cats%”
Id [PK] title 1 Cute cat and dog 2 Cat plays with a dog 3 Cats playing piano
… …. 3 000 000 Hidden cat
SEARCH THEORY Building Inverted Index
Cute cat and dog #1
Cats playing piano #3
Term [PK] Id cute 1 cat 1, 2, 3 dog 1, 2 play 2, 3
… ….
Cat plays with a dog #2
SEARCH THEORY Text Analysis
Puppy and kitten with guinea pig
1. Tokenization
[Puppy] [and] [kitten] [with] [guinea] [pig]
2. Filtering tokens
[dog] [cat] [guinea] [pig]
Two separate tokens? L
ASCII Folding – róża à roza Lowercase - Cat à cat Synonyms –
kitten à cat puppy à dog
Stopwords – common words to remove
and, what, with, or Stemming - reducing inflected words to their base form
cats -> cat fishing, fisher, fished -> fish
SEARCH THEORY Text Analysis
Lekarz Chorób Wewnętrznych
stemming
Lekarz Choroba Wewnętrzny
asciifolding, lowercase lekarz choroba wewnetrzny
synonyms
internista
SEARCH THEORY Text Analysis
SOLUTION
Elasticsearch is a flexible and powerful open-source, distributed, real-time search and analytics engine.
ELASTICSEARCH Architecture
Node 1
Shard 1 Shard 2 Replica 3 Replica 4
Shard 3 Shard 4 Replica 1 Replica 2
Node 2
4 shards 1 replica
Elasticsearch MySQL
Node Instance
Index Database
Type Table
Document Row
Attribute Column
ELASTICSEARCH Nomenclature
PUT [localhost:9200]/pixers/photos/_mapping { "photos" : { "properties" : { "title" : {"type" : "string", "analyzer" : "pl"}, ”categories" : {"type" : ”nested”, ...} } } }
Types string, float, double, byte, short, integer, long, date nested geo_point geo_shape … etc …
ELASTICSEARCH Mapping
localhost:9200/{index}/{type}/{document id} PUT [localhost:9200]/pixers/photos/1 { "title" : "Cute cat and dog sitting on books", "keywords": ["cat", "dog"] } GET [localhost:9200]/pixers/photos/1 DELETE [localhost:9200]/pixers/photos/1
ELASTICSEARCH REST API
Searching GET /pixers/photos/_search { "query" : { "match" : { "title" : "cat" } }
} Real life query > >
ELASTICSEARCH REST API
Query vs Filter
Query String „likes:[10 to *] and title:(+cat –dog)”
Match – „funny cat”
Fuzzy – „funy cad”
More Like This
ELASTICSEARCH Searching
Query vs Filter
Terms – [some, tags]
Range – likes > 10
Geo Distance Lat=50; Lon=20; Distance=200m
ELASTICSEARCH Searching
Aggregations Get likes stats and histogram of created_at date grouped by categories. terms: category - stats: likes - histogram: created_at
ELASTICSEARCH Analytics
Contact me at:
linkedin.com/in/antoniorfin twitter.com/antoniorfin
www.pixersize.com
Thank you! Questions & Answers