Upload
espeo-software
View
245
Download
1
Embed Size (px)
Citation preview
RESULT MATCHING
or top solutions for advanced data matching products: from recruitment portals to Netflix
// 2
WHAT WE’RE LOOKING AT...
1. INTRODUCTION2. ES vs SQL3. BASIC FILTER4. ADVANCED SEARCH5. KEYWORD SEARCH6. SCORING7. WEIGHT MANIPULATION/SCRIPTING8. FULL PARTIAL NO MATCH9. MATCHING10.AGGREGATION11.TO SUM UP...
// 3
ELASTICSEARCH vs SQL
SQL:- Relational data- Simple search engines- Store data- popular
ELASTICSEARCH:- Designed for complicated search engines- Advanced text search options
// 4
BASIC FILTERS
ORAND
ONE OF MANY
// 5
ADVANCED FILTERS
GEO POINTS LIKE:
“LAT”
“LONG”
“SHAPES”
// 6
KEYWORD SEARCH - EXAMPLEA simple and advanced search by string or part of a string is very powerful!
TYPES OF SEARCHES
BEST_FIELDS - Finds documents which match any field, but uses the _score from the best field.
MOST_FIELDS - Finds documents which match any field and combines the _score from each field.
CROSS_FIELDS - Treats fields with the same analyzer as though they were one big field. Looks for each word in any field.
PHRASE - Runs a match_phrase query on each field and combines the _score from each field.
PHRASE_PREFIX - Runs a match_phrase_prefix query on each field and combines the _score from each field.
{
"multi_match" : {
"query": "this is a test",
"fields": [ "subject", "message"
]
}
}
EXAMPLE
// 8
SCORING
Each result (each record) in ElasticSearch has a score.
Results are ordered by score - but how can we predict which results will have the highest score (which will be on top of results)?
Score relevance comes to the rescue!
// 9
Score relevance is based on:
❏ Term frequency - How often does the term
appear in this document?
❏ Inverse document frequency - How often
does the term appear in all documents in
the collection?
❏ Field-length norm - How long is the field?
// 10
WEIGHT MANIPULATION
We can impact scoring by:- Using provided built-in options- Setting weights to search criterias- Scripting
// 11
⬅️ Click here to read
Click here to read ➡️
In-depth view
// 12
WEIGHT MANIPULATION/SCRIPTING… vector space model?
Vector Space Model
// 13
The first and the most easy way to manipulate weights is by using a vector space model. What you saw there is a kind of a joke - this is what happens when we type in Google “vector space model” and are too literal.
By the way, this is how ES works sometimes, you must know what you are doing - otherwise you can end up with a collection of planets instead of an array of numbers and you completely don’t know why ;)
// 14
WEIGHT MANIPULATION/SCRIPTING
Vector Space Model
Let’s switch to a real vector space model. It’s nothing more than a simple vector containing integer numbers how important the term is in current search.
[1,2,5,22,3,8]
// 15
MATCHING
FULLPARTIAL
NO MATCH
// 16
h
a
p
p
y
hippopotamus
MATCHING
// 17
Now, imagine we have three documents:
1. I am happy in summer.
2. After Christmas I’m a hippopotamus.
3. The happy hippopotamus helped Harry.
We can create a similar vector for each document, consisting of the weight of each query term—happy and hippopotamus—that appears in the document, and plot these vectors on the same graph.
// 18
The nice thing about vectors is that they can be compared.
By measuring the angle between the query vector and the document vector, it is possible to assign a relevance score to each document.
The angle between document 1 and the query is large, so it is of low relevance. Document 2 is closer to the query, meaning that it is reasonably relevant, and document 3 is a perfect match.
// 19
AGGREGATION
// 20
Aggregations help build complex summaries & analytics.
Elasticsearch is not only for searching data, but it also is a handy way to prepare summaries. The best thing about ES is it can handle both these functionalities at once.
Searches resolve the problem of finding the best matching documents, but you can have one more very crucial question:“What do these documents tell me about my business?” And that’s where aggregations come in.
Two most frequently used kinds of aggregation are buckets and metric.
// 21
BUCKET AGGREGATION EXAMPLE
// 22
TO SUM UP -who uses advanced search technology such as Elasticsearch?
GITHUBBLIZZARD
NETFLIX
...and we do! In some of our projects.See more for details
Want to see how this can work with your product?Email us!