How to Build the Best Data Matching Product

RESULT MATCHING

or top solutions for advanced data matching products: from recruitment portals to Netflix

// 2

WHAT WE’RE LOOKING AT...

1. INTRODUCTION2. ES vs SQL3. BASIC FILTER4. ADVANCED SEARCH5. KEYWORD SEARCH6. SCORING7. WEIGHT MANIPULATION/SCRIPTING8. FULL PARTIAL NO MATCH9. MATCHING10.AGGREGATION11.TO SUM UP...

https://www.facebook.com/espeosoftware


https://www.linkedin.com/company/377200


https://twitter.com/espeo_software


https://www.instagram.com/espeo.software/


// 3

ELASTICSEARCH vs SQL

SQL:- Relational data- Simple search engines- Store data- popular

ELASTICSEARCH:- Designed for complicated search engines- Advanced text search options









// 4

BASIC FILTERS

ORAND

ONE OF MANY









// 5

ADVANCED FILTERS

GEO POINTS LIKE:

“LAT”

“LONG”

“SHAPES”









// 6

KEYWORD SEARCH - EXAMPLEA simple and advanced search by string or part of a string is very powerful!

TYPES OF SEARCHES

BEST_FIELDS - Finds documents which match any field, but uses the _score from the best field.

MOST_FIELDS - Finds documents which match any field and combines the _score from each field.

CROSS_FIELDS - Treats fields with the same analyzer as though they were one big field. Looks for each word in any field.

PHRASE - Runs a match_phrase query on each field and combines the _score from each field.

PHRASE_PREFIX - Runs a match_phrase_prefix query on each field and combines the _score from each field.

{

"multi_match" : {

"query": "this is a test",

"fields": [ "subject", "message"

]

}

}









EXAMPLE

// 8

SCORING

Each result (each record) in ElasticSearch has a score.

Results are ordered by score - but how can we predict which results will have the highest score (which will be on top of results)?

Score relevance comes to the rescue!









// 9

Score relevance is based on:

❏ Term frequency - How often does the term

appear in this document?

❏ Inverse document frequency - How often

does the term appear in all documents in

the collection?

❏ Field-length norm - How long is the field?









// 10

WEIGHT MANIPULATION

We can impact scoring by:- Using provided built-in options- Setting weights to search criterias- Scripting









// 11

⬅️ Click here to read

Click here to read ➡️

In-depth view









http://espeo.eu/blog/search-tools-elasticsearch/

http://espeo.eu/blog/elasticsearch-advanced-search/

// 12

WEIGHT MANIPULATION/SCRIPTING… vector space model?

Vector Space Model









https://nlp.stanford.edu/IR-book/html/htmledition/scoring-term-weighting-and-the-vector-space-model-1.html

// 13

The first and the most easy way to manipulate weights is by using a vector space model. What you saw there is a kind of a joke - this is what happens when we type in Google “vector space model” and are too literal.

By the way, this is how ES works sometimes, you must know what you are doing - otherwise you can end up with a collection of planets instead of an array of numbers and you completely don’t know why ;)









// 14

WEIGHT MANIPULATION/SCRIPTING

Vector Space Model

Let’s switch to a real vector space model. It’s nothing more than a simple vector containing integer numbers how important the term is in current search.

[1,2,5,22,3,8]









// 15

MATCHING

FULLPARTIAL

NO MATCH









// 16

h

a

p

p

y

hippopotamus

MATCHING









// 17

Now, imagine we have three documents:

1. I am happy in summer.

2. After Christmas I’m a hippopotamus.

3. The happy hippopotamus helped Harry.

We can create a similar vector for each document, consisting of the weight of each query term—happy and hippopotamus—that appears in the document, and plot these vectors on the same graph.









// 18

The nice thing about vectors is that they can be compared.

By measuring the angle between the query vector and the document vector, it is possible to assign a relevance score to each document.

The angle between document 1 and the query is large, so it is of low relevance. Document 2 is closer to the query, meaning that it is reasonably relevant, and document 3 is a perfect match.









// 19

AGGREGATION









// 20

Aggregations help build complex summaries & analytics.

Elasticsearch is not only for searching data, but it also is a handy way to prepare summaries. The best thing about ES is it can handle both these functionalities at once.

Searches resolve the problem of finding the best matching documents, but you can have one more very crucial question:“What do these documents tell me about my business?” And that’s where aggregations come in.

Two most frequently used kinds of aggregation are buckets and metric.









// 21

BUCKET AGGREGATION EXAMPLE









// 22

TO SUM UP -who uses advanced search technology such as Elasticsearch?

GITHUBBLIZZARD

NETFLIX

...and we do! In some of our projects.See more for details









http://espeo.eu/cases/recruitment-process-management-system/

http://espeo.eu/cases/platform-creating-cvs-complex-candidate-search-options-user-management-tools/

Want to see how this can work with your product?Email us!

[email protected]

Business

How to Build the Best Data Matching Product