31
#LSNA17

Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

Embed Size (px)

Citation preview

Page 1: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

Page 2: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

Page 3: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

SEO Relevance

Pages Liferay assets

Whole text is indexed Key/value docs are indexed

Opaque ranking criteria Scored queries, filters, field types

Reverse engineer Fine tune

Third party algorithms Search engine that you control

Page 4: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

GET /_search?explain{ "query" : { "term" : { "tag" : "LSNA17" } }}

GET /index/type/0/ _explain?q=user_id:2

"value" : 2.7051764, "description" : "score(doc=0,freq=1.0), product of:", "details" : [ { "value" : 0.66422296, "description" : "queryWeight, product of:", "details" : [ { "value" : 4.0726933, "description" : "idf(docFreq=4, maxDocs=108)" }, { "value" : 0.16309182, "description" : "queryNorm" } ] }, { "value" : 4.0726933, "description" : "fieldWeight in 0, product of:", "details" : [ { "value" : 1.0, "description" : "tf(freq=1.0), with freq of:", "details" : [ { "value" : 1.0, "description" : "termFreq=1.0" } ] }, { "value" : 4.0726933, "description" : "idf(docFreq=4, maxDocs=108)" }, { "value" : 1.0, "description" : "fieldNorm(doc=0)"

"failure to match filter: cache(user_id:[2 TO 2])"

Page 5: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

query = apple eclipse zzz yyy xxx qqq kkk ttt rrr

2.345 doc1: apple banana 2.345 doc2: eclipse moon sun16.415 doc3: zzz yyy xxx qqq kkk ttt rrr 111

Page 6: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

(Term Frequency/Inverse Document Frequency)

In question form... Score increases...

Term frequency How often a term appears in a field? + When the term pops up a lot of times along the text

Inverse Document Frequency

How rare is the term in the whole index? + When the term is found in this document and not many others

Field-length norm How short is the field where the term is? + When there isn't much else in the same field (like, a title)

Page 7: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

•{ "must" : { "bool" : { "should" : [ { "match" : { "content_en_US" : { "query" : "pigeon", "type" : "boolean" } } }, { "match" : {

"content_en_US" : { "query" : "pigeon", "type" : "phrase_prefix" } } } ] } }, "should" : { "match" : { "content_en_US" : { "query" : "pigeon", "type" : "phrase", "boost" : 2.0 } } } } }, { "bool" : { "must" : { "bool" : { "should" : [ { "match" : { "description_en_US" : { "query" : "pigeon", "type" : "boolean" } } }, { "match" : { "description_en_US" : { "query" : "pigeon", "type" : "phrase_prefix" } } } ] } }, "should" : { "match" : { "description_en_US" : { "query" : "pigeon", "type" : "phrase", "boost" : 2.0 } } } } }, { "bool" : { "must" : { "bool" : { "should" : [ { "match" : { "entryClassPK" : { "query" : "pigeon", "type" : "boolean" } } }, { "match" : { "entryClassPK" : { "query" : "pigeon", "type" : "phrase_prefix" } } } ] } }, "should" : { "match" : { "entryClassPK" : { "query" : "pigeon", "type" : "phrase", "boost" : 2.0 } } } } }, { "bool" : { "must" : { "bool" : {

"should" : [ { "match" : { "title_en_US" : { "query" : "pigeon", "type" : "boolean" } } }, { "match" : { "title_en_US" : { "query" : "pigeon", "type" : "phrase_prefix" } } } ] } }, "should" : { "match" : { "title_en_US" : { "query" : "pigeon", "type" : "phrase", "boost" : 2.0 } } } } }

Page 8: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

● → FacetedSearcher →

● Indexer

● fields

● score{ "match" : { "content_en_US" : { "query" : "pigeon", "type" : "phrase_prefix" } } }

{ "match" : { "description_en_US" : { "query" : "pigeon", "type" : "phrase", "boost" : 2.0 } } } { "match" : { "entryClassPK" : { "query" : "pigeon", "type" : "boolean" } } }

Page 9: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

Natural language?

string:text

● TF/IDF● case insensitive

Score!

IDs and Serials?

string:keyword

● not_analyzed● case sensitive● match | no match

No score!

Non string data?

integer,date,geo_point...

● match | no match No score!

(... "no score" really a const = 1)

Page 10: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

// IndexSettingsContributor

typeMappingsHelper. addTypeMappings(indexName, myCustomFieldMappings);

liferay-type-mappings.json"content": { "index": "analyzed", "type": "string"},"organizationId": { "index": "not_analyzed",

"type": "string"},"publishDate": { "format": "yyyyMMddHHmmss",

"type": "date"}

Page 11: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

• Analyzed human searches

• query types

• combinations

• best relevance

Favor text fields over keyword fields.

Page 12: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

"*ubstrin*"

• lowercase

• * → "full scan" ↓↓↓

• don't score

Page 13: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

1. full text search

2. Prefix

3. n-grams

Page 14: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

• Match →

• Prefix →

• Phrase →

Know your field, use the right queries.

Page 15: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

Write a field specific query builder@Component(service = FieldQueryBuilder.class, immediate = true)

public class MyFieldQueryBuilder implements FieldQueryBuilder {

public Query build(String field, String keywords) {

Fine tune the right queries for your fieldmyBooleanQuery.add(q1, MUST); myBooleanQuery.add(q2, SHOULD); ...

Page 16: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

多言語検索

• Map

• suffix →

• "b" "a" "d"

• Stemming, stopwords(https://www.elastic.co/guide/en/elasticsearch/guide/current/using-language-analyzers.html)

Pick the right language analyzer.

Page 17: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

document.addText(" myField_ja_JP", japanese);document.addText(" myField_en_US", english);

Locale defaultLocale = portal. getSiteDefaultLocale (groupId);document.addText( getLocalizedName("myField", defaultLocale), translation);

addSearchLocalizedTerm (searchQuery, searchContext, " myField");

searchContext.setLocale(themeDisplay.getLocale());

liferay-type-mappings.json"template_ja": { "mapping": { "analyzer": "kuromoji" }, "match": "\\w+_ja_[A-Z]{2}\\b"}

Page 18: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

• description, content

• title, title_en_US

• content

2x matching query clauses = inflated relevance.

Match once and only once.

Page 19: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

If already indexing once...document.addText(getLocalizedName("myField", languageId), translation);

… no need to index twice...// DON'T //// document.addText(" myField", content);

… match once and only once.addSearchLocalizedTerm(searchQuery, searchContext, " myField");

// DON'T //// addSearchTerm(searchQuery, searchContext, " myField");

Page 20: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

• docs

• value

• display

• highlight

Index for rendering, render from doc.

Page 21: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

analyzed

[30] Liferay[15] DXP[15] Symposium

Page 22: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

not_analyzed

[15] Liferay DXP

[15] Liferay Symposium

Page 23: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

• Aggregate not_analyzed– [15] Liferay DXP

– [15] Liferay Symposium

• Match analyzed

2 fields, 1 analyzed, 1 not_analyzed.

Page 24: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

Search on the text field

new MatchQuery("myfield", keywords);

Aggregate on the keyword field

myFacet.setFieldName("myfield.raw");

Page 25: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

• multifields(https://www.elastic.co/guide/en/elasticsearch/guide/current/aggregations-and-analysis.html)

• Copy Fields(https://wiki.apache.org/solr/SchemaXml#Copy_Fields)

• analyzed

• not_analyzed

Page 26: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

• elasticsearch-head

• Solr Admin

• query string

• explain

Tweak clauses, re-run query, repeat.

Page 27: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

Page 28: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

Page 29: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17

Page 30: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

Thank you!And lots of relevant content at #LSNA17

Page 31: Liferay Search: Best Practices to Dramatically Improve Relevance - Liferay Symposium North America 2017, Austin, USA

#LSNA17