27
Combining Solr and Elasticsearch to Improve Autosuggestion on Mobile Local Search Toan Vinh Luu, PhD Senior Search Engineer local.ch AG

Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Combining Solr and Elasticsearch to Improve Autosuggestion on Mobile Local Search

Toan Vinh Luu, PhD Senior Search Engineer local.ch AG

Page 2: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

In this talk

•  Autosuggestion feature

•  Autosuggestion architecture

•  Evaluation

Page 3: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

local.ch •  Local search engine in Switzerland (web, mobile)

•  Each month: –  > 4 millions unique users –  > 8 millions queries on mobile (iOS, android,…)

•  Users search for: –  Services (e.g “restaurant zurich”) –  Resident information (e.g “peter meier”) –  Phone number (e.g. 0800 86 80 86) –  Addresses, point of interest –  ...

Page 4: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Why autosuggestion is important?

User taps on the phone 8 times instead of 34 times to get to the result list when searching for “Electric installation Wallisellen”

Page 5: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

What should we suggest to user?

Page 6: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Popular data suggestion

Page 7: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Popular queries suggestion

>2000 queries/month for “cablecom” which have only 1 entry

“mc donalds” has less entries than “muller” but is queried >10x

Page 8: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Query history suggestion •  9% mobile queries are

historical queries.

•  38% users search by a query in the past

Page 9: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Spellchecker suggestion >700’000 mistakes per month on mobile (9%)

Page 10: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Detail entry suggestion

Page 11: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Special information suggestion

Page 12: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Autosuggestion Architecture

Aut

osug

gest

AP

I/S

earc

h A

PI SuggestData

component

Query history component

Popular query component

Spellchecker component

Index

Index

Index

Index

Index

Query log

Popular query processor

Local.ch Database

Page 13: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

How do we process “popular queries” •  Popular is just not high frequency!

•  User’s language –  4 languages are used in Switzerland. Fail if we suggest “bäckerei” for a French

speaking user

•  Location –  Fail if we suggest a hospital in Zurich for an user in Geneva

•  Misspell –  Fail if we suggest “zürich” and “züruch”

•  Unique users –  Fail if we suggest “toan” just because I searched my name thousands of times

•  Blacklist –  Fail if we suggest “f**k”, “pe**is”

Page 14: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Popular query processor •  Preprocessing query log:

–  Text normalization, stopword, blacklist, keep only queries return results…

•  A query log item in elasticsearch index { "q": "restaurant", "language": "de", "lon": 8.50646, "lat": 47.4192, "datetime": "2014-06-02 11:10:07”, "user": “eeaad0c09abc41676c1c99530693”

}

Page 15: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Find candidate popular queries for each language

{ "query" : { "query_string" : { "query" : "language:%s AND date:[%s TO %s] AND

-q.untouched:/[0].*/” % (language, fromDate, toDate) } }, ”aggs" : { "q" : { "terms" : { "field" : "q.untouched", "size" : TOP_POPULAR } } } }

Page 16: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Find number of unique users given a query

{ "query" : { "query_string" : { "query" : "q.untouched:%s AND date:[%s TO %s]” % (query, fromDate, toDate) } },

"aggs": { "num_users": { "cardinality": { "field": "user" } } } }

Page 17: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Bounding box to limit popular queries given location

0

50

100

150

200

250

300

5.9

5

6.0

5

6.1

5

6.2

5

6.3

5

6.4

5

6.5

5

6.6

5

6.7

5

6.8

5

6.9

5

7.0

5

7.1

5

7.2

5

7.3

5

7.4

5

7.5

5

7.6

5

7.7

5

7.8

5

7.9

5

8.0

5

8.1

5

8.2

5

8.3

5

8.4

5

8.5

5

8.6

5

8.7

5

8.8

5

8.9

5

9.0

5

9.1

5

9.2

5

9.3

5

9.4

5

9.5

5

9.6

5

9.7

5

9.8

5

9.9

5

10.0

5

10.1

5

10.2

5

10.3

5

10.4

5

90% Popular query: Chuv (Centre Hospitalier Universitaire Vaudois)

Page 18: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

45.81 45.88 45.95 46.02 46.09 46.16 46.23 46.3

46.37 46.44 46.51 46.58 46.65 46.72 46.79 46.86 46.93

47 47.07 47.14 47.21 47.28 47.35 47.42 47.49 47.56 47.63 47.7

47.77

5.9

5

6.0

4

6.1

3

6.2

2

6.3

1

6.4

6.4

9

6.5

8

6.6

7

6.7

6

6.8

5

6.9

4

7.0

3

7.1

2

7.2

1

7.3

7.3

9

7.4

8

7.5

7

7.6

6

7.7

5

7.8

4

7.9

3

8.0

2

8.1

1

8.2

8.2

9

8.3

8

8.4

7

8.5

6

8.6

5

8.7

4

8.8

3

8.9

2

9.0

1

9.1

9.1

9

9.2

8

9.3

7

9.4

6

9.5

5

9.6

4

9.7

3

9.8

2

9.9

1

10

10.0

9

10.1

8

10.2

7

10.3

6

10.4

5

Histogram of query “chuv” based on freq, longitude and latitude

Page 19: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

46.5243,6.6397

46.52,6.63

46.53,6.64

Page 20: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Percentiles aggregation to find min, max value of querying location

"query" : { "match" : {"q" : {"query" :”chuv”}}

}, "aggs" : { "lat_outlier" : { "percentiles" : { "field" : "lat", "percents" : [5, 95] } }, "lon_outlier" : { "percentiles" : { "field" : "lon", "percents" : [5, 95] } } }

Page 21: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Popular query stored in Solr index

{ "q": "chuv", "lang": ["de”,"fr”, "en”], "users": 7435, "min_lat": 46.2245, "max_lon": 7.3332, "max_lat": 46.9909, "min_lon": 6.29637, "freq": 9524

}

Page 22: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Solr request to suggest popular query

q:ch* lang:en users: [100 TO *] min_lat:[* TO " + user_lat + "] min_lon:[* TO " + user_lon + "] max_lat:[" + user_lat + " TO *] max_lon:[" + user_lon + " TO *] & sort=freq desc

Page 23: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Evaluation

•  Several metrics are used to evaluate autosuggestion feature – Number of typed characters to get to result list

•  Average length of input: 10.0 chars •  Average length of clicked suggestion: 15.4 chars

– Number of clicks on suggested items – Average rank of clicked item

Page 24: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Number of clicks on suggested items since new feature release

Release date

Page 25: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

0

0.5

1

1.5

2

2.5

Average rank of clicked item

Release new query suggestion

Page 26: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Conclusion

•  We can combine 2 search frameworks to bring better search experience to user:

– Solr is efficient for querying, faceting and caching

– Elasticsearch is efficient for big data aggregation and query log storing

Page 27: Combining Solr and Elasticsearch to Improve Autosuggestion ...files.meetup.com/7646592/2016-10-12 Elastic Meetup Local Autosuggest.pdfautosuggestion feature – Number of typed characters

Contact information

•  Search team at local.ch – [email protected] – [email protected] –  [email protected]

•  We are hiring a search engineer! – Contact: [email protected]