38
Recruiting Solutions Daniel Tunkelang Head, Query Understanding better search through query understanding Danie l

Better Search Through Query Understanding

Embed Size (px)

DESCRIPTION

Better Search Through Query Understanding Presented as a Data Talk at Intuit on April 22, 2014 Search is a fundamental problem of our time — we use search engines daily to satisfy a variety of personal and professional information needs. But search engine development still feels stuck in an information retrieval paradigm that focuses on result ranking. In this talk, I’ll advocate an emphasis on query understanding. I’ll talk about how we implement query understanding at LinkedIn, and I’ll present examples from the broader web. Hopefully you’ll come out with a different perspective on search and share my appreciation for how we can improve search through query understanding. About the Speaker Daniel Tunkelang leads LinkedIn's efforts around query understanding. Before that, he led LinkedIn's product data science team. He previously led a local search quality team at Google and was a founding employee of Endeca (acquired by Oracle in 2011). He has written a textbook on faceted search, and is a recognized advocate of human-computer interaction and information retrieval (HCIR). He has a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.

Citation preview

Page 1: Better Search Through Query Understanding

Recruiting SolutionsRecruiting SolutionsRecruiting Solutions

Daniel TunkelangHead, Query Understanding

better search throughquery understanding

Daniel

Page 2: Better Search Through Query Understanding

overview

query understanding: what is it? how we do query understanding at LinkedIn some other thoughts from search in the wild

what I’m not going to cover:

2

Page 3: Better Search Through Query Understanding

Information need query select from results

rank using IR model

user:

system:tf-idf PageRank

bird’s-eye view of how a search engine works

3

Page 4: Better Search Through Query Understanding

Information need query select from results

rank using IR model

user:

system:tf-idf PageRank

query understanding

4

Page 5: Better Search Through Query Understanding

search is a communication problem

5

Page 6: Better Search Through Query Understanding

6

tag: skill OR titlerelated skills: search, ranking, …

tag: companyid: 1337industry: internet

verticals:people, jobs

intent: exploratory

Page 7: Better Search Through Query Understanding

7

query understanding pipeline

spellcheck

query tagging

vertical intent prediction

query expansion

raw query

structured query+

annotations

Page 8: Better Search Through Query Understanding

8

query understanding pipeline

spellcheck

query tagging

vertical intent prediction

query expansion

raw query

structured query+

annotations

Page 9: Better Search Through Query Understanding

9

fix obvious typos

help users spell names

spelling correction

Page 10: Better Search Through Query Understanding

10

spelling out the details

PEOPLE NAMESCOMPANIES

TITLES

PAST QUERIES

n-gramsmarissa => ma ar ri is ss sa

metaphonemark/marc => MRK

co-occurrence countsmarissa:mayer = 1000

marisa meyer yahoo

marissa

marisa

meyer

mayer

yahoo

Page 11: Better Search Through Query Understanding

11

spelling out the details

problem: corpus as well as query logs contain many spelling errors

certain spelling errors are quite frequent

while genuine words (especially names) might be infrequent

Page 12: Better Search Through Query Understanding

12

spelling out the details

problem: corpus & query logs contain spelling errors

solution: use query chains to infer correct spelling

[product manger] [product manager] CLICK

[marissa mayer] CLICK

Page 13: Better Search Through Query Understanding

13

query understanding pipeline

spellcheck

query tagging

vertical intent prediction

query expansion

raw query

structured query+

annotations

Page 14: Better Search Through Query Understanding

14

query tagging: identifying entities in the query

TITLE CO GEO

TITLE-237software engineersoftware developer

programmer…

CO-1441Google Inc.

Industry: Internet

GEO-7583Country: US

Lat: 42.3482 NLong: 75.1890 W

(RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL )

Page 15: Better Search Through Query Understanding

15

query tagging: identifying entities in the query

TITLE CO GEO

MORE PRECISE MATCHING WITH DOCUMENTS

Page 16: Better Search Through Query Understanding

16

entity-based filtering

BEFORE

Page 17: Better Search Through Query Understanding

17

entity-based filtering

AFTER

BEFORE

Page 18: Better Search Through Query Understanding

18

entity-based filtering

BEFORE

Page 19: Better Search Through Query Understanding

19

entity-based filtering

AFTER

BEFORE

Page 20: Better Search Through Query Understanding

20

entity-based suggestions

Page 21: Better Search Through Query Understanding

21

entity-based suggestions

Page 22: Better Search Through Query Understanding

22

query tagging: sequential model

EMISSION PROBABILITIES

(learned from user profiles)

TRANSITION PROBABILITIES

(learned from query logs)

TRAINING

Page 23: Better Search Through Query Understanding

23

query tagging: sequential model

INFERENCE

given a query, find the most likely sequence of tags

Page 24: Better Search Through Query Understanding

24

query understanding pipeline

spellcheck

query tagging

vertical intent prediction

query expansion

raw query

structured query+

annotations

Page 25: Better Search Through Query Understanding

25

vertical intent prediction: distribution

JOBS

PEOPLE

COMPANIES

(probability distribution over verticals)

Page 26: Better Search Through Query Understanding

26

vertical intent prediction: relevance

[company]

[employees]

[jobs]

[name search]

Page 27: Better Search Through Query Understanding

27

query understanding pipeline

spellcheck

query tagging

vertical intent prediction

query expansion

raw query

structured query+

annotations

Page 28: Better Search Through Query Understanding

28

query expansion: name synonyms

Page 29: Better Search Through Query Understanding

29

query expansion: job title synonyms

Page 30: Better Search Through Query Understanding

30

query expansion: signals

[jon] [jonathan] CLICK

trained using query chains:

[programmer] [developer] CLICK

symmetric but not transitive!

[francis] ⇔ [frank]

[franklin] ⇔ [frank]

[francis] ≠ [franklin]

[software engineer] [software developer] CLICK

context based!

[software engineer] => [software developer]

[civil engineer] ≠ [civil developer]

Page 31: Better Search Through Query Understanding

31

query understanding pipeline

spellcheck

query tagging

vertical intent prediction

query expansion

raw query

structured query+

annotations

Page 32: Better Search Through Query Understanding

32

what else can we learn from search in the wild?

Page 33: Better Search Through Query Understanding

33

don’t guess when it’s better to ask

vs.

Page 34: Better Search Through Query Understanding

34

clarify then refine

computers books

Page 35: Better Search Through Query Understanding

35

give users transparency, guidance, and control

Page 36: Better Search Through Query Understanding

36

think beyond individual search queries

Gene Golovchinsky, FXPAL

Page 37: Better Search Through Query Understanding

37

know when you don’t know

Claudia Hauff, Query Difficulty for Digital Libraries [2009]

Page 38: Better Search Through Query Understanding

38

Daniel [email protected]://linkedin.com/in/dtunkelang