Upload
daniel-tunkelang
View
2.560
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Better Search Through Query Understanding Presented as a Data Talk at Intuit on April 22, 2014 Search is a fundamental problem of our time — we use search engines daily to satisfy a variety of personal and professional information needs. But search engine development still feels stuck in an information retrieval paradigm that focuses on result ranking. In this talk, I’ll advocate an emphasis on query understanding. I’ll talk about how we implement query understanding at LinkedIn, and I’ll present examples from the broader web. Hopefully you’ll come out with a different perspective on search and share my appreciation for how we can improve search through query understanding. About the Speaker Daniel Tunkelang leads LinkedIn's efforts around query understanding. Before that, he led LinkedIn's product data science team. He previously led a local search quality team at Google and was a founding employee of Endeca (acquired by Oracle in 2011). He has written a textbook on faceted search, and is a recognized advocate of human-computer interaction and information retrieval (HCIR). He has a PhD in Computer Science from CMU, as well as BS and MS degrees from MIT.
Citation preview
Recruiting SolutionsRecruiting SolutionsRecruiting Solutions
Daniel TunkelangHead, Query Understanding
better search throughquery understanding
Daniel
overview
query understanding: what is it? how we do query understanding at LinkedIn some other thoughts from search in the wild
what I’m not going to cover:
2
Information need query select from results
rank using IR model
user:
system:tf-idf PageRank
bird’s-eye view of how a search engine works
3
Information need query select from results
rank using IR model
user:
system:tf-idf PageRank
query understanding
4
search is a communication problem
5
6
tag: skill OR titlerelated skills: search, ranking, …
tag: companyid: 1337industry: internet
verticals:people, jobs
intent: exploratory
7
query understanding pipeline
spellcheck
query tagging
vertical intent prediction
query expansion
raw query
structured query+
annotations
8
query understanding pipeline
spellcheck
query tagging
vertical intent prediction
query expansion
raw query
structured query+
annotations
9
fix obvious typos
help users spell names
spelling correction
10
spelling out the details
PEOPLE NAMESCOMPANIES
TITLES
PAST QUERIES
n-gramsmarissa => ma ar ri is ss sa
metaphonemark/marc => MRK
co-occurrence countsmarissa:mayer = 1000
marisa meyer yahoo
marissa
marisa
meyer
mayer
yahoo
11
spelling out the details
problem: corpus as well as query logs contain many spelling errors
certain spelling errors are quite frequent
while genuine words (especially names) might be infrequent
12
spelling out the details
problem: corpus & query logs contain spelling errors
solution: use query chains to infer correct spelling
[product manger] [product manager] CLICK
[marissa mayer] CLICK
13
query understanding pipeline
spellcheck
query tagging
vertical intent prediction
query expansion
raw query
structured query+
annotations
14
query tagging: identifying entities in the query
TITLE CO GEO
TITLE-237software engineersoftware developer
programmer…
CO-1441Google Inc.
Industry: Internet
GEO-7583Country: US
Lat: 42.3482 NLong: 75.1890 W
(RECOGNIZED TAGS: NAME, TITLE, COMPANY, SCHOOL, GEO, SKILL )
15
query tagging: identifying entities in the query
TITLE CO GEO
MORE PRECISE MATCHING WITH DOCUMENTS
16
entity-based filtering
BEFORE
17
entity-based filtering
AFTER
BEFORE
18
entity-based filtering
BEFORE
19
entity-based filtering
AFTER
BEFORE
20
entity-based suggestions
21
entity-based suggestions
22
query tagging: sequential model
EMISSION PROBABILITIES
(learned from user profiles)
TRANSITION PROBABILITIES
(learned from query logs)
TRAINING
23
query tagging: sequential model
INFERENCE
given a query, find the most likely sequence of tags
24
query understanding pipeline
spellcheck
query tagging
vertical intent prediction
query expansion
raw query
structured query+
annotations
25
vertical intent prediction: distribution
JOBS
PEOPLE
COMPANIES
(probability distribution over verticals)
26
vertical intent prediction: relevance
[company]
[employees]
[jobs]
[name search]
27
query understanding pipeline
spellcheck
query tagging
vertical intent prediction
query expansion
raw query
structured query+
annotations
28
query expansion: name synonyms
29
query expansion: job title synonyms
30
query expansion: signals
[jon] [jonathan] CLICK
trained using query chains:
[programmer] [developer] CLICK
symmetric but not transitive!
[francis] ⇔ [frank]
[franklin] ⇔ [frank]
[francis] ≠ [franklin]
[software engineer] [software developer] CLICK
context based!
[software engineer] => [software developer]
[civil engineer] ≠ [civil developer]
31
query understanding pipeline
spellcheck
query tagging
vertical intent prediction
query expansion
raw query
structured query+
annotations
32
what else can we learn from search in the wild?
33
don’t guess when it’s better to ask
vs.
34
clarify then refine
computers books
35
give users transparency, guidance, and control
36
think beyond individual search queries
Gene Golovchinsky, FXPAL
37
know when you don’t know
Claudia Hauff, Query Difficulty for Digital Libraries [2009]