Upload
clifton-eaton
View
212
Download
0
Embed Size (px)
Citation preview
Post-Ranking query suggestion by diversifying search
Chao Wang
Missiondiversifying the content of the search results from suggested queries while keep-ing the suggestion relevant
random walk : a mathematical formalization of a path that consists of a succession of random steps.
Example: Stock price, Molecule travels in a liquid
Suggested queriesAfter a user submits a query, a set of relevant queries are suggested to the user.
If not satisfied with the results on the page, the user may choose to click on the suggested queries.
Research indicated that query suggestion greatly improves user satisfaction rate.
Existing work and improvement
Focus on discovering relevant queries from search engine logs. ( co-clicked URLS and session information)
They forget to address the diversification of the query suggestions. When a user clicks on the suggested query,he/she expects to gain additional information.
SERP diversification between two queries to be the
difference between their top-returned search results. .Example:Delta airline
related workRandom walk model: Queries and URLs are represented as nodes in a bipartite graph where each edge connects one query with one URL, which indicates a click.
Entropy model: various user clicks have different importance. A click on a more specific URL is weighted higher than a click on a general URL
Rare queries: combine information from clicked URLs and skipped URLs by constructing two bipartite graphs
Rare queries: use walk model on the query-URL bipartite graph by calculating the query hitting time and can encourage diversities.
MissionMission: Rather than focusing on improving the relevance of documents by re-ranking them, we aim at re-ranking suggested queries which help users refine their intent .
previous limitation: the existing works on diversifying search results only
focused on ambiguous queries where those queries have more
than one user intents,
previous limitation: only focus on relevance and do not consider diversification issue.
Generate suggestion candidates
Collected from random walk model : Apply to the query-click logs.
User session : find out user activities within a certain period of time to extract relevant queries
Ranking Function
Features 1 Open directory project :https://www.dmoz.org/
Build using a binary tree
Paper example : (next page)
Features 2, 3 , 4 Feature 2 and 3 check similarity between URL strings and domain names. Value = 1 if two strings are the same and 0 otherwise.
Feature 4 compute the correlation between two ordered SERP lists. Concordant if both URLs are identical and ranked at the same positionSimilarity calculation : not main focus on this paper.
Training labels and learning algorithmsask people to evaluate the relevance between query and suggestions. ( score between 0 and 3)
Classification : support vector machines classify instances into one of the four classes with detailed ranked score. Example.
The research is based on LambdaSMART algorithm because of its superior performance.
13
When data is very informative, shrinkage is zero and it moves toward 1 when data is less informative,
Data acquisition Randomly samples 13,421 queries between Sep 2010 and Nov 2010. These are queries that trigger at least one related search on the search result page
performance for different query typesAverage query length : 2.51. Average suggestion length.
Long > 4, medium 2<= length <=4, short < 2
Navigational queries and information queries
Normalized discounted cumulative gain (NDCG): a measure of ranking quality and used to measure effectiveness of web search engine algorithms. value between 0 and 1
performance for different query types
conclusionFirst gather a set of suggestion candidates then rank them suggestions based on their diversification scores.
Diversification score based on features : ODP category, URL string difference, domain difference.
Important discovery : the similarity between queries and suggested queries indeed drops
lots of room for improvement and will explore more features