Upload
elyse
View
48
Download
1
Tags:
Embed Size (px)
DESCRIPTION
TransRank: A Novel Algorithm for Transfer of Rank Learning. Depin Chen, Jun Yan, Gang Wang et al. University of Science and Technology of China, USTC Machine Learning Group, MSRA [email protected]. Content. Ranking for IR Paper motivation The algorithm: TransRank - PowerPoint PPT Presentation
Citation preview
TransRank: A Novel Algorithm for
Transfer of Rank Learning
Depin Chen, Jun Yan, Gang Wang et al.
University of Science and Technology of China, USTCMachine Learning Group, MSRA
Page 12008-12-15
Content
• Ranking for IR
• Paper motivation
• The algorithm: TransRank
• Results & future work
2008-12-15Page 2
Ranking in IR
• Ranking is crucial in information retrieval. It aims
to move the good results up, while the bad down.
• A well known example: web search engine
Page 32008-12-15
Learning to rank
• Ranking + Machine learning = Learning to rank
• An early work
Ranking SVM, “Support Vector Learning for
Ordinal Regression” , Herbrich et al [ICANN 99].
Page 42008-12-15
Learning to rank for IR
Page 52008-12-15
Existing approaches
• Early ones
Ranking SVM, RankBoost …
• Recently
IRSVM, AdaRank, ListNet ...
• Tie-Yan Liu’s team at MSRA
Page 62008-12-15
Content
• Learning to rank in IR
• Paper motivation
• The algorithm: TransRank
• Results & future work
Page 72008-12-15
Training data shortage
• Learning to rank relies on the full supply of
labeled training data.
• In real world practice …
Page 82008-12-15
Transfer learning
• Transfer learning definition
Transfer knowledge learned from different but
related problems to solve current problem effectively,
with fewer training data and less time [Yang, 2008].
– Learning to walk can help learn to run
– learning to program with C++ can help learn to program
with JAVA
– …
• We follow the spirit of transfer learning in this paper.
Page 92008-12-15
Content
• Learning to rank in IR
• Paper motivation
• The algorithm: TransRank
• Results & future work
Page 102008-12-15
Problem formulation
• St: training data in target domain
Ss: auxiliary training data from a source domain
• Note that,
• What we want?
A ranking function for the target domain
Page 112008-12-15
TransRank
• Three steps of TransRank:
Page 122008-12-15
Step 1: K-best query selection
• Query’s ranking direction
query 11 in OHSUMED query 41 in OHSUMED
Page 132008-12-15
• The goal of step 1:We want to select the queries from source domain who have the most similar ranking directions with the target domain data.
• These queries are treated to be most like the target domain training data.
23/4/21Microsoft Confidential Page 14
Utility function (1)
• Preprocess Ss:
select k best queries, and discard the rest.
• A “best” query is the query, whose ranking
direction is confidently similar with that of queries
in St.
• The utility function combines two parts:
confidence and similarity.
Page 152008-12-15
Utility function (2)
• Confidence is valued using a separation value.
The better different classes of instances are
separated, the ranking direction will be more
confident.
Page 162008-12-15
Utility function (3)
• Cosine similarity.
Page 172008-12-15
Step 2: Feature augmentation
• Daumé implemented cross-domain classification
in NLP through a method called “feature
augmentation” [ACL 07] .
• For source-domain document vector (1, 2, 3)
(1, 2, 3)(1, 2, 3, 1, 2, 3, 0, 0, 0)
• For target-domain document vector (1, 2, 3)
(1, 2, 3)(1, 2, 3, 0, 0, 0, 1, 2, 3)
Page 182008-12-15
Step 3: Ranking SVM
• Ranking SVM is the state-of-the-art learning to
rank algorithm, proposed by Herbrich et al
[ICANN 99].
Page 192008-12-15
Content
• Learning to rank in IR
• Paper motivation
• The heuristic algorithm: TransRank
• Results & future work
Page 202008-12-15
Experimental settings
• Datasets: OHSUMED (the LETOR version), WSJ, AP
• Features: feature set defined in OHSUMED. Same
features are abstracted on WSJ and AP
• Evaluation measures: NDCG@n, MAP
• For Ranking SVM, we use SVMlight by Joachims.
• Two group of experiments
WSJ OHSUMED
AP OHSUMED
Page 212008-12-15
Compared algorithms
• Baseline: run Ranking SVM on St
• TransRank
• Directly Mix: Step 1 + Step3
23/4/21Microsoft Confidential Page 22
Performance comparison
40% of target labeled data, k=10
source domain: WSJ source domain: AP
Page 232008-12-15
0.37
0.38
0.39
0.4
0.41
0.42
0.43
0.44
MAP NDCG@1 NDCG@3 NDCG@5 NDCG@10
Baseline
TransRank
Directly Mix
0.38
0.39
0.4
0.41
0.42
0.43
0.44
0.45
MAP NDCG@1 NDCG@3 NDCG@5 NDCG@10
Baseline
TransRank
Directly Mix
Impact of target labeled data
• From 5% to 100%, k=10
source domain: WSJ source domain: AP
Page 242008-12-15
Impact of k
40% of target labeled data
Page 252008-12-15
Future work
• Web scale experiments, i.e. data from search
engines
• More integrated algorithm using machine learning
techniques
• Theoretical study for transfer of rank learning
Page 262008-12-15
Q & A
Page 272008-12-15
Thanks!
Page 282008-12-15