Upload
rohit-shrestha
View
213
Download
0
Embed Size (px)
Citation preview
8/3/2019 Icml05 Jin Lwtir 01
1/58
OverviewLearn Term Weights
ExperimentSummary
Learn to Weight Term in Information RetrievalUsing Category Information
Rong Jin1 Joyce Y. Chai1 Luo Si2
1Department of Computer Science and EngineeringMichigan State University
2
School of Computer ScienceCarnegie Mellon University
International Conference on Machine Learning, 2005
Jin, Chai, and Si Learn to Weight Terms
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
2/58
OverviewLearn Term Weights
ExperimentSummary
Outline1 Overview of Term Weighting Methods in Information
RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods
2 Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach
3 ExperimentExperimental DesignBaseline ApproachesExperimental Results
4
SummaryJin, Chai, and Si Learn to Weight Terms
http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
3/58
OverviewLearn Term Weights
ExperimentSummary
Outline1 Overview of Term Weighting Methods in Information
RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods
2 Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach
3 ExperimentExperimental DesignBaseline ApproachesExperimental Results
4
SummaryJin, Chai, and Si Learn to Weight Terms
http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
4/58
OverviewLearn Term Weights
ExperimentSummary
Outline1 Overview of Term Weighting Methods in Information
RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods
2 Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach
3 ExperimentExperimental DesignBaseline ApproachesExperimental Results
4
SummaryJin, Chai, and Si Learn to Weight Terms
O i
http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
5/58
OverviewLearn Term Weights
ExperimentSummary
Outline1 Overview of Term Weighting Methods in Information
RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods
2 Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach
3 ExperimentExperimental DesignBaseline ApproachesExperimental Results
4
SummaryJin, Chai, and Si Learn to Weight Terms
O i
http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
6/58
OverviewLearn Term Weights
ExperimentSummary
TF.IDFLanguage ModelsProblems
Outline1 Overview of Term Weighting Methods in Information
RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods
2 Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach
3 ExperimentExperimental DesignBaseline ApproachesExperimental Results
4
SummaryJin, Chai, and Si Learn to Weight Terms
Overview
http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
7/58
OverviewLearn Term Weights
ExperimentSummary
TF.IDFLanguage ModelsProblems
Term Weighting Methods based on TF.IDF
Most popular methods in information retrieval.
Consist of three factors
Term frequency (TF): f(w, d)- How frequent does the term w appear in document d
Inverse document frequency (IDF):- How rare is term w in a collection C
idf(w) = log
N + 0.5
N(w)
N : the total number of documents in collection C
N(w) : the number of documents in C having word w
Document normalization factor, e.g. d2- Reduce the bias of long documents
Jin, Chai, and Si Learn to Weight Terms
Overview
http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
8/58
OverviewLearn Term Weights
ExperimentSummary
TF.IDFLanguage ModelsProblems
Term Weighting Methods based on TF.IDF
Most popular methods in information retrieval.
Consist of three factors
Term frequency (TF): f(w, d)- How frequent does the term w appear in document d
Inverse document frequency (IDF):- How rare is term w in a collection C
idf(w) = log
N + 0.5
N(w)
N : the total number of documents in collection C
N(w) : the number of documents in C having word w
Document normalization factor, e.g. d2- Reduce the bias of long documents
Jin, Chai, and Si Learn to Weight Terms
Overview
http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
9/58
OverviewLearn Term Weights
ExperimentSummary
TF.IDFLanguage ModelsProblems
Term Weighting Methods based on TF.IDF
Most popular methods in information retrieval.
Consist of three factors
Term frequency (TF): f(w, d)- How frequent does the term w appear in document d
Inverse document frequency (IDF):- How rare is term w in a collection C
idf(w) = log
N + 0.5
N(w)
N : the total number of documents in collection C
N(w) : the number of documents in C having word w
Document normalization factor, e.g. d2- Reduce the bias of long documents
Jin, Chai, and Si Learn to Weight Terms
OverviewTF IDF
http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
10/58
OverviewLearn Term Weights
ExperimentSummary
TF.IDFLanguage ModelsProblems
Term Weighting Methods based on TF.IDF
Most popular methods in information retrieval.
Consist of three factors
Term frequency (TF): f(w, d)- How frequent does the term w appear in document d
Inverse document frequency (IDF):- How rare is term w in a collection C
idf(w) = log
N + 0.5
N(w)
N : the total number of documents in collection C
N(w) : the number of documents in C having word w
Document normalization factor, e.g. d2- Reduce the bias of long documents
Jin, Chai, and Si Learn to Weight Terms
OverviewTF IDF
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
11/58
Learn Term WeightsExperiment
Summary
TF.IDFLanguage ModelsProblems
Okapi: An Example of TF.IDF Term Weighting
Similarity between query q and document d is:
sim(d, q) =wq
kf(w, q)f(w, d)
f(w, d) + k(1 b + b |d|
d
)log
N + 0.5
N(w)
where
f(w, q) : term frequency ofw in query q
f(w, d) : term frequency ofw in d
d : is the average document length of collection C.
k, b : weight parameters determined empirically
Jin, Chai, and Si Learn to Weight Terms
OverviewTF IDF
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
12/58
Learn Term WeightsExperiment
Summary
TF.IDFLanguage ModelsProblems
Outline1 Overview of Term Weighting Methods in Information
RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods
2 Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach
3 ExperimentExperimental DesignBaseline ApproachesExperimental Results
4 Summary
Jin, Chai, and Si Learn to Weight Terms
OverviewTF IDF
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
13/58
Learn Term WeightsExperiment
Summary
TF.IDFLanguage ModelsProblems
Term Weighting Methods based on Language Models
Assume each document d is generated by a statistical
model dEstimate d by maximizing likelihood p(d|d)
Usually a smoothing technique, such as Jelink Mercersmoothing and Dirichlet smoothing, is used to deal withthe sparse data problem
Jin, Chai, and Si Learn to Weight Terms
OverviewL T W i h
TF.IDF
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
14/58
Learn Term WeightsExperiment
Summary
TF.IDFLanguage ModelsProblems
Term Weighting Methods based on Language Models
Assume each document d is generated by a statistical
model dEstimate d by maximizing likelihood p(d|d)
Usually a smoothing technique, such as Jelink Mercersmoothing and Dirichlet smoothing, is used to deal withthe sparse data problem
Jin, Chai, and Si Learn to Weight Terms
OverviewL T W i ht
TF.IDF
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
15/58
Learn Term WeightsExperiment
Summary
Language ModelsProblems
Term Weighting Methods based on Language Models
Assume each document d is generated by a statistical
model dEstimate d by maximizing likelihood p(d|d)
Usually a smoothing technique, such as Jelink Mercersmoothing and Dirichlet smoothing, is used to deal withthe sparse data problem
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
TF.IDF
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
16/58
Learn Term WeightsExperiment
Summary
Language ModelsProblems
An Example of Language models for InformationRetrieval
The unigram language model p(w|d) based on JelinkMercer smoothing:
p(w|d) = (1 )p(w|C) + f(w, d)
|d|
= p(w|C)
1 +
f(w, d)
|d|p(w|C)
where is a smoothing parameter.The similarity of query q to document d is estimated as
sim(q, d) p(q|d) wq
[p(w|d)]f(w,q)
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
TF.IDF
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
17/58
Learn Term WeightsExperiment
Summary
Language ModelsProblems
An Example of Language models for InformationRetrieval
The unigram language model p(w|d) based on JelinkMercer smoothing:
p(w|d) = (1 )p(w|C) + f(w, d)
|d|
= p(w|C)
1 +
f(w, d)
|d|p(w|C)
where is a smoothing parameter.The similarity of query q to document d is estimated as
sim(q, d) p(q|d) wq
[p(w|d)]f(w,q)
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
TF.IDF
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
18/58
Learn Term WeightsExperiment
Summary
Language ModelsProblems
Outline1 Overview of Term Weighting Methods in Information
RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods
2
Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach
3 ExperimentExperimental DesignBaseline ApproachesExperimental Results
4 Summary
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
TF.IDFL M d l
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
19/58
Learn Term WeightsExperiment
Summary
Language ModelsProblems
Problems with Existing Term Weighting Methods
The essential difficulty with determining term weights is thelack of supervision.
Problems with TF.IDF methods
Either IDF or TF is sufficient to determine if a word isinformative.
IDF factor rare words are informative wordsBut, typos are usually rare and uninformative.
Problems with language modeling approaches
They are generative models Insufficient to distinguish informative words fromuninformative ones
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
TF.IDFL M d l
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
20/58
gExperiment
Summary
Language ModelsProblems
Problems with Existing Term Weighting Methods
The essential difficulty with determining term weights is thelack of supervision.
Problems with TF.IDF methods
Either IDF or TF is sufficient to determine if a word isinformative.
IDF factor rare words are informative wordsBut, typos are usually rare and uninformative.
Problems with language modeling approaches
They are generative models Insufficient to distinguish informative words fromuninformative ones
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
TF.IDFLanguage Models
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
21/58
gExperiment
Summary
Language ModelsProblems
Problems with Existing Term Weighting Methods
The essential difficulty with determining term weights is thelack of supervision.
Problems with TF.IDF methods
Either IDF or TF is sufficient to determine if a word isinformative.
IDF factor rare words are informative wordsBut, typos are usually rare and uninformative.
Problems with language modeling approaches
They are generative models Insufficient to distinguish informative words fromuninformative ones
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
TF.IDFLanguage Models
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
22/58
ExperimentSummary
Language ModelsProblems
Problems with Existing Term Weighting Methods
The essential difficulty with determining term weights is thelack of supervision.
Problems with TF.IDF methods
Either IDF or TF is sufficient to determine if a word isinformative.
IDF factor rare words are informative wordsBut, typos are usually rare and uninformative.
Problems with language modeling approaches
They are generative models Insufficient to distinguish informative words fromuninformative ones
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
E i
FrameworkA Regression Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
23/58
ExperimentSummary
A Regression ApproachA Probabilistic Approach
Outline1 Overview of Term Weighting Methods in Information
RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods
2
Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach
3 ExperimentExperimental DesignBaseline ApproachesExperimental Results
4 Summary
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
E i t
FrameworkA Regression Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
24/58
ExperimentSummary
A Regression ApproachA Probabilistic Approach
Learn Term Weights Using Category Information
Given: each document is assigned to a set of categories
Goal: learn term weights from the assigned categories of
documentsMain idea:
Each document is represented by both a bag of words and aset of categoriesCompute document similarity based on word sw(di, dj)
Compute document similarity based on category sc(di, dj)Find term weights sw(di, dj) sc(di, dj)
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
Experiment
FrameworkA Regression Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
25/58
ExperimentSummary
g ppA Probabilistic Approach
Learn Term Weights Using Category Information
Given: each document is assigned to a set of categories
Goal: learn term weights from the assigned categories of
documentsMain idea:
Each document is represented by both a bag of words and aset of categoriesCompute document similarity based on word sw(di, dj)
Compute document similarity based on category sc(di, dj)Find term weights sw(di, dj) sc(di, dj)
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
Experiment
FrameworkA Regression Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
26/58
ExperimentSummary
A Probabilistic Approach
Learn Term Weights Using Category Information
Given: each document is assigned to a set of categories
Goal: learn term weights from the assigned categories of
documentsMain idea:
Each document is represented by both a bag of words and aset of categoriesCompute document similarity based on word sw(di, dj)
Compute document similarity based on category sc(di, dj)Find term weights sw(di, dj) sc(di, dj)
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
Experiment
FrameworkA Regression ApproachA A
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
27/58
ExperimentSummary
A Probabilistic Approach
Learn Term Weights Using Category Information
Given: each document is assigned to a set of categories
Goal: learn term weights from the assigned categories of
documentsMain idea:
Each document is represented by both a bag of words and aset of categoriesCompute document similarity based on word sw(di, dj)
Compute document similarity based on category sc(di, dj)Find term weights sw(di, dj) sc(di, dj)
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
Experiment
FrameworkA Regression ApproachA P b bili ti A h
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
28/58
ExperimentSummary
A Probabilistic Approach
A Framework for Learning Term Weights UsingCategory Information
For each document di, we have
Word based Rep. wi = (wi,1, wi,2,...,wi,n)T
Category based Rep. ci = (ci,1, ci,2,...,ci,n)T
Word based document similarity
sw(di, dj ; ) =
m
k=1
kwi,kwj,k
Category based document similarity
sc(di, dj ; ) =
mk=1
kci,kcj,k
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
Experiment
FrameworkA Regression ApproachA P b bili ti A h
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
29/58
pSummary
A Probabilistic Approach
A Framework for Learning Term Weights UsingCategory Information
For each document di, we have
Word based Rep. wi = (wi,1, wi,2,...,wi,n)T
Category based Rep. ci = (ci,1, ci,2,...,ci,n)T
Word based document similarity
sw(di, dj ; ) =
m
k=1
kwi,kwj,k
Category based document similarity
sc(di, dj ; ) =
mk=1
kci,kcj,k
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
Experiment
FrameworkA Regression ApproachA Probabilistic Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
30/58
pSummary
A Probabilistic Approach
A Framework for Learning Term Weights UsingCategory Information (Contd)
Find weights and s.t. sw(di, dj ; ) sc(di, dj ; ) for
any two documents di and dj
(, ) = arg min,
i=j
l(sc(di, dj ; ), sw(di, dj ; ))
where l(x, y) is a loss function measures the differencebetween x and y.
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
Experiment
FrameworkA Regression ApproachA Probabilistic Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
31/58
SummaryA Probabilistic Approach
Outline1 Overview of Term Weighting Methods in Information
RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods
2 Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach
3 ExperimentExperimental DesignBaseline ApproachesExperimental Results
4 Summary
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
ExperimentS
FrameworkA Regression ApproachA Probabilistic Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
32/58
SummaryA Probabilistic Approach
A Regression Approach Toward Learning Term Weights
Define loss function l(sc, sw) = sc sw2
Objective function Freg
Freg = (T, T) Q
cPT
P Qw
where
[Qw]i,j = (uTi uj)
2, [Qc]i,j = (vTi v)
2, [P]i,j = (uTi vj)
2
ui : frequency vector for the i-th termvi : frequency vector for the j-th category
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term WeightsExperiment
S
FrameworkA Regression ApproachA Probabilistic Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
33/58
SummaryA Probabilistic Approach
A Regression Approach Toward Learning Term Weights
Define loss function l(sc, sw) = sc sw2
Objective function Freg
Freg = (T, T) Qc PT
P Qw
where
[Qw]i,j = (uTi uj)
2, [Qc]i,j = (vTi v)
2, [P]i,j = (uTi vj)
2
ui : frequency vector for the i-th termvi : frequency vector for the j-th category
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term WeightsExperiment
Summar
FrameworkA Regression ApproachA Probabilistic Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
34/58
Summarypp
The Regression Approach: Constraints
Trivial solution = = 0 Freg = 0L2 Constraint:
22 + 22 1
Problem: negative term weight i < 0 When two documents share word wi, they are less likelyto be similar
L1 Constraint:
i 0; j 0mi=1
i +ni=1
i 1
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term WeightsExperiment
Summary
FrameworkA Regression ApproachA Probabilistic Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
35/58
Summary
The Regression Approach: Constraints
Trivial solution = = 0 Freg = 0L2 Constraint:
22 + 22 1
Problem: negative term weight i < 0 When two documents share word wi, they are less likelyto be similar
L1 Constraint:
i 0; j 0mi=1
i +ni=1
i 1
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term WeightsExperiment
Summary
FrameworkA Regression ApproachA Probabilistic Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
36/58
Summary
The Regression Approach: Constraints
Trivial solution = = 0 Freg = 0L2 Constraint:
22 + 22 1
Problem: negative term weight i < 0 When two documents share word wi, they are less likelyto be similar
L1 Constraint:
i 0; j 0mi=1
i +ni=1
i 1
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term WeightsExperiment
Summary
FrameworkA Regression ApproachA Probabilistic Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
37/58
Summary
The Regression Approach: Constraints
Trivial solution = = 0 Freg = 0L2 Constraint:
22 + 22 1
Problem: negative term weight i < 0 When two documents share word wi, they are less likelyto be similar
L1 Constraint:
i 0; j 0mi=1
i +ni=1
i 1
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term WeightsExperiment
Summary
FrameworkA Regression ApproachA Probabilistic Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
38/58
y
The Regression Approach: Final Form
Final form for the regression approach
min, (T, T) Q
cPT
P Qw
s. t 0, 0
1 + 1 1
Solve by quadratic programming techiques
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term WeightsExperiment
Summary
FrameworkA Regression ApproachA Probabilistic Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
39/58
y
Outline1 Overview of Term Weighting Methods in Information
RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods
2 Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach
3 ExperimentExperimental DesignBaseline ApproachesExperimental Results
4 Summary
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term WeightsExperiment
Summary
FrameworkA Regression ApproachA Probabilistic Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
40/58
A Probabilistic Approach Toward Learning TermWeights
Probability for documents to be similar based on words
pwi,j =1
1 + exp (sw(di, dj ; ) + 0)
Probability for documents to be similar based on categories
pci,j =1
1 + exp (sc(di, dj ; ) + 0)
Loss function: cross entropy function
l(sc(di, dj ; ), sw(di, dj ; )) =
pci,j logpwi,j (1 p
ci,j) log(1 p
wi,j)
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term WeightsExperiment
Summary
FrameworkA Regression ApproachA Probabilistic Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
41/58
A Probabilistic Approach Toward Learning TermWeights
Probability for documents to be similar based on words
pwi,j =1
1 + exp (sw(di, dj ; ) + 0)
Probability for documents to be similar based on categories
pci,j =1
1 + exp (sc(di, dj ; ) + 0)
Loss function: cross entropy function
l(sc(di, dj ; ), sw(di, dj ; )) =
pci,j logpwi,j (1 p
ci,j) log(1 p
wi,j)
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term WeightsExperiment
Summary
FrameworkA Regression ApproachA Probabilistic Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
42/58
The Probabilistic Approach: Final Form
Objective function Fprob
Fprob =Ni=j
pci,j logpwi,j + (1 p
ci,j) log(1 p
wi,j)
The final form for the probabilistic approach:
arg max,,0,0
Fprob w
n
i=1
i c
m
i=1
i
s. t. 0, 0
where w > 0 and c > 0 are regularization parameters.
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term WeightsExperiment
Summary
FrameworkA Regression ApproachA Probabilistic Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
43/58
The Probabilistic Approach: Optimization Strategy
Alternating OptimizationLearn term weights with fixed category weights
Decouple the correlation among
Fprob(
, ) Fprob(, )
n
i=1
gi(
i i)
and are term weights of two consecutive iterations.Solve
g
i(i) = 0
= +
Learn category weights with fixed term weights
A similar procedure for optimizing with fixed
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term WeightsExperiment
Summary
FrameworkA Regression ApproachA Probabilistic Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
44/58
The Probabilistic Approach: Optimization Strategy
Alternating OptimizationLearn term weights with fixed category weights
Decouple the correlation among
Fprob(
, ) Fprob(, )
n
i=1
gi(
i i)
and are term weights of two consecutive iterations.Solve
g
i(i) = 0
= +
Learn category weights with fixed term weights
A similar procedure for optimizing with fixed
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term WeightsExperiment
Summary
FrameworkA Regression ApproachA Probabilistic Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
45/58
The Probabilistic Approach: Optimization Strategy
Alternating OptimizationLearn term weights with fixed category weights
Decouple the correlation among
Fprob(
, ) Fprob(, )
n
i=1
gi(
i i)
and are term weights of two consecutive iterations.Solve
g
i(i) = 0
= +
Learn category weights with fixed term weights
A similar procedure for optimizing with fixed
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
ExperimentSummary
FrameworkA Regression ApproachA Probabilistic Approach
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
46/58
The Probabilistic Approach: Optimization Strategy
Alternating OptimizationLearn term weights with fixed category weights
Decouple the correlation among
Fprob(
, ) Fprob(, )
n
i=1
gi(
i i)
and are term weights of two consecutive iterations.Solve
g
i(i) = 0
= +
Learn category weights with fixed term weights
A similar procedure for optimizing with fixed
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
ExperimentSummary
Experimental DesignBaseline ApproachesExperimental Results
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
47/58
Outline1 Overview of Term Weighting Methods in Information
RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods
2 Learn Term Weights Using Category Information
A Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach
3 ExperimentExperimental DesignBaseline ApproachesExperimental Results
4 Summary
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
ExperimentSummary
Experimental DesignBaseline ApproachesExperimental Results
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
48/58
Experimental Design
Document collection
A document collection from the ad hoc retrieval task ofImageCLEFTotally 28,133 documents, 933 categoriesAverage document length 50Average number of categories for a document 5
Evaluation Queries
5 queries from ImageCLEF 2003 for training w and c25 queries from ImageCLEF 2004 for testing
Evaluation metricsAverage precision for top retried documentsAverage precision across 11 recall pointsPrecision recall curve
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
ExperimentSummary
Experimental DesignBaseline ApproachesExperimental Results
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
49/58
Experimental Design
Document collection
A document collection from the ad hoc retrieval task ofImageCLEFTotally 28,133 documents, 933 categoriesAverage document length 50Average number of categories for a document 5
Evaluation Queries
5 queries from ImageCLEF 2003 for training w and c25 queries from ImageCLEF 2004 for testing
Evaluation metricsAverage precision for top retried documentsAverage precision across 11 recall pointsPrecision recall curve
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
ExperimentSummary
Experimental DesignBaseline ApproachesExperimental Results
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
50/58
Experimental Design
Document collection
A document collection from the ad hoc retrieval task ofImageCLEFTotally 28,133 documents, 933 categoriesAverage document length 50Average number of categories for a document 5
Evaluation Queries
5 queries from ImageCLEF 2003 for training w and c25 queries from ImageCLEF 2004 for testing
Evaluation metricsAverage precision for top retried documentsAverage precision across 11 recall pointsPrecision recall curve
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
ExperimentSummary
Experimental DesignBaseline ApproachesExperimental Results
O li
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
51/58
Outline1 Overview of Term Weighting Methods in Information
RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods
2 Learn Term Weights Using Category Information
A Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach
3
ExperimentExperimental DesignBaseline ApproachesExperimental Results
4 Summary
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
ExperimentSummary
Experimental DesignBaseline ApproachesExperimental Results
B li A h
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
52/58
Baseline Approaches
State-of-art information retrieval methods
The Okapi method (Okapi)The language model with JM smoothing (LM)
Inverse category frequency (ICF)
icf(w) = log
m
m(w)
m(w) : number of categories having word w
Replace idf(w) with icf(w) in the Okapi method
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
ExperimentSummary
Experimental DesignBaseline ApproachesExperimental Results
B li A h (C td)
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
53/58
Baseline Approaches (Contd)
Category-based query expansion (CQE)
1 Retrieve top k = 100 documents for query q using Okapi2 Expand query q to include category information
q = {f(w1, q),...,f(wn, q); f(c1, q),...,f(cm, q)}
f(ci, q) : the number of top k documents in category ci
3 Retrieve documents using the expanded query q
logp(q
|d) =
n
i=1 f(wi, q)logp(wi|d)ni=1 f(wi, q)
+(1 )
mi=1 f(ci, q)logp(ci|d)mi=1 f(ci, q)
Jin, Chai, and Si Learn to Weight Terms
OverviewLearn Term Weights
ExperimentSummary
Experimental DesignBaseline ApproachesExperimental Results
O tli
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
54/58
Outline1 Overview of Term Weighting Methods in Information
RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods
2 Learn Term Weights Using Category Information
A Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach
3
ExperimentExperimental DesignBaseline ApproachesExperimental Results
4 Summary
Jin, Chai, and Si Learn to Weight Terms
Overview
Learn Term WeightsExperiment
Summary
Experimental DesignBaseline ApproachesExperimental Results
P i i R ll C
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
55/58
Precision Recall Curves
0 0.2 0.4 0.6 0.8 10.2
0.3
0.4
0.5
0.6
0.7
0.8
Recall
Precision
Regression
Discriminative
Inverse Category Frequency
Categorybased Query Expansion
Okapi
Language Model
Probabilistic approach > Language Model & Okapi
Jin, Chai, and Si Learn to Weight Terms
Overview
Learn Term WeightsExperiment
Summary
Experimental DesignBaseline ApproachesExperimental Results
Average Precision
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
56/58
Average Precision
Using Category No Category
Reg. Prob. ICF CQE Okapi LM
Avg. Prec. 0.45 0.48 0.38 0.42 0.41 0.41Prec @ 5 doc 0.55 0.56 0.40 0.50 0.47 0.50
Prec @ 10 doc 0.48 0.52 0.40 0.48 0.45 0.48Prec @ 20 doc 0.46 0.46 0.39 0.42 0.39 0.38Prec @ 100 doc 0.21 0.21 0.19 0.19 0.20 0.20
Reg. and Prob. > Okapi and LM- Category information is useful
ICF and CQE < Okapi and LM- Need to exploit category information wisely
Jin, Chai, and Si Learn to Weight Terms
Overview
Learn Term WeightsExperiment
Summary
Experimental DesignBaseline ApproachesExperimental Results
Retrieval Precision for Individual Queries
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
57/58
Retrieval Precision for Individual Queries
0 5 10 15 20 250
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Query ID
Average
Precision
Langauge ModelInverse Category Frequency
Categorybased Query Expansion
Probabilistic
Over 16 queries, probabilistic approach > langauge model
Over 5 queries, probabilistic approach < langauge model
Jin, Chai, and Si Learn to Weight Terms
Overview
Learn Term WeightsExperiment
Summary
Summary
http://goforward/http://find/http://goback/8/3/2019 Icml05 Jin Lwtir 01
58/58
Summary
Proposed two algorithms for learning term weights usingcategory information
A regression approachA probabilistic approach
Empirical studies with the ImageCLEF dataset verify theeffectiveness of the proposed algorithms
Future work
Improve learning efficiency for large numbers of documents
and large-sized vocabulariesExtend to image retrieval for annotated images
Jin, Chai, and Si Learn to Weight Terms
http://goforward/http://find/http://goback/