Icml05 Jin Lwtir 01

Embed Size (px)

Citation preview

  • 8/3/2019 Icml05 Jin Lwtir 01

    1/58

    OverviewLearn Term Weights

    ExperimentSummary

    Learn to Weight Term in Information RetrievalUsing Category Information

    Rong Jin1 Joyce Y. Chai1 Luo Si2

    1Department of Computer Science and EngineeringMichigan State University

    2

    School of Computer ScienceCarnegie Mellon University

    International Conference on Machine Learning, 2005

    Jin, Chai, and Si Learn to Weight Terms

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    2/58

    OverviewLearn Term Weights

    ExperimentSummary

    Outline1 Overview of Term Weighting Methods in Information

    RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods

    2 Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach

    3 ExperimentExperimental DesignBaseline ApproachesExperimental Results

    4

    SummaryJin, Chai, and Si Learn to Weight Terms

    http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    3/58

    OverviewLearn Term Weights

    ExperimentSummary

    Outline1 Overview of Term Weighting Methods in Information

    RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods

    2 Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach

    3 ExperimentExperimental DesignBaseline ApproachesExperimental Results

    4

    SummaryJin, Chai, and Si Learn to Weight Terms

    http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    4/58

    OverviewLearn Term Weights

    ExperimentSummary

    Outline1 Overview of Term Weighting Methods in Information

    RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods

    2 Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach

    3 ExperimentExperimental DesignBaseline ApproachesExperimental Results

    4

    SummaryJin, Chai, and Si Learn to Weight Terms

    O i

    http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    5/58

    OverviewLearn Term Weights

    ExperimentSummary

    Outline1 Overview of Term Weighting Methods in Information

    RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods

    2 Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach

    3 ExperimentExperimental DesignBaseline ApproachesExperimental Results

    4

    SummaryJin, Chai, and Si Learn to Weight Terms

    O i

    http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    6/58

    OverviewLearn Term Weights

    ExperimentSummary

    TF.IDFLanguage ModelsProblems

    Outline1 Overview of Term Weighting Methods in Information

    RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods

    2 Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach

    3 ExperimentExperimental DesignBaseline ApproachesExperimental Results

    4

    SummaryJin, Chai, and Si Learn to Weight Terms

    Overview

    http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    7/58

    OverviewLearn Term Weights

    ExperimentSummary

    TF.IDFLanguage ModelsProblems

    Term Weighting Methods based on TF.IDF

    Most popular methods in information retrieval.

    Consist of three factors

    Term frequency (TF): f(w, d)- How frequent does the term w appear in document d

    Inverse document frequency (IDF):- How rare is term w in a collection C

    idf(w) = log

    N + 0.5

    N(w)

    N : the total number of documents in collection C

    N(w) : the number of documents in C having word w

    Document normalization factor, e.g. d2- Reduce the bias of long documents

    Jin, Chai, and Si Learn to Weight Terms

    Overview

    http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    8/58

    OverviewLearn Term Weights

    ExperimentSummary

    TF.IDFLanguage ModelsProblems

    Term Weighting Methods based on TF.IDF

    Most popular methods in information retrieval.

    Consist of three factors

    Term frequency (TF): f(w, d)- How frequent does the term w appear in document d

    Inverse document frequency (IDF):- How rare is term w in a collection C

    idf(w) = log

    N + 0.5

    N(w)

    N : the total number of documents in collection C

    N(w) : the number of documents in C having word w

    Document normalization factor, e.g. d2- Reduce the bias of long documents

    Jin, Chai, and Si Learn to Weight Terms

    Overview

    http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    9/58

    OverviewLearn Term Weights

    ExperimentSummary

    TF.IDFLanguage ModelsProblems

    Term Weighting Methods based on TF.IDF

    Most popular methods in information retrieval.

    Consist of three factors

    Term frequency (TF): f(w, d)- How frequent does the term w appear in document d

    Inverse document frequency (IDF):- How rare is term w in a collection C

    idf(w) = log

    N + 0.5

    N(w)

    N : the total number of documents in collection C

    N(w) : the number of documents in C having word w

    Document normalization factor, e.g. d2- Reduce the bias of long documents

    Jin, Chai, and Si Learn to Weight Terms

    OverviewTF IDF

    http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    10/58

    OverviewLearn Term Weights

    ExperimentSummary

    TF.IDFLanguage ModelsProblems

    Term Weighting Methods based on TF.IDF

    Most popular methods in information retrieval.

    Consist of three factors

    Term frequency (TF): f(w, d)- How frequent does the term w appear in document d

    Inverse document frequency (IDF):- How rare is term w in a collection C

    idf(w) = log

    N + 0.5

    N(w)

    N : the total number of documents in collection C

    N(w) : the number of documents in C having word w

    Document normalization factor, e.g. d2- Reduce the bias of long documents

    Jin, Chai, and Si Learn to Weight Terms

    OverviewTF IDF

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    11/58

    Learn Term WeightsExperiment

    Summary

    TF.IDFLanguage ModelsProblems

    Okapi: An Example of TF.IDF Term Weighting

    Similarity between query q and document d is:

    sim(d, q) =wq

    kf(w, q)f(w, d)

    f(w, d) + k(1 b + b |d|

    d

    )log

    N + 0.5

    N(w)

    where

    f(w, q) : term frequency ofw in query q

    f(w, d) : term frequency ofw in d

    d : is the average document length of collection C.

    k, b : weight parameters determined empirically

    Jin, Chai, and Si Learn to Weight Terms

    OverviewTF IDF

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    12/58

    Learn Term WeightsExperiment

    Summary

    TF.IDFLanguage ModelsProblems

    Outline1 Overview of Term Weighting Methods in Information

    RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods

    2 Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach

    3 ExperimentExperimental DesignBaseline ApproachesExperimental Results

    4 Summary

    Jin, Chai, and Si Learn to Weight Terms

    OverviewTF IDF

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    13/58

    Learn Term WeightsExperiment

    Summary

    TF.IDFLanguage ModelsProblems

    Term Weighting Methods based on Language Models

    Assume each document d is generated by a statistical

    model dEstimate d by maximizing likelihood p(d|d)

    Usually a smoothing technique, such as Jelink Mercersmoothing and Dirichlet smoothing, is used to deal withthe sparse data problem

    Jin, Chai, and Si Learn to Weight Terms

    OverviewL T W i h

    TF.IDF

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    14/58

    Learn Term WeightsExperiment

    Summary

    TF.IDFLanguage ModelsProblems

    Term Weighting Methods based on Language Models

    Assume each document d is generated by a statistical

    model dEstimate d by maximizing likelihood p(d|d)

    Usually a smoothing technique, such as Jelink Mercersmoothing and Dirichlet smoothing, is used to deal withthe sparse data problem

    Jin, Chai, and Si Learn to Weight Terms

    OverviewL T W i ht

    TF.IDF

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    15/58

    Learn Term WeightsExperiment

    Summary

    Language ModelsProblems

    Term Weighting Methods based on Language Models

    Assume each document d is generated by a statistical

    model dEstimate d by maximizing likelihood p(d|d)

    Usually a smoothing technique, such as Jelink Mercersmoothing and Dirichlet smoothing, is used to deal withthe sparse data problem

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    TF.IDF

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    16/58

    Learn Term WeightsExperiment

    Summary

    Language ModelsProblems

    An Example of Language models for InformationRetrieval

    The unigram language model p(w|d) based on JelinkMercer smoothing:

    p(w|d) = (1 )p(w|C) + f(w, d)

    |d|

    = p(w|C)

    1 +

    f(w, d)

    |d|p(w|C)

    where is a smoothing parameter.The similarity of query q to document d is estimated as

    sim(q, d) p(q|d) wq

    [p(w|d)]f(w,q)

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    TF.IDF

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    17/58

    Learn Term WeightsExperiment

    Summary

    Language ModelsProblems

    An Example of Language models for InformationRetrieval

    The unigram language model p(w|d) based on JelinkMercer smoothing:

    p(w|d) = (1 )p(w|C) + f(w, d)

    |d|

    = p(w|C)

    1 +

    f(w, d)

    |d|p(w|C)

    where is a smoothing parameter.The similarity of query q to document d is estimated as

    sim(q, d) p(q|d) wq

    [p(w|d)]f(w,q)

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    TF.IDF

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    18/58

    Learn Term WeightsExperiment

    Summary

    Language ModelsProblems

    Outline1 Overview of Term Weighting Methods in Information

    RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods

    2

    Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach

    3 ExperimentExperimental DesignBaseline ApproachesExperimental Results

    4 Summary

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    TF.IDFL M d l

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    19/58

    Learn Term WeightsExperiment

    Summary

    Language ModelsProblems

    Problems with Existing Term Weighting Methods

    The essential difficulty with determining term weights is thelack of supervision.

    Problems with TF.IDF methods

    Either IDF or TF is sufficient to determine if a word isinformative.

    IDF factor rare words are informative wordsBut, typos are usually rare and uninformative.

    Problems with language modeling approaches

    They are generative models Insufficient to distinguish informative words fromuninformative ones

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    TF.IDFL M d l

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    20/58

    gExperiment

    Summary

    Language ModelsProblems

    Problems with Existing Term Weighting Methods

    The essential difficulty with determining term weights is thelack of supervision.

    Problems with TF.IDF methods

    Either IDF or TF is sufficient to determine if a word isinformative.

    IDF factor rare words are informative wordsBut, typos are usually rare and uninformative.

    Problems with language modeling approaches

    They are generative models Insufficient to distinguish informative words fromuninformative ones

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    TF.IDFLanguage Models

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    21/58

    gExperiment

    Summary

    Language ModelsProblems

    Problems with Existing Term Weighting Methods

    The essential difficulty with determining term weights is thelack of supervision.

    Problems with TF.IDF methods

    Either IDF or TF is sufficient to determine if a word isinformative.

    IDF factor rare words are informative wordsBut, typos are usually rare and uninformative.

    Problems with language modeling approaches

    They are generative models Insufficient to distinguish informative words fromuninformative ones

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    TF.IDFLanguage Models

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    22/58

    ExperimentSummary

    Language ModelsProblems

    Problems with Existing Term Weighting Methods

    The essential difficulty with determining term weights is thelack of supervision.

    Problems with TF.IDF methods

    Either IDF or TF is sufficient to determine if a word isinformative.

    IDF factor rare words are informative wordsBut, typos are usually rare and uninformative.

    Problems with language modeling approaches

    They are generative models Insufficient to distinguish informative words fromuninformative ones

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    E i

    FrameworkA Regression Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    23/58

    ExperimentSummary

    A Regression ApproachA Probabilistic Approach

    Outline1 Overview of Term Weighting Methods in Information

    RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods

    2

    Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach

    3 ExperimentExperimental DesignBaseline ApproachesExperimental Results

    4 Summary

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    E i t

    FrameworkA Regression Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    24/58

    ExperimentSummary

    A Regression ApproachA Probabilistic Approach

    Learn Term Weights Using Category Information

    Given: each document is assigned to a set of categories

    Goal: learn term weights from the assigned categories of

    documentsMain idea:

    Each document is represented by both a bag of words and aset of categoriesCompute document similarity based on word sw(di, dj)

    Compute document similarity based on category sc(di, dj)Find term weights sw(di, dj) sc(di, dj)

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    Experiment

    FrameworkA Regression Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    25/58

    ExperimentSummary

    g ppA Probabilistic Approach

    Learn Term Weights Using Category Information

    Given: each document is assigned to a set of categories

    Goal: learn term weights from the assigned categories of

    documentsMain idea:

    Each document is represented by both a bag of words and aset of categoriesCompute document similarity based on word sw(di, dj)

    Compute document similarity based on category sc(di, dj)Find term weights sw(di, dj) sc(di, dj)

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    Experiment

    FrameworkA Regression Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    26/58

    ExperimentSummary

    A Probabilistic Approach

    Learn Term Weights Using Category Information

    Given: each document is assigned to a set of categories

    Goal: learn term weights from the assigned categories of

    documentsMain idea:

    Each document is represented by both a bag of words and aset of categoriesCompute document similarity based on word sw(di, dj)

    Compute document similarity based on category sc(di, dj)Find term weights sw(di, dj) sc(di, dj)

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    Experiment

    FrameworkA Regression ApproachA A

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    27/58

    ExperimentSummary

    A Probabilistic Approach

    Learn Term Weights Using Category Information

    Given: each document is assigned to a set of categories

    Goal: learn term weights from the assigned categories of

    documentsMain idea:

    Each document is represented by both a bag of words and aset of categoriesCompute document similarity based on word sw(di, dj)

    Compute document similarity based on category sc(di, dj)Find term weights sw(di, dj) sc(di, dj)

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    Experiment

    FrameworkA Regression ApproachA P b bili ti A h

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    28/58

    ExperimentSummary

    A Probabilistic Approach

    A Framework for Learning Term Weights UsingCategory Information

    For each document di, we have

    Word based Rep. wi = (wi,1, wi,2,...,wi,n)T

    Category based Rep. ci = (ci,1, ci,2,...,ci,n)T

    Word based document similarity

    sw(di, dj ; ) =

    m

    k=1

    kwi,kwj,k

    Category based document similarity

    sc(di, dj ; ) =

    mk=1

    kci,kcj,k

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    Experiment

    FrameworkA Regression ApproachA P b bili ti A h

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    29/58

    pSummary

    A Probabilistic Approach

    A Framework for Learning Term Weights UsingCategory Information

    For each document di, we have

    Word based Rep. wi = (wi,1, wi,2,...,wi,n)T

    Category based Rep. ci = (ci,1, ci,2,...,ci,n)T

    Word based document similarity

    sw(di, dj ; ) =

    m

    k=1

    kwi,kwj,k

    Category based document similarity

    sc(di, dj ; ) =

    mk=1

    kci,kcj,k

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    Experiment

    FrameworkA Regression ApproachA Probabilistic Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    30/58

    pSummary

    A Probabilistic Approach

    A Framework for Learning Term Weights UsingCategory Information (Contd)

    Find weights and s.t. sw(di, dj ; ) sc(di, dj ; ) for

    any two documents di and dj

    (, ) = arg min,

    i=j

    l(sc(di, dj ; ), sw(di, dj ; ))

    where l(x, y) is a loss function measures the differencebetween x and y.

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    Experiment

    FrameworkA Regression ApproachA Probabilistic Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    31/58

    SummaryA Probabilistic Approach

    Outline1 Overview of Term Weighting Methods in Information

    RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods

    2 Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach

    3 ExperimentExperimental DesignBaseline ApproachesExperimental Results

    4 Summary

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    ExperimentS

    FrameworkA Regression ApproachA Probabilistic Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    32/58

    SummaryA Probabilistic Approach

    A Regression Approach Toward Learning Term Weights

    Define loss function l(sc, sw) = sc sw2

    Objective function Freg

    Freg = (T, T) Q

    cPT

    P Qw

    where

    [Qw]i,j = (uTi uj)

    2, [Qc]i,j = (vTi v)

    2, [P]i,j = (uTi vj)

    2

    ui : frequency vector for the i-th termvi : frequency vector for the j-th category

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term WeightsExperiment

    S

    FrameworkA Regression ApproachA Probabilistic Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    33/58

    SummaryA Probabilistic Approach

    A Regression Approach Toward Learning Term Weights

    Define loss function l(sc, sw) = sc sw2

    Objective function Freg

    Freg = (T, T) Qc PT

    P Qw

    where

    [Qw]i,j = (uTi uj)

    2, [Qc]i,j = (vTi v)

    2, [P]i,j = (uTi vj)

    2

    ui : frequency vector for the i-th termvi : frequency vector for the j-th category

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term WeightsExperiment

    Summar

    FrameworkA Regression ApproachA Probabilistic Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    34/58

    Summarypp

    The Regression Approach: Constraints

    Trivial solution = = 0 Freg = 0L2 Constraint:

    22 + 22 1

    Problem: negative term weight i < 0 When two documents share word wi, they are less likelyto be similar

    L1 Constraint:

    i 0; j 0mi=1

    i +ni=1

    i 1

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term WeightsExperiment

    Summary

    FrameworkA Regression ApproachA Probabilistic Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    35/58

    Summary

    The Regression Approach: Constraints

    Trivial solution = = 0 Freg = 0L2 Constraint:

    22 + 22 1

    Problem: negative term weight i < 0 When two documents share word wi, they are less likelyto be similar

    L1 Constraint:

    i 0; j 0mi=1

    i +ni=1

    i 1

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term WeightsExperiment

    Summary

    FrameworkA Regression ApproachA Probabilistic Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    36/58

    Summary

    The Regression Approach: Constraints

    Trivial solution = = 0 Freg = 0L2 Constraint:

    22 + 22 1

    Problem: negative term weight i < 0 When two documents share word wi, they are less likelyto be similar

    L1 Constraint:

    i 0; j 0mi=1

    i +ni=1

    i 1

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term WeightsExperiment

    Summary

    FrameworkA Regression ApproachA Probabilistic Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    37/58

    Summary

    The Regression Approach: Constraints

    Trivial solution = = 0 Freg = 0L2 Constraint:

    22 + 22 1

    Problem: negative term weight i < 0 When two documents share word wi, they are less likelyto be similar

    L1 Constraint:

    i 0; j 0mi=1

    i +ni=1

    i 1

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term WeightsExperiment

    Summary

    FrameworkA Regression ApproachA Probabilistic Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    38/58

    y

    The Regression Approach: Final Form

    Final form for the regression approach

    min, (T, T) Q

    cPT

    P Qw

    s. t 0, 0

    1 + 1 1

    Solve by quadratic programming techiques

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term WeightsExperiment

    Summary

    FrameworkA Regression ApproachA Probabilistic Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    39/58

    y

    Outline1 Overview of Term Weighting Methods in Information

    RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods

    2 Learn Term Weights Using Category InformationA Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach

    3 ExperimentExperimental DesignBaseline ApproachesExperimental Results

    4 Summary

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term WeightsExperiment

    Summary

    FrameworkA Regression ApproachA Probabilistic Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    40/58

    A Probabilistic Approach Toward Learning TermWeights

    Probability for documents to be similar based on words

    pwi,j =1

    1 + exp (sw(di, dj ; ) + 0)

    Probability for documents to be similar based on categories

    pci,j =1

    1 + exp (sc(di, dj ; ) + 0)

    Loss function: cross entropy function

    l(sc(di, dj ; ), sw(di, dj ; )) =

    pci,j logpwi,j (1 p

    ci,j) log(1 p

    wi,j)

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term WeightsExperiment

    Summary

    FrameworkA Regression ApproachA Probabilistic Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    41/58

    A Probabilistic Approach Toward Learning TermWeights

    Probability for documents to be similar based on words

    pwi,j =1

    1 + exp (sw(di, dj ; ) + 0)

    Probability for documents to be similar based on categories

    pci,j =1

    1 + exp (sc(di, dj ; ) + 0)

    Loss function: cross entropy function

    l(sc(di, dj ; ), sw(di, dj ; )) =

    pci,j logpwi,j (1 p

    ci,j) log(1 p

    wi,j)

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term WeightsExperiment

    Summary

    FrameworkA Regression ApproachA Probabilistic Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    42/58

    The Probabilistic Approach: Final Form

    Objective function Fprob

    Fprob =Ni=j

    pci,j logpwi,j + (1 p

    ci,j) log(1 p

    wi,j)

    The final form for the probabilistic approach:

    arg max,,0,0

    Fprob w

    n

    i=1

    i c

    m

    i=1

    i

    s. t. 0, 0

    where w > 0 and c > 0 are regularization parameters.

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term WeightsExperiment

    Summary

    FrameworkA Regression ApproachA Probabilistic Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    43/58

    The Probabilistic Approach: Optimization Strategy

    Alternating OptimizationLearn term weights with fixed category weights

    Decouple the correlation among

    Fprob(

    , ) Fprob(, )

    n

    i=1

    gi(

    i i)

    and are term weights of two consecutive iterations.Solve

    g

    i(i) = 0

    = +

    Learn category weights with fixed term weights

    A similar procedure for optimizing with fixed

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term WeightsExperiment

    Summary

    FrameworkA Regression ApproachA Probabilistic Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    44/58

    The Probabilistic Approach: Optimization Strategy

    Alternating OptimizationLearn term weights with fixed category weights

    Decouple the correlation among

    Fprob(

    , ) Fprob(, )

    n

    i=1

    gi(

    i i)

    and are term weights of two consecutive iterations.Solve

    g

    i(i) = 0

    = +

    Learn category weights with fixed term weights

    A similar procedure for optimizing with fixed

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term WeightsExperiment

    Summary

    FrameworkA Regression ApproachA Probabilistic Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    45/58

    The Probabilistic Approach: Optimization Strategy

    Alternating OptimizationLearn term weights with fixed category weights

    Decouple the correlation among

    Fprob(

    , ) Fprob(, )

    n

    i=1

    gi(

    i i)

    and are term weights of two consecutive iterations.Solve

    g

    i(i) = 0

    = +

    Learn category weights with fixed term weights

    A similar procedure for optimizing with fixed

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    ExperimentSummary

    FrameworkA Regression ApproachA Probabilistic Approach

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    46/58

    The Probabilistic Approach: Optimization Strategy

    Alternating OptimizationLearn term weights with fixed category weights

    Decouple the correlation among

    Fprob(

    , ) Fprob(, )

    n

    i=1

    gi(

    i i)

    and are term weights of two consecutive iterations.Solve

    g

    i(i) = 0

    = +

    Learn category weights with fixed term weights

    A similar procedure for optimizing with fixed

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    ExperimentSummary

    Experimental DesignBaseline ApproachesExperimental Results

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    47/58

    Outline1 Overview of Term Weighting Methods in Information

    RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods

    2 Learn Term Weights Using Category Information

    A Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach

    3 ExperimentExperimental DesignBaseline ApproachesExperimental Results

    4 Summary

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    ExperimentSummary

    Experimental DesignBaseline ApproachesExperimental Results

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    48/58

    Experimental Design

    Document collection

    A document collection from the ad hoc retrieval task ofImageCLEFTotally 28,133 documents, 933 categoriesAverage document length 50Average number of categories for a document 5

    Evaluation Queries

    5 queries from ImageCLEF 2003 for training w and c25 queries from ImageCLEF 2004 for testing

    Evaluation metricsAverage precision for top retried documentsAverage precision across 11 recall pointsPrecision recall curve

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    ExperimentSummary

    Experimental DesignBaseline ApproachesExperimental Results

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    49/58

    Experimental Design

    Document collection

    A document collection from the ad hoc retrieval task ofImageCLEFTotally 28,133 documents, 933 categoriesAverage document length 50Average number of categories for a document 5

    Evaluation Queries

    5 queries from ImageCLEF 2003 for training w and c25 queries from ImageCLEF 2004 for testing

    Evaluation metricsAverage precision for top retried documentsAverage precision across 11 recall pointsPrecision recall curve

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    ExperimentSummary

    Experimental DesignBaseline ApproachesExperimental Results

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    50/58

    Experimental Design

    Document collection

    A document collection from the ad hoc retrieval task ofImageCLEFTotally 28,133 documents, 933 categoriesAverage document length 50Average number of categories for a document 5

    Evaluation Queries

    5 queries from ImageCLEF 2003 for training w and c25 queries from ImageCLEF 2004 for testing

    Evaluation metricsAverage precision for top retried documentsAverage precision across 11 recall pointsPrecision recall curve

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    ExperimentSummary

    Experimental DesignBaseline ApproachesExperimental Results

    O li

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    51/58

    Outline1 Overview of Term Weighting Methods in Information

    RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods

    2 Learn Term Weights Using Category Information

    A Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach

    3

    ExperimentExperimental DesignBaseline ApproachesExperimental Results

    4 Summary

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    ExperimentSummary

    Experimental DesignBaseline ApproachesExperimental Results

    B li A h

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    52/58

    Baseline Approaches

    State-of-art information retrieval methods

    The Okapi method (Okapi)The language model with JM smoothing (LM)

    Inverse category frequency (ICF)

    icf(w) = log

    m

    m(w)

    m(w) : number of categories having word w

    Replace idf(w) with icf(w) in the Okapi method

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    ExperimentSummary

    Experimental DesignBaseline ApproachesExperimental Results

    B li A h (C td)

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    53/58

    Baseline Approaches (Contd)

    Category-based query expansion (CQE)

    1 Retrieve top k = 100 documents for query q using Okapi2 Expand query q to include category information

    q = {f(w1, q),...,f(wn, q); f(c1, q),...,f(cm, q)}

    f(ci, q) : the number of top k documents in category ci

    3 Retrieve documents using the expanded query q

    logp(q

    |d) =

    n

    i=1 f(wi, q)logp(wi|d)ni=1 f(wi, q)

    +(1 )

    mi=1 f(ci, q)logp(ci|d)mi=1 f(ci, q)

    Jin, Chai, and Si Learn to Weight Terms

    OverviewLearn Term Weights

    ExperimentSummary

    Experimental DesignBaseline ApproachesExperimental Results

    O tli

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    54/58

    Outline1 Overview of Term Weighting Methods in Information

    RetrievalTerm Weighting based on TF.IDFTerm Weighting based on Language ModelsProblems with Existing Term Weighting Methods

    2 Learn Term Weights Using Category Information

    A Framework for Learning Term Weights Using CategoryInformationA Regression ApproachA Probabilistic Approach

    3

    ExperimentExperimental DesignBaseline ApproachesExperimental Results

    4 Summary

    Jin, Chai, and Si Learn to Weight Terms

    Overview

    Learn Term WeightsExperiment

    Summary

    Experimental DesignBaseline ApproachesExperimental Results

    P i i R ll C

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    55/58

    Precision Recall Curves

    0 0.2 0.4 0.6 0.8 10.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    Recall

    Precision

    Regression

    Discriminative

    Inverse Category Frequency

    Categorybased Query Expansion

    Okapi

    Language Model

    Probabilistic approach > Language Model & Okapi

    Jin, Chai, and Si Learn to Weight Terms

    Overview

    Learn Term WeightsExperiment

    Summary

    Experimental DesignBaseline ApproachesExperimental Results

    Average Precision

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    56/58

    Average Precision

    Using Category No Category

    Reg. Prob. ICF CQE Okapi LM

    Avg. Prec. 0.45 0.48 0.38 0.42 0.41 0.41Prec @ 5 doc 0.55 0.56 0.40 0.50 0.47 0.50

    Prec @ 10 doc 0.48 0.52 0.40 0.48 0.45 0.48Prec @ 20 doc 0.46 0.46 0.39 0.42 0.39 0.38Prec @ 100 doc 0.21 0.21 0.19 0.19 0.20 0.20

    Reg. and Prob. > Okapi and LM- Category information is useful

    ICF and CQE < Okapi and LM- Need to exploit category information wisely

    Jin, Chai, and Si Learn to Weight Terms

    Overview

    Learn Term WeightsExperiment

    Summary

    Experimental DesignBaseline ApproachesExperimental Results

    Retrieval Precision for Individual Queries

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    57/58

    Retrieval Precision for Individual Queries

    0 5 10 15 20 250

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    Query ID

    Average

    Precision

    Langauge ModelInverse Category Frequency

    Categorybased Query Expansion

    Probabilistic

    Over 16 queries, probabilistic approach > langauge model

    Over 5 queries, probabilistic approach < langauge model

    Jin, Chai, and Si Learn to Weight Terms

    Overview

    Learn Term WeightsExperiment

    Summary

    Summary

    http://goforward/http://find/http://goback/
  • 8/3/2019 Icml05 Jin Lwtir 01

    58/58

    Summary

    Proposed two algorithms for learning term weights usingcategory information

    A regression approachA probabilistic approach

    Empirical studies with the ImageCLEF dataset verify theeffectiveness of the proposed algorithms

    Future work

    Improve learning efficiency for large numbers of documents

    and large-sized vocabulariesExtend to image retrieval for annotated images

    Jin, Chai, and Si Learn to Weight Terms

    http://goforward/http://find/http://goback/