View
5
Download
0
Category
Preview:
Citation preview
Tuning Topical Queries through Context Vocabulary Enrichment:
a Corpus-Based Approach
Carlos M Lorenzetti Ana G Maguitmancml@cs.uns.edu.ar agm@cs.uns.edu.ar
Universidad Nacional del SurAv. L.N. Alem 1253Bahía Blanca - Argentina
Grupo de Investigación en Recuperación de Información y Gestión del Conocimiento
Laboratorio de Investigación y Desarrollo en Inteligencia Artificial
CONICET AGENCIA
Introduction
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 3
Context–Based SearchJava?
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 4
Context–Based SearchJava?
AnimalsAnimals
ComputersComputers
ConsumablesConsumables
EntertainmentEntertainment
GeographyGeography
FloraFlora
ShipsShips
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 5
Context–Based Search
ContextContext
ArticlesArticlesNewspapersNewspapers
OthersOthers
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 6
Context–Based SearchJava?
ContextContext
ArticlesArticlesNewspapersNewspapers
OthersOthersGeographyGeography
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 7
Query tuning
• Step 1: Query• Step 2: Initial set of results• Step 3: Relevance assessment
– SupervisedSupervised feedback– UnsupervisedUnsupervised feedback– SemiSemi--supervisedsupervised feedback
• Step 4: Better representation• Step 5: Revised set of results
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 8
Different Role of Terms
• DescriptorsTerms that appear often in documentsrelated to the given topic
What is this topic about?
• DiscriminatorsTerms that appear only in documentsrelated to the given topic
What terms are useful to seek similar information?
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 9
Different Role of Terms
• DescriptorsTerms that appear often in documentsrelated to the given topic
What is this topic about?
• DiscriminatorsTerms that appear only in documentsrelated to the given topic
What terms are useful to seek similar information?
RecallRecall
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 10
Different Role of Terms
• DescriptorsTerms that appear often in documentsrelated to the given topic
What is this topic about?
• DiscriminatorsTerms that appear only in documentsrelated to the given topic
What terms are useful to seek similar information?
PrecisionPrecision
RecallRecall
Descriptors and DiscriminatorsComputation:an example
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 12
Descriptors and Discriminators
JavaJavaLanguageLanguage
AppletsApplets
CodeCode
TopicTopic: Java Virtual : Java Virtual MachineMachine
NetBeansNetBeansComputersComputers
JVMJVM
RubyRuby ProgrammingProgramming
JDKJDK
VirtualVirtual
MachineMachine
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 13
Descriptors and Discriminators
JavaJavaLanguageLanguage
AppletsApplets
CodeCode
TopicTopic: Java Virtual : Java Virtual MachineMachine
NetBeansNetBeansComputersComputers
JVMJVM
RubyRuby ProgrammingProgramming
JDKJDK
VirtualVirtual
MachineMachineGoodGood descriptorsdescriptors
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 14
Descriptors and Discriminators
JavaJavaLanguageLanguage
AppletsApplets
CodeCode
TopicTopic: Java Virtual : Java Virtual MachineMachine
NetBeansNetBeansComputersComputers
JVMJVM
RubyRuby ProgrammingProgramming
JDKJDK
VirtualVirtual
MachineMachine
GoodGood discriminatorsdiscriminators
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 15
Documents Descriptors and Discriminators
ktd ji ],[HNumberNumber ofof occurrencesoccurrences ofof termterm jjin in documentdocument ii
TopicTopic: Java Virtual : Java Virtual MachineMachine
0330
0120
1004
2004
3003
0220
1120
0110
0236
2552
InitialInitialContextContext
HH
(1)(1) espressotec.comespressotec.com(2)(2) netbeans.orgnetbeans.org(3)(3) sun.comsun.com(4)(4) wikitravel.orgwikitravel.org
0jdk
0jvm
0province
0island
0coffee
3programming
1language
1virtual
2machine
4java
(1)(1) (2)(2) (3)(3) (4)(4)
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 16
Documents Descriptors
TopicTopic: Java Virtual : Java Virtual MachineMachineInitialInitial ContextContext
0jdk
0jvm
0province
0island
0coffee
3programming
1language
1virtual
2machine
4java
1
02]),[(
],[),(n
k
jiki
jitdH
H
DescriptiveDescriptive powerpower ofof a a termterm in a in a documentdocument
0,000
0,000
0,000
0,000
0,000
0,539
0,180
0,180
0,359
0,718
),( 0 jtd
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 17
Documents Discriminators
TopicTopic: Java Virtual : Java Virtual MachineMachineInitialInitial ContextContext
0jdk
0jvm
0province
0island
0coffee
3programming
1language
1virtual
2machine
4java
1
0]),[s(
]),[s(),(m
k
jiki
jidtT
T
H
H
DiscriminatingDiscriminating powerpower ofof a a termterm in a in a documentdocument
0,000
0,000
0,000
0,000
0,000
0,577
0,500
0,577
0,500
0,447
),( 0dti
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 18
Documents comparison criteria
1
0)),(),((),(
n
kkjkiji tdtddd Documents
similarityDocumentsDocuments
similaritysimilarity
KK11
KK33KK22
dd22
dd11
CosineCosine similaritysimilarity
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 19
Topics DescriptorsTopicTopic: Java Virtual : Java Virtual MachineMachine
InitialInitial ContextContext
0jdk
0jvm
0province
0island
0coffee
3programming
1language
1virtual
2machinemachine
4javajava
1
,0
1
,0
2
),(
)),(),((),( m
ikkki
m
ikkjkki
ji
dd
tdddtd
TermTerm descriptivedescriptive powerpower in a in a topictopic ofof a a documentdocument
0,014
0,032
0,040
0,040
0,055
0,064
0,089
0,124
0,1580,158
0,3850,385
),( 0 jtd
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 20
Topics DiscriminatorsTopicTopic: Java Virtual : Java Virtual MachineMachine
InitialInitial ContextContext
0province
0island
0coffee
4java
1language
2machine
3programming
1virtual
0jdkjdk
0jvmjvm
Term Term discriminatingdiscriminating powerpower in a in a topictopic ofof a a documentdocument
1
,0
2 )),(),((),(m
jkkkijkji dtdddt
0,385
0,385
0,385
0,493
0,517
0,524
0,566
0,566
0,8480,848
0,8480,848
),( 0dti
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 21
Proposed Algorithm
ContextContextw1
w2
w3
w4
w5
w6
w7
w8
wm-1
wm wm-2
w9
...
Roulette
query 01
query 02
query 03
query n
result 03
result 01
result 02
result n
w 0,5w 0,25...w 0,1
1
2
m
DESCRIPTORSDESCRIPTORSDESCRIPTORS
w 0,4w 0,37...w 0,01
1
2
m
DISCRIMINATORSDISCRIMINATORSDISCRIMINATORS
1
24
3
Terms
Evaluation
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 23
EvaluationLocal IndexLocal Index• DMOZ – ODP Project
– ~ 500 Topics– 3rd level of the hierarchy– 100 URLs at least– English language– > 350,000 pages
• Initial Context
• vs. Bo1 & Baseline– Novelty-Driven Similarity– Precision– Semantic Precision
TopTop
HomeHome ScienceScience ArtsArts
CookingCooking FamilyFamily
ChildcareChildcare
11stst levellevel
22ndnd levellevel
33rdrd levellevel
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 24
Evaluation – N Similarity
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 20 40 60 80 100 120 140 160 180iteration
nove
lty-d
riven
sim
ilarit
y
Maximum
Average
Minimum
Context update
Top/Computers/Open_Source/Software
Query formulation and retrieval process
[0.5866; 0.6073]0.5970best
[0.0618; 0.0704]0.06611st
95% CIMeanN
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 25
Evaluation – N Similarity
[0.0822; 0.0924]0.087Baseline
[0.5866; 0.6073]0.597Incremental
[0.0710; 0.0803]0.075Bo1-DFR
95% CIMeanN
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 26
Evaluation – Precision
>>
Incremental (66.96%)Bo1-DFR (24.33%)Baseline (8.7%)
CC[0.2461; 0.2863]0.266Baseline
[0.3325; 0.3764]0.354Incremental
[0.2859; 0.3298]0.307Bo1-DFR
95% CIMeanPrecision
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 27
Evaluation – Semantic Precision
>>
Incremental (65.18%)Bo1-DFR (27.90%)Baseline (6.92%)
CC[0.5383; 0.5679]0.553Baseline
[0.6068; 0.6372]0.622Incremental
[0.5750; 0.6066]0.590Bo1-DFR
95% CIMeanPrecisionS
Tuning Topical Queries through Context Vocabulary Enrichment: a Corpus-Based ApproachLorenzetti Carlos M, Maguitman Ana G
1st QSI - OTM – Monterrey, Mexico 28
Conclusions
• We presented an intelligent IR approach for learning context-specific terms.– Take advantage of the user contextuser context.
• We have shown evaluations and the effectivenesseffectiveness of incremental methods.
Future Work– Investigate different parameters.– Develop methods to learn and adjust parameters.– Run additional tests using other IR metrics.
ThankThank youyou!!
CONICET AGENCIA
Laboratorio de Investigación y Desarrollo en Inteligencia Artificial
lidia.cs.uns.edu.ar
Universidad Nacional del SurBahía Blanca
www.uns.edu.ar
Recommended