8/14/2019 Research--Probabilistic Models in Information Retrieval
1/29
PROBABILISTICPROBABILISTIC
MODELS INMODELS IN
INFORMATIONINFORMATIONRETRIEVAL
Norbert Fuhr
8/14/2019 Research--Probabilistic Models in Information Retrieval
2/29
Introduction
The intrinsic uncertainty of IR.
Two approaches:Relevance models
Proof-theoretic model
8/14/2019 Research--Probabilistic Models in Information Retrieval
3/29
Relevance models
A user assigns relevance judgments todocument w.r.t. his/her query.
The IR systems yield the approximationof the set of relevant documents.
Some models: BIR model, BII model, DIAmodel, etc
8/14/2019 Research--Probabilistic Models in Information Retrieval
4/29
Relevance models
Binary independence retrieval model(BIR)A document d_m is composed of a set of
terms and represented as a vector.Assumptions:cluster hypothesis: Terms are distributed
differently within relevant and non-relevant
documents.A query q_k is also a set of terms.
8/14/2019 Research--Probabilistic Models in Information Retrieval
5/29
Relevance models
8/14/2019 Research--Probabilistic Models in Information Retrieval
6/29
Relevance models
An example
Ranking is (1,1),(1,0),(0,1),(0,)
8/14/2019 Research--Probabilistic Models in Information Retrieval
7/29
The probability rankingrinci le
Let C be the costs for the retrieval of arelevant document. for non-relevantdocuments.
Retrieve that document for which the expectedcosts of retrieval are a minimum.
8/14/2019 Research--Probabilistic Models in Information Retrieval
8/29
Proof-theoretic model
IR is interpreted as uncertain inference.
A generation of deductive databases:queries and contents are treated as logical
formulas.
The query has to be proved from theformulas.
A document is an answer for a queryiffthe logic formula is true.
8/14/2019 Research--Probabilistic Models in Information Retrieval
9/29
GOAL-CENTRICGOAL-CENTRIC
TRACEABILITY FORTRACEABILITY FOR
MANAGING NON-MANAGING NON-FUNCTIONAL
Jane Cleland-Huang, Reffaella Settimi, Oussama
BenKhadra, Eugenia Berezhanskaya, Selvia Christina
8/14/2019 Research--Probabilistic Models in Information Retrieval
10/29
Non-Functional Requirements (NFR) aredifficult to trace:Global impact upon a software system
Extensive network of interdependencies andtrace-offs
8/14/2019 Research--Probabilistic Models in Information Retrieval
11/29
Goal centric traceability (GCT) approach:NFRs are modeled as goals and
operationalizations within SIG.
Dynamically establish traces from impactedfunctional design element to elements in SIG.
8/14/2019 Research--Probabilistic Models in Information Retrieval
12/29
Softgoal InterdependencyGra h
8/14/2019 Research--Probabilistic Models in Information Retrieval
13/29
GCT Model
8/14/2019 Research--Probabilistic Models in Information Retrieval
14/29
Impact detection in GCTDocuments
Queries
Index terms
8/14/2019 Research--Probabilistic Models in Information Retrieval
15/29
The relevance of a document to aquery q is pr( ,q)
8/14/2019 Research--Probabilistic Models in Information Retrieval
16/29
UTILIZINGUTILIZING
SUPPORTINGSUPPORTING
EVIDENCE TOEVIDENCE TOIMPROVE DYNAMIC
Jane Cleland-Huang, Reffaella Settimi, Chuan Duan,
Xuchang Zou
8/14/2019 Research--Probabilistic Models in Information Retrieval
17/29
Introduction
Current workRecall level close to 90%
Precision from 10% to 45%.
Target:Maintain recall level at least 90%
Precision at least 20%
8/14/2019 Research--Probabilistic Models in Information Retrieval
18/29
Introduction
Three strategies to improve theperformance of dynamic requirementstraceability:
Hierarchical modelingLogical clustering of artifacts
Semi-automated pruning of the probabilisticnetwork.
8/14/2019 Research--Probabilistic Models in Information Retrieval
19/29
Enhancement strategies
8/14/2019 Research--Probabilistic Models in Information Retrieval
20/29
Motivation Example
8/14/2019 Research--Probabilistic Models in Information Retrieval
21/29
Hierarchical
R3 label is De-icing
Using hierarchical information in R3 ->R5 describe de-icing service.
Similarly, C4 describe about truckmaintenance service.
The link between C4 and R5 is notcorrect !!!
8/14/2019 Research--Probabilistic Models in Information Retrieval
22/29
Hierarchical
Solution:Build a DAG graph to display the direct
relationship between artifacts.
8/14/2019 Research--Probabilistic Models in Information Retrieval
23/29
Results
8/14/2019 Research--Probabilistic Models in Information Retrieval
24/29
Clustering
Links tend to occur
in clusters:
q d_j => higher
prob that q dq q_i => higher
prob that d q
Care about relationship
of sibling artifacts.
8/14/2019 Research--Probabilistic Models in Information Retrieval
25/29
Clustering
Solution
8/14/2019 Research--Probabilistic Models in Information Retrieval
26/29
Clustering
Evaluation
8/14/2019 Research--Probabilistic Models in Information Retrieval
27/29
Graph PruningEnhancementObservation:Word schedule used for both de-icing
schedule and truck maintenance schedules
Query with schedule will returns artifactsfrom both domains make precision lower.
8/14/2019 Research--Probabilistic Models in Information Retrieval
28/29
Graph PruningEnhancementSolution:Utilize initial decision made by the analyst to
place constraints and improve precision inproblematic area.
Rules to place constrains:
1. One or more links between two groups areall rejected by an analyst.
2. Basic retrieval algorithm generatedcandidate links between two groups.
8/14/2019 Research--Probabilistic Models in Information Retrieval
29/29
Graph PruningEnhancementEvaluation