Upload
johnathan-king
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
A Probabilistic Model for Fine-Grained Expert
Search
Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu
June 16--18, 2008, Columbus Ohio
Schedule
Introduction1
Fine-grained Expert Search2
Conclusion4
Experimental Results3
3
Introduction Expert Search
“who is an expert on X?”
User QuerySearch Engine
Experts
Who are experts on Semantic Web Search Engine?
Introduction Pioneering Expert Search Systems
Log data in software development Kautz et al., 1996; Mockus and Herbsleb, 2002;
McDonald and Ackerman, 1998; etc. Email communications
Campbell et al., 2003; Dom et al. 2003; Sihn and Heeren, 2001; etc.
General documents Yimam, 1996; Davenport and Prusak, 1998; Steer and
Lochbaum, 1988; Mattox et al., 1999; Hertzum and Pejtersen, 2000; Craswell et al., 2001; etc.
Introduction
Expert Search at TREC A new task at TREC 2005, 2006, 2007
Craswell et al., 2005; Soboroff et al., 2006; Bailey et al., 2007;
Many approaches have been proposed Two generative models, Balog et al. 2006 Prior distribution, relevance feedback, Fang et al. 2006 Hierarchical language model, Petkova and Croft 2006 Voting and data fusion, Macdonald and Ounis 2006 …
Introduction
Coarse-grained approach. Expert search is carried out under a grain of
document. Further improvements are hard to achieve
Different blocks of electronic
documents
Different functions and
qualities
Different impacts for
expert search
Windowed Section Relation
irrelevant
Window relevant
queried topic
7
Examples
Title-Author Relation
Title
AuthorQuery: Timed Text
8
Examples
Reference Section Relation
9
Examples
<H1>
<H2>
Query: W3C Management Team
10
Examples
Section Title-Body Relation
Schedule
Introduction1
Fine-grained Expert Search2
Conclusion4
Experimental Results3
12
Fine-grained Evidence
Who are experts on Semantic
Web Search Engine?
Fine-grained Expert Search --Evidence Extraction
• Document-001:
“…a high-level plan of the architecture of the semantic web by Tim Berners-Lee… ”
“…later, Berners-Lee describes a semantic web search engine experience…”
<topic, person, relation, document>
E1: <semantic web, Tim Berners-Lee, same-section, document-001>E2: <semantic web search engine, Berners-Lee, same-section, document-001>
Tim Berners-Lee
Fine-grained Expert Search –Search Model
<topic, person, relation, document>(t,p,r,d)
Expert Candidate(c)
Query(q)
Expert MatchingModel
Evidence Matching Model
)|(),|()|,()|( qePqecPqecPqcPee
)|()|( qePecPe
Fine-grained Expert Search -- Expert Matching
),,|()|( drpcPecP
),(
),,(),|(
drL
drpfreqdrpP
Dd
S D
drpPdrpPdrpP
' ||
)',|()1(),|(),|(
),|()|( drpPpcP
),()|( pctypePpcP Mask Sample
Full Name Ritu Raj Tiwari
Email Name [email protected]
Combined Name Tiwari, Ritu R;
Abbr. Name Ritu Raj ; Ritu
Short Name RRT
Alias, new email [email protected]
<topic, person, relation, document> (<t, p, r, d> for short)
Fine-grained Expert Search -- Evidence Matching
)|()|()|()|()|,,,()|( qdPqrPqpPqtPqdrptPqeP
),()|( qttypePqtP
)()|( pPqpP
))(()|( rtypePqrP
)()|()(
)()|()|( dPdqP
qP
dPdqPqdP
)()|())(()(),()|( dPdqPrtypePpPqttypePqeP
Type Sample
Query Semantic Web Search Engine
Phrase “Semantic Web Search Engine”
Bi-gram “Semantic Web” “Search Engine”
Proximity “Semantic … Web Search Engine”
Fuzzy “Samentic Web Saerch Engine”
Stemmed “Semantic Web Search Engin”
Relation Type
Same Section
Windowed Section
Reference Section
Title-Author
Section Title-Body
Quality Type
Dynamic Quality
Static Qualify
Schedule
Introduction1
Fine-grained Expert Search2
Conclusion4
Experimental Results3
Experimental Result W3C Corpus
331,307 web pages 10 training topics of TREC 2005 50 test topics of TREC 2005 49 test topics of TREC 2006
Evaluation Metrics Mean average precision (MAP) R-precision (R-P) Top N precision (P@N)
Experimental Result
Query Matching
TREC 2005 TREC 2006
MAP R-P P@10 MAP R-P P@10Baseline 0.1840 0.2136 0.3060 0.3752 0.4585 0.5604+Bi-gram 0.1957 0.2438 0.3320 0.4140 0.4910 0.5799+Proximity 0.2024 0.2501 0.3360 0.4530 0.5137 0.5922+ Fuzzy, Stemmed 0.2030 0.2501 0.3360 0.4580 0.5112 0.5901Improv. 10.33% 17.09% 9.80% 22.07% 11.49% 5.30%
T-test 0.0084 0.0000
Experimental Result
Person Matching
TREC 2005 TREC 2006
MAP R-P P@10 MAP R-P P@10Baseline 0.2030 0.2501 0.3360 0.4580 0.5112 0.5901+ Combined Name 0.2056 0.2539 0.3463 0.4709 0.5152 0.5931+ Abbr. Name 0.2106 0.2545 0.3400 0.5010 0.5181 0.6000+ Short Name 0.2111 0.2578 0.3400 0.5121 0.5192 0.6000+ Alias, new email 0.2156 0.2591 0.3400 0.5221 0.5212 0.6000Improv. 6.21% 3.60% 1.19% 14.00% 1.96% 1.68%T-test 0.0064 0.0057
Experimental Result
Multiple Relations
TREC 2005 TREC 2006
MAP R-P P@10 MAP R-P P@10Baseline 0.2156 0.2591 0.3400 0.5221 0.5212 0.6000+Windowed Section 0.2158 0.2633 0.3380 0.5255 0.5311 0.6082+Reference Section 0.2160 0.2630 0.3380 0.5272 0.5314 0.6061+Title-Author 0.2234 0.2634 0.3580 0.5354 0.5355 0.6245+Section Title-Body 0.2586 0.3107 0.3740 0.5657 0.5669 0.6510Improv. 19.94% 19.91% 10.00% 8.35% 8.77% 8.50%T-test 0.0013 0.0043
Experimental Result
Evidence Quality
TREC 2005 TREC 2006
MAP R-P P@10 MAP R-P P@10Baseline 0.2586 0.3107 0.3740 0.5657 0.5669 0.6510+Static quality 0.2711 0.3188 0.3720 0.5900 0.5813 0.6796+Dynamic quality 0.2755 0.3252 0.3880 0.5943 0.5877 0.7061Improv. 6.13% 4.67% 3.74% 2.86% 3.67% 8.61%T-test 0.0360 0.0252
Rank 1 @TREC 0.2749 0.3330 0.4520 0.5947 0.5783 0.7041
Schedule
Introduction1
Fine-grained Expert Search2
Conclusion4
Experimental Results3
Conclusion
Fine-grained expert search
Probabilistic model and its implementation
Evaluation on the TREC data set