24
A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Embed Size (px)

Citation preview

Page 1: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

A Probabilistic Model for Fine-Grained Expert

Search

Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu

June 16--18, 2008, Columbus Ohio

Page 2: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Schedule

Introduction1

Fine-grained Expert Search2

Conclusion4

Experimental Results3

Page 3: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

3

Introduction Expert Search

“who is an expert on X?”

User QuerySearch Engine

Experts

Who are experts on Semantic Web Search Engine?

Page 4: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Introduction Pioneering Expert Search Systems

Log data in software development Kautz et al., 1996; Mockus and Herbsleb, 2002;

McDonald and Ackerman, 1998; etc. Email communications

Campbell et al., 2003; Dom et al. 2003; Sihn and Heeren, 2001; etc.

General documents Yimam, 1996; Davenport and Prusak, 1998; Steer and

Lochbaum, 1988; Mattox et al., 1999; Hertzum and Pejtersen, 2000; Craswell et al., 2001; etc.

Page 5: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Introduction

Expert Search at TREC A new task at TREC 2005, 2006, 2007

Craswell et al., 2005; Soboroff et al., 2006; Bailey et al., 2007;

Many approaches have been proposed Two generative models, Balog et al. 2006 Prior distribution, relevance feedback, Fang et al. 2006 Hierarchical language model, Petkova and Croft 2006 Voting and data fusion, Macdonald and Ounis 2006 …

Page 6: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Introduction

Coarse-grained approach. Expert search is carried out under a grain of

document. Further improvements are hard to achieve

Different blocks of electronic

documents

Different functions and

qualities

Different impacts for

expert search

Page 7: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Windowed Section Relation

irrelevant

Window relevant

queried topic

7

Examples

Page 8: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Title-Author Relation

Title

AuthorQuery: Timed Text

8

Examples

Page 9: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Reference Section Relation

9

Examples

Page 10: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

<H1>

<H2>

Query: W3C Management Team

10

Examples

Section Title-Body Relation

Page 11: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Schedule

Introduction1

Fine-grained Expert Search2

Conclusion4

Experimental Results3

Page 12: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

12

Fine-grained Evidence

Who are experts on Semantic

Web Search Engine?

Fine-grained Expert Search --Evidence Extraction

• Document-001:

“…a high-level plan of the architecture of the semantic web by Tim Berners-Lee… ”

“…later, Berners-Lee describes a semantic web search engine experience…”

<topic, person, relation, document>

E1: <semantic web, Tim Berners-Lee, same-section, document-001>E2: <semantic web search engine, Berners-Lee, same-section, document-001>

Tim Berners-Lee

Page 13: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Fine-grained Expert Search –Search Model

<topic, person, relation, document>(t,p,r,d)

Expert Candidate(c)

Query(q)

Expert MatchingModel

Evidence Matching Model

)|(),|()|,()|( qePqecPqecPqcPee

)|()|( qePecPe

Page 14: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Fine-grained Expert Search -- Expert Matching

),,|()|( drpcPecP

),(

),,(),|(

drL

drpfreqdrpP

Dd

S D

drpPdrpPdrpP

' ||

)',|()1(),|(),|(

),|()|( drpPpcP

),()|( pctypePpcP Mask Sample

Full Name Ritu Raj Tiwari

Email Name [email protected]

Combined Name Tiwari, Ritu R;

Abbr. Name Ritu Raj ; Ritu

Short Name RRT

Alias, new email [email protected]

<topic, person, relation, document> (<t, p, r, d> for short)

Page 15: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Fine-grained Expert Search -- Evidence Matching

)|()|()|()|()|,,,()|( qdPqrPqpPqtPqdrptPqeP

),()|( qttypePqtP

)()|( pPqpP

))(()|( rtypePqrP

)()|()(

)()|()|( dPdqP

qP

dPdqPqdP

)()|())(()(),()|( dPdqPrtypePpPqttypePqeP

Type Sample

Query Semantic Web Search Engine

Phrase “Semantic Web Search Engine”

Bi-gram “Semantic Web” “Search Engine”

Proximity “Semantic … Web Search Engine”

Fuzzy “Samentic Web Saerch Engine”

Stemmed “Semantic Web Search Engin”

Relation Type

Same Section

Windowed Section

Reference Section

Title-Author

Section Title-Body

Quality Type

Dynamic Quality

Static Qualify

Page 16: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Schedule

Introduction1

Fine-grained Expert Search2

Conclusion4

Experimental Results3

Page 17: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Experimental Result W3C Corpus

331,307 web pages 10 training topics of TREC 2005 50 test topics of TREC 2005 49 test topics of TREC 2006

Evaluation Metrics Mean average precision (MAP) R-precision (R-P) Top N precision (P@N)

Page 18: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Experimental Result

Query Matching

TREC 2005 TREC 2006

MAP R-P P@10 MAP R-P P@10Baseline 0.1840 0.2136 0.3060 0.3752 0.4585 0.5604+Bi-gram 0.1957 0.2438 0.3320 0.4140 0.4910 0.5799+Proximity 0.2024 0.2501 0.3360 0.4530 0.5137 0.5922+ Fuzzy, Stemmed 0.2030 0.2501 0.3360 0.4580 0.5112 0.5901Improv. 10.33% 17.09% 9.80% 22.07% 11.49% 5.30%

T-test 0.0084 0.0000

Page 19: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Experimental Result

Person Matching

TREC 2005 TREC 2006

MAP R-P P@10 MAP R-P P@10Baseline 0.2030 0.2501 0.3360 0.4580 0.5112 0.5901+ Combined Name 0.2056 0.2539 0.3463 0.4709 0.5152 0.5931+ Abbr. Name 0.2106 0.2545 0.3400 0.5010 0.5181 0.6000+ Short Name 0.2111 0.2578 0.3400 0.5121 0.5192 0.6000+ Alias, new email 0.2156 0.2591 0.3400 0.5221 0.5212 0.6000Improv. 6.21% 3.60% 1.19% 14.00% 1.96% 1.68%T-test 0.0064 0.0057

Page 20: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Experimental Result

Multiple Relations

TREC 2005 TREC 2006

MAP R-P P@10 MAP R-P P@10Baseline 0.2156 0.2591 0.3400 0.5221 0.5212 0.6000+Windowed Section 0.2158 0.2633 0.3380 0.5255 0.5311 0.6082+Reference Section 0.2160 0.2630 0.3380 0.5272 0.5314 0.6061+Title-Author 0.2234 0.2634 0.3580 0.5354 0.5355 0.6245+Section Title-Body 0.2586 0.3107 0.3740 0.5657 0.5669 0.6510Improv. 19.94% 19.91% 10.00% 8.35% 8.77% 8.50%T-test 0.0013 0.0043

Page 21: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Experimental Result

Evidence Quality

TREC 2005 TREC 2006

MAP R-P P@10 MAP R-P P@10Baseline 0.2586 0.3107 0.3740 0.5657 0.5669 0.6510+Static quality 0.2711 0.3188 0.3720 0.5900 0.5813 0.6796+Dynamic quality 0.2755 0.3252 0.3880 0.5943 0.5877 0.7061Improv. 6.13% 4.67% 3.74% 2.86% 3.67% 8.61%T-test 0.0360 0.0252

Rank 1 @TREC 0.2749 0.3330 0.4520 0.5947 0.5783 0.7041

Page 22: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Schedule

Introduction1

Fine-grained Expert Search2

Conclusion4

Experimental Results3

Page 23: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio

Conclusion

Fine-grained expert search

Probabilistic model and its implementation

Evaluation on the TREC data set

Page 24: A Probabilistic Model for Fine-Grained Expert Search Shenghua Bao, Huizhong Duan, Qi Zhou, Miao Xiong, Yunbo Cao, Yong Yu June 16--18, 2008, Columbus Ohio