Upload
cory-oliver
View
216
Download
1
Embed Size (px)
Citation preview
Advisor-advisee Relationship Mining from Research Publication Network
Chi Wang1, Jiawei Han1, Yuntao Jia1, Jie Tang2, Duo Zhang1, Yintao Yu1, Jingyi Guo2
1 University of Illinois at Urbana-Champaign{chiwang1, hanj, yjia3, dzhang22, yintao}@illinois.edu
2 Tsinghua University {jietang, guojy07@mails}.tsinghua.edu.cn
Motivation
• Latent knowledge in information network: – Relationships:
friends/relatives/colleagues/enemies?• If they can be mined by links, it will benefit our
study in– Community structure clustering & classification– Exerting Searching search & ranking– Evolution patterns prediction &
recommendation
Overall Framework• ai: author i
• pj: paper j• py: paper year• pn: paper#• sti,yi: starting
time• edi,yi: ending
time• ri,yi: ranking
score
Heuristics
• ASSUMPTION 1: at each time t during the publication history of a node x, x is either being advised or not being advised. Once x starts to advise another node, it will never be advised again.
• ASSUMPTION 2: for a given pair of advisor and advisee, the advisor always has a longer publication history than the advisee.
Stage 1: Preprocessing
• From author-paper bipartite network to authorship collaboration homogenous network.
• Then a filtering process is performed to remove unlikely relations of advisor-advisee.
Stage 1: Preprocessing• Author aj is not considered to be ai’s advisor if
one of the following conditions holds:
Stage 1: Preprocessing
• In addition, estimate:– the starting time stij is estimated as the time they
started to collaborate;– the ending time edij can be estimated as either the
time point when the Kulczynski measure starts to decrease;
– the local likelihood of aj being ai’s advisor lij
Stage 2: Graph Factor Model
• For each node ai, there are three variables to decide: yi, sti, and edi. Suppose we have already had a local feature function g(yi, sti, edi) defined on the three variables of any given node.
Experiment Results• DBLP data: 654, 628 authors, 1076,946
publications, years provided.
Datasets RULE SVM IndMAX TPFG
TEST1 69.9% 73.4% 75.2% 78.9% 80.2% 84.4%
TEST2 69.8% 74.6% 74.6% 79.0% 81.5% 84.3%
TEST3 80.6% 86.7% 83.1% 90.9% 88.8% 91.3%
Empiricalparameter
optimizedparameter
heuristics Supervised learning
Case StudyAdvisee Top Ranked Advisor Time Note
David M. Blei
1. Michael I. Jordan 01-03 PhD advisor, 2004 grad2. John D. Lafferty 05-06 Postdoc, 2006
Hong Cheng
1. Qiang Yang 02-03 MS advisor, 20032. Jiawei Han 04-08 PhD advisor, 2008
Sergey Brin 1. Rajeev Motawani 97-98 “Unofficial advisor”