Upload
melvin-owen
View
216
Download
2
Tags:
Embed Size (px)
Citation preview
1
ArnetMiner– Extraction and Mining of Academic Social Networks
1Jie Tang, 1Jing Zhang, 1Limin Yao, 1Juanzi Li, 2Li Zhang, and 2Zhong Su
1Knowledge Engineering Group, Dept. of Computer Science and Technology
Tsinghua University2IBM, China Research Lab
August 25th 2008
2
Motivation
“Academic search is treated as
document search, but ignore semantics”
“The information need is not only
about publication…”
3
Examples – Expertise search
Researcher A
• When starting awork in a new research topic;
• Or brainstorming for novel ideas.
• Who are experts in this field?
• What are the top conferences in the field?
• What are the best papers?
• What are the top research labs?
4
Examples – Citation network analysis
Researcher B • an in-depth understanding of the research field?
Self-Indexing Inverted Files for Fast Text Retrieval
StaticIndex Pruning for Information
Retrieval Systems
Signature les: An access Method for Documents and
its Analytical Performance Evaluation
FilteredDocument Retrieval with Frequency-Sorted Indexes
Vector-space Ranking with Effective Early Termination
Efficient Document Retrieval in Main Memory
A Document-centric Approach to Static Index Pruning in Text
Retrieval Systems
An Inverted Index Implementation
Parameterised Compression for Sparse Bitmaps
Introduction of Modern Information Retrieval
Memory Efficient Ranking
Topic 31: Ranking and Inverted Index
Topic 27: Information retrieval
Topic 1 : Theory
Topic 21: Framework
Topic 22: Compression
Other
Topic 23: Index method
Topic 34: Parallel computing
Basic theoryComparable workOther
Citation Relationship Type
Topics
5
Which conference should we submit the paper?
Researcher C
authors
content
Examples – Conference Suggestion
6
Who are best matching reviewers for each paper?
KDD Committee
Paper content
conference
Examples – Reviewer Suggestion
10
Ruud Bolle Office: 1S-D58 Letters: IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 USA Packages: IBM T.J. Watson Research Center 19 Skyline Drive Hawthorne, NY 10532 USA Email: [email protected] Ruud M. Bolle was born in Voorburg, The Netherlands. He received the Bachelor's Degree in Analog Electronics in 1977 and the Master's Degree in Electrical Engineering in 1980, both from Delft University of Technology, Delft, The Netherlands. In 1983 he received the Master's Degree in Applied Mathematics and in 1984 the Ph.D. in Electrical Engineering from Brown University, Providence, Rhode Island. In 1984 he became a Research Staff Member at the IBM Thomas J. Watson Research Center in the Artificial Intelligence Department of the Computer Science Department. In 1988 he became manager of the newly formed Exploratory Computer Vision Group which is part of the Math Sciences Department.
Currently, his research interests are focused on video database indexing, video processing, visual human-computer interaction and biometrics applications.
Ruud M. Bolle is a Fellow of the IEEE and the AIPR. He is Area Editor of Computer Vision and Image Understanding and Associate Editor of Pattern Recognition. Ruud M. Bolle is a Member of the IBM Academy of Technology.
DBLP: Ruud Bolle 2006
Nalini K. Ratha, Jonathan Connell, Ruud M. Bolle, Sharat Chikkerur: Cancelable Biometrics: A Case Study in Fingerprints. ICPR (4) 2006: 370-373
EE50
Sharat Chikkerur, Sharath Pankanti, Alan Jea, Nalini K. Ratha, Ruud M. Bolle: Fingerprint Representation Using Localized Texture Features. ICPR (4) 2006: 521-524
EE49
Andrew Senior, Arun Hampapur, Ying-li Tian, Lisa Brown, Sharath Pankanti, Ruud M. Bolle: Appearance models for occlusion handling. Image Vision Comput. 24(11): 1233-1243 (2006)EE48
2005
Ruud M. Bolle, Jonathan H. Connell, Sharath Pankanti, Nalini K. Ratha, Andrew W. Senior: The Relation between the ROC Curve and the CMC. AutoID 2005: 15-20EE47
Sharat Chikkerur, Venu Govindaraju, Sharath Pankanti, Ruud M. Bolle, Nalini K. Ratha: Novel Approaches for Minutiae Verification in Fingerprint Images. WACV. 2005: 111-116
EE46
...
Ruud Bolle Office: 1S-D58 Letters: IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 USA Packages: IBM T.J. Watson Research Center 19 Skyline Drive Hawthorne, NY 10532 USA Email: [email protected] Ruud M. Bolle was born in Voorburg, The Netherlands. He received the Bachelor's Degree in Analog Electronics in 1977 and the Master's Degree in Electrical Engineering in 1980, both from Delft University of Technology, Delft, The Netherlands. In 1983 he received the Master's Degree in Applied Mathematics and in 1984 the Ph.D. in Electrical Engineering from Brown University, Providence, Rhode Island. In 1984 he became a Research Staff Member at the IBM Thomas J. Watson Research Center in the Artificial Intelligence Department of the Computer Science Department. In 1988 he became manager of the newly formed Exploratory Computer Vision Group which is part of the Math Sciences Department.
Currently, his research interests are focused on video database indexing, video processing, visual human-computer interaction and biometrics applications.
Ruud M. Bolle is a Fellow of the IEEE and the AIPR. He is Area Editor of Computer Vision and Image Understanding and Associate Editor of Pattern Recognition. Ruud M. Bolle is a Member of the IBM Academy of Technology.
Motivating ExampleContact Information
Educational history
Academic services
Publications
1
1
2
2
Ruud Bolle
Position
Affiliation
Address
Address
PhdunivPhdmajor
Phddate
Msuniv
Msdate
Msmajor
BsunivBsdate
Bsmajor
Research Staff
IBM T.J. Watson Research CenterP.O. Box 704Yorktown Heights, NY 10598 USA
Brown University
1984
Electrical Engineering
Delft University of Technology
Analog Electronics
1977
Delft University of Technology
IBM T.J. Watson Research Center19 Skyline DriveHawthorne, NY 10532 USA
IBM T.J. Watson Research Center
Electrical Engineering
1980
Applied Mathematics
Msmajor
http://researchweb.watson.ibm.com/ecvg/people/bolle.html
Homepage
Ruud BolleName
video database indexingvideo processing
visual human-computer interactionbiometrics applications
Research_Interest
Photo
Publication 1#
Cancelable Biometrics: A Case Study in Fingerprints
ICPR 370
2006
Date
Start_page
Venue
Title
373
End_page
Publication 2#
Fingerprint Representation Using Localized Texture Features
ICPR 521
2006
Date
Start_page
Venue
Title
524
End_page
. . .
Co-authorCo-author
1
Ruud Bolle
2Publication #3
Publication #5
coauthor
coauthor
UIUCaffiliation
Professorposition
2
1
11
Ruud Bolle Office: 1S-D58 Letters: IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 USA Packages: IBM T.J. Watson Research Center 19 Skyline Drive Hawthorne, NY 10532 USA Email: [email protected] Ruud M. Bolle was born in Voorburg, The Netherlands. He received the Bachelor's Degree in Analog Electronics in 1977 and the Master's Degree in Electrical Engineering in 1980, both from Delft University of Technology, Delft, The Netherlands. In 1983 he received the Master's Degree in Applied Mathematics and in 1984 the Ph.D. in Electrical Engineering from Brown University, Providence, Rhode Island. In 1984 he became a Research Staff Member at the IBM Thomas J. Watson Research Center in the Artificial Intelligence Department of the Computer Science Department. In 1988 he became manager of the newly formed Exploratory Computer Vision Group which is part of the Math Sciences Department.
Currently, his research interests are focused on video database indexing, video processing, visual human-computer interaction and biometrics applications.
Ruud M. Bolle is a Fellow of the IEEE and the AIPR. He is Area Editor of Computer Vision and Image Understanding and Associate Editor of Pattern Recognition. Ruud M. Bolle is a Member of the IBM Academy of Technology.
DBLP: Ruud Bolle 2006
Nalini K. Ratha, Jonathan Connell, Ruud M. Bolle, Sharat Chikkerur: Cancelable Biometrics: A Case Study in Fingerprints. ICPR (4) 2006: 370-373
EE50
Sharat Chikkerur, Sharath Pankanti, Alan Jea, Nalini K. Ratha, Ruud M. Bolle: Fingerprint Representation Using Localized Texture Features. ICPR (4) 2006: 521-524
EE49
Andrew Senior, Arun Hampapur, Ying-li Tian, Lisa Brown, Sharath Pankanti, Ruud M. Bolle: Appearance models for occlusion handling. Image Vision Comput. 24(11): 1233-1243 (2006)EE48
2005
Ruud M. Bolle, Jonathan H. Connell, Sharath Pankanti, Nalini K. Ratha, Andrew W. Senior: The Relation between the ROC Curve and the CMC. AutoID 2005: 15-20EE47
Sharat Chikkerur, Venu Govindaraju, Sharath Pankanti, Ruud M. Bolle, Nalini K. Ratha: Novel Approaches for Minutiae Verification in Fingerprint Images. WACV. 2005: 111-116
EE46
...
Ruud Bolle Office: 1S-D58 Letters: IBM T.J. Watson Research Center P.O. Box 704 Yorktown Heights, NY 10598 USA Packages: IBM T.J. Watson Research Center 19 Skyline Drive Hawthorne, NY 10532 USA Email: [email protected] Ruud M. Bolle was born in Voorburg, The Netherlands. He received the Bachelor's Degree in Analog Electronics in 1977 and the Master's Degree in Electrical Engineering in 1980, both from Delft University of Technology, Delft, The Netherlands. In 1983 he received the Master's Degree in Applied Mathematics and in 1984 the Ph.D. in Electrical Engineering from Brown University, Providence, Rhode Island. In 1984 he became a Research Staff Member at the IBM Thomas J. Watson Research Center in the Artificial Intelligence Department of the Computer Science Department. In 1988 he became manager of the newly formed Exploratory Computer Vision Group which is part of the Math Sciences Department.
Currently, his research interests are focused on video database indexing, video processing, visual human-computer interaction and biometrics applications.
Ruud M. Bolle is a Fellow of the IEEE and the AIPR. He is Area Editor of Computer Vision and Image Understanding and Associate Editor of Pattern Recognition. Ruud M. Bolle is a Member of the IBM Academy of Technology.
Motivating ExampleContact Information
Educational history
Academic services
Publications
1
1
2
2
Ruud Bolle
Position
Affiliation
Address
Address
PhdunivPhdmajor
Phddate
Msuniv
Msdate
Msmajor
BsunivBsdate
Bsmajor
Research Staff
IBM T.J. Watson Research CenterP.O. Box 704Yorktown Heights, NY 10598 USA
Brown University
1984
Electrical Engineering
Delft University of Technology
Analog Electronics
1977
Delft University of Technology
IBM T.J. Watson Research Center19 Skyline DriveHawthorne, NY 10532 USA
IBM T.J. Watson Research Center
Electrical Engineering
1980
Applied Mathematics
Msmajor
http://researchweb.watson.ibm.com/ecvg/people/bolle.html
Homepage
Ruud BolleName
video database indexingvideo processing
visual human-computer interactionbiometrics applications
Research_Interest
Photo
Publication 1#
Cancelable Biometrics: A Case Study in Fingerprints
ICPR 370
2006
Date
Start_page
Venue
Title
373
End_page
Publication 2#
Fingerprint Representation Using Localized Texture Features
ICPR 521
2006
Date
Start_page
Venue
Title
524
End_page
. . .
Co-authorCo-author
1
Ruud Bolle
2Publication #3
Publication #5
coauthor
coauthor
UIUCaffiliation
Professorposition
2
1
Two key issues:• How to accurately extract the researcher profile information from the Web?• How to integrate the information from different sources?
12
Researcher Network Extraction
Researcher
Homepage
Phone
Address
Phduniv
Phddate
PhdmajorMsuniv
Bsmajor
Bsdate
Bsuniv
Affiliation
Postion
Msmajor
Msdate
Fax
Person Photo
Publication
Research_Interest
NameAuthored
Title
Publication_venue
Start_page
End_page
Date
Coauthor
70.60% of the researchers have at least one homepage/introducing page
There are a large number of person names having the ambiguity problem
85.6% from universities 14.4% from companies
71.9% are homepages28.1% are introducing
pages60% are natural language text
40% are in lists and tables
70% moved at least one time
Even 3 “Yi Li” graduated from the author’s lab
300 most common male names are used by 1 billion+ people (78.74%) in USA
13
Our Approach Picture – based on Markov Random Field
aYbY
cY
eY
fY
dY
( | | )
( | | ~ )
i j j i
i j j i
P Y Y Y Y
P Y Y Y Y
Special cases: - Conditional Random Fields - Hidden Markov Random Fields
Markov Property:
x1
x2
x3
co-conference
cite
coauthor
cite
coauthor
coauthor
coauthort- coauthor
cite
co-conference
x4
x5
x6
x7
x8
x9
x10
x11
y1=1
y8=1
y9=3
y11=3
y10=3
y2=1
y3=1
y4=2
y7=2
y6=2
y5=2
Researcher Profiling Name Disambiguation
14
CRFs
He is a Professor at
OTH OTH
POS
AFF
…
POS
OTH
POS
UIUC
OTH
POS
AFF
OTH
POS
AFF
……
AFF
OTH
POS
AFF
… ADR
AFF
ADR
- Green nodes are hidden vars, - Purple nodes are observations
, ,
1( | ) exp ( , | , ) ( , | , )
( ) j j e k k ve E j v V k
p y x t e y x s v y xZ x
15
Token Definitions
Standard word Words in natural language
Special word
Including several general ‘special words’
e.g. email address, IP address, URL, date, number, money, percentage, unnecessary tokens (e.g. ‘===’ and ‘###’), etc.
Image token <IMAGE src="defaul3.jpg" alt=""/>
Term base NP, like “Computer Science”
Punctuation marks
Including period, question mark, and exclamation mark
16
Feature Definition• Content features
Word features
Morphological features
Whether the current token is a word
Whether the word is capitalized
Standard Word
Image Token
Image size The size of the image
Image height/width ratio The value of height/width. The value of a person photo is often larger than 1
Image format JPG or BMP
The number of the “unique color” used in the image and the number of bits used for per pixel, i.e. 32,24,16,8,1
Face recognition Whether the current image contains a person face
Whether the filename contains (partially) the researcher name
Image filename
Whether the “alt” of the image contains (partially) the researcher name
Image “ALT”
Image positive keywords “myself”, “biology”
Image negative keywords “ads”, “banner”, “logo”
Image color
17
Profiling Experiments
• Dataset– 1,000 researchers from ArnetMiner.org
• Baseline– Amilcare– Support Vector Machines– Unified_NT (CRFs without transition features)
• Evaluation measures– Precision, Recall, F1
18
Profiling Results—5-fold cross validation
Profiling Task Unified Unified_NT SVM AmilcarePhoto 89.11 88.64 88.86 31.62
Position 69.44 64.70 64.68 56.48
Affiliation 83.52 72.16 73.86 46.65
Phone 91.10 78.72 79.71 83.33
Fax 90.83 64.28 64.17 86.88
Email 80.35 75.47 79.37 78.70
Address 86.34 75.15 77.04 66.24
Bsuniv 67.38 57.56 59.54 47.17
Bsmajor 64.20 59.18 60.75 58.67
Bsdate 53.49 40.59 28.49 52.34
Msuniv 57.55 47.49 49.78 45.00
Msmajor 63.35 61.92 62.10 57.14
Msdate 48.96 41.27 30.07 56.00
Phduniv 63.73 53.11 57.01 59.42
Phdmajor 67.92 59.30 59.67 57.93
Phddate 57.75 42.49 41.44 61.19
Overall 83.37 72.09 73.57 62.3083.37
20
Name Disambiguation
Name Affiliation
Jing Zhang (26)
Shanghai Jiao Tong Univ.
Yunnan Univ.
Tsinghua Univ.
Alabama Univ.
Univ. of California, Davis
Carnegie Mellon University
Henan Institute of Education
Proposal of a semi-supervised framework
21
Our Method to Name Disambiguation
• A hidden Markov Random Field model
• Observable Variables X represent publications
• Hidden Variables Y represent the labels of publications
• Paper relationships define the dependencies over hidden variables
x1
x2
x3
co-conference
cite
coauthor
cite
coauthor
coauthor
coauthort- coauthor
cite
co-conference
x4
x5
x6
x7
x8
x9
x10
x11
y1=1
y8=1
y9=3
y11=3
y10=3
y2=1
y3=1
y4=2
y7=2
y6=2
y5=2
22
Objective Function
( | ) ( ) ( | )Y X Y X YP P P
2
1( | ) exp( ( , ))
i
i ix X
X Y D x yZ
P
1
1
1
1
1exp( ( ))
1exp( ( ))
1exp( ( , ))
1exp( ( , ) ( ) [ ( , )])
i
i
k
NN N
i j
i j i j k k i ji j c C
V YZ
V YZ
V i jZ
D x x I y y w c y yZ
{ ( , ) ( ) [ ( , )]} ( , ) logk i
obj i j i j k k i j i ii j c C x X
f D x x I y y w c y y D x y Z
maximize
minimize1 22
23
Relationship Definition
p1: A, B, Cp2: A, Bp3: A, Dp4: C, D
Mp(0): p1 p2 p3
p1 p2
p3
1 0 00 1 00 0 1
p1: A, B, Cp2: A, Bp3: A, Dp4: C, D
Mp(1): p1 p2 p3
p1 p2
p3
1 1 01 1 00 0 1
p1: A, B, Cp2: A, Bp3: A, Dp4: C, D
Mp(2): p1 p2 p3
p1 p2
p3
1 1 11 1 01 0 1
p1: A, B, Cp2: A, Bp3: A, Dp4: C, D
Mp(3): p1 p2 p3
p1 p2
p3
1 1 11 1 11 1 1
C W Relationship Descriptionc1 w1 Co-Conference pi.pubvenue = pj.pubvenue
c2 w2 CoAuthor r, s>0, ai(r)=aj
(s)
c3 w3 Citation pi cites pj or pj cites pi
c4 w4 Constraints Feedbacks supplied by users
c5 w5 τ-CoAuthor one common author in τ extension
24
We define the distance function as follows (Basu, 04):
where
We can see that actually maps each vector xi into
another new space, i.e. A1/2xi
To simplify our question, we define A as a diagonal matrix
Parameterized Distance Function
|| || Ti i iA Ax x x
( , ) 1|| || || ||
Ti j
i ji j
D A A
Ax xx x
x x
|| ||i Ax
25
• Initialization • use constraints to generate initial k clusters
• E-Step
• M-Step• Update cluster centroid• Update parameter matrix A
EM Framework
( , ) { ( , ) ( ) [ ( , )]} ( , )k
obj i i i j j k k i j i ii j c C
f y D x x I h l w c p p D x y
x
:
:
|| ||i
i
ii l h
ii
i l h
y
A
x
x
( , ) ( , ){ ( ) [ ( , )]}
k i
obj i j i ii j k k i j
i j c C x Xm m m
f D x x D x yI l l w c p p
a a a
2 2|| || || ||
|| || || ||( , ) 2 || || || ||
|| || || ||
im i jm jTim jm i j i j
i j i j
m i j
x x x xx x x x x x
D x x x x
a x x
2 2A A
A AA A
2 2A A
A
26
Disambiguation Experiments
• Data set:
Abbr. Name#Public-
ations#Actual Person
Abbr. Name#Public-ations
#Actual Person
Cheng Chang 12 3 Gang Wu 40 16
Wen Gao 286 4 Jing Zhang 54 25
Yi Li 42 21 Kuo Zhang 6 2
Jie Tang 21 2 Hui Fang 15 3
Bin Yu 66 12 Lei Wang 109 40
Rakesh Kumar 61 5 Michael Wagner 44 12
Bing Liu 130 11 Jim Smith 33 5
27
Our Approach vs. Baseline
Data Set Person NameBaseline (Tan, 2006) Our Approach
Our Approach(w/o relation)
Prec. Rec. F1 Prec. Rec. F1 Prec. Rec. F1
Real Name
Cheng Chang 100.0 100.0 100.0 100.0 100.0 100.0 72.73 64.00 68.09Wen Gao 96.60 62.64 76.00 99.29 98.59 98.94 96.17 33.53 49.72
Yi Li 86.64 95.12 90.68 70.91 97.50 82.11 20.97 31.71 25.25Jie Tang 100.0 100.0 100.0 100.0 100.0 100.0 88.68 54.65 67.63Gang Wu 97.54 97.54 97.54 71.86 98.36 83.05 57.69 36.89 45.00
Jing Zhang 85.0 69.86 76.69 83.91 100.0 91.25 10.79 20.55 14.15Kuo Zhang 100.0 100.0 100.0 100.0 100.0 100.0 66.67 40.00 50.00Hui Fang 100.0 100.0 100.0 100.0 100.0 100.0 64.71 70.97 67.69
Bin Yu 67.22 50.25 57.51 86.53 53.00 65.74 26.50 23.25 24.77Lei Wang 68.45 41.12 51.38 88.64 89.06 88.85 24.13 25.16 24.63
Rakesh Kumar 63.36 92.41 75.18 99.14 96.91 98.01 67.11 43.04 52.45
Michael Wagner
18.35 60.26 28.13 85.19 76.16 80.42 36.89 50.33 42.57
Bing Liu 84.88 43.16 57.22 88.25 86.49 87.36 66.14 19.51 30.13Jim Smith 92.43 86.80 89.53 95.81 93.56 94.67 60.94 59.39 60.16
Avg. 82.89 78.51 80.64 90.68 92.12 91.39 54.29 40.93 46.67
29
Contribution of Relationships
0.00
20.00
40.00
60.00
80.00
100.00
Pre. Rec. F1
w/o Relationship
+CoConference
+Citation
+CoAuthor
All
30
Distribution Analysis
(1) All methods can achieve good performance
(2) Our method can achieve good performance
(3) Our method can obtain not bad results, but still need further improvements
32
The Academic Network
writewrite
cite
cite
cite
write
write
write
cite
Write
publish
publish
publish
publish
publish
publish
write
write
coauthor coauthor
Dr. Tang
Limin
Prof. Wang
Prof. Li
SVM...Association...
Tree CRF...
Semantic...EOS... Annotation...
IJCAI
ISWC
WWW
Pc member
Academic Network
Heterogeneous objects:
Paper, Person, Conf./Journal
Relationships:
•Conf./Journal publish paper
•Paper cite paper
•Person write paper
•Person is PC member of Conf./Journal
•Person is coauthor of person
Challenges:- How to model the heterogeneous objects in a unified approach?- How to apply the modeling approach to different applications?
33
Modeling the Academic Network
T
DNd
wzxad
β
Φ
α
A
θ
c
T
μ ψ
T
DNd
wzx
ad
β
Φ
α
ACθ
c
T
D
Nd
wz
β
Φ
c
η,σ2
ad x
α
A
θ
ACT1 ACT2 ACT3
authors
Topic
words
conference
34
Generative Story of ACT1 Model
• Generative process
Shafiei
Milios
NLP
MLDM
IR
ML
NLPIR
DM
Latent Dirichlet Co-clustering
Shafiei and Milios
We present a generative model for clustering documents and terms. Our model is a four hierarchical bayesian model. We present efficient inference techniques based on Markow Chain Monte Carlo. We report results in document modeling, document and terms clustering …
ICDM 0.23KDD 0.19….
mining 0.23clustering 0.19classification 0.17….
ICML 0.23NIPS 0.19….
model 0.23learning 0.19boost 0.17….
P(c|z)
P(w|z)
P(c|z)
P(w|z)
clustering
inference
ICDM
Paper
NIPS
35
ACT Model 1
Generative process:
T
DNd
wzxad
β
Φ
α
A
θ
c
T
μ ψ
ACT1
authors
Topic
words
conference
38
Hot topic on a conference
Applications
Researcher interests
Expertise search
Association search
T
DNd
wzxad
β
Φ
α
A
θ
c
T
μ ψ
Topic browser
39
Expertise Search
• Calculate the relevance of query q and different objects (i.e., papers, authors, and conferences)
• E.g.,( | ) ( | )
w qP q d P w d
( | ) ( | ) ( | )LM ACTP w d P w d P w d
( , ) ( , )( | ) (1 )d d
LMd d d
N Ntf w d tf wP w d
N N N
D
D
1 1
( | ) ( | , ) ( | , ) ( | )dAT
ACT z xz x
P w d P w z P z x P x d
40
Expertise Search Results
Arnetminer data: 14,134 authors 10,716 papers 1,434 confs/journals
Evaluation measures: pooled relevance + human judgement
Baselines: - Language Model (LM) - LDA - Author Topic (AT)
43
* Arnetminer data: > 0.5 M researcher profiles > 2M papers > 8M citation relationships > 4K conferences * Visits come from more than 165 countries* Continuously +20% increase of visits per month
* Currently, more than 1,500 unique-ip visits per day.
ArnetMiner Today
Top 10 countries1. USA 6. Canada2. China 7. Japan3. Germany 8. France4. India 9. Taiwan5. UK 10. Italy
47
Association Search
Finding associations between persons - high efficiency - Top-K associations
Usage: - to find a partner - to find a person with same interests
50
Acknowledgements
• National Science Foundation of China (NSFC)
• National 985 Funding
• Chinese Young Faculty Research Funding
• Minnesota-China Collaboration Project
• IBM CRL
• Tsinghua-Google Joint Research Project
• National Foundation Science Research (973)