Upload
damon-dorsey
View
216
Download
0
Embed Size (px)
Citation preview
School of InformationUniversity of Michigan
Expertise Sharing Dynamics in Online Forums
Lada Adamic
joint work with Jun Zhang, Mark Ackerman, Eytan Bakshy, Jiang Yang
School of Information, University of Michigan
Outline of talk
Knows
Knowledge iN
Oozing out knowledge
Knowledge In ``Knowledge search is like oozing out knowledge in human brains to the Internet. People who know something better than others can present their know-how, skills or knowledge''
NHN CEO Chae Hwi-young
“(It is) the next generation of search… (it) is a kind of collective brain -- a searchable database of everything everyone knows. It's a culture of generosity. The fundamental belief is that everyone knows something.”
-- Eckart Walther (Yahoo Research)
Limitations of Current Systems
• The Response Time Gap
4939N =
ExpertiseRating
lowhigh
WAITTIME(min)
10000
9000
8000
7000
6000
5000
4000
3000
2000
1000
0
69
96
41
• The Expertise Gap • Difficult to infer reliability of answers
Automatically ranking expertise may be helpful.
Related work
• Analysis of online communities• NetScan (Smith, Fisher, et al. at Microsoft)• Social network analysis (LiveJournal, blog communities)• Motivations of online participation (Lakhani & Hippel, Kraut)
• Graph-based ranking algorithms• PageRank, HITS, etc.
• Expertise sharing studies• Expertise recommenders
• ContactFinder (Krulwich et al.), Answer Garden (Ackerman)• Small Blue (Lin)
• Automatic evaluation of expertise levels• Using different text resources (Kautz, et al, and a lot of others)• Using email networks (Campbell et al.)
• Best answers correspond to best questions (Agichtein et al.)
Java Forum
• 87 sub-forums• 1,438,053
messages• community
expertise network constructed:• 196,191 users• 796,270 edges
Constructing a community expertise network
A B C
Thread 1 Thread 2
Thread 1: Large Data, binary search or hashtable? user ARe: Large... user BRe: Large... user C
Thread 2: Binary file with ASCII data user ARe: File with... user C
A
B
C
1
1
A
B
C
1
2
A
B
C
1/2
1+1//2
A
B
C
0.9
0.1
unweighted
weighted by # threads
weighted by shared credit
weighted with backflow
Uneven participation
100
101
102
103
10-4
10-3
10-2
10-1
100
degree (k)
cum
ula
tive
prob
abili
ty
= 1.87 fit, R2 = 0.9730
number of people one received replies from
number of people one replied to
• ‘answer people’ may reply to thousands of others
• ‘question people’ are also uneven in the number of repliers to their posts, but to a lesser extent
Not Everyone Asks/Replies
• Core: A strongly connected component, in which everyone asks and answers • IN: Mostly askers.• OUT: Mostly Helpers
The Web is a bow tie The Java Forum network is
an uneven bow tie
fragment of the Java Forum
Relating network structure to Java expertise
• Human-rated expertise levels• 2 raters• 135 JavaForum users with >= 10 posts• inter-rater agreement (= 0.74, = 0.83)• for evaluation of algorithms, omit users where raters disagreed by
more than 1 level (= 0.80, = 0.83)
L Category Description
5 Top Java expert Knows the core Java theory and related advanced topics deeply.
4 Java professional Can answer all or most of Java concept questions. Also knows one or some sub topics very well,
3 Java user Knows advanced Java concepts. Can program relatively well.
2 Java learner Knows basic concepts and can program, but is not good at advanced topics of Java.
1 Newbie Just starting to learn java.
Algorithm Rankings vs. Human Ratings
simple local measures do as well (and better) than measures incorporating the wider network topology
Top K Kendall’s Spearman’s
# answersz-score # answersindegreez-score indegreePageRankHITS authority
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
10121617181920192N =
LEVCOM
1098765432
RANK of PRANK
160
140
120
100
80
60
40
20
0
- 20
9281
5
68
1
10121617181920192N =
LEVCOM
1098765432
RANK of REPLY
140
120
100
80
60
40
20
0
- 20
40
101
10121617181920192N =
LEVCOM
1098765432
RANK of ZTHREADS
160
140
120
100
80
60
40
20
0
- 20
40
1011
10121117171917192N =
LEVCOM
1098765432
RANK of HITS_AUT
140
120
100
80
60
40
20
0
- 20
33
automated vs. human ratings
# answers
human rating
auto
mat
ed r
anki
ng
10121617181920192N =
LEVCOM
1098765432
RANK of INDGR
160
140
120
100
80
60
40
20
0
- 20
40
101
10121117171917192N =
LEVCOM
1098765432
RANK of ZDGR
140
120
100
80
60
40
20
0
- 20
106104
z # answers
HITS authority
indegree
z indegree
PageRank
Modeling community structure to explain algorithm performance
Control Parameters: Distribution of expertise Who asks questions most often? Who answers questions most often? best expert most likely someone a bit more expert
ExpertiseNet Simulator
Simulating probability of expertise pairing
0 1 2 3 4 50
1
2
3
4
5
replier expertise
asker expertise
0.02
0.04
0.06
0.08
0.1
0.12
0 1 2 3 4 50
1
2
3
4
5
replier expertise
asker expertise
0
0.05
0.1
0.15
suppose:
expertise is uniformly distributed
probability of posing a question is inversely proportional to expertise
pij = probability a user with expertise j replies to a user with expertise i
2 models:‘best’ preferred ‘just better’ preferred
iep ijij /~ )( −β iep ji
ij /~ )( −γ j>i
Visualization
Best “preferred” just better
Degree correlation profiles
best preferred (simulation) just better (simulation)
Java Forum Networkas
ker
inde
gree
aske
r in
degr
ee
aske
r in
degr
ee
It can tell us when to use which algorithms
Preferred Helper: ‘just better’
Preferred Helper: ‘best available’
Different ranking algorithms perform differently
In the ‘just better’ model, a node is correctly ranked by PageRank but not by HITS
Summary for a focused expertise sharing forum
• Expertise Networks have interesting characteristics• A set of useful metrics • Ranking algorithms are affected by network structures• Simulation as an analysis tool
• There are rich design opportunities• Find experts with the help of structural information (and content
analysis) • Predict good answers • Re-order questions/answers to match expertise
UIST2007: “Expertise-Level based Interface Personalization for Online Help-seeking Communities”
• Looking at diverse sets of question-answer forums (Yahoo Answers)
• Expertise across different topics
Everyone is not a Java expert, but everyone knows something...
cars & transportation
maintenance & repairs
beauty & style
hair
w/ Jun Zhang, Eytan Bakshy, Mark Ackerman
Yahoo! Answers
• A community-driven knowledge market site, allowing users to ask and answer questions
• About 80 million answers since launch in Dec 2005• Similar sites: Naver, Baidu-knows
• our sample• 1 month• 8,452,337 answers• 1,178,983 questions• unique repliers: 433,402• unique askers: 495,414• users who are both askers and helpers: 211,372
length of answer vs. # of answers
Celebrities
MusicMathematics
Physics
Languages
ChemistryCooking & recipesProgramming & design
Jokes & Riddles
Parenting
Polls &Surveys
Baby NamesWrestling
Small business
Genealogy
thread length
cont
ent l
engt
h
network structure of a technical forum
programming & design
network structure of a discussion forum
politics
Clustering categories: technical, advice/support, discussion
Breadth of user’s post patterns
• Sum entropies at each level
cars & transportation
maintenance & repairs
beauty & style
hair
)log( ,, iLi
iLL ppH ∑−= ∑=L
LT HH
0.3 0.7
0.70.1 0.2
L=1
L=2
car audio
thanks to Mark Newman
user with entropy = 0
all answers are in the Pets > Dogs subcategory
user with medium entropy
_Travel *1*
_Asia_Pacific *1*
_Pets *1*
__General___Pets *1*
_Sports *18*
__Cricket *18*
_Society___Culture *19*
_Cultures___Groups *1*
__Religion___Spirituality *18*
_Arts___Humanities *1*
__Philosophy *1*
===== entropy ======
L1 entropy: 0.99
L2 entropy: 1.09
total: 2.08
user with medium entropy
_Beauty___Style *32*
__Makeup *7*
__General___Beauty___Style *3*
__Fashion___Accessories *12*
__Hair *7*
_Skin___Body *3*
_Family___Relationships *8* __Singles___Dating *6* __Friends *1*
__Family *1*
===== entropy ======
L1 entropy: 0.50
L2 entropy: 1.83
total: 2.33
focused on a few top level categories, during the sampled period
high entropy user
===== entropy ======
L1 entropy: 2.64
L2 entropy: 3.11
total entropy: 5.75
user with high entropy
Entropy and “expertise”
• consider 1st level categories and 2nd level entropies (HL=2) within them individually
level 1 category(ies)
Pearson (entropy,score)
p-value
computers & internet science and math
-0.22 10-7
family & relationships
-0.13 10-13
sports -0.01 0.65
Predicting best answers
lengthier replies
Predicting best answers
Programming Marriage wrestling
answer length + + +
thread length - - -
replier best answers
+ + +
replier answers - - -
R2 0.729 0.693 0.692
people who answer about X, ask about Y
Yahoo Answers Summary
• Everyone knows something• Differing network characteristics by category type• Focus corresponds to performance, primarily for “factual
categories”• Predicting best answers easiest for those categories
Competing to share expertise on Witkey sites
when money is given to best answers
• previously (Java Forum): asker -> replier• in Task CN: submitter -> winner
task prestige network
• If winners of other tasks lose in this task, this task is more prestigious...
network of task participants
• Same two users compete twice:
• same winner 77% of the time (compared to 1/2 chance)
• Same two users compete 3x:
• same winner all 3 times in 56% of the cases (compared to 1/4 chance)
Taskcn knowledge sharing community
• Askers offer cash reward for best “solution” to task
w/ Jiang Yang and Mark Ackerman
does money matter?
• more views• not more submissions on average• small negative correlation with task PageRank
• task may be a bit more difficult
• average prestige negatively correlated with number of participants
• but winner’s prestige positively correlated
does the number of participants matter?
predicting who will win
• Competing against few other and having a good track record is predictive of future wins
• Some users “learn” to improve their odds of winning by selecting less popular tasks
users learn to choose less competitive tasks
successful users shorten interval between wins
summary of Taskcn findings
• Amount of reward does not correlate with• # submissions• expertise level
• It does correlate with the number of views
• Can infer expertise from new formulation of expertise networks
• Can predict future probability of winning
• Overall, users don’t increase their winning rate over time• successful users
• choose less popular tasks• decrease intervals between wins
for more info• Competing to share expertise
Yang, J., Adamic, L., Ackerman, M.S., ICWSM2008
• Examining knowledge sharing on Yahoo Answers Adamic, L., Zhang, J., Ackerman, M.S., Knowledge Sharing and Yahoo Answers: Everyone knows something, WWW’08
• ExpertiseRank algorithms and evaluations Zhang, J., Ackerman, M.S., Adamic, L., Expertise Networks in Online Communities: Structure
and Algorithms, WWW’07
• Simulations of expertise networks Zhang, J., Ackerman, M.S., Adamic, L., CommunityNetSimulator: Using Simulations to Study
Online Community Network Formation and Implications, C&T2007
• QuME: A Mechanism to Support Expertise Finding In Online Help-seeking CommunitiesJ. Zhang, M. S. Ackerman, and L. A. Adamic, UIST2007, Newport, RI, 2007.
Lada Adamic [email protected]
http://www-personal.umich.edu/~ladamic
thanks!
Jun Zhang [email protected]
http://www-personal.si.umich.edu/~junzh
Mark Ackerman [email protected]
http://www.eecs.umich.edu/~ackerm/
Jiang Yang [email protected]
http://www.yangjiang.us/
Eytan Bakshy [email protected]
http://www-personal.umich.edu/~ebakshy
Thanks to: ARI, Intel, NSF 0325347