School of Information University of Michigan Expertise Sharing Dynamics in Online Forums Lada Adamic...

Preview:

Citation preview

School of InformationUniversity of Michigan

Expertise Sharing Dynamics in Online Forums

Lada Adamic

joint work with Jun Zhang, Mark Ackerman, Eytan Bakshy, Jiang Yang

School of Information, University of Michigan

Outline of talk

Knows

Knowledge iN

Oozing out knowledge

Knowledge In ``Knowledge search is like oozing out knowledge in human brains to the Internet. People who know something better than others can present their know-how, skills or knowledge''

NHN CEO Chae Hwi-young

“(It is) the next generation of search… (it) is a kind of collective brain -- a searchable database of everything everyone knows. It's a culture of generosity. The fundamental belief is that everyone knows something.”

-- Eckart Walther (Yahoo Research)

Limitations of Current Systems

• The Response Time Gap

4939N =

ExpertiseRating

lowhigh

WAITTIME(min)

10000

9000

8000

7000

6000

5000

4000

3000

2000

1000

0

69

96

41

• The Expertise Gap • Difficult to infer reliability of answers

Automatically ranking expertise may be helpful.

Related work

• Analysis of online communities• NetScan (Smith, Fisher, et al. at Microsoft)• Social network analysis (LiveJournal, blog communities)• Motivations of online participation (Lakhani & Hippel, Kraut)

• Graph-based ranking algorithms• PageRank, HITS, etc.

• Expertise sharing studies• Expertise recommenders

• ContactFinder (Krulwich et al.), Answer Garden (Ackerman)• Small Blue (Lin)

• Automatic evaluation of expertise levels• Using different text resources (Kautz, et al, and a lot of others)• Using email networks (Campbell et al.)

• Best answers correspond to best questions (Agichtein et al.)

Java Forum

• 87 sub-forums• 1,438,053

messages• community

expertise network constructed:• 196,191 users• 796,270 edges

Constructing a community expertise network

A B C

Thread 1 Thread 2

Thread 1: Large Data, binary search or hashtable? user ARe: Large... user BRe: Large... user C

Thread 2: Binary file with ASCII data user ARe: File with... user C

A

B

C

1

1

A

B

C

1

2

A

B

C

1/2

1+1//2

A

B

C

0.9

0.1

unweighted

weighted by # threads

weighted by shared credit

weighted with backflow

Uneven participation

100

101

102

103

10-4

10-3

10-2

10-1

100

degree (k)

cum

ula

tive

prob

abili

ty

= 1.87 fit, R2 = 0.9730

number of people one received replies from

number of people one replied to

• ‘answer people’ may reply to thousands of others

• ‘question people’ are also uneven in the number of repliers to their posts, but to a lesser extent

Not Everyone Asks/Replies

• Core: A strongly connected component, in which everyone asks and answers • IN: Mostly askers.• OUT: Mostly Helpers

The Web is a bow tie The Java Forum network is

an uneven bow tie

fragment of the Java Forum

Relating network structure to Java expertise

• Human-rated expertise levels• 2 raters• 135 JavaForum users with >= 10 posts• inter-rater agreement (= 0.74, = 0.83)• for evaluation of algorithms, omit users where raters disagreed by

more than 1 level (= 0.80, = 0.83)

L Category Description

5 Top Java expert Knows the core Java theory and related advanced topics deeply.

4 Java professional Can answer all or most of Java concept questions. Also knows one or some sub topics very well,

3 Java user Knows advanced Java concepts. Can program relatively well.

2 Java learner Knows basic concepts and can program, but is not good at advanced topics of Java.

1 Newbie Just starting to learn java.

Algorithm Rankings vs. Human Ratings

simple local measures do as well (and better) than measures incorporating the wider network topology

Top K Kendall’s Spearman’s

# answersz-score # answersindegreez-score indegreePageRankHITS authority

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0

10121617181920192N =

LEVCOM

1098765432

RANK of PRANK

160

140

120

100

80

60

40

20

0

- 20

9281

5

68

1

10121617181920192N =

LEVCOM

1098765432

RANK of REPLY

140

120

100

80

60

40

20

0

- 20

40

101

10121617181920192N =

LEVCOM

1098765432

RANK of ZTHREADS

160

140

120

100

80

60

40

20

0

- 20

40

1011

10121117171917192N =

LEVCOM

1098765432

RANK of HITS_AUT

140

120

100

80

60

40

20

0

- 20

33

automated vs. human ratings

# answers

human rating

auto

mat

ed r

anki

ng

10121617181920192N =

LEVCOM

1098765432

RANK of INDGR

160

140

120

100

80

60

40

20

0

- 20

40

101

10121117171917192N =

LEVCOM

1098765432

RANK of ZDGR

140

120

100

80

60

40

20

0

- 20

106104

z # answers

HITS authority

indegree

z indegree

PageRank

Modeling community structure to explain algorithm performance

Control Parameters: Distribution of expertise Who asks questions most often? Who answers questions most often? best expert most likely someone a bit more expert

ExpertiseNet Simulator

Simulating probability of expertise pairing

0 1 2 3 4 50

1

2

3

4

5

replier expertise

asker expertise

0.02

0.04

0.06

0.08

0.1

0.12

0 1 2 3 4 50

1

2

3

4

5

replier expertise

asker expertise

0

0.05

0.1

0.15

suppose:

expertise is uniformly distributed

probability of posing a question is inversely proportional to expertise

pij = probability a user with expertise j replies to a user with expertise i

2 models:‘best’ preferred ‘just better’ preferred

iep ijij /~ )( −β iep ji

ij /~ )( −γ j>i

Visualization

Best “preferred” just better

Degree correlation profiles

best preferred (simulation) just better (simulation)

Java Forum Networkas

ker

inde

gree

aske

r in

degr

ee

aske

r in

degr

ee

It can tell us when to use which algorithms

Preferred Helper: ‘just better’

Preferred Helper: ‘best available’

Different ranking algorithms perform differently

In the ‘just better’ model, a node is correctly ranked by PageRank but not by HITS

Summary for a focused expertise sharing forum

• Expertise Networks have interesting characteristics• A set of useful metrics • Ranking algorithms are affected by network structures• Simulation as an analysis tool

• There are rich design opportunities• Find experts with the help of structural information (and content

analysis) • Predict good answers • Re-order questions/answers to match expertise

UIST2007: “Expertise-Level based Interface Personalization for Online Help-seeking Communities”

• Looking at diverse sets of question-answer forums (Yahoo Answers)

• Expertise across different topics

Everyone is not a Java expert, but everyone knows something...

cars & transportation

maintenance & repairs

beauty & style

hair

w/ Jun Zhang, Eytan Bakshy, Mark Ackerman

Yahoo! Answers

• A community-driven knowledge market site, allowing users to ask and answer questions

• About 80 million answers since launch in Dec 2005• Similar sites: Naver, Baidu-knows

• our sample• 1 month• 8,452,337 answers• 1,178,983 questions• unique repliers: 433,402• unique askers: 495,414• users who are both askers and helpers: 211,372

length of answer vs. # of answers

Celebrities

MusicMathematics

Physics

Languages

ChemistryCooking & recipesProgramming & design

Jokes & Riddles

Parenting

Polls &Surveys

Baby NamesWrestling

Small business

Genealogy

thread length

cont

ent l

engt

h

network structure of a technical forum

programming & design

network structure of a discussion forum

politics

Clustering categories: technical, advice/support, discussion

Breadth of user’s post patterns

• Sum entropies at each level

cars & transportation

maintenance & repairs

beauty & style

hair

)log( ,, iLi

iLL ppH ∑−= ∑=L

LT HH

0.3 0.7

0.70.1 0.2

L=1

L=2

car audio

thanks to Mark Newman

user with entropy = 0

all answers are in the Pets > Dogs subcategory

user with medium entropy

_Travel *1*

_Asia_Pacific *1*

_Pets *1*

__General___Pets *1*

_Sports *18*

__Cricket *18*

_Society___Culture *19*

_Cultures___Groups *1*

__Religion___Spirituality *18*

_Arts___Humanities *1*

__Philosophy *1*

===== entropy ======

L1 entropy: 0.99

L2 entropy: 1.09

total: 2.08

user with medium entropy

_Beauty___Style *32*

__Makeup *7*

__General___Beauty___Style *3*

__Fashion___Accessories *12*

__Hair *7*

_Skin___Body *3*

_Family___Relationships *8* __Singles___Dating *6* __Friends *1*

__Family *1*

===== entropy ======

L1 entropy: 0.50

L2 entropy: 1.83

total: 2.33

focused on a few top level categories, during the sampled period

high entropy user

===== entropy ======

L1 entropy: 2.64

L2 entropy: 3.11

total entropy: 5.75

user with high entropy

Entropy and “expertise”

• consider 1st level categories and 2nd level entropies (HL=2) within them individually

level 1 category(ies)

Pearson (entropy,score)

p-value

computers & internet science and math

-0.22 10-7

family & relationships

-0.13 10-13

sports -0.01 0.65

Predicting best answers

lengthier replies

Predicting best answers

Programming Marriage wrestling

answer length + + +

thread length - - -

replier best answers

+ + +

replier answers - - -

R2 0.729 0.693 0.692

people who answer about X, ask about Y

Yahoo Answers Summary

• Everyone knows something• Differing network characteristics by category type• Focus corresponds to performance, primarily for “factual

categories”• Predicting best answers easiest for those categories

Competing to share expertise on Witkey sites

when money is given to best answers

• previously (Java Forum): asker -> replier• in Task CN: submitter -> winner

task prestige network

• If winners of other tasks lose in this task, this task is more prestigious...

network of task participants

• Same two users compete twice:

• same winner 77% of the time (compared to 1/2 chance)

• Same two users compete 3x:

• same winner all 3 times in 56% of the cases (compared to 1/4 chance)

Taskcn knowledge sharing community

• Askers offer cash reward for best “solution” to task

w/ Jiang Yang and Mark Ackerman

does money matter?

• more views• not more submissions on average• small negative correlation with task PageRank

• task may be a bit more difficult

• average prestige negatively correlated with number of participants

• but winner’s prestige positively correlated

does the number of participants matter?

predicting who will win

• Competing against few other and having a good track record is predictive of future wins

• Some users “learn” to improve their odds of winning by selecting less popular tasks

users learn to choose less competitive tasks

successful users shorten interval between wins

summary of Taskcn findings

• Amount of reward does not correlate with• # submissions• expertise level

• It does correlate with the number of views

• Can infer expertise from new formulation of expertise networks

• Can predict future probability of winning

• Overall, users don’t increase their winning rate over time• successful users

• choose less popular tasks• decrease intervals between wins

for more info• Competing to share expertise

Yang, J., Adamic, L., Ackerman, M.S., ICWSM2008

• Examining knowledge sharing on Yahoo Answers Adamic, L., Zhang, J., Ackerman, M.S., Knowledge Sharing and Yahoo Answers: Everyone knows something, WWW’08

• ExpertiseRank algorithms and evaluations Zhang, J., Ackerman, M.S., Adamic, L., Expertise Networks in Online Communities: Structure

and Algorithms, WWW’07

• Simulations of expertise networks Zhang, J., Ackerman, M.S., Adamic, L., CommunityNetSimulator: Using Simulations to Study

Online Community Network Formation and Implications, C&T2007

• QuME: A Mechanism to Support Expertise Finding In Online Help-seeking CommunitiesJ. Zhang, M. S. Ackerman, and L. A. Adamic, UIST2007, Newport, RI, 2007.

Lada Adamic ladamic@umich.edu

http://www-personal.umich.edu/~ladamic

thanks!

Jun Zhang junzh@umich.edu

http://www-personal.si.umich.edu/~junzh

Mark Ackerman ackerm@eecs.umich.edu

http://www.eecs.umich.edu/~ackerm/

Jiang Yang yangjian@umich.edu

http://www.yangjiang.us/

Eytan Bakshy ebakshy@umich.edu

http://www-personal.umich.edu/~ebakshy

Thanks to: ARI, Intel, NSF 0325347

Recommended