School of Information University of Michigan Expertise Sharing Dynamics in Online Forums Lada Adamic...

School of InformationUniversity of Michigan

Expertise Sharing Dynamics in Online Forums

Lada Adamic

joint work with Jun Zhang, Mark Ackerman, Eytan Bakshy, Jiang Yang

School of Information, University of Michigan

Outline of talk

Knowledge iN

Oozing out knowledge

Knowledge In ``Knowledge search is like oozing out knowledge in human brains to the Internet. People who know something better than others can present their know-how, skills or knowledge''

NHN CEO Chae Hwi-young

“(It is) the next generation of search… (it) is a kind of collective brain -- a searchable database of everything everyone knows. It's a culture of generosity. The fundamental belief is that everyone knows something.”

-- Eckart Walther (Yahoo Research)

Limitations of Current Systems

• The Response Time Gap

4939N =

ExpertiseRating

lowhigh

WAITTIME(min)

• The Expertise Gap • Difficult to infer reliability of answers

Automatically ranking expertise may be helpful.

Related work

• Analysis of online communities• NetScan (Smith, Fisher, et al. at Microsoft)• Social network analysis (LiveJournal, blog communities)• Motivations of online participation (Lakhani & Hippel, Kraut)

• Graph-based ranking algorithms• PageRank, HITS, etc.

• Expertise sharing studies• Expertise recommenders

• ContactFinder (Krulwich et al.), Answer Garden (Ackerman)• Small Blue (Lin)

• Automatic evaluation of expertise levels• Using different text resources (Kautz, et al, and a lot of others)• Using email networks (Campbell et al.)

• Best answers correspond to best questions (Agichtein et al.)

Java Forum

• 87 sub-forums• 1,438,053

messages• community

expertise network constructed:• 196,191 users• 796,270 edges

Constructing a community expertise network

Thread 1 Thread 2

Thread 1: Large Data, binary search or hashtable? user ARe: Large... user BRe: Large... user C

Thread 2: Binary file with ASCII data user ARe: File with... user C

1+1//2

unweighted

weighted by # threads

weighted by shared credit

weighted with backflow

Uneven participation

degree (k)

= 1.87 fit, R2 = 0.9730

number of people one received replies from

number of people one replied to

• ‘answer people’ may reply to thousands of others

• ‘question people’ are also uneven in the number of repliers to their posts, but to a lesser extent

Not Everyone Asks/Replies

• Core: A strongly connected component, in which everyone asks and answers • IN: Mostly askers.• OUT: Mostly Helpers

The Web is a bow tie The Java Forum network is

an uneven bow tie

fragment of the Java Forum

Relating network structure to Java expertise

• Human-rated expertise levels• 2 raters• 135 JavaForum users with >= 10 posts• inter-rater agreement (= 0.74, = 0.83)• for evaluation of algorithms, omit users where raters disagreed by

more than 1 level (= 0.80, = 0.83)

L Category Description

5 Top Java expert Knows the core Java theory and related advanced topics deeply.

4 Java professional Can answer all or most of Java concept questions. Also knows one or some sub topics very well,

3 Java user Knows advanced Java concepts. Can program relatively well.

2 Java learner Knows basic concepts and can program, but is not good at advanced topics of Java.

1 Newbie Just starting to learn java.

Algorithm Rankings vs. Human Ratings

simple local measures do as well (and better) than measures incorporating the wider network topology

Top K Kendall’s Spearman’s

# answersz-score # answersindegreez-score indegreePageRankHITS authority

10121617181920192N =

LEVCOM

1098765432

RANK of PRANK

10121617181920192N =

LEVCOM

1098765432

RANK of REPLY

10121617181920192N =

LEVCOM

1098765432

RANK of ZTHREADS

10121117171917192N =

LEVCOM

1098765432

RANK of HITS_AUT

automated vs. human ratings

# answers

human rating

10121617181920192N =

LEVCOM

1098765432

RANK of INDGR

10121117171917192N =

LEVCOM

1098765432

RANK of ZDGR

106104

z # answers

HITS authority

indegree

z indegree

PageRank

Modeling community structure to explain algorithm performance

Control Parameters: Distribution of expertise Who asks questions most often? Who answers questions most often? best expert most likely someone a bit more expert

ExpertiseNet Simulator

Simulating probability of expertise pairing

0 1 2 3 4 50

replier expertise

asker expertise

0 1 2 3 4 50

replier expertise

asker expertise

suppose:

expertise is uniformly distributed

probability of posing a question is inversely proportional to expertise

pij = probability a user with expertise j replies to a user with expertise i

2 models:‘best’ preferred ‘just better’ preferred

iep ijij /~ )( −β iep ji

ij /~ )( −γ j>i

Visualization

Best “preferred” just better

Degree correlation profiles

best preferred (simulation) just better (simulation)

Java Forum Networkas

It can tell us when to use which algorithms

Preferred Helper: ‘just better’

Preferred Helper: ‘best available’

Different ranking algorithms perform differently

In the ‘just better’ model, a node is correctly ranked by PageRank but not by HITS

Summary for a focused expertise sharing forum

• Expertise Networks have interesting characteristics• A set of useful metrics • Ranking algorithms are affected by network structures• Simulation as an analysis tool

• There are rich design opportunities• Find experts with the help of structural information (and content

analysis) • Predict good answers • Re-order questions/answers to match expertise

UIST2007: “Expertise-Level based Interface Personalization for Online Help-seeking Communities”

• Looking at diverse sets of question-answer forums (Yahoo Answers)

• Expertise across different topics

Everyone is not a Java expert, but everyone knows something...

cars & transportation

maintenance & repairs

beauty & style

w/ Jun Zhang, Eytan Bakshy, Mark Ackerman

Yahoo! Answers

• A community-driven knowledge market site, allowing users to ask and answer questions

• About 80 million answers since launch in Dec 2005• Similar sites: Naver, Baidu-knows

• our sample• 1 month• 8,452,337 answers• 1,178,983 questions• unique repliers: 433,402• unique askers: 495,414• users who are both askers and helpers: 211,372

length of answer vs. # of answers

Celebrities

MusicMathematics

Physics

Languages

ChemistryCooking & recipesProgramming & design

Jokes & Riddles

Parenting

Polls &Surveys

Baby NamesWrestling

Small business

Genealogy

thread length

network structure of a technical forum

programming & design

network structure of a discussion forum

politics

Clustering categories: technical, advice/support, discussion

Breadth of user’s post patterns

• Sum entropies at each level

cars & transportation

maintenance & repairs

beauty & style

)log( ,, iLi

iLL ppH ∑−= ∑=L

0.3 0.7

0.70.1 0.2

car audio

thanks to Mark Newman

user with entropy = 0

all answers are in the Pets > Dogs subcategory

user with medium entropy

_Travel *1*

_Asia_Pacific *1*

_Pets *1*

__General___Pets *1*

_Sports *18*

__Cricket *18*

_Society___Culture *19*

_Cultures___Groups *1*

__Religion___Spirituality *18*

_Arts___Humanities *1*

__Philosophy *1*

===== entropy ======

L1 entropy: 0.99

L2 entropy: 1.09

total: 2.08

user with medium entropy

_Beauty___Style *32*

__Makeup *7*

__General___Beauty___Style *3*

__Fashion___Accessories *12*

__Hair *7*

_Skin___Body *3*

_Family___Relationships *8* __Singles___Dating *6* __Friends *1*

__Family *1*

===== entropy ======

L1 entropy: 0.50

L2 entropy: 1.83

total: 2.33

focused on a few top level categories, during the sampled period

high entropy user

===== entropy ======

L1 entropy: 2.64

L2 entropy: 3.11

total entropy: 5.75

user with high entropy

Entropy and “expertise”

• consider 1st level categories and 2nd level entropies (HL=2) within them individually

level 1 category(ies)

Pearson (entropy,score)

p-value

computers & internet science and math

-0.22 10-7

family & relationships

-0.13 10-13

sports -0.01 0.65

Predicting best answers

lengthier replies

Predicting best answers

Programming Marriage wrestling

answer length + + +

thread length - - -

replier best answers

replier answers - - -

R2 0.729 0.693 0.692

people who answer about X, ask about Y

Yahoo Answers Summary

• Everyone knows something• Differing network characteristics by category type• Focus corresponds to performance, primarily for “factual

categories”• Predicting best answers easiest for those categories

Competing to share expertise on Witkey sites

when money is given to best answers

• previously (Java Forum): asker -> replier• in Task CN: submitter -> winner

task prestige network

• If winners of other tasks lose in this task, this task is more prestigious...

network of task participants

• Same two users compete twice:

• same winner 77% of the time (compared to 1/2 chance)

• Same two users compete 3x:

• same winner all 3 times in 56% of the cases (compared to 1/4 chance)

Taskcn knowledge sharing community

• Askers offer cash reward for best “solution” to task

w/ Jiang Yang and Mark Ackerman

does money matter?

• more views• not more submissions on average• small negative correlation with task PageRank

• task may be a bit more difficult

• average prestige negatively correlated with number of participants

• but winner’s prestige positively correlated

does the number of participants matter?

predicting who will win

• Competing against few other and having a good track record is predictive of future wins

• Some users “learn” to improve their odds of winning by selecting less popular tasks

users learn to choose less competitive tasks

successful users shorten interval between wins

summary of Taskcn findings

• Amount of reward does not correlate with• # submissions• expertise level

• It does correlate with the number of views

• Can infer expertise from new formulation of expertise networks

• Can predict future probability of winning

• Overall, users don’t increase their winning rate over time• successful users

• choose less popular tasks• decrease intervals between wins

for more info• Competing to share expertise

Yang, J., Adamic, L., Ackerman, M.S., ICWSM2008

• Examining knowledge sharing on Yahoo Answers Adamic, L., Zhang, J., Ackerman, M.S., Knowledge Sharing and Yahoo Answers: Everyone knows something, WWW’08

• ExpertiseRank algorithms and evaluations Zhang, J., Ackerman, M.S., Adamic, L., Expertise Networks in Online Communities: Structure

and Algorithms, WWW’07

• Simulations of expertise networks Zhang, J., Ackerman, M.S., Adamic, L., CommunityNetSimulator: Using Simulations to Study

Online Community Network Formation and Implications, C&T2007

• QuME: A Mechanism to Support Expertise Finding In Online Help-seeking CommunitiesJ. Zhang, M. S. Ackerman, and L. A. Adamic, UIST2007, Newport, RI, 2007.

Lada Adamic ladamic@umich.edu

http://www-personal.umich.edu/~ladamic

thanks!

Jun Zhang junzh@umich.edu

http://www-personal.si.umich.edu/~junzh

Mark Ackerman ackerm@eecs.umich.edu

http://www.eecs.umich.edu/~ackerm/

Jiang Yang yangjian@umich.edu

http://www.yangjiang.us/

Eytan Bakshy ebakshy@umich.edu

http://www-personal.umich.edu/~ebakshy

Thanks to: ARI, Intel, NSF 0325347

School of Information University of Michigan Expertise Sharing Dynamics in Online Forums Lada Adamic...

Documents

Rajat Talak, Sertac Karaman, Eytan Modianomodiano/papers/CV_C_212.pdfRajat Talak, Sertac Karaman, Eytan Modiano Abstract—Age of information (AoI) is a recently proposed metric for

Adamic and Edenic Covenant

Eytan Rosencwaig :: Design Portfolio

Data Networks Lecture 1: Introduction September 4, 2008web.mit.edu/modiano/www/6.263/Lecture1.pdf · Data Networks Lecture 1: Introduction September 4, 2008 Eytan Modiano. Eytan Modiano

Eytan World Congress Consumer Health

Rajat Talak, Sertac Karaman, Eytan ModianoRajat Talak, Sertac Karaman, Eytan Modiano Abstract—Age of information (AoI) is a recently proposed metric for measuring information freshness

fjiblt lLC5sons on jfirst llrinciplcs · fjiblt lLC5sons on jfirst llrinciplcs By ROBERT G. HUGGINS LESSON II THE ABRAHAMIC and DAVIDIAN COVENANTS I. The Adamic Inheritance ADAMIC

Lectures 22 & 23 Flow and congestion control · Lectures 22 & 23 Flow and congestion control Eytan Modiano Eytan Modiano Slide 1 Laboratory for Information and Decision Systems

Lecture 28 Network resilience Slides are modified from Lada Adamic

arXiv:1201.4145v2 [cs.SI] 28 Feb 2012 · PDF fileThe Role of Social Networks in Information Diffusion Eytan Bakshy Facebook 1601 Willow Rd. Menlo Park, CA 94025 ebakshy@fb.com Itamar

Social networks caught in the Web Lada Adamic University of Michigan

For Shira, Eytan, Noam, and Shani - Grasscity

1 A social network caught in the Web Lada Adamic and Eytan Adar (HP Labs, Palo Alto, CA) Orkut Buyukkokten (Google)

Information Evolution in Social Networks€¦ · Information Evolution in Social Networks. Lada A. Adamic, Thomas M. Lento, Eytan Adar, Pauling C. Ng (2016) Presentation for INFO

Design and Analysis of Benchmarking Experiments for ... · Design and Analysis of Benchmarking Experiments for Distributed Internet Services Eytan Bakshy Facebook Menlo Park, CA eytan@fb.com

Rajat Talak, Sertac Karaman, Eytan Modianomodiano/papers/CV_C_212.pdf · Rajat Talak, Sertac Karaman, Eytan Modiano Abstract—Age of information (AoI) is a recently proposed metric

arXiv:1402.6792v1 [cs.SI] 27 Feb 2014 · Information Evolution in Social Networks Lada A. Adamic 1; 2, Thomas M. Lento , Eytan Adar , Pauline C. Ng3 1 School of Information, University

REVIVING ADAMIC ADOPTIONISM: THE EXAMPLE OF JOHN MACQUARRIE

Data Networks Lecture 1 Introduction - MIT OpenCourseWare · PDF fileData Networks Lecture 1 Introduction Eytan Modiano Eytan Modiano ... circuit switching requires a call set-up during

How to search a social network - · PDF fileHow to search a social network Lada A. Adamic ladamic@hpl.hp.com Eytan Adar eytan@hpl.hp.com HP Labs, 1501 Page Mill Road Palo Alto, CA