40
February 11, 2008 WSDM 1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

Embed Size (px)

Citation preview

Page 1: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 1

Preferential Behavior inOnline Groups

Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

Page 2: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 2

Power Users

Page 3: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 3

Executive Summary of Preferential Treatment

Long term power-users are:

1) 20 times more likely to receive a response upon joining

2) Twice as likely to receive a response upon becoming heavily engaged

3) 9 times more likely to have early responses come from other power-users

Page 4: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 4

Outline

• Introduction to online groups

• Experimental set-up

• Statistics of group “cores”

• Statistics of heavily engaged users

• Preferential treatment of engaged users

• Model: Predicting deep engagement

Page 5: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 5

Outline

• Introduction to online groups

• Experimental set-up

• Statistics of group “cores”

• Statistics of heavily engaged users

• Preferential treatment of engaged users

• Model: Predicting deep engagement

Page 6: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 6

Online Groups

• A majority of internet users participate in some form of “online group” related to hobbies, beliefs or offline relationships (Pew 2001)

• Groups vary along a number of dimensions:– Scale– Online vs. offline relationships– Broadcast, Q/A, and interaction– etc.

• Examples– Ithaca Rotary Club mailing list– Palo Alto Parenting group– Aerosmith fan club on MySpace

Page 7: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 7

Yahoo Groups

• 100 million users, 6 million groups• Can be created by any user. This user becomes the

moderator of the group– controls privacy settings, access, memberships, etc.

• Content includes information pages, multimedia content, and message boards.

• Majority of contents resides in message boards.– Members may post a message on a new topic, or respond to a

message posted earlier– Users may read content online, or receive by email– ~6 million groups, ~6 billion messages

• We used data from one year: May 2005 - May 2006

Page 8: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 8

Privacy and Size

• Analysis performed on several categories of groups

• Size– Small: fewer than 20 unique posters– Medium: 20-99 unique posters– Large: greater than 100 posters

• Privacy– Public: open and listed or open and unlisted– Semi-public: restricted and listed– Private: closed and listed, closed and unlisted or restricted and

unlisted

Page 9: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 9

Yahoo Groups

Page 10: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 10

Outline

• Introduction to online groups

• Experimental set-up

• Statistics of group “cores”

• Statistics of heavily engaged users

• Preferential treatment of engaged users

• Model: Predicting deep engagement

Page 11: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 11

Engaged Users & Thriving Groups

• Various degrees of user engagement.– lurkers, to heavily engaged

• Our focus: users who are heavily engaged in the group, with a high level of posting activity

• What differentiates these engaged users? Are they treated differently? Do they behave differently?

• Look at “thriving” groups

Page 12: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 12

Thriving Groups

Three requirements to be a “thriving” group:

1) Baseline Users– At least 10 users must post during the year

2) Baseline traffic– at least two messages for every 30 day window.

3) Dense period– a two-month period during which every 7-day interval has at least

10 posts

New corpus: 44,473 groups, 1M users

Page 13: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 13

k-Cores

• We define the k-core of a group at time t as follows:• For a two week window around t, a user is in the k-core if:

– the user has replied to k distinct users in the group– the user has been replied to by k distinct users

3-core user

Page 14: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 14

Outline

• Introduction to online groups

• Experimental set-up

• Statistics of group “cores”

• Statistics of heavily engaged users

• Preferential treatment of engaged users

• Model: Predicting deep engagement

Page 15: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 15

Core Size

48% of group/time pairs have a 2-core of at least 6 people

Page 16: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 16

Fraction of Posters in Core

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Small Medium Large

PrivateSemi-publicPublic

Page 17: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 17

Time Spent in Core

small-private groups: 20% in core less than 2 weeks

large public groups: 94% in core less than 2 weeks

small private groups: 48% in core for 200+ days

Page 18: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 18

Half-life of Cores

Page 19: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 19

Core Populations

• Light: briefly enters the conversation, i.e., don’t enter the core

• Short Core: enters the core for less than 50 days

• Long Core: enters the core for 50 days or more

Light 774k

Short core 134k

Long core 90k

Page 20: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 20

Outline

• Introduction to online groups

• Experimental set-up

• Statistics of group “cores”

• Statistics of heavily engaged users

• Preferential treatment of engaged users

• Model: Predicting deep engagement

Page 21: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 21

Long Core Users Across Groups

(6, 0.55) = long-core in first 6 groups joined, 55% probability of being long-core in the 7th

Page 22: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 22

Multiple Memberships

probability

Page 23: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 23

Outline

• Introduction to online groups

• Experimental set-up

• Statistics of group “cores”

• Statistics of heavily engaged users

• Preferential treatment of engaged users

• Model: Predicting deep engagement

Page 24: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 24

Preferential Treatment of Engaged Users

• Are engaged, or “long-core”, users treated differently within a group?

• Yes! We detail three key forms of preferential treatment given to heavily engaged users.

Page 25: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 25

Response to Newcomer

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Light Short-Core Long-core

Private Small Semi-Public Small Public Small Private Medium Semi-public MediumPublic Medium Private Large Semi-Public LargePublic Large

Page 26: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 26

Response to Core Members

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

Long-core Short-core

PublicSemi-PublicPrivate

Page 27: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 27

Response Probability by Newcomer Type

00.10.20.30.40.50.60.70.80.9

1

Light Short-core Long-core

Newcomer Type

Probability

Long-coreColumn 2Light

Page 28: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 28

Long-core Response Probability

0.28

0.29

0.3

0.31

0.32

0.33

0.34

At time ofjoining core

When in Core50+ Days

PublicSemi-PublicPrivate

Page 29: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 29

Summary of Preferential Treatment

Heavily engaged “long-core” users are:

1) 20 times more likely to receive a response upon joining

2) Twice as likely to receive a response upon becoming heavily engaged

3) 9 times more likely to have early responses come from other long-core users

Note: Probability of receiving a response increases until joining the core, then begins to decline.

Page 30: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 30

First Post Types

100 first posts of long-core users:

Friends: the newcomer has some prior relationship with another group member

Introduction: the new poster introduces herself to the group

No decision: no information to determine a relationship

No decision 57%

Introduction 37%

Friends 6%

Page 31: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 31

Pregnancy-and-pups: Coccidiosis

I am new to this board and I have enjoyed reading the posts. I am hoping you can help me learn more about coccidiosis.

I have a litter of puppies whose stool is good. They have been on Albon for 10 days. One of the puppies went home and to the vet today and has coccidiosis. Why is that? What else can I use to be sure the puppies are free of cocci?

I do appreciate all your input and all your time in helping me!

Page 32: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 32

skatefans: Appropos of Barbara Cook

Hi! This is my first "Skatefans" post. (I've been reading -- don't like the word "lurking" -- Skatefans since about the summer of 2000, but haven't had the time to post before.)While everyone, of course, is entitled to their own likes and dislikes, I'd just like to add some thoughts about "Fosse" from someone who's been a very big fan of that program. I've only been following figure skating since 1997 so my frame of reference is obviously limited but, in terms of the exhibition programs that I've seen duing this period, I think "Fosse" is one of the "landmark" exhibition programs that I've seen (although I can also see some of the problems with it that people have been pointing out).

Page 33: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 33

Outline

• Introduction to online groups

• Experimental set-up

• Examination of group “cores”

• Statistics of heavily engaged users

• Preferential treatment of engaged users

• Model: Predicting deep engagement

Page 34: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 34

Modeling Long-Core Engagement

Factors at work creating long-core engagement:

• User factor: a user’s personality causes her to become long-core in every group she joins

• Group factor: a group is so welcoming, or its topic so engaging, that users are likely to become long-core

Page 35: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 35

Model

Users Groups

Chance of being long-core in a group: 0.7

Chance of random member being a long-core user: 0.3

Page 36: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 36

Model

• For each (u,g) pair, predict whether pair is in set of memberships H that are long-core.

• Pr[(u,g) in H] = 1 - (1-p(u))(1-p(g))• Task is to choose the best p(u) and p(g) to reproduce H,

the set of long-core memberships• Evaluate quality by the likelihood of predicting H

• Consider three variants:– Use only properties of users, p(g) = 0– Use only properties of groups, p(u) = 0– Allow both p(u) and p(g) to be arbitrary

Page 37: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 37

Analytical Results

Model % of correct edges

User-only p(g) = 0 94.9

Group-only p(u) = 0 85.6

Combined 95.1

Page 38: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 38

Improvement Using Group Factor

Page 39: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 39

Fin

• Social analysis of one of the world’s largest collections of online communities

• Proposed a partitioning of the data to select for active communities of engaged users

• Examined several levels of engagement: “light”, “short-core”, and “long-core”

• Identified several striking ways in which heavily engaged users are given preferential treatment from other members of the group

• Proposed a model to study factors contributing to long-term engagement and showed that both user and group factors play a role.

Page 40: February 11, 2008WSDM1 Preferential Behavior in Online Groups Lars Backstrom, Ravi Kumar, Cameron Marlow, Jasmine Novak, Andrew Tomkins

February 11, 2008 WSDM 40

Fin++

Special thanks to the Groups team:Di-fa Chang, Lee Clancy, David Kopp, Bobby Lee,

Maria Saltz and Gordon Strause.

Ravi Kumar, Jasmine Novak & Andrew Tomkins

{ravikuma,jnovak,atomkins}@yahoo-inc.com

Lars Backstrom [email protected]

Cameron Marlow [email protected]