Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction

MINING USER LIFECYCLES FROM ONLINE COMMUNITY PLATFORMS AND THEIR APPLICATION TO CHURN PREDICTION DR. MATTHEW ROWE SCHOOL OF COMPUTING AND COMMUNICATIONS @MROWEBOT | [email protected]

International Conference on Data Mining 2013 Dallas, USA

Identity Development: Offline

Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction

1

Development = conflicts

Development happens through

stages

User Development: ‘Online’


2

¨  Recently studied in isolated dimensions: ¤ Socially (Telecoms Networks: Miritello et al. 2013)

n Communication networks tend to a capacity

¤ Lexically (Online Communities: Danescu-Niculescu-Mizil et al. 2013) n  Language adapts to the community, before diverging

¨  Without analysing development: a)  Relative to earlier signals b)  Relative to the community of interaction

Understanding User Development enables…


3

1.  Recommender Systems ¤ Developmental Stage-based user

neighbourhoods (e.g. user-kNN) ¤ Modelling taste evolution (e.g. biases in MF)

n Current/future work (more later)

2.  Churn Prediction ¤ Forecast churners from development signals

n Focus of this talk

Table 2: Statistics of the revised review datasets used for our analysis and experiments. Reduction over theoriginal datasets are shown in parentheses.

Dataset #Users #Items #RatingsMovieLens 6,024 (-0%) 3,231 (-12.1%) 841,602 (-6.7%)Movie Tweetings 17,827 (-6.3%) 7,913 (-30.8%) 104,055 (-11.2%)Amazon 444,861 (-49.9%) 27,066 (-89.4%) 5,564,199 (-29.3%)Total 468,712 (-48.7%) 38,210 (-85.7%) 6,509,856 (-26.9%)

sion programmes. Again, we find that obscure films such as‘Cry of the Black Wolves’ was not aligned to a semantic URIdespite it having a IMDB page. We define the challenge ofaddressing the obscurity of movies, and thus their lack of aDBPedia entry, as the hipster dilemma. In the discussionssection of the paper we outline several strategies for deal-ing with this in the future given its presence not only inthis work but in the prior work of Ostuni et al. [6], and itpotential e↵ect on future work within this domain.

4.3.1 Category Ratings

To provide a first look into to the link between seman-tic categories and ratings, we examined the average ratingsof the top-2 most frequently reviewed categories across thedatasets. This was derived by recording the frequency ofcategories linked to by rated items within the training split,and then selecting the top-2 most frequent. In Figure 3 weplotted the development of the average rating score acrossthese two categories, derived using a 7-day moving averageto smooth the variance in rating fluctuations. We find thatthere are peaks and troughs in the reviewing of the itemsthat belong to the categories, in particular one can note thatfor MovieLens the scores remain relatively stable, while forMovie Tweetings ‘Independent Films’ reduce in their averagerating and ‘Directorial Debut films’ increase in their averagerating over time. Such information can be encoded withinthe biases of the recommendation models and consider thestability of a given bias in light of when the rating is beingmade: i.e. considering the fluctuation of the rating signaland how this relates to previous fluctuations.

May Jul Sep Nov

3.0

3.2

3.4

3.6

3.8

4.0

Time

Aver

age

Rat

ing

Directorial Debut Films1990s Comedy Films

(a) Lens

Mar Apr May Jun Jul Aug

5.0

6.0

7.0

8.0

Time

Aver

age

Rat

ing

Independent FilmsDirectorial Debut Films

(b) Tweetings

1998 2002 2006 2010

01

23

45

Time

Aver

age

Rat

ing

American FilmsBlack and White Films

(c) Amazon

Figure 3: Average ratings derived using a 7-daymoving average of the top-2 most frequently ratedcategories.

5. ANALYSING TASTE EVOLUTIONAnalysing the evolution and development of users’ tastes

allows one to understand how a given user is likely to ratea given item in the future. In the context of this paper we

concentrate on movie items; this domain is unique for en-compassing a variety of genres, where movies can be placedin categories (as described above). We are interested in try-ing to understand three things: (i) global evolution dynam-ics, describing how users in general transition and developin their preferences over time; (ii) local evolution dynamics,in terms of how a specific user is evolving his tastes; andfinally (iii) the association between global and local dynam-ics. This latter aspect concerns the influence that generaltrends of taste evolution have on a given user: we postulatethat certain users will be more greatly influenced by suchdynamics than others, and this is something that we controlfor in our recommendation model which follows this section.In this section we analyse for global and local dynamics, andmine the relation between local and global dynamics in in-fluencing users’ tastes. In particular, given our use of threedi↵erent datasets derived from three distinct sources, we areinterested in understanding how local and global taste evo-lution di↵ers, and the influence that one has on the otherfor the platforms’ users.

5.1 Preamble: NotationFrom this point onwards we are entering the realm of rec-

ommender systems, and therefore to aid comprehension andease of legibility we reserve the following special charactersfor set notations, as follows:

• u, v denote users• i, j denote items• r denotes a known rating value (where r 2 R), and r̂

denotes a predicted rating value• Datasets are provided as quadruples of the form (u, i, r, t) 2

D and are segmented into training (Dtrain

) and testing(D

test

) datasets by the above mentioned cuto↵ pointssuch that D

train

\Dtest

= ;• c, c0 denote category concepts within the linked data

graph and C denotes the set of categories. The graphitself is denoted by: G = hC,Ei, where e

c,c

0 2 E de-notes the set of edges, or concept relations, that con-nect concepts together within the graph. We use a di-rected graph and only consider hierarchical relations:i.e. e

c,c

0 denotes the edge connecting c and c0 throughthe triple c skos:broader c0 such that c has a broader,and thus more general category, denoted by c0.

• � denotes a mapping function between items and se-mantic categories, realised through the disambigua-tion of semantic URIs for items. The mapping func-tion either maps items to their direct parent cate-gories: �

p

: I⇥C;3 or maps items to their transitively-connected grandparent categories: �

g

: I ⇥ C.4

3I.e. Using the statement: i dcterms:subject c4I.e. Via: i dcterms:subject ?p, ?p skos:broader c

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

False Positive Rate

True

Pos

itive

Rate

EntropyPeriod EntropyCommunity EntropyIn−degreeOut−degreeLexicalAll

Outline


4

¨  Datasets: Online Community Platforms ¨  Defining User Lifecycles and Properties ¨  Mining Lifecycle Trajectories ¨  Predicting Churners ¨  Findings and Conclusions ¨  Future Work

Datasets: Online Community Platforms


5

1.  Facebook ‘Open University’ Groups ¤ Containing discussions about courses and degrees

2.  SAP Community Network ¤ Question-answering system for SAP technologies

3.  Server Fault ¤ Stack Overflow subsidiary site for server-related issues

examination of user lifecycles we used data collected from Facebook, the SAPCommunity Network (SAP) and Server Fault. Table 1 provides summary statis-tics of the datasets where we only considered users who had posted more than 40times within their lifetime on the platform.1 The Facebook dataset was collectedfrom groups discussing Open University courses, where users talked about theirissues with the courses and guidance on studying. The SAP Community Networkis a community question answering system related to SAP technologies whereusers post questions and provide answers related to technical issues. Similarly,Server Fault is a platform that is part of the Stack Overflow question answeringsite collection2 where users post questions related to server-related issues. Wedivided each platform’s users up into 80%/20% splits for training (and analysis)and testing, using the former in this section to examine user development andthe latter split for our later detection experiments.

Table 1. Statistics of the online community platform datasets.

Platform Time Span Post Count User CountFacebook [18-08-2007,24-01-2013] 118,432 4,745SAP [15-12-2003,20-07-2011] 427,221 32,926Server Fault [01-08-2008,31-03-2011] 234,790 33,285

3.1 Defining Lifecycle Periods

In order to examine how users develop over time we needed some means tosegment a user’s lifetime (i.e. from the first date at which they post to the dateof their final post) into discrete intervals. Prior work [6, 2, 5] has demonstratedthe extent to which users develop at their own pace and thus evolve accordingto their own ‘personal clock ’ [5]. Hence, for deriving the lifecycle periods of userswithin the platforms we adopted an activity-slicing approach that divided auser’s lifetime into 20 discrete time intervals, emulating the approach in [2], butwith an equal proportion of activity within each period. This approach functionsas follows: we derive the set of interval tuples ({[ti, tj ]} 2 T ) by first derivingthe chunk size (i.e. the number of posts in a single period) for each user, we thensort the posts in ascending date order, before deriving the start and end pointsof each interval in an incremental manner. This derives the set of time intervalsT that are specific to a given user.

3.2 Modelling User Properties

Based on the defined lifecycle periods, we now move on to defining user proper-ties, capturing social and lexical dynamics, and tracking how the properties ofusers change over time.

In-degree and Out-degree Distributions. Starting with social dynamics,we assessed the in-degree and out-degree distributions of users: the in-degree

1 Choosing 40 posts so that we had at least 2 posts per lifecycle period.2http://stackoverflow.com/

User Lifecycles: Derivation


6

Time

Primary School High School University Postgrad Postdoc Lecturing

Novice Users Asking Questions Asking & Answering Questions

Answering Questions

1 2 3 … n

1 2 3 … n

Offline Lifecycle Periods

Lifecycle Periods of a potential Question-Answering System user (conjecture!)

In reality: do not know the labels, however we can split by equal time intervals:

Yet, users non-uniformly distribute their activity across lifecycles

First Post Last Post

User Lifecycles: Properties


7

1 2 3 … n

¨  Capture period-specific user properties (in period s): ¤  In-degree distribution

n  Relative frequency distribution of senders to user u in period s ¤  Out-degree distribution

n  Relative frequency distribution of recipients from user u in s ¤  Term distribution

n  Relative frequency distribution of terms used by u in s

1 2 Divide lifetime into equal activity periods

#posts #posts =

s

We set n=20

Analysing Development: Period Entropy


8

¨  Variation in users’ properties across periods ¨  Computed period entropy for each property

interaction as follows:3

Pr(uj | !IN[t,t!]) =

|{q : p ! Pui, q ! Puj

, t " time(q) < t!, q # p}|!

uk"!IN[t,t!]

|{q : p ! Pui, q ! Puk

, t " time(q) < t!, q # p}|

(1)

And for an arbitrary user (uj ! !outuit ) who user ui has

contacted within time segment t we define the probabilityof interaction as follows:

Pr(uj | !OUT[t,t!] ) =

|{p : p ! Pui, q ! Puj

, t " time(p) < t!, p # q}|!

uk"!OUT[t,t!]

|{p : p ! Pui, q ! Puk

, t " time(p) < t!, p # q}|

(2)

Given this formulation we now have time-dependent dis-crete probability distributions for a given user’s in-degreeand out-degree distribution, thereby allowing the social

changes of users to be analysed in terms of the userscommunicating with a given user over time.

2) Term Distribution: We model the lexical dynamics ofusers based on their term usage over time. To derive theset of terms we first retrieve all posts made by a given userwithin a time interval and then remove stop words and filterout any punctuation. Having derived the set of cleaned posts,we then define the discrete probability distribution for a userui within interval [t, t!] based on the conditional probabilityof term x being used within the time interval. We define amultiset as containing the set of terms used by ui in a giventime period: x ! C[t,t!] and a mapping function µ : C[t,t!] #N that returns the multiplicity of a given term’s usage by theuser at a given time period. Thus we define the conditionalprobability for term x being used by ui during [t, t!] as:

Pr(x | [t, t!]) =µ(x)

!

x!"Cuitµ(x!)

(3)

C. Analysing User Lifecycles

To examine how users’ properties evolve throughout theirlifecycles we inspected distribution changes using threemeans: (i) analysing the variation within the distribution ineach lifecycle period; (ii) analysing the minimal change invariation when comparing one lifecycle period with earlierperiods, thus informing how the user is changing relativeto past behaviour; and (iii) analysing the variation in onelifecycle period relative to how the platform varies over thesame time interval.

1) Inspecting Individual Periods (Period Entropy): Toanalyse the variation in a user’s properties within a givenlifecycle period we derived the entropy of each probabilitydistribution within each lifecycle period. Entropy describesthe amount of variation within a random variable, and

3We use p ! q to denote message q replying to message p.

therefore provides a useful means to gauge how much agiven user is varying: (i) the people with whom he iscommunicating, and (ii) the terms that he is using withinhis posts. We define the entropy of an arbitrary probabilitydistribution P as follows:

H(P ) = $"

x

p(x) log p(x) (4)

For each platform (Facebook, SAP, and ServerFault) wederived the entropy of each user throughout their lifecycles,and thus each individual lifecycle period, based on the in-degree, out-degree and term distributions. We then recordedthe mean of these entropy values over each lifecycle period,thereby providing an assessment of the general changes thatusers go through. Figure 1 presents the lifecycle periodentropies for the different user properties and platforms,indicating differences between both. For instance, for users’in-degree we observe a slight increase for SAP and Facebookindicating that towards later portions of the lifecycle userstend to contact a wider array of people: this could bea symptom of users being replied to more often by thecommunity’s members as they become known. For out-degree we find a similar effect with an increase for SAP andFacebook indicating that users tend to communicate withmore people over time, again this is attributable to usersreplying to more people’s threads in order to help them withtheir requests.

When examining the distribution of terms, and thus thelexical dynamics of users, we find a degradation in entropyover time, indicating that users use less diverse language asthey develop in the community (for SAP and Facebook),however Server Fault users remain relatively stable. Thiscould be due to the relatively minor interaction effects thattake place on ServerFault: users largely lurk on the platformto seek answers to questions, and thus do not contributeunless it is necessary (i.e. they feel that their expertise issufficient to answer a question or that a new question isrequired), as a result it is likely that users have an implicitunderstanding of how one should formulate a post and thusthe language that should be used.

!!

!!

! !

!! !

!

! ! ! !! !

!

! !

!

0.1

0.3

0.5

0.7

Lifecycle Stages

Dis

trib

utio

n E

ntr

opy

0 0.2 0.5 0.8 1

! FacebookSAPServer Fault

(a) In-degree

! ! ! ! ! ! ! !!

! ! !! !

!! !

!!

!

0.4

0.6

0.8

Lifecycle Stages

Dis

trib

utio

n E

ntr

opy

0 0.2 0.5 0.8 1

(b) Out-degree

!

! ! ! !!

! ! !!

!! ! ! ! !

! ! ! !

2.5

3.0

3.5

4.0

4.5

Lifecycle Stages

Dis

trib

utio

n E

ntr

opy

0 0.2 0.5 0.8 1

(c) Lexical

Figure 1. Entropies of lifetime-stage distributions formed from users’in-degrees, out-degrees and lexical terms.

2) Historical Contrasts (Period Cross-Entropy): A lim-itation with focusing on time-period specific snapshots is

Generally stable trends: consistent variance in communication and terms

Analysing Development: Period Cross-Entropy


9

¨  Changes in properties relative to earlier ¨  Computed the minimised cross-entropy for each

property

that we eschew time-comparative assessments of how a useris changing relative to earlier properties. To inform suchcross-period assessment we examined the users’ in-degree,out-degree and term distributions across lifecycle periodsby computing the cross-entropy of one probability distri-bution with respect to another distribution from an lifecycleperiod, and then selecting the distribution that minimisescross-entropy. Assuming we have a probability distribution(P ) formed from a given lifecycle period ([t, t!]), and aprobability distribution (Q) from an earlier lifecycle period,then we define the cross-entropy between the distributionsas follows:

H(P,Q) = !!

x

p(x) log q(x) (5)

In the same vein as the earlier entropy analysis, wederived the period cross-entropy for each platform’s usersthroughout their lifecycles and then derived the mean cross-entropy for the 20 lifecycle periods. Figure 2 presents thecross-entropies derived for the different platforms and userproperties. We observe that for each distribution and eachplatform cross-entropies reduce throughout users’ lifecycles,suggesting that users do not tend to exhibit behaviour thathas not been seen previously. For instance, for the in-degreedistribution the cross-entropy gauges the extent to whichthe users who contact a given user at a given lifecyclestage differ from those who have contacted him previously,where a larger value indicates greater divergence. We findthat consistently across the platforms, users are contactedby people who have contacted them before and that fewernovel users appear. The same is also true for the out-degreedistributions: users contact fewer new people than they didbefore. This is symptomatic of community platforms wheredespite new users arriving within the platform, users formsub-communities in which they interact and communicatewith the same individuals. Figure 2(c) also demonstrates thatusers tend to reuse language over time and thus produce agradually decaying cross-entropy curve.

!!

!!

! ! ! ! ! ! ! ! ! !! ! ! ! !

0.0

00

.10

0.2

00

.30

Lifecycle Stages

Cro

ss E

ntr

opy

0 0.2 0.5 0.8 1


(a) In-degree

!

!

!

!!

!! !

! ! ! ! ! ! !! ! ! !

0.0

00

.05

0.1

00

.15

Lifecycle Stages

Cro

ss E

ntr

opy

0 0.2 0.5 0.8 1

(b) Out-degree

!

!! !

! ! ! ! ! ! ! ! ! ! ! ! ! ! !

0.0

0.4

0.8

1.2

Lifecycle Stages

Cro

ss E

ntr

opy

0 0.2 0.5 0.8 1

(c) Lexical

Figure 2. Cross-entropies derived from comparing users’ in-degree, out-degree and lexical term distributions with previous lifecycle periods. Wesee a consistent reduction in the cross-entropies over time.

3) Community Contrasts (Community Cross-Entropy):

For the third inspection of user lifecycles and how userproperties change, we examined how users compare with

the platform in which they are interacting over the sametime interval. We used the in-degree, out-degree and termdistributions and compared them with the same distributionsderived globally over the same time periods. For the globalprobability distributions we used the same means as forforming user-specific distributions, but rather than using theset of posts that a given user had authored (Pui

) to derivethe probability distribution, we instead used all posts. Forinstance, for the global in-degree distribution we used thefrequencies of received messages for all users. Given thediscrete probability distribution of a user from a time interval(P[t,t!]), and the global probability distribution over the sametime interval (Q[t,t!]), we derived the cross-entropy as abovebetween the distributions. (H(P[t,t!], Q[t,t!])).

As before we derived the community cross-entropy foreach platform’s users over their lifetimes and then calculatedthe mean community cross-entropy for the lifecycle periods.Figure 3 presents the plots of the cross-entropies for the in-degree, out-degree and term distributions over the lifecycleperiods. We find that for all platforms the community cross-entropy of users’ in-degree increases over time indicatingthat a given user tends to diverge in his properties fromusers of the platform. For instance, for the community cross-entropy of the in-degree distribution the divergence towardslater parts of the lifecycle indicates that users who reply to agiven user differ from the repliers in the entire community.This complements cross-period findings from above wherewe see a reduction in cross entropy, thus suggesting thatusers form sub-communities in which interaction is consis-tently performed within (i.e. reduction in new users joining).We find a similar effect for the out-degree of the userswhere divergence from the community is evident towardsthe latter stages of users’ lifecycles. The term distributiondemonstrates differing effects however: for Facebook andSAP we find that the community cross-entropy reducesinitially before rising again towards the end of the lifecycle,while for Server Fault there is a clear increase in communitycross-entropy towards the latter portions of users’ lifecyclessuggesting that the language used by the users actually tendsto diverge from that of the community in a linear manner.This effect is consistent with the findings of Danescu et al.[2] where users adapt their language to the community tobegin with, before then diverging towards the end.

V. MINING LIFECYCLE TRAJECTORIES

Inspecting how communities of users develop we haveconcentrated on assessments at the macro-level on eachplatform, examining how the social dynamics and lexical dy-namics of communities of users have changed over time. Wenow turn to examining how individual users evolve through-out their lifecycle periods. Understanding how individualusers develop over time in online community platformsallows for churners to be predicted, as we shall demonstratein the following section through our experiments, and also

Convergence on prior properties

Convergence: lack of communication with new people, or use of new terms

Analysing Development: Community Cross-Entropy


10

¨  Difference in properties relative to the community ¨  Computed cross-entropy for each property between

user @ [t,t’] and community @ [t,t’]

!! ! ! ! ! !

!!

! ! ! ! ! !!

! !

!

01

23

4

Lifecycle Stages

Cro

ss E

ntr

opy

0 0.2 0.5 0.8 1


(a) In-degree

!!

! !! !

! ! !! ! !

! ! !!

!!

!

2.0

3.0

4.0

5.0

Lifecycle Stages

Cro

ss E

ntr

opy

0 0.2 0.5 0.8 1

(b) Out-degree

!!

! !! !

!!

!!

!!

!! !

!

!

!

!

6.0

7.0

8.0

Lifecycle Stages

Cro

ss E

ntr

opy

0 0.2 0.5 0.8 1

(c) Lexical

Figure 3. Cross-entropies derived from comparing users’ in-degree, out-degree and lexical term distributions the community platform over the sametime periods. We see a increased divergence towards the end of lifecycles.

informs how online community platforms differ in termsof user development. We model the development of usersthroughout their lifecycles by mining the lifecycle trajecto-ries along different user properties (in-degree, out-degree,terms) and development measures (entropy, period-cross-entropy, community-cross-entropy). Mining is performed byselecting a suitable model to represent changes in userproperties based on community development, before thensetting the model for individual users. We begin this sectionby explaining entropy trajectories and the mining process.

A. Modelling Entropy Trajectories

Our prior analysis of the entropy of user propertiesindicated that users exhibited a generally stable entropythroughout their lifecycle periods. Therefore we chose thelinear regression model as a suitable model for the devel-opment of user properties, setting the explanatory variableto be the lifecycle period of the user and the responsevariable to be the user property’s entropy. In modellingentropy development we can characterise each user using theslope (!) of the model, thus indicating the rate of change ofentropy throughout the lifecycle periods. We induced user-specific entropy models for each platform’s users and thenexamined the cumulative frequency distribution of the !-values for the different user properties and platforms, theseare shown in Figure 4.

!4 0 2 4 6 8

0.0

0.4

0.8

!

F(x

)

FacebookSAPServer Fault

(a) In-degree

!2 !1 0 1 2 3

0.0

0.4

0.8

!

F(x

)

(b) Out-degree

!3 !1 0 1 2 3

0.0

0.4

0.8

!

F(x

)

(c) Lexical

Figure 4. Cumulative frequency distributions of linear regression models’! coefficients for the entropy of user property distributions.

The plots indicate that each platform and each userproperty has a distribution with a mean slope being closeto 0, with greater variance in the slopes of users for their

lexical entropy. This indicates that for certain users theirentropy reduces over time, while for others it tends toincrease. We omit the coefficient of determinant (R2) plotshere due to brevity, however they show that for >80% ofusers R2 > 0.4.

B. Modelling Cross-Period Trajectories

Inspection of the period cross-entropy values obtainedearlier, by deriving the minimum cross-entropy when com-paring user properties with past properties, indicated cleardecreasing trends. That is, users converge on their pastbehaviour over time. This suggests that an exponential decaymodel would be suitable for describing such reductionsthroughout user’s lifecycles. Applying such a model requiresthat users reduce in their period cross-entropy values overtime. To examine whether this was indeed the case wedefined the measure "ui

that returns the average propor-tional change value in period cross-entropy for a givenuser throughout their lifecycles, letting f(ui, [t, t!]) denote afunction that returns the period cross-entropy of an arbitraryuser property (e.g. in-degree) for a given user and timeinterval:

"ui=

1

|T |! 1

!

[t,t!],[t!,t!!]"T,t<t!<t!!

f(ui, [t, t!])! f(ui, [t!, t!!])

f(ui, [t, t!])

(6)

By deriving the distribution of average proportionalchange values (") across the different platforms and userproperties we examined the proportion of users for whomthe average change was greater than 0, and thus indicatingdecay overall. We found that for all tested measures, allusers had an average proportional change value of greaterthan 0, thus suggesting the suitability of a decaying growthmodel. The exponential decay model requires one parameterto be provided # that defines the decay rate of a given valuex (e.g. in-degree period cross-entropy) over time, where# = 1/x̄. We defined the lifecycle period for the exponentialmodel using an integer value s = {1, 2, . . . , 20}, hence[t0, t0.05] " s1, and then defined the exponential decaymodel as follows, letting f(s, ui) be a function that returnsthe period cross-entropy of an arbitrary feature (in-degree,out-degree, terms) for a given user and lifecycle period:

g(s, ui) = f(s1, ui)e#!s (7)

As we induce a per-user parameter, and thus derive amodel for each user, we can characterise the decay rate ofusers along different properties and examine how they vary.In Figure 5 we plotted the cumulative distribution of decayrates (#) for users of the three platforms and the three userproperties. We found that every distribution had a right-skewindicating that users tend to have low decay rates, whilesome users have a large decay rate - indicating that theseusers tend to converge on their prior behaviour a lot quicker

Convergence on community properties Divergence from the community

Convergence-divergence: first, adapt to community; second, separate

Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction 11

How can we model the evolution of individual users?

Solution: Mine Lifecycle Trajectories i.e. fit a curve for each user’s development measure (property and indicator)

Properties: in-degree, out-degree, terms

Indicators: period entropy, period cross-entropy, community cross-entropy Measures: property and indicator (e.g. in-degree period entropy)

Lifecycle Trajectories: Period Entropy


12

¨  Fitted per-user linear regression models ¤  Ind’ var: entropy. Dep’ var: lifecycle period [0,1] ¤ >80% of users R2 > 0.4

!! ! ! ! ! !

!!

! ! ! ! ! !!

! !

!

01

23

4

Lifecycle Stages

Cro

ss E

ntr

opy

0 0.2 0.5 0.8 1


(a) In-degree

!!

! !! !

! ! !! ! !

! ! !!

!!

!

2.0

3.0

4.0

5.0

Lifecycle Stages

Cro

ss E

ntr

opy

0 0.2 0.5 0.8 1

(b) Out-degree

!!

! !! !

!!

!!

!!

!! !

!

!

!

!

6.0

7.0

8.0

Lifecycle Stages

Cro

ss E

ntr

opy

0 0.2 0.5 0.8 1

(c) Lexical





!4 0 2 4 6 8

0.0

0.4

0.8

!

F(x

)


(a) In-degree

!2 !1 0 1 2 3

0.0

0.4

0.8

!

F(x

)

(b) Out-degree

!3 !1 0 1 2 3

0.0

0.4

0.8

!

F(x

)

(c) Lexical







"ui=

1

|T |! 1

!

[t,t!],[t!,t!!]"T,t<t!<t!!

f(ui, [t, t!])! f(ui, [t!, t!!])

f(ui, [t, t!])

(6)


g(s, ui) = f(s1, ui)e#!s (7)


Lifecycle Trajectories: Period Cross-Entropy


13

¨  Earlier: users converged on past behaviour ¤  I.e. previously seen terms, relationships, etc.

¨  First, examined the potential for exponential decay model:

¨  All users had a proportional change value <0,

hence fitted exponential decay model:

!! ! ! ! ! !

!!

! ! ! ! ! !!

! !

!

01

23

4

Lifecycle Stages

Cro

ss E

ntr

opy

0 0.2 0.5 0.8 1


(a) In-degree

!!

! !! !

! ! !! ! !

! ! !!

!!

!

2.0

3.0

4.0

5.0

Lifecycle Stages

Cro

ss E

ntr

opy

0 0.2 0.5 0.8 1

(b) Out-degree

!!

! !! !

!!

!!

!!

!! !

!

!

!

!

6.0

7.0

8.0

Lifecycle Stages

Cro

ss E

ntr

opy

0 0.2 0.5 0.8 1

(c) Lexical





!4 0 2 4 6 8

0.0

0.4

0.8

!

F(x

)


(a) In-degree

!2 !1 0 1 2 3

0.0

0.4

0.8

!

F(x

)

(b) Out-degree

!3 !1 0 1 2 3

0.0

0.4

0.8

!

F(x

)

(c) Lexical







"ui=

1

|T |! 1

!

[t,t!],[t!,t!!]"T,t<t!<t!!

f(ui, [t, t!])! f(ui, [t!, t!!])

f(ui, [t, t!])

(6)


g(s, ui) = f(s1, ui)e#!s (7)


Feature value for interval [t,t’] Feature: property and development indicator

Average proportional change value in feature

!! ! ! ! ! !

!!

! ! ! ! ! !!

! !

!

01

23

4

Lifecycle Stages

Cro

ss E

ntr

opy

0 0.2 0.5 0.8 1


(a) In-degree

!!

! !! !

! ! !! ! !

! ! !!

!!

!

2.0

3.0

4.0

5.0

Lifecycle Stages

Cro

ss E

ntr

opy

0 0.2 0.5 0.8 1

(b) Out-degree

!!

! !! !

!!

!!

!!

!! !

!

!

!

!

6.0

7.0

8.0

Lifecycle Stages

Cro

ss E

ntr

opy

0 0.2 0.5 0.8 1

(c) Lexical





!4 0 2 4 6 8

0.0

0.4

0.8

!

F(x

)


(a) In-degree

!2 !1 0 1 2 3

0.0

0.4

0.8

!

F(x

)

(b) Out-degree

!3 !1 0 1 2 3

0.0

0.4

0.8

!

F(x

)

(c) Lexical







"ui=

1

|T |! 1

!

[t,t!],[t!,t!!]"T,t<t!<t!!

f(ui, [t, t!])! f(ui, [t!, t!!])

f(ui, [t, t!])

(6)


g(s, ui) = f(s1, ui)e#!s (7)


Average of user’s features Exponential Decay Model

●● ● ● ● ● ●

●●

● ● ● ● ● ●●

● ●

●

01

23

4

Lifecycle Stages

Cro

ss E

ntro

py

0 0.2 0.5 0.8 1

● FacebookSAPServer Fault

(a) In-degree

●●

● ● ● ●● ● ●

● ● ●● ● ●

●

●●

●

2.0

3.0

4.0

5.0

Lifecycle Stages

Cro

ss E

ntro

py

0 0.2 0.5 0.8 1

(b) Out-degree

● ●

● ●● ●

●●

●●

●● ● ● ● ●

●●

●

6.0

7.0

8.0

Lifecycle Stages

Cro

ss E

ntro

py

0 0.2 0.5 0.8 1

(c) Lexical



A. Modelling Entropy TrajectoriesOur prior analysis of the entropy of user properties

indicated that users exhibited a generally stable entropythroughout their lifecycle periods. Therefore we chose thelinear regression model as a suitable model for the devel-opment of user properties, setting the explanatory variableto be the lifecycle period of the user and the responsevariable to be the user property’s entropy. In modellingentropy development we can characterise each user using theslope (�) of the model, thus indicating the rate of change ofentropy throughout the lifecycle periods. We induced user-specific entropy models for each platform’s users and thenexamined the cumulative frequency distribution of the �-values for the different user properties and platforms, theseare shown in Figure 4.

−4 0 2 4 6 8

0.0

0.4

0.8

β

F(x)


(a) In-degree

−2 −1 0 1 2 3

0.0

0.4

0.8

β

F(x)

(b) Out-degree

−3 −1 0 1 2 3

0.0

0.4

0.8

β

F(x)

(c) Lexical

Figure 4. Cumulative frequency distributions of linear regression models’� coefficients for the entropy of user property distributions.




Inspection of the period cross-entropy values obtainedearlier, by deriving the minimum cross-entropy when com-paring user properties with past properties, indicated cleardecreasing trends. That is, users converge on their pastbehaviour over time. This suggests that an exponential decaymodel would be suitable for describing such reductionsthroughout user’s lifecycles. Applying such a model requiresthat users reduce in their period cross-entropy values overtime. To examine whether this was indeed the case wedefined the measure �ui that returns the average propor-tional change value in period cross-entropy for a givenuser throughout their lifecycles, letting f(ui, [t, t

0]) denote a

function that returns the period cross-entropy of an arbitraryuser property (e.g. in-degree) for a given user and timeinterval:

�ui=

1

|T |� 1

X

[t,t0],[t0,t00]2T,t<t0<t00

f(ui, [t, t0])� f(ui, [t

0, t00])

f(ui, [t, t0])

(6)By deriving the distribution of average proportional

change values (�) across the different platforms and userproperties we examined the proportion of users for whomthe average change was greater than 0, and thus indicatingdecay overall. We found that for all tested measures, allusers had an average proportional change value of greaterthan 0, thus suggesting the suitability of a decaying growthmodel. The exponential decay model requires one parameterto be provided � that defines the decay rate of a given valuex (e.g. in-degree period cross-entropy) over time, where� = 1/x̄. We defined the lifecycle period for the exponentialmodel using an integer value s = {1, 2, . . . , 20}, hence[t0, t0.05] ⌘ s1, and then defined the exponential decaymodel as follows, letting f(s, ui) be a function that returnsthe period cross-entropy of an arbitrary feature (in-degree,out-degree, terms) for a given user and lifecycle period:

g(ui, s) = f(ui, s1)e��s (7)

As we induce a per-user parameter, and thus derive amodel for each user, we can characterise the decay rate ofusers along different properties and examine how they vary.In Figure 5 we plotted the cumulative distribution of decayrates (�) for users of the three platforms and the three userproperties. We found that every distribution had a right-skewindicating that users tend to have low decay rates, whilesome users have a large decay rate - indicating that theseusers tend to converge on their prior behaviour a lot quicker

●● ● ● ● ● ●

●●

● ● ● ● ● ●●

● ●

●

01

23

4

Lifecycle Stages

Cro

ss E

ntro

py

0 0.2 0.5 0.8 1


(a) In-degree

●●

● ● ● ●● ● ●

● ● ●● ● ●

●

●●

●

2.0

3.0

4.0

5.0

Lifecycle Stages

Cro

ss E

ntro

py

0 0.2 0.5 0.8 1

(b) Out-degree

● ●

● ●● ●

●●

●●

●● ● ● ● ●

●●

●

6.0

7.0

8.0

Lifecycle Stages

Cro

ss E

ntro

py

0 0.2 0.5 0.8 1

(c) Lexical





−4 0 2 4 6 8

0.0

0.4

0.8

β

F(x)


(a) In-degree

−2 −1 0 1 2 3

0.0

0.4

0.8

β

F(x)

(b) Out-degree

−3 −1 0 1 2 3

0.0

0.4

0.8

β

F(x)

(c) Lexical






0]) denote a


�ui=

1

|T |� 1

X

[t,t0],[t0,t00]2T,t<t0<t00

f(ui, [t, t0])� f(ui, [t

0, t00])

f(ui, [t, t0])



g(ui, s) = f(ui, s1)e��s (7)


●● ● ● ● ● ●

●●

● ● ● ● ● ●●

● ●

●

01

23

4

Lifecycle Stages

Cro

ss E

ntro

py

0 0.2 0.5 0.8 1


(a) In-degree

●●

● ● ● ●● ● ●

● ● ●● ● ●

●

●●

●

2.0

3.0

4.0

5.0

Lifecycle Stages

Cro

ss E

ntro

py

0 0.2 0.5 0.8 1

(b) Out-degree

● ●

● ●● ●

●●

●●

●● ● ● ● ●

●●

●

6.0

7.0

8.0

Lifecycle Stages

Cro

ss E

ntro

py

0 0.2 0.5 0.8 1

(c) Lexical





−4 0 2 4 6 8

0.0

0.4

0.8

β

F(x)


(a) In-degree

−2 −1 0 1 2 3

0.0

0.4

0.8

β

F(x)

(b) Out-degree

−3 −1 0 1 2 3

0.0

0.4

0.8

β

F(x)

(c) Lexical






0]) denote a


�ui=

1

|T |� 1

X

[t,t0],[t0,t00]2T,t<t0<t00

f(ui, [t, t0])� f(ui, [t

0, t00])

f(ui, [t, t0])



g(ui, s) = f(ui, s1)e��s (7)


Lifecycle Trajectories: Community Cross-Entropy


14

●● ● ● ● ● ●

●●

● ● ● ● ● ●●

● ●

●

01

23

4

Lifecycle Stages

Cro

ss E

ntro

py

0 0.2 0.5 0.8 1


(a) In-degree

●●

● ● ● ●● ● ●

● ● ●● ● ●

●

●●

●

2.0

3.0

4.0

5.0

Lifecycle Stages

Cro

ss E

ntro

py

0 0.2 0.5 0.8 1

(b) Out-degree

● ●

● ●● ●

●●

●●

●● ● ● ● ●

●●

●

6.0

7.0

8.0

Lifecycle Stages

Cro

ss E

ntro

py

0 0.2 0.5 0.8 1

(c) Lexical





−4 0 2 4 6 8

0.0

0.4

0.8

β

F(x)


(a) In-degree

−2 −1 0 1 2 3

0.0

0.4

0.8

β

F(x)

(b) Out-degree

−3 −1 0 1 2 3

0.0

0.4

0.8

β

F(x)

(c) Lexical






0]) denote a


�ui=

1

|T |� 1

X

[t,t0],[t0,t00]2T,t<t0<t00

f(ui, [t, t0])� f(ui, [t

0, t00])

f(ui, [t, t0])



g(ui, s) = f(ui, s1)e��s (7)


¨  Identified differences between platforms and properties’ trajectory models ¤  In-degree: quadratic regression

n Convergence-divergence

¤ Out-degree: linear regression n Divergence

¤ Lexical: n Facebook, SAP: quadratic regression n Server Fault: linear regression

¨  >73% of users have R2 > 0.4

Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction 15

Mining lifecycle trajectories enables users to be categorised by their behaviour…

Facilitating Churn Prediction

Predicting Churners


16

¨  Binary classification task: is user u a churner? ¨  Dataset churners: who last posted before final 10% ¨  Dataset attributes from trajectory model features

¨  Induced Logistic regression model:

subsequent interpretation, of churners in online commu-nities. Churners present a serious issue for communitymanagers and hosts as the leaving of certain users can havea detrimental effect on the community (i.e. experts leavinga question-answering community can cause an increase inunanswered queries).

In this section we define churn prediction as a binaryclassification task and use the previously examined indica-tors of lifecycle trajectories to predict whether a user is achurner or not. As we confine user lifecycle periods fromthe start of their lifecycle to the end we use the trajectoriesmined from this period to characterise how users develop.We define churners as any user who posts for the last timebefore the final 10% of the time window of our datasets,cutoff points are: 2012-07-09 for Facebook, 2010-05-11 forSAP, and 2010-12-23 for ServerFault. Our dataset is of thefollowing form: D = {(xi, yi)}, where yi denotes the classlabel of the user from one of two values: y 2 {0, 1},4while xi denotes an 11-element R-valued feature vector foreither a Facebook or SAP user, and a 10-element featurevector for a Server Fault user - given that we use a linearregression model for each user’s lexical community cross-entropy development. We model the feature vector of eachuser using the trajectory indicators from the previous section,in short Table II defines our set of features where we placeeach within a set depending on the dynamics it captures.

Table IIFEATURES USED FOR THE CHURN PREDICTION EXPERIMENTS. THE

INDICATORS OF LIFECYCLE TRAJECTORIES ARE USED TOCHARACTERISE USER EVOLUTION ALONG THE DIFFERENT USER

PROPERTIES.

Property Indicator Model Feature(s) PlatformIn-degree Period Entropy Linear Regression � All

Period Cross-Ent Exponential Decay � AllComm’ Cross-Ent Quad’ Regress’ a1, a2 All

Out-degree Period Entropy Linear Regression � AllPeriod Cross-Ent Exponential Decay � AllComm’ Cross-Ent Linear Regression � All

Lexical Period Entropy Linear Regression � AllPeriod Cross-Ent Exponential Decay � AllComm’ Cross-Ent Quad’ Regress’ a1, a2 Fb, SAPComm’ Cross-Ent Linear Regression � SF

A. Prediction Model DefinitionThe observed feature vector of user ui (xi) contains

the indicator trajectories of the user along the differentproperties. We use the logistic regression model to predictthe conditional probability of user ui churning as follows:

Pr(Y = 1 | xi) =1

1 + e��|xi

(9)

The model’s coefficients (�) define the weight attachedto each identity trajectory feature within the linear model(f(i) = �|xi). In order to derive the model’s coefficients

41 indicating churner, 0 not.

we use maximum likelihood fitting through the R statisticalsoftware package5 to select the maximum likelihood esti-mation ˆ� of the model’s coefficients. Following fitting, thederived model is used to predict the churn probability ofeach user within the test dataset.

B. Experimental SetupFor our experiments we first standardised the datasets

by combining the test (20%) and training (80%) datasetstogether and setting each indicator feature to have 0 meanand a standard deviation of 1, we then divided the datasetagain into the respective test and training splits maintainingthe same instances as before. We wanted to test the effects ofobserving different user properties and development dynam-ics on churn prediction. We therefore tested each user prop-erty in isolation, for instance using the in-degree propertyand the entropy, period cross-entropy and community cross-entropy trajectory indicators; and then each developmentmodel in isolation, for instance using the entropy modeland examining in-degree, out-degree, and term distributions;finally we combined all features together within a singlemodel. In doing so we could isolate any effects of keyfeatures on prediction performance, and thus inform modelselection for specific platforms (i.e. identifying the bestperforming model for Facebook, SAP and Server Fault).

As we used the logistic regression model for our pre-diction model we are provided with a function whose co-domain is a churn probability value for a given user withinthe closed interval [0, 1]. Therefore we evaluated the per-formance of each induced model using two evaluation mea-sures: (i) precision@k ( ¯P ), and (ii) area under the receiveroperator characteristic curve (AUC). To derive precision@kwe ranked the users by their churn probability according tothe induced model and then assessed the precision of thetop-k ranks, setting k = {1, 5, 10, 20, 50, 100}, and takingthe mean of these precision values. This assesses the extentto which the upper portion of the predicted churners arecorrect. We used the baseline measure of the probability ofa randomly selected user being a churner, thus correspondingto the probability of success in a single Bernouilli trial(setting p = |churners|/|Dtest|). To derive the area underthe receiver operator characteristic curve we varied theconfidence of an indicator function (f(x)) through discretesettings of confidence bounds � = {0, 0.05, . . . , 0.95, 0.1},thereby setting the class label for given instance (x) asfollows:

f(x) =⇢

1, if Pr(Y = 1 | xi) > � (10a)0, otherwise (10b)

For each different setting of � we measured the truepositive rate (TPR/recall) and the false positive rate (FPR),and from these measures plotted the receiver operator char-acteristic (ROC) curve. A model which maximises the area

5http://www.r-project.org/





PROPERTIES.







Pr(Y = 1 | xi) =1

1 + e��|xi

(9)







f(x) =⇢








PROPERTIES.







Pr(Y = 1 | xi) =1

1 + e��|xi

(9)







f(x) =⇢




Facebook, SAP: 11 features Server Fault: 10 features

Probability of user churning

Induce coefficients via maximum likelihood estimation

Evaluation: Setup


17

¨  User-wise dataset split: 80% training, 20% testing ¨  Experiments:

¤  Isolated user properties, isolated development indicator features, all features together

¨  Evaluation measures: 1.  Precision@k (P): Avg over k={1,5,10,20,50,100} 2.  Area Under the Receiver Operator Curve (AUC)

¨  Baseline: Success probability in single Bernoulli trial ¤  I.e. randomly selecting a churner

Evaluation: Results


18

¨  Variance in features depending on: ¤ Accuracy preference

n  I.e. precision > recall

¤ Platform n Different detection

signals for different communities

¨  Full model is never the best

under this curve (AUC) is preferable (thus achieving a valueof 1), where the baseline for this measure is 0.5.

C. ResultsTable III presents the performance of the different models

over the three platforms, showing variation in the optimummodel and evaluation measures. Interestingly, we find thatthe use of all features combined together does not yield thebest model for any of the tested platforms. For Facebook theresults indicate that the prediction model using communitycross entropy indicators performed best in terms of both ¯Pand AUC. We tested the difference between this model andnext best performing model (Full) using a Mann-Whitneytest and found the difference to be significant (at the 5%level).

For SAP we found differences in the best performingmodel according to the evaluation measure used: in-degreefor AUC and lexical features for ¯P . These differencesindicate that concentrating on top ranks and thus informingdetection of churners with high-levels of confidence can bebest achieved by assessing the term distributions of usersand their lexical dynamics, while for preferring recall theuse of in-degree distributions is preferable.

Turning now to Server Fault, the results also indicatedifferences in the best model depending on the evalua-tion measure: period cross-entropy achieves the best AUCscore, while the lexical features yield the highest ¯P . Thislatter result indicates similarities between SAP and ServerFault, both of which are Question-Answering communityplatforms, in that that the terms used by users are salientindicators of churners and for guaranteeing high precisionwhen identifying who will churn.

D. Churn PatternsUp to this point we have concentrated on predicting churn-

ers from online communities, we now turn to inspecting thequalities that discern churners from other users. One of theadvantages of using a logistic regression model as our churnprediction model is its provision of coefficients that can beinspected to understand the impact of individual featureson the likelihood of a user churner, that is: by inspectingthe coefficient of a given feature within the model we canunderstand how a change in that feature (increase/decrease)would affect the churn probability.

Table IV shows the features and coefficients from thebest performing model, based on maximising the AUC,from the previous section. For both Facebook and SAP wefind that the roots of the quadratic regression model forthe community cross-entropy of the in-degree distributionsshould be reduced, suggesting that the curve has a vertexwhich appears prior to the time interval analysed and thatthe trajectory of a churner has a steep gradient to towardsthe end of their lifecycle. This indicates that churners exhibitdivergent social networks in comparison to the platform.

Table IIIPRECISION@K (P̄ ) AND AREA UNDER THE RECEIVER OPERATOR

CHARACTERISTIC CURVE (AUC) VALUES FOR FACEBOOK, SAP ANDSERVER FAULT WHEN TESTING DIFFERENT: (I) USER PROPERTIES, (II)

DEVELOPMENT INDICATORS, (III) ALL FEATURES TOGETHER.

Platform Feature P̄ AUCFacebook Entropy 0.761 0.500

Period Cross Entropy 0.624 0.485Community Cross Entropy 0.791 0.617In-degree 0.648 0.511Out-degree 0.781 0.570Lexical 0.681 0.557Full 0.730 0.573Baseline 0.629 0.500

SAP Entropy 0.434 0.549Period Cross Entropy 0.321 0.568Community Cross Entropy 0.334 0.549In-degree 0.351 0.592Out-degree 0.250 0.503Lexical 0.438 0.539Full 0.363 0.539Baseline 0.342 0.500

Server Fault Entropy 0.392 0.526Period Cross Entropy 0.300 0.555Community Cross Entropy 0.352 0.538In-degree 0.232 0.475Out-degree 0.293 0.512Lexical 0.459 0.546Full 0.421 0.554Baseline 0.319 0.500

We also find differences between SAP and Server Fault interms of the period cross-entropy trajectory for the in-degreedistributions: as we induced exponential decay models foreach of these platforms, the decay rate indicates that forSAP this rate should be increased for churners, suggestingtheir in-degree distributions, and the extent to which theyare contacted during one time period relative to their pastcommunications, reduces at a much faster rate than onServerFault.

Table IVBEST PERFORMING PREDICTION MODEL COEFFICIENTS FOR FACEBOOK

(COMMUNITY CROSS-ENTROPY), SAP (IN-DEGREE) AND SERVERFAULT (PERIOD CROSS-ENTROPY). ALL FEATURES ARE SIGNIFICANT

WITHIN THEIR RESPECTIVE MODELS (↵ < 0.05)

Feature Facebook SAP Server FaultIn-degree Entropy � - 0.0532 -In-degree Period Cross-Ent � - 0.0139 -0.1826In-degree Comm’ Cross-Ent a1 -0.1057 -0.1878 -In-degree Comm’ Cross-Ent a2 -0.0510 -1.5104 -Out-degree Comm’ Cross-Ent � 0.3173Out-degree Period Cross-Ent � - - 0.0210Lexical Period Cross-Ent � - - 0.0557Lexical Comm’ Cross-Ent a1 0.3253 -Lexical Comm’ Cross-Ent a2 -0.0541 -

VII. DISCUSSION AND FUTURE WORK

Prior work on social network evolution by Panzarasa et al.[6] and Miritello et al. [1] found that users’ social networkstend to a limit in terms of their communication capacity.Our examination of the development of users in termsof their in-degree and out-degree distributions relative toearlier lifecycle periods (period cross-entropy) demonstrated

Evaluation: Churner Patterns


19

under this curve (AUC) is preferable (thus achieving a valueof 1), where the baseline for this measure is 0.5.

C. ResultsTable III presents the performance of the different models

over the three platforms, showing variation in the optimummodel and evaluation measures. Interestingly, we find thatthe use of all features combined together does not yield thebest model for any of the tested platforms. For Facebook theresults indicate that the prediction model using communitycross entropy indicators performed best in terms of both ¯Pand AUC. We tested the difference between this model andnext best performing model (Full) using a Mann-Whitneytest and found the difference to be significant (at the 5%level).

For SAP we found differences in the best performingmodel according to the evaluation measure used: in-degreefor AUC and lexical features for ¯P . These differencesindicate that concentrating on top ranks and thus informingdetection of churners with high-levels of confidence can bebest achieved by assessing the term distributions of usersand their lexical dynamics, while for preferring recall theuse of in-degree distributions is preferable.

Turning now to Server Fault, the results also indicatedifferences in the best model depending on the evalua-tion measure: period cross-entropy achieves the best AUCscore, while the lexical features yield the highest ¯P . Thislatter result indicates similarities between SAP and ServerFault, both of which are Question-Answering communityplatforms, in that that the terms used by users are salientindicators of churners and for guaranteeing high precisionwhen identifying who will churn.

D. Churn PatternsUp to this point we have concentrated on predicting churn-

ers from online communities, we now turn to inspecting thequalities that discern churners from other users. One of theadvantages of using a logistic regression model as our churnprediction model is its provision of coefficients that can beinspected to understand the impact of individual featureson the likelihood of a user churner, that is: by inspectingthe coefficient of a given feature within the model we canunderstand how a change in that feature (increase/decrease)would affect the churn probability.

Table IV shows the features and coefficients from thebest performing model, based on maximising the AUC,from the previous section. For both Facebook and SAP wefind that the roots of the quadratic regression model forthe community cross-entropy of the in-degree distributionsshould be reduced, suggesting that the curve has a vertexwhich appears prior to the time interval analysed and thatthe trajectory of a churner has a steep gradient to towardsthe end of their lifecycle. This indicates that churners exhibitdivergent social networks in comparison to the platform.

Table IIIPRECISION@K (P̄ ) AND AREA UNDER THE RECEIVER OPERATOR

CHARACTERISTIC CURVE (AUC) VALUES FOR FACEBOOK, SAP ANDSERVER FAULT WHEN TESTING DIFFERENT: (I) USER PROPERTIES, (II)

DEVELOPMENT INDICATORS, (III) ALL FEATURES TOGETHER.

Platform Feature P̄ AUCFacebook Entropy 0.761 0.500

Period Cross Entropy 0.624 0.485Community Cross Entropy 0.791 0.617In-degree 0.648 0.511Out-degree 0.781 0.570Lexical 0.681 0.557Full 0.730 0.573Baseline 0.629 0.500

SAP Entropy 0.434 0.549Period Cross Entropy 0.321 0.568Community Cross Entropy 0.334 0.549In-degree 0.351 0.592Out-degree 0.250 0.503Lexical 0.438 0.539Full 0.363 0.539Baseline 0.342 0.500

Server Fault Entropy 0.392 0.526Period Cross Entropy 0.300 0.555Community Cross Entropy 0.352 0.538In-degree 0.232 0.475Out-degree 0.293 0.512Lexical 0.459 0.546Full 0.421 0.554Baseline 0.319 0.500

We also find differences between SAP and Server Fault interms of the period cross-entropy trajectory for the in-degreedistributions: as we induced exponential decay models foreach of these platforms, the decay rate indicates that forSAP this rate should be increased for churners, suggestingtheir in-degree distributions, and the extent to which theyare contacted during one time period relative to their pastcommunications, reduces at a much faster rate than onServerFault.

Table IVBEST PERFORMING PREDICTION MODEL COEFFICIENTS FOR FACEBOOK

(COMMUNITY CROSS-ENTROPY), SAP (IN-DEGREE) AND SERVERFAULT (PERIOD CROSS-ENTROPY). ALL FEATURES ARE SIGNIFICANT

WITHIN THEIR RESPECTIVE MODELS (↵ < 0.05)

Feature Facebook SAP Server FaultIn-degree Entropy � - 0.0532 -In-degree Period Cross-Ent � - 0.0139 -0.1826In-degree Comm’ Cross-Ent a1 -0.1057 -0.1878 -In-degree Comm’ Cross-Ent a2 -0.0510 -1.5104 -Out-degree Comm’ Cross-Ent � 0.3173Out-degree Period Cross-Ent � - - 0.0210Lexical Period Cross-Ent � - - 0.0557Lexical Comm’ Cross-Ent a1 0.3253 -Lexical Comm’ Cross-Ent a2 -0.0541 -

VII. DISCUSSION AND FUTURE WORK

Prior work on social network evolution by Panzarasa et al.[6] and Miritello et al. [1] found that users’ social networkstend to a limit in terms of their communication capacity.Our examination of the development of users in termsof their in-degree and out-degree distributions relative toearlier lifecycle periods (period cross-entropy) demonstrated

Variance in decay coefficient: degree of communication decays a lot faster for SAP than Server Fault

Reduced quadratic coefficients: churners exhibit steep cross-community curves towards the end of their lifecycles

Conclusions


20

1.  Users communicate with a fixed-set of users ¤ Similar to findings from (Miritello et al. 2013)

2.  Convergence-divergence effect: users converge on community ‘norms’ before diverging

¤  (Erikson. 1959) theorised that younger people are susceptible to social norms

¤  (Danescu-Niculescu-Mizil et al. 2013) found users to converge on lexical norms, before diverging

3.  Variance in churner signals ¤ No common best model was found across platforms

Current & Future Work


21

1.  Regularised Linear Models ¤ Achieved ~30% AUC boost with growth and magnitude

features

2.  Evolving-Taste Recommender System ¤ Used lifecycle model (n=5) to form category-ratings profiles ¤  Identified variance in taste evolution across platforms

Pr(c|Du,s

train

) =ave rating(Du,s,c

train

)X

c

02C

u,strain

ave rating(Du,s,c

0

train

)(4)

Based on this formalisation we can assess the relativemean rating score per category for a given user and lifecyclestage. As our item-to-semantic-category mapping function(�) can either map items directly to the categories they areassigned to (�

p

) or to the grandparent categories (�g

) wecan form taste profiles of users using two di↵erent categor-ical levels. Our intuition here was that the former profiletype, formed from directly mapped categories, would leadto sparse profiles due to the greater specificity that the cat-egories denote, while the latter profile type, formed fromtransitively mapped grandparent categories would lead todenser taste profiles. This theory was influenced by the priorwork of Ostuni et al. [6], in which the authors consider onlygrandparent categories aligned to semantic URIs.

5.4 Taste Evolution: Local vs Global Dynam-ics

We now turn to looking at the evolution of users’ tastesover time in order to understand how their preferences change.We are interested in examining for two e↵ects: (i) local dy-namics, the propensity for users to develop in their owninimitable manner; and (ii) global dynamics, where usersexhibit consistent development properties. Given our useof probability distributions to model the lifecycle stage spe-cific taste profile of each user, we apply information theo-retic measures based on entropy judgements capturing: (i)consecutive lifecycle stage development, and (ii) informationtransfer between global taste profiles and the local taste pro-files at one lifecycle stage and the taste profiles of the userat the next stage.

5.4.1 Conditional Category Entropy

By using conditional entropy we can assess the informa-tion needed to describe the taste profile of a user at one timestep (Q) using his taste profile from the previous stage (P ).A reduction in conditional entropy indicates that the user’staste profile is similar to that of his previous stage’s pro-file, while an increase indicates the converse. We define theconditional entropy of two discrete probability distributions,representing taste profiles, as:

H(Q|P ) =X

x2P,

y2Q

p(x, y) logp(x)p(x, y)

(5)

We derived the conditional entropy over the 5 lifecycleperiods in a pairwise fashion, i.e. H(P

2

|P1

), . . . , H(P5

|P4

),and plotted the curve of the mean conditional entropy inFigure 5 over each dataset’s users in the training split, alsoincluding the 95% confidence intervals to show the varia-tion in the conditional entropies. Figure 5 indicates thatMovie Lens users tend to diverge in their ratings and cat-egories over time, given the increase in the mean curve to-wards later portions of the users’ lifecycles, the same is alsoevident for Movie Tweetings, however the increase is moregradual there. Amazon users, however, di↵er by showing areduction in conditional entropy towards later lifecycle peri-ods. Relating this back to our above definition of conditionalentropy, the global e↵ect that we see on Amazon indicates

that users tend to converge in their reviewing behaviour andthat previous profiles allow one to gauge how the user willrate items in the future given their category information.Conversely, for MovieLens and Movie Tweetings we see anopposite e↵ect: users’ taste profiles become less predictableas they develop; users rate items in a way that renders un-certainty in profiling from previous information.

0.22

50.

235

0.24

5

Lifecycle Stages

Con

ditio

nal E

ntro

py

●

●

●

●

1 2 3 4 5

(a) Lens

0.27

50.

280

0.28

50.

290

Lifecycle Stages

Con

ditio

nal E

ntro

py

●●

●

●

1 2 3 4 5

(b) Tweetings0.

205

0.21

00.

215

0.22

0Lifecycle Stages

Con

ditio

nal E

ntro

py

●

● ●

●

1 2 3 4 5

(c) Amazon

Figure 5: Parent category conditional entropy be-tween consecutive lifecycle stages (e.g. H(P

2

|P3

))across the datasets, together with the bounds of the95% confidence interval for the derived means.

5.4.2 Transfer Category Entropy

Earlier we postulated that the local development of a userand the global development of all users will have di↵erentinfluential e↵ects on certain users, i.e. influencing how theymay review things. We can examine for such e↵ects underthe premise that local development and global developmentare two di↵erent systems, although the former contributesto the latter we would expect a given user who is influencedby what other users rate to be more influenced by globaldevelopment, while users who follow their own path andrate things in their own bespoke manner would eschew suchglobal information. In prior work of Ver Steeg & Aram theauthors adopted transfer entropy to measure informationtransfer in social media [9]. In doing so the authors wereable to assess for influence in retweeting URLs on Twitterbetween users. Transfer entropy is closely related to Grangercausality, and given two time-series signals, allows one tocompare the e↵ect of one on the other.We adopt transfer entropy to assess how the taste profile

(Ps

) of a user at one time step (s) has been influence bylocal (P

s�1

) and global taste (Qs�1

) profiles at the previouslifecycle stage (s�1). For the latter taste profile (Q

s�1

), weform a global probability distribution as above for a singleuser but instead using all users who posted reviews withinthe time interval of s. From these definitions we can thenexamine the information transfer from a prior lifecycle stage(s� 1) to the current lifecycle stage (s) of the user. Now, as-sume that we have a random variable that describe the localcategories that have been reviewed at the current stage (Y

s

),a random variable of local categories at the previous stage(Y

s�1

). and a third random variable of global categories atthe previous stage (X

s�1

), we then define the transfer en-tropy of one lifecycle stage to another as follows, based onthe work of Schreiber [8]:

TX!Y

= H(Ys

|Ys�1

)�H(Ys

|Ys�1

, Xs�1

) (6)

Using the above probability distributions we can calculatethe transfer entropy based on the joint and conditional prob-ability distributions given the values of the random variables

Dissimilarity in taste profile from previous profile

@mrowebot [email protected]

http://www.lancaster.ac.uk/staff/rowem/

Questions? 22


Technology

Mining User Lifecycles from Online Community Platforms and their Application to Churn Prediction