42
1 Yuxiao Dong * , Jie Tang $ , Tiancheng Lou # , Bin Wu & , Nitesh V. Chawla * How Long will She Call Me? Distribution, Social Theory and Duration Prediction *University of Notre Dame $ Tsinghua University # Google Inc. & Beijing U. of Posts & Telecoms Yuxiao Dong, Jie Tang, Tiancheng Lou, Bin Wu, Nitesh V. Chawla. How Long will She Call Me? Distribution, Social Theory and Duration Prediction. In ECML/PKDD’13.

1 Yuxiao Dong *, Jie Tang $, Tiancheng Lou #, Bin Wu &, Nitesh V. Chawla * How Long will She Call Me? Distribution, Social Theory and Duration Prediction

Embed Size (px)

Citation preview

1

Yuxiao Dong*, Jie Tang$, Tiancheng Lou#, Bin Wu&, Nitesh V. Chawla*

How Long will She Call Me? Distribution, Social Theory and Duration

Prediction

*University of Notre Dame$Tsinghua University#Google Inc.&Beijing U. of Posts & Telecoms

Yuxiao Dong, Jie Tang, Tiancheng Lou, Bin Wu, Nitesh V. Chawla. How Long will She Call Me? Distribution, Social Theory and Duration Prediction. In ECML/PKDD’13.

2

Outline

Motivation

Dynamic Distribution on Duration

Social Theory on Duration

Duration Prediction

Conclusion

3

Motivation

Mobile calls between humans are ubiquitous at any time …

91% of American adults have a mobile phone in May 2013[1]. Mobile users can’t leave their phone alone for 6 minutes and

check it up to 150 times a day[2]. People make, receive or avoid 22 phone calls every day[2].

1. Pew Internet: Mobile Reports. June 6, 2013. http://pewinternet.org/Commentary/2012/February/Pew-Internet-Mobile.aspx 2. Tomi Ahonen. Communities Dominate Brands. http://communities-dominate.blogs.com/

4

Duration Macro-Distribution

1. M. Seshadri, A. Srid. J. Bolot. C. Faloutsos and J. Leskovec. Mobile Call Graphs: Beyond Power-Law and Lognormal Distributions. In KDD’08.2. P. Melo, L. Akoglu, C. Faloutsos and A. Loureiro. Surprising Patterns for the call duration distribution of mobile phone users. In PKDD’10

Double pareto lognormal distribution (DPLN) [1]. Truncated log-logistic distribution(TLAC)[2].

5

Mobile Data

Call Detailed Records (CDR): 3.9 million CDRs; 2 months (Dec. 2007 & Jan. 2008); Non-America.

Mobile Network: 272,345 users and 521,925 call edges.

Pareto Principle: 20% pairs of users produce 80% calls.

One-week data is available at http://arnetminer.org/mobile-duration

6

1. V. Palchykov, K. Kaski, J. Kertesz, AL. Bababasi and R.I.M. Dunbar. Sex differences in intimate relationships. Scientific reports 2:370 2012.

Existing Macro-Distribution. DPLN distribution TLAC distribution

Dynamic Dist. on Duration Temporal distribution. Demographics distribution.

Roadmap

[1]

7

1. V. Palchykov, K. Kaski, J. Kertesz, AL. Bababasi and R.I.M. Dunbar. Sex differences in intimate relationships. Scientific reports 2:370 2012.

Existing Macro-Distribution. DPLN distribution TLAC distribution

Dynamic Dist. on Duration Temporal distribution. Demographics distribution.

Social Theory on Duration Strong/weak tie Homophily Opinion leader Social balance

Roadmap

[1]

8

1. V. Palchykov, K. Kaski, J. Kertesz, AL. Bababasi and R.I.M. Dunbar. Sex differences in intimate relationships. Scientific reports 2:370 2012.

Existing Macro-Distribution. DPLN distribution TLAC distribution

Dynamic Dist. on Duration Temporal distribution. Demographics distribution.

Social Theory on Duration Strong/weak tie Homophily Opinion leader Social balance

Duration Prediction Dynamic factors Social factors

Roadmap

[1]

9

Dynamic Distribution on Duration

10

Periodicity

Periodic patterns for mobile call duration: Working time (8:00AM-7:00PM), 75 seconds in average; Evening (7:00PM-12:00AM), increasing to150 seconds on mid-night; Early Moring (12:00AM-8:00AM), decreasing to 50 seconds.

11

Demographics

Call Duration VS. Demographics: Longer calls by female than male; Longer calls between 2 females than 2 males; Longer calls from M to F than F call M; Longer calls if younger.

12

Social Theory on Duration

13

Social Theory Strong/weak tie:

How long do people with a strong or weak tie call? Link homophily:

Do similar users tend to call each other with long or short duration? Opinion leader:

How different are the calling behaviours between opinion leaders and ordinary users?

Social balance: How does the duration-based network satisfy social balance theory?

14

Strong/Weak Tie

Using the #calls to measure the tie strength between two users.

1. http://www.thomashutter.com/index.php/2012/01/facebook-die-rolle-von-social-networks-in-der-informationsverbreitung/2. Jure Leskovec and Eric Horvitz. Planetary-Scale views on a large instant-messaging network. In WWW’08.

[1]

15

Strong/Weak Tie

Call Duration VS. Social Tie: The stronger tie, shorter calls. 80% probability that the call is < 60s if they

call each other for 1000 times two month. Different from online instant messaging

network[2].

Using the #calls to measure the tie strength between two users.

1. http://www.thomashutter.com/index.php/2012/01/facebook-die-rolle-von-social-networks-in-der-informationsverbreitung/2. Jure Leskovec and Eric Horvitz. Planetary-Scale views on a large instant-messaging network. In WWW’08.

Probability that the call is < 60s.

[1]

16

Link Homophily

Using #common neighbours between two users to measure homophily.

1. Lilian Weng, Fillippo Menczer, Yong-Yeol Ann. Virality Prediction and Community Structure in Social Network. Scientific Reports. Aug. 2013

[1]

17

Link Homophily

Call Duration VS. Link Homophily: More common neighbors, shorter calls. 80% probability that the call is < 60s, if they have >30

common neighbors. Call Duration VS. Social Tie + Link Homophily: More homophily and stronger ties, shorter calls.

Using #common neighbours between two users to measure homophily.

1. Lilian Weng, Fillippo Menczer, Yong-Yeol Ann. Virality Prediction and Community Structure in Social Network. Scientific Reports. Aug. 2013

Probability that the call is < 60s.

[1]

18

Opinion Leader

Using PageRank to mine top 1% users as opinion leaders in mobile call network.

The other as ordinary users.

[1]

1. Katz, E. The two-step flow of communication: an up-to-date report of an hypothesis. In: Enis, Cox (eds.) Marketing Classics, 1973

19

Opinion Leader

Call Duration VS. Opinion Leader: OL make shorter calls in general, the prob is

about 80% that OL’s calls are < 60s; Calls between 2 OLs are shorter.

Using PageRank to mine top 1% users as opinion leaders in mobile call network.

The other as ordinary users.

OL: opinion leaderOU: ordinary user Probability that the call is < 60s.

[1]

1. Katz, E. The two-step flow of communication: an up-to-date report of an hypothesis. In: Enis, Cox (eds.) Marketing Classics, 1973

20

Social Balance

Structural balance: all three users are friends or only one pair of them are friends.

Assume two users are friends if they call each other at least once.

Relationship balance: the balance rate is the percentage of triangles with even number of negative ties.

Assume a tie is a negative one based on #calls or average duration between two nodes.

21

Social Balance

Call Duration VS. Social Balance: Unbalanced in structural balance Balanced in relationship balance

Structural balance: all three users are friends or only one pair of them are friends.

Assume two users are friends if they call each other at least once.

Relationship balance: the balance rate is the percentage of triangles with even number of negative ties.

Assume a tie is a negative one based on #calls or average duration between two nodes.

< 20%, not balanced

22

Duration Prediction

23

Prediction Scenario

v3

v4

v5

v2

v1

38s

62s

132s

95s

Time 1

47s

33s

v1: female, 29yv2: male, 31yv3: male, 60yv4: female, 63yv5: female, 27y

Attribute factors

24

Prediction Scenario

v3

v4

v5

v2

v1

47s

38s

62s

132s

95s

v3

v4

v5

v2

v1

19s

40s

441s

78s

63s

Time 1 Time 2

Opinion leader: v5

Strong tie: v4, v5

Weak tie: v1, v3

Homophily: v3, v5

Social balance: v3, v4, v5

33s

76s

16s

v1: female, 29yv2: male, 31yv3: male, 60yv4: female, 63yv5: female, 27y

Attribute factors Social factors

25

Prediction Scenario

v3

v4

v5

v2

v1

138s

54s

95s

49s

Time 3

Can we predict how long this call lasts for?

v3

v4

v5

v2

v1

47s

38s

62s

132s

95s

v3

v4

v5

v2

v1

19s

40s

441s

78s

63s

Time 1 Time 2

33s

76s

16s

v5 calls to v3 on Mon. 10:00PMOpinion leader: v5

Strong tie: v4, v5

Weak tie: v1, v3

Homophily: v3, v5

Social balance: v3, v4, v5

v1: female, 29yv2: male, 31yv3: male, 60yv4: female, 63yv5: female, 27y

Attribute factors Social factors Temporal factors

26

Social Time-dependent Factor Graph (STFG)

PFG: partially labeled factor graph[1]

TRFG: social triad based factor graph[2]

1. W. Tang, H. Zhuang and J. Tang. Learning to infer social ties in large networks. In ECML/PKDD’11.2. J. Hopcroft, T. Lou and J. Tang. Who will follow you back? Reciprocal relationship prediction. In CIKM’11.

27

Social Time-dependent Factor Graph (STFG)

PFG: partially labeled factor graph[1]

TRFG: social triad based factor graph[2]

STFG: partially labeled + social triad + time dependent

1. W. Tang, H. Zhuang and J. Tang. Learning to infer social ties in large networks. In ECML/PKDD’11.2. J. Hopcroft, T. Lou and J. Tang. Who will follow you back? Reciprocal relationship prediction. In CIKM’11.

28

Social Time-dependent FG

29

Social Time-dependent FG

Joint distribution:

Attributes Social Temporal

30

Social Time-dependent FG

Joint distribution:

Attributes Social

Attribute factor:

Social factor:

Exponential-linear functions to initialize factors

Temporal

Temporal factor:

31

STFG objective function:

Learning: Parameters:

Social Time-dependent FG

32

Learning Algorithm

1. J. Hopcroft, T. Lou and J. Tang. Who will follow you back? Reciprocal relationship prediction. In CIKM’11.

Gradient decent method.

33

Learning Algorithm

1. J. Hopcroft, T. Lou and J. Tang. Who will follow you back? Reciprocal relationship prediction. In CIKM’11.

Gradient decent method.

Using Loopy Belief Propagation to compute expectation.

34

Experimental Setup Prediction

Case 1: predict the duration of next call in the future Case 2: predict the average duration of calls in a future period

35

Experimental Setup Prediction

Case 1: predict the duration of next call in the future Case 2: predict the average duration of calls in a future period

Data First 7-week CDR data as historic data Case 1: 1st call duration in 8th week as next call prediction Case 2: average duration in 8th week as next average prediction

36

Experimental Setup Prediction

Case 1: predict the duration of next call in the future Case 2: predict the average duration of calls in a future period

Data First 7-week CDR data as historic data Case 1: 1st call duration in 8th week as next call prediction Case 2: average duration in 8th week as next average prediction

Binary Prediction 60% calls are less than 60 seconds and remaining 40% are > 60s; There is a jump on telephone bill when it reaches 1 minute; Setting threshold = 60 seconds to classify calls as long or short calls in

this work.

37

Experimental Setup (Cont.)

Baseline Predictors SVM: support vector machine by SVM-light. LRC: logistic regression in Weka. Bnet: Bayes Network CRF: conditional random field

Evaluation Precision / Recall / F1-Measure

38

Results

Case 1: Next Call Duration PredictionCase 2: Average Call Duration Prediction

39

Factor Contribution

G: genderA: age

B: social balanceT: social tie

H: homophilyO: opinion leader

W: weekD: day

40

STFG Convergence

Our learning algorithm is able to reach convergence quickly.

41

Conclusion & Future Work

Conclusions:Social theory and dynamic distribution have obvious existence in

duration network;Our proposed model can significantly improve the prediction accuracy.

Interesting observations:Young females tend to make long calls, in particular in the evening;Familiar people (more calls and more common neighbors) make

shorter calls.

Future work:Inferring call duration by regression model.Modeling duration prediction into a mobile application.

42

Thanks

Data&Code: http://arnetminer.org/mobile-duration

Yuxiao Dong, Jie Tang, Tiancheng Lou, Bin Wu, Nitesh V. Chawla. How Long will She Call Me? Distribution, Social Theory and Duration Prediction. In ECML/PKDD’13.