44
(collaboration with @RenaudLambiotte) @CeydaSanli CompleXity Networks Weekly big-data seminars, UCL, Louvain-la-Neuve, October 1, 2015. UNamur PLoS ONE 10(7): e0131704, 2015. Frontiers in Physics, 3(79), 2015. Social spike trains in twitter: hashtag diffusion and user communication [email protected] fcxn.wordpress.com http://www.slideshare.net/ceydasanli

C sanlitalk social_spiketrains_uclouvain2015

Embed Size (px)

Citation preview

  • 5 mm

    Inset: The displacement field demonstrates local heterogeneities in the flow.

    A typical snapshot of an experiment: The white spots indicate the positions of the beads floating on surface waves.

    [email protected]

    http://fcxn.wordpress.com ! Month 6 General Meeting

    http://xn.unamur.be

    Fluctuations drive viral memes in online social media: Integrating criticality into network science

    Ceyda Sanl, Vsevolod Salnikov, Lionel Tabourier, and Renaud Lambiotte

    CompleXity Networks, naXys, University of Namur, Belgium.

    To spread our posts throughout online social network such as Twitter: When do we need to post? How often should a #hashtag be posted? These questions emphasize the time features of our twitting activity. They would be controlled much more easily compared to the followings: What we post and how many number of followers we have.

    To be mobile in dense granular media such as highly packed beads on surface waves: Do single beads move independently or form a group? Is the trajectory of each bead regular in time? The quantification of the bead dynamics shows that the beads perform heterogenous motion with a distinct time scale to characterize this heterogeneity.

    restricted amount of attention restricted amount of space

    Restricted amount of sources forces social and physical systems to present emergence of order.

    hypothesis

    Twitter users want to spread their messages and beads under driving want to be mobile. As a result, the twitter users collectively advertise and the beads form groups to move together. Both systems self-organize and create dynamic heterogeneity.

    The origin of the fluctuations in dynamics would be the same origins:

    Therefore, the interpretation of the dynamic heterogeneity of the beads in a critical limit would help to characterize viral memes (#hashtags) in twitter.

    Refs: 1 C. Sanl et al. (arXiv - 2013). 2 L. Berthier (2011).

    Refs: 1 H. Simon (1971). 2 L. Weng et al. (2012). 3 J. P. Gleeson et al. (2014).

    0 12 24 36 48 60 72 840

    10

    20

    30

    40

    time (hours)

    num

    ber o

    f tw

    eets

    /uni

    t tim

    e daily tweet cycles

    propogations of #hashtags 0 10 20 30 40 50 60 70 80 90

    0

    5

    10

    15

    20

    25

    4(l=

    2R,

    )

    (s)

    =0.652=0.725=0.741=0.749=0.753=0.755=0.760=0.761=0.762=0.766=0.770=0.771

    (a)

    4(l, )

    h

    4(l, )

    mobility of beads

    spatiotemporal granular flow

    time

    0.2

    0.6

    1

    1.4

    1.8

    *

    P(r

    =2

    R,t

    )M

    M

    Twitter #hashtag analysis

    single beads: groups:

    perim

    eter

    quantifying dynamic heterogeneity:

    time (s) 0 0.5 1 1.5 2

    [Ref:2]

    [Ref:2]

    #nice: #pepsi:

    observation: artificial representation:

    0 10 20 30 40 50 60

    time (hours) 0 1 2 3 4 5 6 7 8

    time (hours)

    analysis:

    time (hours)

    cum

    ulat

    ive

    time (hours)

    cum

    ulat

    ive

    !(collaboration with @RenaudLambiotte)

    @CeydaSanli

    5 mm

    Inset: The displacement field demonstrates local heterogeneities in the flow.

    A typical snapshot of an experiment: The white spots indicate the positions of the beads floating on surface waves.

    [email protected]

    http://fcxn.wordpress.com ! Month 6 General Meeting

    http://xn.unamur.be

    Fluctuations drive viral memes in online social media: Integrating criticality into network science

    Ceyda Sanl, Vsevolod Salnikov, Lionel Tabourier, and Renaud Lambiotte

    CompleXity Networks, naXys, University of Namur, Belgium.

    To spread our posts throughout online social network such as Twitter: When do we need to post? How often should a #hashtag be posted? These questions emphasize the time features of our twitting activity. They would be controlled much more easily compared to the followings: What we post and how many number of followers we have.

    To be mobile in dense granular media such as highly packed beads on surface waves: Do single beads move independently or form a group? Is the trajectory of each bead regular in time? The quantification of the bead dynamics shows that the beads perform heterogenous motion with a distinct time scale to characterize this heterogeneity.

    restricted amount of attention restricted amount of space

    Restricted amount of sources forces social and physical systems to present emergence of order.

    hypothesis

    Twitter users want to spread their messages and beads under driving want to be mobile. As a result, the twitter users collectively advertise and the beads form groups to move together. Both systems self-organize and create dynamic heterogeneity.

    The origin of the fluctuations in dynamics would be the same origins:

    Therefore, the interpretation of the dynamic heterogeneity of the beads in a critical limit would help to characterize viral memes (#hashtags) in twitter.

    Refs: 1 C. Sanl et al. (arXiv - 2013). 2 L. Berthier (2011).

    Refs: 1 H. Simon (1971). 2 L. Weng et al. (2012). 3 J. P. Gleeson et al. (2014).

    0 12 24 36 48 60 72 840

    10

    20

    30

    40

    time (hours)

    num

    ber o

    f tw

    eets

    /uni

    t tim

    e daily tweet cycles

    propogations of #hashtags 0 10 20 30 40 50 60 70 80 90

    0

    5

    10

    15

    20

    25

    4(l=

    2R,

    )

    (s)

    =0.652=0.725=0.741=0.749=0.753=0.755=0.760=0.761=0.762=0.766=0.770=0.771

    (a)

    4(l, )

    h

    4(l, )

    mobility of beads

    spatiotemporal granular flow

    time

    0.2

    0.6

    1

    1.4

    1.8

    *

    P(r

    =2R

    ,t)

    MM

    Twitter #hashtag analysis

    single beads: groups:

    perim

    eter

    quantifying dynamic heterogeneity:

    time (s) 0 0.5 1 1.5 2

    [Ref:2]

    [Ref:2]

    #nice: #pepsi:

    observation: artificial representation:

    0 10 20 30 40 50 60

    time (hours) 0 1 2 3 4 5 6 7 8

    time (hours)

    analysis:

    time (hours)

    cum

    ulat

    ive

    time (hours)

    cum

    ulat

    ive

    CompleXity Networks

    Weekly big-data seminars, UCL, Louvain-la-Neuve, October 1, 2015.

    UNamur

    PLoS ONE 10(7): e0131704, 2015.Frontiers in Physics, 3(79), 2015.

    Social spike trains in twitter: hashtag diffusion and user communication

    [email protected] fcxn.wordpress.com http://www.slideshare.net/ceydasanli

  • The UpshotPOWER OF FICTION

    Why Rumors Outrace the Truth OnlineSEPT. 29, 2014

    Photo

    CreditTomi Um

    Brendan Nyhan@BrendanNyhan

    Continue reading the main storyShare This Page

    EmailShareTweetSave

    Information diffusion in twitter

    C. Sanli, CompleXity Networks, UNamur

    tweets retweets (RT) mentions (@) replies (RE)

    Tomi Um

    Hashtag Diffusion in Twitter

    C. Sanli, CompleXity Networks, UNamur

    hashtag

    hashtag spike train

    time

    coun

    t

    Social dynamic behaviour patterns 1

    Part-1:

    Part-2: RT + @ RE

    WHO WHOM

    aU : activity of users

    RT

    @

    RE

    RT + @ RE

    pU : popularity of users

    RT

    @

    RE

    Social spike trains in twitter 1

  • C. Sanli, CompleXity Networks, UNamur

    hashtag spike trains

    time

    Part-1:

    Social spike trains in twitter 2

    coun

    t

  • Key result: local variation

    C. Sanli, CompleXity Networks, UNamur

    8

    0

    5

    10

    15

    20

    25

    30

    0 1 2 3 4 50

    5

    10

    15

    20

    25

    30

    =91127

    =18553

    = 1678

    = 318

    = 174

    = 117

    = 86

    = 68

    = 56

    = 47

    = 41

    = 35

    =91127

    =18553

    = 1678

    = 318

    = 174

    = 117

    = 86

    = 68

    = 56

    = 47

    = 41

    = 35

    LV

    P(

    )L V

    P(

    )L V

    Real activity(a)

    Random activity(b)

    FIG. 7. Probability density function (PDF) of the local vari-ation LV of real hashtag propagation (a) and random hash-tag time sequence (b). Two distinct shapes are visible: (a)From high p to low p, the peak position of P (LV ) shifts fromlow values of LV to higher values of LV . (b) P (LV ) alwayspeaks around 1 for the random sequences generated by arti-ficial hashtag spike trains. The same color coding is appliedas used in Fig. 6.

    14 (ACM, New York, NY, USA, 2014) pp. 913924.[9] U. Frana, H. Sayama, C. McSwiggen, R. Daneshvar, and

    Y. Bar-Yam, ArXiv e-prints (2014), arXiv:1411.0722[physics.soc-ph].

    [10] A. Mollgaard and J. Mathiesen, ArXiv e-prints (2015),arXiv:1502.03224 [physics.soc-ph].

    [11] J. Ratkiewicz, S. Fortunato, A. Flammini, F. Menczer,and A. Vespignani, Phys. Rev. Lett. 105, 158701 (2010).

    [12] J. Borge-Holthoefer, A. Rivero, I. Garca, E. Cauh,A. Ferrer, D. Ferrer, D. Francos, D. Iiguez, M. P. Prez,G. Ruiz, F. Sanz, F. Serrano, C. Vias, A. Tarancn, andY. Moreno, PLoS ONE 6, e23883 (2011).

    [13] S. Gonzlez-Bailn, J. Borge-Holthoefer, A. Rivero, andY. Moreno, Sci. Rep. 1, 197 (2011).

    [14] K. Sasahara, Y. Hirata, M. Toyoda, M. Kitsuregawa, andK. Aihara, PLoS ONE 8, e61823 (2013).

    [15] D. Y. Kenett, F. Morstatter, H. E. Stanley, and H. Liu,PLoS ONE 9, e102001 (2014).

    [16] F. Deschtres and D. Sornette, Phys. Rev. E 72, 016112(2005).

    0 0.5 1 1.5 2 2.5 30

    0.5

    1

    1.5

    2

    2.5

    3

    LV (t1)L V

    (t 2)

    101 102 103 104 1050

    0.2

    0.4

    0.6

    0.8

    1

    r (L V

    (t 1),

    L V(t 2

    ))

    (a)

    (b)

    =91127

    =18553

    = 1678

    = 318

    = 174

    = 117

    = 86

    = 68

    = 56

    = 47

    bursty regular

    FIG. 8. Linear correlation of LV through real hashtag spiketrains. (a) The linear relation of the first and the secondhalves of the empirical spike trains, LV (t1) and LV (t1), re-spectively, are investigated. The legend ranks hpi in dierentcolors and symbols. (b) The Pearson correlation coecientr(LV (t1), LV (t2)) between these quantities show that whilethe temporal correlation through moderately popular hashtagis maximum, r reaches the minimum values for both bursty(high LV and low p) and regular (low LV and high p) spiketrains.

    [17] L. Weng, F. Menczer, and Y.-Y. Ahn, Sci. Rep. 3, 2522(2013).

    [18] J. Cheng, L. Adamic, P. A. Dow, J. M. Kleinberg, andJ. Leskovec, in Proceedings of the 23rd International Con-ference on World Wide Web, WWW 14 (ACM, NewYork, NY, USA, 2014) pp. 925936.

    [19] L. Weng, A. Flammini, A. Vespignani, and F. Menczer,Sci. Rep. 2, 335 (2012).

    [20] J. P. Gleeson, J. A. Ward, K. P. OSullivan, and W. T.Lee, Phys. Rev. Lett. 112, 048701 (2014).

    [21] U. Cetin and H. O. Bingol, Phys. Rev. E 90, 032801(2014).

    [22] J. P. Gleeson, K. P. OSullivan, R. A. Baos, andY. Moreno, ArXiv e-prints (2015), arXiv:1501.05956[physics.soc-ph].

    [23] S. Shinomoto, K. Shima, and J. Tanji, Neural Comput.15, 2823 (2003).

    [24] S. Koyama and S. Shinomoto, Journal of Physics A:Mathematical and General 38, L531 (2005).

    [25] K. Miura, M. Okada, and S. ichi Amari, Neural Comput.18, 2359 (2006).

    8

    0

    5

    10

    15

    20

    25

    30

    0 1 2 3 4 50

    5

    10

    15

    20

    25

    30

    =91127

    =18553

    = 1678

    = 318

    = 174

    = 117

    = 86

    = 68

    = 56

    = 47

    = 41

    = 35

    =91127

    =18553

    = 1678

    = 318

    = 174

    = 117

    = 86

    = 68

    = 56

    = 47

    = 41

    = 35

    LV

    P(

    )L V

    P(

    )L V

    Real activity(a)

    Random activity(b)

    FIG. 7. Probability density function (PDF) of the local vari-ation LV of real hashtag propagation (a) and random hash-tag time sequence (b). Two distinct shapes are visible: (a)From high p to low p, the peak position of P (LV ) shifts fromlow values of LV to higher values of LV . (b) P (LV ) alwayspeaks around 1 for the random sequences generated by arti-ficial hashtag spike trains. The same color coding is appliedas used in Fig. 6.

    14 (ACM, New York, NY, USA, 2014) pp. 913924.[9] U. Frana, H. Sayama, C. McSwiggen, R. Daneshvar, and

    Y. Bar-Yam, ArXiv e-prints (2014), arXiv:1411.0722[physics.soc-ph].

    [10] A. Mollgaard and J. Mathiesen, ArXiv e-prints (2015),arXiv:1502.03224 [physics.soc-ph].

    [11] J. Ratkiewicz, S. Fortunato, A. Flammini, F. Menczer,and A. Vespignani, Phys. Rev. Lett. 105, 158701 (2010).

    [12] J. Borge-Holthoefer, A. Rivero, I. Garca, E. Cauh,A. Ferrer, D. Ferrer, D. Francos, D. Iiguez, M. P. Prez,G. Ruiz, F. Sanz, F. Serrano, C. Vias, A. Tarancn, andY. Moreno, PLoS ONE 6, e23883 (2011).

    [13] S. Gonzlez-Bailn, J. Borge-Holthoefer, A. Rivero, andY. Moreno, Sci. Rep. 1, 197 (2011).

    [14] K. Sasahara, Y. Hirata, M. Toyoda, M. Kitsuregawa, andK. Aihara, PLoS ONE 8, e61823 (2013).

    [15] D. Y. Kenett, F. Morstatter, H. E. Stanley, and H. Liu,PLoS ONE 9, e102001 (2014).

    [16] F. Deschtres and D. Sornette, Phys. Rev. E 72, 016112(2005).

    0 0.5 1 1.5 2 2.5 30

    0.5

    1

    1.5

    2

    2.5

    3

    LV (t1)

    L V(t 2

    )

    101 102 103 104 1050

    0.2

    0.4

    0.6

    0.8

    1

    r (L V

    (t 1),

    L V(t 2

    ))

    (a)

    (b)

    =91127

    =18553

    = 1678

    = 318

    = 174

    = 117

    = 86

    = 68

    = 56

    = 47

    bursty regular

    FIG. 8. Linear correlation of LV through real hashtag spiketrains. (a) The linear relation of the first and the secondhalves of the empirical spike trains, LV (t1) and LV (t1), re-spectively, are investigated. The legend ranks hpi in dierentcolors and symbols. (b) The Pearson correlation coecientr(LV (t1), LV (t2)) between these quantities show that whilethe temporal correlation through moderately popular hashtagis maximum, r reaches the minimum values for both bursty(high LV and low p) and regular (low LV and high p) spiketrains.

    [17] L. Weng, F. Menczer, and Y.-Y. Ahn, Sci. Rep. 3, 2522(2013).

    [18] J. Cheng, L. Adamic, P. A. Dow, J. M. Kleinberg, andJ. Leskovec, in Proceedings of the 23rd International Con-ference on World Wide Web, WWW 14 (ACM, NewYork, NY, USA, 2014) pp. 925936.

    [19] L. Weng, A. Flammini, A. Vespignani, and F. Menczer,Sci. Rep. 2, 335 (2012).

    [20] J. P. Gleeson, J. A. Ward, K. P. OSullivan, and W. T.Lee, Phys. Rev. Lett. 112, 048701 (2014).

    [21] U. Cetin and H. O. Bingol, Phys. Rev. E 90, 032801(2014).

    [22] J. P. Gleeson, K. P. OSullivan, R. A. Baos, andY. Moreno, ArXiv e-prints (2015), arXiv:1501.05956[physics.soc-ph].

    [23] S. Shinomoto, K. Shima, and J. Tanji, Neural Comput.15, 2823 (2003).

    [24] S. Koyama and S. Shinomoto, Journal of Physics A:Mathematical and General 38, L531 (2005).

    [25] K. Miura, M. Okada, and S. ichi Amari, Neural Comput.18, 2359 (2006).

    8

    0

    5

    10

    15

    20

    25

    30

    0 1 2 3 4 50

    5

    10

    15

    20

    25

    30

    =91127

    =18553

    = 1678

    = 318

    = 174

    = 117

    = 86

    = 68

    = 56

    = 47

    = 41

    = 35

    =91127

    =18553

    = 1678

    = 318

    = 174

    = 117

    = 86

    = 68

    = 56

    = 47

    = 41

    = 35

    LV

    P(

    )L V

    P(

    )L V

    Real activity(a)

    Random activity(b)

    FIG. 7. Probability density function (PDF) of the local vari-ation LV of real hashtag propagation (a) and random hash-tag time sequence (b). Two distinct shapes are visible: (a)From high p to low p, the peak position of P (LV ) shifts fromlow values of LV to higher values of LV . (b) P (LV ) alwayspeaks around 1 for the random sequences generated by arti-ficial hashtag spike trains. The same color coding is appliedas used in Fig. 6.

    14 (ACM, New York, NY, USA, 2014) pp. 913924.[9] U. Frana, H. Sayama, C. McSwiggen, R. Daneshvar, and

    Y. Bar-Yam, ArXiv e-prints (2014), arXiv:1411.0722[physics.soc-ph].

    [10] A. Mollgaard and J. Mathiesen, ArXiv e-prints (2015),arXiv:1502.03224 [physics.soc-ph].

    [11] J. Ratkiewicz, S. Fortunato, A. Flammini, F. Menczer,and A. Vespignani, Phys. Rev. Lett. 105, 158701 (2010).

    [12] J. Borge-Holthoefer, A. Rivero, I. Garca, E. Cauh,A. Ferrer, D. Ferrer, D. Francos, D. Iiguez, M. P. Prez,G. Ruiz, F. Sanz, F. Serrano, C. Vias, A. Tarancn, andY. Moreno, PLoS ONE 6, e23883 (2011).

    [13] S. Gonzlez-Bailn, J. Borge-Holthoefer, A. Rivero, andY. Moreno, Sci. Rep. 1, 197 (2011).

    [14] K. Sasahara, Y. Hirata, M. Toyoda, M. Kitsuregawa, andK. Aihara, PLoS ONE 8, e61823 (2013).

    [15] D. Y. Kenett, F. Morstatter, H. E. Stanley, and H. Liu,PLoS ONE 9, e102001 (2014).

    [16] F. Deschtres and D. Sornette, Phys. Rev. E 72, 016112(2005).

    0 0.5 1 1.5 2 2.5 30

    0.5

    1

    1.5

    2

    2.5

    3

    LV (t1)

    L V(t 2

    )

    101 102 103 104 1050

    0.2

    0.4

    0.6

    0.8

    1

    r (L V

    (t 1),

    L V(t 2

    ))

    (a)

    (b)

    =91127

    =18553

    = 1678

    = 318

    = 174

    = 117

    = 86

    = 68

    = 56

    = 47

    bursty regular

    FIG. 8. Linear correlation of LV through real hashtag spiketrains. (a) The linear relation of the first and the secondhalves of the empirical spike trains, LV (t1) and LV (t1), re-spectively, are investigated. The legend ranks hpi in dierentcolors and symbols. (b) The Pearson correlation coecientr(LV (t1), LV (t2)) between these quantities show that whilethe temporal correlation through moderately popular hashtagis maximum, r reaches the minimum values for both bursty(high LV and low p) and regular (low LV and high p) spiketrains.

    [17] L. Weng, F. Menczer, and Y.-Y. Ahn, Sci. Rep. 3, 2522(2013).

    [18] J. Cheng, L. Adamic, P. A. Dow, J. M. Kleinberg, andJ. Leskovec, in Proceedings of the 23rd International Con-ference on World Wide Web, WWW 14 (ACM, NewYork, NY, USA, 2014) pp. 925936.

    [19] L. Weng, A. Flammini, A. Vespignani, and F. Menczer,Sci. Rep. 2, 335 (2012).

    [20] J. P. Gleeson, J. A. Ward, K. P. OSullivan, and W. T.Lee, Phys. Rev. Lett. 112, 048701 (2014).

    [21] U. Cetin and H. O. Bingol, Phys. Rev. E 90, 032801(2014).

    [22] J. P. Gleeson, K. P. OSullivan, R. A. Baos, andY. Moreno, ArXiv e-prints (2015), arXiv:1501.05956[physics.soc-ph].

    [23] S. Shinomoto, K. Shima, and J. Tanji, Neural Comput.15, 2823 (2003).

    [24] S. Koyama and S. Shinomoto, Journal of Physics A:Mathematical and General 38, L531 (2005).

    [25] K. Miura, M. Okada, and S. ichi Amari, Neural Comput.18, 2359 (2006).

    hashtag dynamics artificial dynamics

    C. Sanl and R. Lambiotte, PLoS ONE 10(7): e0131704 (2015).

    p: popularity

    Social spike trains in twitter 3

  • What do we address in Part-1?

    C. Sanli, CompleXity Networks, UNamur

    How can we measure local temporal behavior of the hashtag diffusion?

    Is there a difference in the dynamics between popular and less used hashtags?

    Is there a difference in the dynamics of real hashtags and artificially generated ones?

    Social spike trains in twitter 4

  • Social spike trains

    C. Sanli, CompleXity Networks, UNamur

    time

    coun

    t

    t t0 f

    I. LOCAL VARIABLE OF A TIME SERIES

    Time series in social system include interaction among agents. Considering online social

    network such as Twitter, we address self-organized optimizing of popularity of information.

    To this end, we create time series of #hashtag propogation, user activity, and user #hashtag

    activity.

    In a time series, if a time delay between successive events, inter-event interval , is a

    resultant of independent events, the distrubution of inter-event interval is Poissonian. If not,

    many bursty events are observed and therefore forward propogation of a signal is a function

    of its temporal history. Thus, quantifying is crucial.

    Local variable Lv is an alternative way to characterize whether a time series is Poissonian

    or non-Poissonian. For a stationarly process, Lv is a ratio of the dierence between the

    inter-event interval of forward event and the inter-event interval of backward event to the

    sum of these inter-event intervals. Suppose that a signal propogates in distinct time such as

    1 . . . , i1, i, i+1, . . . N . Then, at i, the inter-event interval of forward event is i+1 =

    i+1 i and the inter-event interval of backward event is i = i i1. Consequently, Lvis

    Lv =3

    N 2

    N1X

    i=2

    (i+1 i) (i i1)(i+1 i) + (i i1)

    2=

    i+1 ii+1 + i

    2. (1)

    Here, N is the total appearance of a time series in distinct times. Multiple activity in same

    i is ignored.

    If Lv = 1 the distribution of the inter-event interval of a time series is Poissonian. If a

    time series considers significant amount of bursty activity, the distribution is non-Poissonian.

    When N < 3 the distribution is automatically assumed to be Poissonian.

    II. LIMITS OF Lv

    Rank of a time series can be defined as how many activity proceeded in distinct times i.

    If events occur in multiple dierent i, N 3, the signal has high rank. If a time serie is

    too short, N 3, the signal has low rank.

    2

    100

    102

    104

    106

    0

    50

    100

    150

    200

    250

    300

    rh

    Fv

    < rh>=11

    < rh>= 2

    Real activity(a)

    FV = 1

    100

    102

    104

    106

    0

    50

    100

    150

    200

    250

    300

    rh

    Fv

    < rh>=11

    < rh>= 2

    (b) Random activity

    FV = 1

    FIG. 7. Local variation Lv of single #hashtag time series versus low rank rh of the corresponsing

    #hashtag. (a) Real #hashtag propogation. (b) Randomly selected #hashtag activity from real

    data set.

    0 50 100 150 200 250 300 350

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    h (hour)

    Lv

    < r

    h>=91127

    < rh>=18553

    < rh>= 1678

    < rh>= 318

    < rh>= 174

    < rh>= 117

    < rh>= 86

    < rh>= 68

    < rh>= 56

    < rh>= 47

    < rh>= 41

    < rh>= 35

    Real activity(a)

    0 50 100 150 200 250 300 350

    0

    0.5

    1

    1.5

    2

    2.5

    3

    3.5

    h (hour)

    Lv

    < rh>=91127

    < rh>=18553

    < rh>= 1678

    < rh>= 318

    < rh>= 174

    < rh>= 117

    < rh>= 86

    < rh>= 68

    < rh>= 56

    < rh>= 47

    < rh>= 41

    < rh>= 35

    FIG. 8. Local variation Lv of single #hashtag time series versus life time (h) of the corresponsing

    #hashtag. (a) Real #hashtag propogation. (b) Randomly selected #hashtag activity from real

    data set..

    5

    I. DAILY CYCLE OF #HASHTAGS

    00:00 03:00 06:00 09:00 12:00 15:00 18:00 21:00 24:000

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    0.45

    0.5

    1 day (hour)

    PD

    F (

    no

    rma

    lize

    d p

    rob

    ab

    ility

    de

    nsi

    ty)

    of

    #h

    ash

    tag

    s

    total

    rush hour

    dead hour

    FIG. 1. Daily activity of #hashtags: Normalized probability density (PDF) of the activity versus

    day time.

    II. HETEROGENEITY IN POPULARITY AND LIFE TIME OF #HASHTAGS

    104

    102

    100

    102

    104

    100

    101

    102

    103

    104

    105

    106

    h (hour)

    r h

    r =2h

    FIG. 2. Rank of #hashtag rh versus life time of #hashtag h.

    2

    . . .

    I. DAILY CYCLE OF #HASHTAGS

    00:00 03:00 06:00 09:00 12:00 15:00 18:00 21:00 24:000

    0.05

    0.1

    0.15

    0.2

    0.25

    0.3

    0.35

    0.4

    0.45

    0.5

    1 day (hour)

    PD

    F (

    no

    rma

    lize

    d p

    rob

    ab

    ility

    de

    nsi

    ty)

    of

    #h

    ash

    tag

    s

    total

    rush hour

    dead hour

    FIG. 1. Daily activity of #hashtags: Normalized probability density (PDF) of the activity versus

    day time.

    II. HETEROGENEITY IN POPULARITY AND LIFE TIME OF #HASHTAGS

    104

    102

    100

    102

    104

    100

    101

    102

    103

    104

    105

    106

    h (hour)

    r h

    r =2h

    FIG. 2. Rank of #hashtag rh versus life time of #hashtag h.

    2

    I. DAILY CYCLE OF #HASHTAGS

    00:0012:0000:0012:0000:0012:0000:0012:0000:0012:0000:0012:0000:000

    10

    20

    30

    40

    50

    60

    70

    80

    hour

    count/m

    in.

    #ledebat

    #hollande

    #sarkozy

    #votehollande

    #fh2012

    #france2012

    FIG. 1. ...

    hpi =

    p

    2

    = popularity

    Social spike trains in twitter 5

  • Circadian pattern and local signal

    C. Sanli, CompleXity Networks, UNamur Social spike trains in twitter 6

  • Driving factors in our twitter data

    C. Sanli, CompleXity Networks, UNamur

    1. circadian human behavior (internal)

    2. political election (external)

    + complex decision-making (both internal and external)

    Social spike trains in twitter 7

  • Data in twitter

    C. Sanli, CompleXity Networks, UNamur

    {hash":["netsci2015"],"source":"stream","user_alias":"Hiroki Sayama","corpus":["en"],"text":"Jean-Charles Delvenne: Burstiness and fat tail in temporal networks collapse their smallest eigenvalues. @netsci15 #netsci2015,"_id":"010101010","date":1433419200,"at":[{"type":"","alias":"netsci15"}], "user_id":"101010101"}

    .JSON

    Social spike trains in twitter 8

    Jean-Charles Delvenne: Burstiness and fat tail in temporal networks collapse their smallest eigenvalues. @netsci15 #netsci2015

    Hiroki Sayama: @HirokiSayama

  • C. Sanli, CompleXity Networks, UNamur

    9 days of the French election 2012 (May 5th), total activity ~ 10 million, hashtag activity~ 3 million,

    unique hashtags ~ 300.000, !

    !

    !

    !

    number of total users ~ 475.000, number of users tweet or retweet any hashtags at least

    ones ~ 230.000.

    #ledebat 180946#hollande 143636#sarkozy 116906

    #votehollande 99908

    #france2012 20635#fh2012 67759

    Social spike trains in twitter 9

    Data set of Part-1

  • Top most used hashtags

    C. Sanli, CompleXity Networks, UNamur

    I. DAILY CYCLE OF #HASHTAGS

    00:0012:0000:0012:0000:0012:0000:0012:0000:0012:0000:0012:0000:000

    10

    20

    30

    40

    50

    60

    70

    80

    hour

    count/m

    in.

    #ledebat

    #hollande

    #sarkozy

    #votehollande

    #fh2012

    #france2012

    FIG. 1. ...

    2

    debate election

    Social spike trains in twitter 10

  • C. Sanli, CompleXity Networks, UNamur

    Statistics of hashtags

    Social spike trains in twitter 11

  • Heterogeneity in popularity

    C. Sanli, CompleXity Networks, UNamur

    3

    Heterogeneity in popularity of hashtags

    The success of a hashtag can be measured by its popu-larity p, defined as its number of occurrences, and equiva-lent to its frequency. Fig. 2 presents the Zipf-plot and theprobability density function (PDF) of p, for the 295,697unique hashtags observed in the data set. The Zipf-plot[Fig. 2(a)] indicates that more than half of the hashtags( 60%) appear just once in the data set, with p = 1.Moreover, around 83% of the hashtags have p < 5, in thepink-colored region in the last (right) rectangle of Fig.2(a). For moderate values of p, if we set a threshold ofp to 1000 with an upper-bound to 25000, only 0.15% ofthe hashtags fit in the yellow-colored rectangle. Finally,top hashtags with p > 25000, in the red-colored rectan-gle, are very rare ( 0.0001%), but more frequent thanwould be expected for values so large as compared to themedian. These observations are confirmed in Fig. 2(b),where we show the probability distribution of p, P (p) ina log-log plot. P (p) is a clear example of a fat-tailed dis-tribution associated with a strong heterogeneity in thesystem.

    The heterogeneity in p has been already observed [3,6, 11, 17]. A mechanism proposed for its emergence isthe competition between information overload and thelimited capacity of each user [1922], sometimes coupledwith cooperative eects [3, 4]. It has been also shown thathashtags having unique textual features become morepopular than hashtags presenting common textual fea-tures [28]. In this paper, we are not interested in theorigin of the heterogeneity, but in its relation with tem-poral characteristics of hashtags.

    HASHTAG SPIKE TRAINS

    Temporal heterogeneity

    We will draw an analogy between hashtag dynamicsand neuron spike trains. To this end, we introduce stan-dard methods from spike train analysis into the field ofhashtag dynamics. Hashtags are keywords associated todierent topics, which can be created, tracked and reusedby users. Their popularity and unambiguity make theman essential mechanism for information diusion in Twit-ter. The statistical description of neuron spike sequencesis essential for extracting underlying information aboutthe brain [29]. It was originally believed that in vivocortical neurons behave as time-dependent Poisson ran-dom spike generators, where successive inter-spike inter-vals are independently chosen from an exponential dis-tribution with a time-dependent firing rate [30]. How-ever, more recent observations have shown that the inter-spike interval distribution exhibits significant deviationsfrom the exponential distribution, which has led to theconstruction of appropriate tools to describe neuron sig-nals [2327].

    10 0 101 102 103 104 105 106100101102103104105

    rank hashtag

    popu

    larit

    y: p

    106 (a)P(p)

    83%

    0.15

    %

    0.00

    01% 60%

    FIG. 2. Heterogeneity in the hashtag popularity p is shownin (a) Zipf-plot and (b) probability density function (PDF),P (p). (a) Diversity in p (frequency) is visible in a power-lawscaling in the log-log plot. We rank hashtag from high p (left)to low p (right). Dierent colored shaded rectangles highlightthe value of p from red and orange (high p) to purple andpink (low p). The percentages describe the overall contri-butions of the corresponding rectangles. (b) Similarly, P (p)obeys a slowly decaying function and presents a power-lawdistribution with a fat tail. The same colored schema in (a)is applied to visualize the contributions of dierent values ofp.

    Similarly, a hashtag spike train is defined as the se-quence of timings at which a hashtag is observed in Twit-ter. In this framework, we do not specify the type ofdynamics of hashtags, endogeneous or exogeneous [16],i.e. endogeneous, hashtag diusion among members ofthe social network, or exogeneous, the diusion drivenby external factors such as TV and newspapers, but onlyin the timings. Each hashtag thus generates a uniquehashtag spike train with a characteristic popularity p.As a first basic indicator, in Figs. 3(a,b) we show theinter-hashtag spike interval cumulative and probabilitydistributions, CDF () and P (), respectively. In or-der to avoid artificially deforming the distributions be-cause of heterogeneity in p, we classify CDF () andP () in classes depending on p, illustrated by dierentcolors in Fig. 2. We observe similar behavior across theclasses, as P () deviates strongly from an exponential

    Social spike trains in twitter 12

  • Heterogeneity in time

    C. Sanli, CompleXity Networks, UNamur

    4

    101 100 101 102 103103

    102

    101

    100

    101

    102103

    =91127

    =18553

    = 1678

    = 318

    = 174

    = 117

    = 86

    = 68

    = 56

    = 47

    = 41

    = 35

    = 11

    = 2

    0.8

    0.85

    0.9

    0.95

    1

    (hour)

    P(

    )

    CDF(

    )

    (a)

    (b)

    12 hours

    1 day

    2 days

    3 days

    FIG. 3. The cumulative (a), CDF (), and probability (b),P (), distributions of the inter-hashtag spike intervals. Weobserve that P () exhibits, for dierent classes of hashtagsdistinguished by their popularity, non-exponential features.The dierent colors correspond to those in Fig. 2. The leg-end provides the average popularity hpi in each hashtag class.The dash lines indicate the positions of 1 day, 2 days, and 3days, where P () gives peaks for low p (pink symbols). Thebinning is varied from 8 minutes to 2 hours depending on p,e.g. 8 min. for high p (red-orange), 1.5 hour for moderate p(yellow-green-blue-purple), and 2 hours for low p (pink). AllP () present maxima at 1 second, which is not shown todescribe tails in a larger window.

    distribution (Poisson), P () = e , where is afiring rate (frequency and so p in our concept) at whichhashtags appear. Instead, we observe fat-tailed distribu-tions [1, 2, 7, 11, 3133] as shown in Fig. 3(b) for highand moderate p. As mentioned in the introduction, thisdeviation may either originate from temporal correlationsor non-stationary patterns, making the system dierentfrom a stationary, uncorrelated random signal.

    Real and randomized data sets

    We will analyze two sets of data, which we now de-scribe: The empirical data set, directly coming from thedata, and a randomized data set, serving as a null modelin our analysis.

    The real data set contains one spike train per hashtag,as illustrated in Fig. 4(a). The time resolution of thespikes is the same as that of the data set, that is 1 second.In situations when multiple spikes of the same hashtag

    time#ha

    sh1

    time#ha

    sh2

    time#ha

    sh3

    timemer

    ged

    #ha

    sh

    randperm(T, p)

    timearti

    ficia

    l #

    hash

    [ ... ... ]i 1 i 1i

    (a)

    (b)

    (c)

    (d)

    r r r

    FIG. 4. Real and artificial hashtag spike trains. (a) As anillustration of dierent hashtag spike trains representing dif-ferent types of hashtag propagation of the data set. (b) Merg-ing hashtag spike trains from the real data. The black spikesdescribe that only one activity is counted if multiple activi-ties occur at the same time. (c) Randomization procedure byrandperm (Matlab). T contains full hashtag activity of thedata set. The randperm gives a matrix p, unique independentnumbers out of T , and constructing random time series . . .,ri1,

    ri ,

    ri+1, . . . from full hashtag activity matrix T . (d)

    The resultant artificial hashtag spike train.

    take place at the same time only one event is considered.The statistics of such events are provided at the end ofthis subsection. In each spike train, the appearance timeof the spikes is ordered from the earliest time to the latesttime.

    The random data set is randomized version of the realdata set, where each spike train of size p generates a spiketrain of the same size with random times. In practice,we first combine all hashtag spike trains and obtain onemerged hashtag spike train as illustrated in Fig. 4(b).This train carries the full history of all hashtags and,importantly, reproduces the nonstationary features of theoriginal data in the presence of temporal correlations,burstiness, and the cyclic rhythm. As before, if two ormore spikes generated in the same time, only one spikeis shown in that time in the merged spike train, e.g. see

    4

    101 100 101 102 103103

    102

    101

    100

    101

    102103

    =91127

    =18553

    = 1678

    = 318

    = 174

    = 117

    = 86

    = 68

    = 56

    = 47

    = 41

    = 35

    = 11

    = 2

    0.8

    0.85

    0.9

    0.95

    1

    (hour)

    P(

    )

    CDF(

    )

    (a)

    (b)

    12 hours

    1 day

    2 days

    3 days

    FIG. 3. The cumulative (a), CDF (), and probability (b),P (), distributions of the inter-hashtag spike intervals. Weobserve that P () exhibits, for dierent classes of hashtagsdistinguished by their popularity, non-exponential features.The dierent colors correspond to those in Fig. 2. The leg-end provides the average popularity hpi in each hashtag class.The dash lines indicate the positions of 1 day, 2 days, and 3days, where P () gives peaks for low p (pink symbols). Thebinning is varied from 8 minutes to 2 hours depending on p,e.g. 8 min. for high p (red-orange), 1.5 hour for moderate p(yellow-green-blue-purple), and 2 hours for low p (pink). AllP () present maxima at 1 second, which is not shown todescribe tails in a larger window.

    distribution (Poisson), P () = e , where is afiring rate (frequency and so p in our concept) at whichhashtags appear. Instead, we observe fat-tailed distribu-tions [1, 2, 7, 11, 3133] as shown in Fig. 3(b) for highand moderate p. As mentioned in the introduction, thisdeviation may either originate from temporal correlationsor non-stationary patterns, making the system dierentfrom a stationary, uncorrelated random signal.

    Real and randomized data sets

    We will analyze two sets of data, which we now de-scribe: The empirical data set, directly coming from thedata, and a randomized data set, serving as a null modelin our analysis.

    The real data set contains one spike train per hashtag,as illustrated in Fig. 4(a). The time resolution of thespikes is the same as that of the data set, that is 1 second.In situations when multiple spikes of the same hashtag

    time#ha

    sh1

    time#ha

    sh2

    time#ha

    sh3

    timemer

    ged

    #ha

    sh

    randperm(T, p)

    timearti

    ficia

    l #

    hash

    [ ... ... ]i 1 i 1i

    (a)

    (b)

    (c)

    (d)

    r r r

    FIG. 4. Real and artificial hashtag spike trains. (a) As anillustration of dierent hashtag spike trains representing dif-ferent types of hashtag propagation of the data set. (b) Merg-ing hashtag spike trains from the real data. The black spikesdescribe that only one activity is counted if multiple activi-ties occur at the same time. (c) Randomization procedure byrandperm (Matlab). T contains full hashtag activity of thedata set. The randperm gives a matrix p, unique independentnumbers out of T , and constructing random time series . . .,ri1,

    ri ,

    ri+1, . . . from full hashtag activity matrix T . (d)

    The resultant artificial hashtag spike train.

    take place at the same time only one event is considered.The statistics of such events are provided at the end ofthis subsection. In each spike train, the appearance timeof the spikes is ordered from the earliest time to the latesttime.

    The random data set is randomized version of the realdata set, where each spike train of size p generates a spiketrain of the same size with random times. In practice,we first combine all hashtag spike trains and obtain onemerged hashtag spike train as illustrated in Fig. 4(b).This train carries the full history of all hashtags and,importantly, reproduces the nonstationary features of theoriginal data in the presence of temporal correlations,burstiness, and the cyclic rhythm. As before, if two ormore spikes generated in the same time, only one spikeis shown in that time in the merged spike train, e.g. see

    101 100 101 102 103103

    102

    101

    100

    101

    102103

    =91127

    =18553

    = 1678

    = 318

    = 174

    = 117

    = 86

    = 68

    = 56

    = 47

    = 41

    = 35

    = 11

    = 2

    0.8

    0.85

    0.9

    0.95

    1

    (hour)

    P(

    )

    CDF(

    )

    (a)

    (b)

    12 hours

    1 day

    2 days

    3 days

    the most popular hashtags

    Social spike trains in twitter 13

  • C. Sanli, CompleXity Networks, UNamur

    Local analysis on social spike trains

    Social spike trains in twitter 14

    coun

    t

    time

  • Local variation

    C. Sanli, CompleXity Networks, UNamur

    time

    coun

    t

    0 50 100 150 200 250 300 3500

    5

    10

    15

    h (hour)

    F

    < rh>=91127

    < rh>=18553

    < rh>= 1678

    < rh>= 318

    < rh>= 174

    < rh>= 117

    < rh>= 86

    < rh>= 68

    < rh>= 56

    < rh>= 47

    < rh>= 41

    < rh>= 35

    (a) Real activity

    0 50 100 150 200 250 300 3500

    2

    4

    6

    8

    10

    12

    14

    h (hour)

    F

    < rh>=91127

    < rh>=18553

    < rh>= 1678

    < rh>= 318

    < rh>= 174

    < rh>= 117

    < rh>= 86

    < rh>= 68

    < rh>= 56

    < rh>= 47

    < rh>= 41

    < rh>= 35

    (b) Random activity

    FIG. 7. Fano factor F of single #hashtag time series versus life time (h) of the corresponsing

    #hashtag. (a) Real #hashtag propogation. (b) Randomly selected #hashtag activity from real

    data set.

    B. Local variable of #hashtag spike trains

    For a stationarly process, Lv is a ratio of the dierence between the inter-event in-

    terval of forward event and the inter-event interval of backward event to the sum of

    these inter-event intervals. Suppose that a signal propogates in distinct time such as

    1 . . . , i1, i, i+1, . . . N . Then, at i, the inter-event interval of forward event is i+1 =

    i+1 i and the inter-event interval of backward event is i = i i1. Consequently, Lvis

    Lv =3

    N 2

    N1X

    i=2

    (i+1 i) (i i1)(i+1 i) + (i i1)

    2=

    i+1 ii+1 + i

    2. (1)

    Here, N is the total appearance of a time series in distinct times. Multiple activity in same

    i is ignored.

    If Lv = 1 the distribution of the inter-event interval of a time series is Poissonian. If a

    time series considers significant amount of bursty activity, the distribution is non-Poissonian.

    7

    0 50 100 150 200 250 300 3500

    5

    10

    15

    h (hour)

    F

    < rh>=91127

    < rh>=18553

    < rh>= 1678

    < rh>= 318

    < rh>= 174

    < rh>= 117

    < rh>= 86

    < rh>= 68

    < rh>= 56

    < rh>= 47

    < rh>= 41

    < rh>= 35

    (a) Real activity

    0 50 100 150 200 250 300 3500

    2

    4

    6

    8

    10

    12

    14

    h (hour)

    F

    < rh>=91127

    < rh>=18553

    < rh>= 1678

    < rh>= 318

    < rh>= 174

    < rh>= 117

    < rh>= 86

    < rh>= 68

    < rh>= 56

    < rh>= 47

    < rh>= 41

    < rh>= 35

    (b) Random activity

    FIG. 7. Fano factor F of single #hashtag time series versus life time (h) of the corresponsing

    #hashtag. (a) Real #hashtag propogation. (b) Randomly selected #hashtag activity from real

    data set.

    B. Local variable of #hashtag spike trains

    For a stationarly process, Lv is a ratio of the dierence between the inter-event in-

    terval of forward event and the inter-event interval of backward event to the sum of

    these inter-event intervals. Suppose that a signal propogates in distinct time such as

    1 . . . , i1, i, i+1, . . . N . Then, at i, the inter-event interval of forward event is i+1 =

    i+1 i and the inter-event interval of backward event is i = i i1. Consequently, Lvis

    Lv =3

    N 2

    N1X

    i=2

    (i+1 i) (i i1)(i+1 i) + (i i1)

    2=

    i+1 ii+1 + i

    2. (1)

    Here, N is the total appearance of a time series in distinct times. Multiple activity in same

    i is ignored.

    If Lv = 1 the distribution of the inter-event interval of a time series is Poissonian. If a

    time series considers significant amount of bursty activity, the distribution is non-Poissonian.

    7

    0 50 100 150 200 250 300 3500

    5

    10

    15

    h (hour)

    F

    < rh>=91127

    < rh>=18553

    < rh>= 1678

    < rh>= 318

    < rh>= 174

    < rh>= 117

    < rh>= 86

    < rh>= 68

    < rh>= 56

    < rh>= 47

    < rh>= 41

    < rh>= 35

    (a) Real activity

    0 50 100 150 200 250 300 3500

    2

    4

    6

    8

    10

    12

    14

    h (hour)

    F

    < rh>=91127

    < rh>=18553

    < rh>= 1678

    < rh>= 318

    < rh>= 174

    < rh>= 117

    < rh>= 86

    < rh>= 68

    < rh>= 56

    < rh>= 47

    < rh>= 41

    < rh>= 35

    (b) Random activity

    FIG. 7. Fano factor F of single #hashtag time series versus life time (h) of the corresponsing

    #hashtag. (a) Real #hashtag propogation. (b) Randomly selected #hashtag activity from real

    data set.

    B. Local variable of #hashtag spike trains

    For a stationarly process, Lv is a ratio of the dierence between the inter-event in-

    terval of forward event and the inter-event interval of backward event to the sum of

    these inter-event intervals. Suppose that a signal propogates in distinct time such as

    1 . . . , i1, i, i+1, . . . N . Then, at i, the inter-event interval of forward event is i+1 =

    i+1 i and the inter-event interval of backward event is i = i i1. Consequently, Lvis

    Lv =3

    N 2

    N1X

    i=2

    (i+1 i) (i i1)(i+1 i) + (i i1)

    2=

    i+1 ii+1 + i

    2. (1)

    Here, N is the total appearance of a time series in distinct times. Multiple activity in same

    i is ignored.

    If Lv = 1 the distribution of the inter-event interval of a time series is Poissonian. If a

    time series considers significant amount of bursty activity, the distribution is non-Poissonian.

    7

    0 50 100 150 200 250 300 3500

    5

    10

    15

    h (hour)

    F

    < rh>=91127

    < rh>=18553

    < rh>= 1678

    < rh>= 318

    < rh>= 174

    < rh>= 117

    < rh>= 86

    < rh>= 68

    < rh>= 56

    < rh>= 47

    < rh>= 41

    < rh>= 35

    (a) Real activity

    0 50 100 150 200 250 300 3500

    2

    4

    6

    8

    10

    12

    14

    h (hour)

    F

    < rh>=91127

    < rh>=18553

    < rh>= 1678

    < rh>= 318

    < rh>= 174

    < rh>= 117

    < rh>= 86

    < rh>= 68

    < rh>= 56

    < rh>= 47

    < rh>= 41

    < rh>= 35

    (b) Random activity

    FIG. 7. Fano factor F of single #hashtag time series versus life time (h) of the corresponsing

    #hashtag. (a) Real #hashtag propogation. (b) Randomly selected #hashtag activity from real

    data set.

    B. Local variable of #hashtag spike trains

    For a stationarly process, Lv is a ratio of the dierence between the inter-event in-

    terval of forward event and the inter-event interval of backward event to the sum of

    these inter-event intervals. Suppose that a signal propogates in distinct time such as

    1 . . . , i1, i, i+1, . . . N . Then, at i, the inter-event interval of forward event is i+1 =

    i+1 i and the inter-event interval of backward event is i = i i1. Consequently, Lvis

    Lv =3

    N 2

    N1X

    i=2

    (i+1 i) (i i1)(i+1 i) + (i i1)

    2=

    i+1 ii+1 + i

    2. (1)

    Here, N is the total appearance of a time series in distinct times. Multiple activity in same

    i is ignored.

    If Lv = 1 the distribution of the inter-event interval of a time series is Poissonian. If a

    time series considers significant amount of bursty activity, the distribution is non-Poissonian.

    7

    0 50 100 150 200 250 300 3500

    5

    10

    15

    h (hour)

    F

    < rh>=91127

    < rh>=18553

    < rh>= 1678

    < rh>= 318

    < rh>= 174

    < rh>= 117

    < rh>= 86

    < rh>= 68

    < rh>= 56

    < rh>= 47

    < rh>= 41

    < rh>= 35

    (a) Real activity

    0 50 100 150 200 250 300 3500

    2

    4

    6

    8

    10

    12

    14

    h (hour)

    F

    < rh>=91127

    < rh>=18553

    < rh>= 1678

    < rh>= 318

    < rh>= 174

    < rh>= 117

    < rh>= 86

    < rh>= 68

    < rh>= 56

    < rh>= 47

    < rh>= 41

    < rh>= 35

    (b) Random activity

    FIG. 7. Fano factor F of single #hashtag time series versus life time (h) of the corresponsing

    #hashtag. (a) Real #hashtag propogation. (b) Randomly selected #hashtag activity from real

    data set.

    B. Local variable of #hashtag spike trains

    For a stationarly process, Lv is a ratio of the dierence between the inter-event in-

    terval of forward event and the inter-event interval of backward event to the sum of

    these inter-event intervals. Suppose that a signal propogates in distinct time such as

    1 . . . , i1, i, i+1, . . . N . Then, at i, the inter-event interval of forward event is i+1 =

    i+1 i and the inter-event interval of backward event is i = i i1. Consequently, Lvis

    Lv =3

    N 2

    N1X

    i=2

    (i+1 i) (i i1)(i+1 i) + (i i1)

    2=

    i+1 ii+1 + i

    2. (1)

    Here, N is the total appearance of a time series in distinct times. Multiple activity in same

    i is ignored.

    If Lv = 1 the distribution of the inter-event interval of a time series is Poissonian. If a

    time series considers significant amount of bursty activity, the distribution is non-Poissonian.

    7

    0 50 100 150 200 250 300 3500

    5

    10

    15

    h (hour)

    F

    < rh>=91127

    < rh>=18553

    < rh>= 1678

    < rh>= 318

    < rh>= 174

    < rh>= 117

    < rh>= 86

    < rh>= 68

    < rh>= 56

    < rh>= 47

    < rh>= 41

    < rh>= 35

    (a) Real activity

    0 50 100 150 200 250 300 3500

    2

    4

    6

    8

    10

    12

    14

    h (hour)

    F

    < rh>=91127

    < rh>=18553

    < rh>= 1678

    < rh>= 318

    < rh>= 174

    < rh>= 117

    < rh>= 86

    < rh>= 68

    < rh>= 56

    < rh>= 47

    < rh>= 41

    < rh>= 35

    (b) Random activity

    FIG. 7. Fano factor F of single #hashtag time series versus life time (h) of the corresponsing

    #hashtag. (a) Real #hashtag propogation. (b) Randomly selected #hashtag activity from real

    data set.

    B. Local variable of #hashtag spike trains

    For a stationarly process, Lv is a ratio of the dierence between the inter-event in-

    terval of forward event and the inter-event interval of backward event to the sum of

    these inter-event intervals. Suppose that a signal propogates in distinct time such as

    1 . . . , i1, i, i+1, . . . N . Then, at i, the inter-event interval of forward event is i+1 =

    i+1 i and the inter-event interval of backward event is i = i i1. Consequently, Lvis

    Lv =3

    N 2

    N1X

    i=2

    (i+1 i) (i i1)(i+1 i) + (i i1)

    2=

    i+1 ii+1 + i

    2. (1)

    Here, N is the total appearance of a time series in distinct times. Multiple activity in same

    i is ignored.

    If Lv = 1 the distribution of the inter-event interval of a time series is Poissonian. If a

    time series considers significant amount of bursty activity, the distribution is non-Poissonian.

    7

    K. Miura et al. Neural Computation 18, 2359-2386 (2006). S. Shinomoto et al. Neural Computation 15, 2823-2842 (2003).

    spike train of size p [Fig. 4(d)]. Generating independent, yet time-dependent events, the 197procedure is expected to create time-dependent Poisson random processes, P (, t) = 198(t)e(t) , where the firing rate (t) in this case explicitly depends on the time of the 199day and of the week. 200

    Statistics of multiple tweets in 1 second. We detect multiple occurrences in 1 second 201for 6661 hashtags. Fig. 5 presents the probability distribution P (ch) of observing ch, 202occurrences of an hashtag during one second, for different hashtag popularity class. 203Even though ch > 1 occurs rarely, we observe that this possibility is more probable for 204popular hashtags (red open circles), as expected. For the most popular hashtag, ledebat, 205one finds max(ch) = 40. 206

    Figure 5. The probability distribution of count of hashtag activity per 207second P (ch). We show that, except for the top most popular hashtags listed in Table 2081 with ranking 1-11 and presented here in red symbols, multiple activity in 1 second is 209very rare. The different colors correspond to those in Figs. 2 and 3. The legend 210provides the average popularity hpi in each hashtag class. 211

    Local variation 212

    The time series of spike trains are inherently nonstationary, as shown in Fig. 1. For this 213reason, metrics defined for stationary processes are inadequate and might lead to 214incorrect conclusions. For instance, the non-exponential shapes of the inter-event time 215distribution P () in Fig. 3 might originate either from correlated and collective 216dynamics, or from the nonstationarity of the hashtag propagation. Similarly, statistical 217indicators based on this distribution, such as its variance or Fano factor, might be 218affected in a similar way. For this reason, we consider here the so-called local variation 219LV , originally defined to determine intrinsic temporal dynamics of neuron spike 220trains [2327]. 221

    Unlike quantities such as P (), LV compares temporal variations with their local 222rates and is specifically defined for nonstationary processes [27] 223

    LV =3

    N 2

    N1X

    i=2

    (i+1 i) (i i1)(i+1 i) + (i i1)

    2(1)

    Here, N is the total number of spikes and . . ., i1, i, i+1, . . . represents successive 224time sequence of a single hashtag spike train. Eq. 1 also takes the form [27] 225

    LV =3

    N 2

    N1X

    i=2

    i+1 ii+1 +i

    2(2)

    where i+1 = i+1 i and i = i i1. i+1 quantifies forward delay and i 226represents backward waiting time for an event at i. Importantly, the denominator 227normalizes the quantity such as to account for local variations of the rate at which 228events take place. By definition, LV takes values in the interval [0:3]. 229

    The local variation LV presents properties making it an interesting candidate for the 230analysis of hashtag spike trains [2327]. In particular, LV is on average equal to 1 when 231the random process is either a stationary or a non-stationary Poisson process [23], with 232the only condition that the time scale over which the firing rate (t) fluctuates is slower 233than the typical time between spikes. Deviations from 1 originate from local 234correlations in the underlying signal, either under the form of pairwise correlations 235between successive inter-event time intervals, e.g. i+1 and i which tend to 236decrease LV , or because the inter-event time distribution is non-exponential. An 237

    PLOS 6/12

    0 50 100 150 200 250 300 3500

    5

    10

    15

    h (hour)

    F

    < rh>=91127

    < rh>=18553

    < rh>= 1678

    < rh>= 318

    < rh>= 174

    < rh>= 117

    < rh>= 86

    < rh>= 68

    < rh>= 56

    < rh>= 47

    < rh>= 41

    < rh>= 35

    (a) Real activity

    0 50 100 150 200 250 300 3500

    2

    4

    6

    8

    10

    12

    14

    h (hour)

    F

    < rh>=91127

    < rh>=18553

    < rh>= 1678

    < rh>= 318

    < rh>= 174

    < rh>= 117

    < rh>= 86

    < rh>= 68

    < rh>= 56

    < rh>= 47

    < rh>= 41

    < rh>= 35

    (b) Random activity

    FIG. 7. Fano factor F of single #hashtag time series versus life time (h) of the corresponsing

    #hashtag. (a) Real #hashtag propogation. (b) Randomly selected #hashtag activity from real

    data set.

    B. Local variable of #hashtag spike trains

    For a stationarly process, Lv is a ratio of the dierence between the inter-event in-

    terval of forward event and the inter-event interval of backward event to the sum of

    these inter-event intervals. Suppose that a signal propogates in distinct time such as

    1 . . . , i1, i, i+1, . . . N . Then, at i, the inter-event interval of forward event is i+1 =

    i+1 i and the inter-event interval of backward event is i = i i1. Consequently, Lvis

    Lv =3

    N 2

    N1X

    i=2

    (i+1 i) (i i1)(i+1 i) + (i i1)

    2=

    i+1 ii+1 + i

    2. (1)

    Here, N is the total appearance of a time series in distinct times. Multiple activity in same

    i is ignored.

    If Lv = 1 the distribution of the inter-event interval of a time series is Poissonian. If a

    time series considers significant amount of bursty activity, the distribution is non-Poissonian.

    7

    I. INTRODUCTION

    #Hashtags are keywords identified in messages of Twitter social network. While a ma-

    jority of #hashtags attracts no attention only very few of them propagate heavily. Pop-

    ularity of a #hashtag can be introduced as how many times the corresponding #hashtag

    is used in a certain time interval. This heterogeneity in popularity has been explained by

    competition-induced models. Overloading online Twitter users by exposing extensive num-

    ber of #hashtags, the users are only able to perform restrict amount of attention. It has

    been claimed that a competition between massive information in the network versus the

    limited size of the memory of each user is the reason behind the observed heterogeneity

    in population. In this study, we further claim that not only competition-induced behavior

    but also timing-induced behavior is a key role in the heterogeneity. Fraction of dispersed

    #hashtags appearing first time in rush hours such as 6 p.m. is more than that in dead hours

    such as 4 a.m. Furthermore, behavior of inter-event interval, the delay time of the successive

    appearance of a #hashtag, shows a significant dierence in popular #hashtags compared to

    quickly dispersed #hashtags.

    II. DATA SET

    total activity = 9,747,351

    unique activity = 763,262 s 8.83 days

    total #hashtags activity = 2,942,239

    unique #hashtags activity = 667,996 s 7.73 days

    total number of users = 473,243

    total number of users who tweet/retweet any #hashtags at least ones = 228,525

    time interval 9 days in the French election 2012

    all activity considers only French time zone and tweets of any language

    2

    Sanli et al. Temporal Pattern of Online Communication Spike Trains

    spike train . . ., i1, i, i+1, . . ., and so compares temporal variations with their local rates [41]109

    LV =3

    N 2

    N1

    i=2

    (

    (i+1 i) (i i1)

    (i+1 i) + (i i1)

    )2

    (1)

    where N is the total number of spikes. Eq. 1 also takes the form [41]110

    LV =3

    N 2

    N1

    i=2

    (

    i+1 ii+1 +i

    )2

    (2)

    Here, i+1 = i+1 i quantifying the forward delays and i = i i1 representing the backward111waiting times for an event at i. Importantly, the denominator normalizes the quantity such as to account112for local variations of the rate at which events take place. By definition, LV takes values in the interval113(0:3) [43]. It has been shown that LV classifies the salient dynamic patterns successfully [39, 40, 42, 43,11444]. Following the analysis of Gamma processes [39, 40, 43] conventionally applied to model inter-event115

    intervals and the neuron spike analysis [42], while LV = 1 for temporarily uncorrelated (Poisson random)116irregular spike trains, LV 3 proves that bursts dominate the spike trains and the presence of highly117regular patterns in the trains gives LV 0.118

    We now investigate the LV analysis on the user communication spike trains. Eq. 2 is performed through119the spike trains with removing multiple spikes taking place within one second. Such events are rare and120

    their impact on the value of LV has been shown to be limited [43]. Fig. 3 describes the distribution of LV ,121P (LV ) of full spike trains all together with RT, @, and RE for the who (a, b) and whom (c, d). Grouping122LV based on the frequency fU , e.g. the activity of the who aU and the popularity of the whom pU , we123examine the temporal patterns of the trains in different classes of aU and pU . For the real data in (a, c),124in Fig. 3(a), LV is always larger than 1 in any values of aU , suggesting that all users playing a role in125who contact to the whom in bursty communications. However, in Fig. 3(c), we observe distinct behavior126

    of the whom users and bursts present only for low pU . By increasing pU , LV 1 indicating that there is127no temporal correlation among the who referring the whom and LV is slightly smaller than 1 for the most128popular users, indicating a tendency towards regularity in the time series, as also observed for the hashtag129

    spike trains [43]. These observations are significantly different for artificial spike trains constructed by130

    randomly permuting the real full spike train and so expected to generate non-stationary Poisson processes.131

    Therefore, all distributions are centered around 1 in this case, independently of aU and pU , as shown in132Figs. 3(b, d). The randomization and obtaining a null set follow the same procedure explained in detail in133

    Ref. [43].134

    Figure 3 is here 135

    Even though Fig. 3 represents P (LV ) of full spike trains, i.e. all interactions together, P (LV ) of indi-136vidual RT, @, and RE communication spike trains describes very similar temporal behavior for both the137

    who and whom. Fig. 4 summarizes the detail of P (LV ), the mean of LV , (LV ) with the corresponding138standard deviations (LV ) as error bars, comparatively. The results highlight that to classify the commu-139nication temporal patterns neither the position of the users, whether active or passive, nor the types of140

    the interaction, but the frequency of the communication fU such as aU and pU plays a major role. All141Figs. 4(a-d), we observe three regions: Bursts in low fU , log10fU < 2.5, irregular uncorrelated (Poisson142random) dynamics in moderate and high fU , log10fU 2.5-3, and regular patterns in very high fU ,143log10fU > 3. This conclusion supports the importance of frequency so time parameter overall human144

    This is a provisional file, not the final typeset article 4

    Social spike trains in twitter 15

  • Here, we have derived a new metric, LvR, by enhancing theinvariance to firing rate fluctuations, such that signalingcharacteristic that are specific to individual neurons can bedetected with greater sensitivity. We analyzed differences inintrinsic firing characteristics among the cortical areas and found asystematic gradient of firing regularity that closely correspondedwith the functional category of the cortical area; neuronal firing isrelatively regular in primary and higher-order motor areas,random in visual areas, and bursty in the prefrontal area. Thus,intrinsic dynamics are present in cortical areas that may berelevant to function-specific cortical computations.

    Materials and Methods

    Spike Data AnalysisNeuronal data for 15 cortical areas were collected from awake,

    behaving monkeys in eight laboratories. Four of the 15 areas werestudied in two laboratories, thus 19 data sets were generated intotal. Single electrodes or tetrodes were used to record neuronalspikes during various task trials and inter-trial intervals. Allprocedures for animal care and experimentation were inaccordance with the guidelines of the National Institutes of Healthand approved by the animal experiment committee at therespective institution where the experiments were performed.

    The initial 2,000 ISIs of the recorded spike train for eachneuron were analyzed, which contained task trial periods andinter-trial intervals, between which the firing rate differs greatly.Spike trains that contained fewer than 2,000 ISIs, or those withmean firing rates less than 5 spikes/s, were ignored; 1,307 neuronswere accepted. An irregularity metric was computed for the entire2,000 ISIs to yield a representative value for each neuron. Theyare divided into 20 sequences of 100 ISIs for analyzing fractionalsequences; the variation of a metric for an individual neuron wasestimated by comparing metric values computed for 20 fractionalsequences.

    Firing MetricsSix firing metrics were used to analyze the spike data.

    The conventional coefficient of variation Cv [35,36] is defined asthe ratio of the standard deviation of the ISIs DI to the mean I ,

    Cv~DI!

    I : 1

    The local variation Lv [32,33] is defined as

    Lv~3

    n{1

    Xn{1

    i~1

    Ii{Iiz1IizIiz1

    " #2, 2

    where Ii and Iiz1 are the i-th and i+1st ISIs, and n is the numberof ISIs. Both Cv and Lv adopt a value of 0 for a sequence ofperfectly regular intervals and are expected to take value of 1 for aPoisson random series of events with ISIs that are independentlyexponentially distributed. Whereas Cv represents the globalvariability of an entire ISI sequence and is sensitive to firing ratefluctuations, Lv detects the instantaneous variability of ISIs: The

    termIi{Iiz1IizIiz1

    " #2~1{

    4IiIiz1

    IizIiz1 2represents the cross-correla-

    tion between consecutive intervals Ii and Iiz1, each rescaled withthe instantaneous spike rate 2= IizIiz1 . The metric is superiorto standard correlation analysis because (i) the irregularity ismeasured separately from the firing rate; (ii) nonstationarity iseliminated by rescaling intervals with the momentary rate; and (iii)the non-Poisson feature is evaluated in the deviation from Lv = 1.Three more metrics that have been proposed for estimation ofinstantaneous ISI variability, SI, the geometric average of therescaled cross-correlation of ISIs [37,38], Cv2, the coefficient ofvariation for a sequence of two ISIs [39], and IR, the difference ofthe log ISIs [34] were also used.

    Figure 1 displays three types of spike sequences comprisingidentical sets of exponentially distributed ISIs. In terms of the ISIdistributions, all of these are regarded as Poisson processes,accordingly Cv values are all identical at 1. However, thesesequences clearly differ in how their ISIs are arranged; Lv may beable to detect these differences.

    In comparison with Cv, local metrics, such as Lv, SI, Cv2, and IR,detect firing irregularities fairly invariantly with firing ratefluctuations. However, these metrics are still somewhat dependenton firing rate fluctuations. Assuming that rate dependence iscaused by the refractory period that follows a spike, we can

    Figure 1. Spike sequences that have identical sets of inter-spike intervals. Intervals are aligned (A) in a regular order, (B)randomly, and (C) alternating between short and long.doi:10.1371/journal.pcbi.1000433.g001

    Author Summary

    Neurons, or nerve cells in the brain, communicate witheach other using stereotyped electric pulses, called spikes.It is believed that neurons convey information mainlythrough the frequency of the transmitted spikes, called thefiring rate. In addition, neurons may communicate someinformation through the finer temporal patterns of thespikes. Neuronal firing patterns may depend on cellularorganization, which varies among the regions of the brain,according to the roles they play, such as sensation,association, and motion. In order to examine therelationship among signals, structure, and function, wedevised a metric to detect firing irregularity intrinsic andspecific to individual neurons and analyzed spike sequenc-es from over 1,000 neurons in 15 different cortical areas.Here we report two results of this study. First, we foundthat neurons exhibit stable firing patterns that can becharacterized as regular, random, and bursty. Sec-ond, we observed a strong correlation between the type ofsignaling pattern exhibited by neurons in a given area andthe function of that area. This suggests that, in addition toreflecting the cellular organization of the brain, neuronalsignaling patterns may also play a role in specific types ofneuronal computations.

    Cortical Firing Patterns

    PLoS Computational Biology | www.ploscompbiol.org 2 July 2009 | Volume 5 | Issue 7 | e1000433

    Classification of spike trains

    C. Sanli, CompleXity Networks, UNamur

    S. Shinomoto et al. PLoS Comput. Biol. 15, 2823-2842 (2003).

    Here, we have derived a new metric, LvR, by enhancing theinvariance to firing rate fluctuations, such that signalingcharacteristic that are specific to individual neurons can bedetected with greater sensitivity. We analyzed differences inintrinsic firing characteristics among the cortical areas and found asystematic gradient of firing regularity that closely correspondedwith the functional category of the cortical area; neuronal firing isrelatively regular in primary and higher-order motor areas,random in visual areas, and bursty in the prefrontal area. Thus,intrinsic dynamics are present in cortical areas that may berelevant to function-specific cortical computations.

    Materials and Methods

    Spike Data AnalysisNeuronal data for 15 cortical areas were collected from awake,

    behaving monkeys in eight laboratories. Four of the 15 areas werestudied in two laboratories, thus 19 data sets were generated intotal. Single electrodes or tetrodes were used to record neuronalspikes during various task trials and inter-trial intervals. Allprocedures for animal care and experimentation were inaccordance with the guidelines of the National Institutes of Healthand approved by the animal experiment committee at therespective institution where the experiments were performed.

    The initial 2,000 ISIs of the recorded spike train for eachneuron were analyzed, which contained task trial periods andinter-trial intervals, between which the firing rate differs greatly.Spike trains that contained fewer than 2,000 ISIs, or those withmean firing rates less than 5 spikes/s, were ignored; 1,307 neuronswere accepted. An irregularity metric was computed for the entire2,000 ISIs to yield a representative value for each neuron. Theyare divided into 20 sequences of 100 ISIs for analyzing fractionalsequences; the variation of a metric for an individual neuron wasestimated by comparing metric values computed for 20 fractionalsequences.

    Firing MetricsSix firing metrics were used to analyze the spike data.

    The conventional coefficient of variation Cv [35,36] is defined asthe ratio of the standard deviation of the ISIs DI to the mean I ,

    Cv~DI!

    I : 1

    The local variation Lv [32,33] is defined as

    Lv~3

    n{1

    Xn{1

    i~1

    Ii{Iiz1IizIiz1

    " #2, 2

    where Ii and Iiz1 are the i-th and i+1st ISIs, and n is the numberof ISIs. Both Cv and Lv adopt a value of 0 for a sequence ofperfectly regular intervals and are expected to take value of 1 for aPoisson random series of events with ISIs that are independentlyexponentially distributed. Whereas Cv represents the globalvariability of an entire ISI sequence and is sensitive to firing ratefluctuations, Lv detects the instantaneous variability of ISIs: The

    termIi{Iiz1IizIiz1

    " #2~1{

    4IiIiz1

    IizIiz1 2represents the cross-correla-

    tion between consecutive intervals Ii and Iiz1, each rescaled withthe instantaneous spike rate 2= IizIiz1 . The metric is superiorto standard correlation analysis because (i) the irregularity ismeasured separately from the firing rate; (ii) nonstationarity iseliminated by rescaling intervals with the momentary rate; and (iii)the non-Poisson feature is evaluated in the deviation from Lv = 1.Three more metrics that have been proposed for estimation ofinstantaneous ISI variability, SI, the geometric average of therescaled cross-correlation of ISIs [37,38], Cv2, the coefficient ofvariation for a sequence of two ISIs [39], and IR, the difference ofthe log ISIs [34] were also used.

    Figure 1 displays three types of spike sequences comprisingidentical sets of exponentially distributed ISIs. In terms of the ISIdistributions, all of these are regarded as Poisson processes,accordingly Cv values are all identical at 1. However, thesesequences clearly differ in how their ISIs are arranged; Lv may beable to detect these differences.

    In comparison with Cv, local metrics, such as Lv, SI, Cv2, and IR,detect firing irregularities fairly invariantly with firing ratefluctuations. However, these metrics are still somewhat dependenton firing rate fluctuations. Assuming that rate dependence iscaused by the refractory period that follows a spike, we can

    Figure 1. Spike sequences that have identical sets of inter-spike intervals. Intervals are aligned (A) in a regular order, (B)randomly, and (C) alternating between short and long.doi:10.1371/journal.pcbi.1000433.g001

    Author Summary

    Neurons, or nerve cells in the brain, communicate witheach other using stereotyped electric pulses, called spikes.It is believed that neurons convey information mainlythrough the frequency of the transmitted spikes, called thefiring rate. In addition, neurons may communicate someinformation through the finer temporal patterns of thespikes. Neuronal firing patterns may depend on cellularorganization, which varies among the regions of the brain,according to the roles they play, such as sensation,association, and motion. In order to examine therelationship among signals, structure, and function, wedevised a metric to detect firing irregularity intrinsic andspecific to individual neurons and analyzed spike sequenc-es from over 1,000 neurons in 15 different cortical areas.Here we report two results of this study. First, we foundthat neurons exhibit stable firing patterns that can becharacterized as regular, random, and bursty. Sec-ond, we observed a strong correlation between the type ofsignaling pattern exhibited by neurons in a given area andthe function of that area. This suggests that, in addition toreflecting the cellular organization of the brain, neuronalsignaling patterns may also play a role in specific types ofneuronal computations.

    Cortical Firing Patterns

    PLoS Computational Biology | www.ploscompbiol.org 2 July 2009 | Volume 5 | Issue 7 | e1000433

    Here, we have derived a new metric, LvR, by enhancing theinvariance to firing rate fluctuations, such that signalingcharacteristic that are specific to individual neurons can bedetected with greater sensitivity. We analyzed differences inintrinsic firing characteristics among the cortical areas and found asystematic gradient of firing regularity that closely correspondedwith the functional category of the cortical area; neuronal firing isrelatively regular in primary and higher-order motor areas,random in visual areas, and bursty in the prefrontal area. Thus,intrinsic dynamics are present in cortical areas that may berelevant to function-specific cortical computations.

    Materials and Methods

    Spike Data AnalysisNeuronal data for 15 cortical areas were collected from awake,

    behaving monkeys in eight laboratories. Four of the 15 areas werestudied in two laboratories, thus 19 data sets were generated intotal. Single electrodes or tetrodes were used to record neuronalspikes during various task trials and inter-trial intervals. Allprocedures for animal care and experimentation were inaccordance with the guidelines of the National Institutes of Healthand approved by the animal experiment committee at therespective institution where the experiments were performed.

    The initial 2,000 ISIs of the recorded spike train for eachneuron were analyzed, which contained task trial periods andinter-trial intervals, between which the firing rate differs greatly.Spike trains that contained fewer than 2,000 ISIs, or those withmean firing rates less than 5 spikes/s, were ignored; 1,307 neuronswere accepted. An irregularity metric was computed for the entire2,000 ISIs to yield a representative value for each neuron. Theyare divided into 20 sequences of 100 ISIs for analyzing fractionalsequences; the variation of a metric for an individual neuron wasestimated by comparing metric values computed for 20 fractionalsequences.

    Firing MetricsSix firing metrics were used to analyze the spike data.

    The conventional coefficient of variation Cv [35,36] is defined asthe ratio of the standard deviation of the ISIs DI to the mean I ,

    Cv~DI!

    I : 1

    The local variation Lv [32,33] is defined as

    Lv~3

    n{1

    Xn{1

    i~1

    Ii{Iiz1IizIiz1

    " #2, 2

    where Ii and Iiz1 are the i-th and i+1st ISIs, and n is the numberof ISIs. Both Cv and Lv adopt a value of 0 for a sequence ofperfectly regular intervals and are expected to take value of 1 for aPoisson random series of events with ISIs that are independentlyexponentially distributed. Whereas Cv represents the globalvariability of an entire ISI sequence and is sensitive to firing ratefluctuations, Lv detects the instantaneous variability of ISIs: The

    termIi{Iiz1IizIiz1

    " #2~1{

    4IiIiz1

    IizIiz1 2represents the cross-correla-

    tion between consecutive intervals Ii and Iiz1, each rescaled withthe instantaneous spike rate 2= IizIiz1 . The metric is superiorto standard correlation analysis because (i) the irregularity ismeasured separately from the firing rate; (ii) nonstationarity iseliminated by rescaling intervals with the momentary rate; and (iii)the non-Poisson feature is evaluated in the deviation from Lv = 1.Three more metrics that have been proposed for estimation ofinstantaneous ISI variability, SI, the geometric average of therescaled cross-correlation of ISIs [37,38], Cv2, the coefficient ofvariation for a sequence of two ISIs [39], and IR, the difference ofthe log ISIs [34] were also used.

    Figure 1 displays three types of spike sequences comprisingidentical sets of exponentially distributed ISIs. In terms of the ISIdistributions, all of these are regarded as Poisson processes,accordingly Cv values are all identical at 1. However, thesesequences clearly differ in how their ISIs are arranged; Lv may beable to detect these differences.

    In comparison with Cv, local metrics, such as Lv, SI, Cv2, and IR,detect firing irregularities fairly invariantly with firing ratefluctuations. However, these metrics are still somewhat dependenton firing rate fluctuations. Assuming that rate dependence iscaused by the refractory period that follows a spike, we can

    Figure 1. Spike sequences that have identical sets of inter-spike intervals. Intervals are aligned (A) in a regular order, (B)randomly, and (C) alternating between short and long.doi:10.1371/journal.pcbi.1000433.g001

    Author Summary

    Neurons, or nerve cells in the brain, communicate witheach other using stereotyped electric pulses, called spikes.It is believed that neurons convey information mainlythrough the frequency of the transmitted spikes, called thefiring rate. In addition, neurons may communicate someinformation through the finer temporal patterns of thespikes. Neuronal firing patterns may depend on cellularorganization, which varies among the regions of the brain,according to the roles they play, such as sensation,association, and motion. In order to examine therelationship among signals, structure, and function, wedevised a metric to detect firing irregularity intrinsic andspecific to individual neurons and analyzed spike sequenc-es from over 1,000 neurons in 15 different cortical areas.Here we report two results of this study. First, we foundthat neurons exhibit stable firing patterns that can becharacterized as regular, random, and bursty. Sec-ond, we observed a strong correlation between the type ofsignaling pattern exhibited by neurons in a given area andthe function of that area. This suggests that, in addition toreflecting the cellular organization of the brain, neuronalsignaling patterns may also play a role in specific types ofneuronal computations.

    Cortical Firing Patterns

    PLoS Computational Biology | www.ploscompbiol.org 2 July 2009 | Volume 5 | Issue 7 | e1000433

    Here, we have derived a new metric, LvR, by enhancing theinvariance to firing rate fluctuations, such that signalingcharacteristic that are specific to individual neurons can bedetected with greater sensitivity. We analyzed differences inintrinsic firing characteristics among the cortical areas and found asystematic gradient of firing regularity that closely correspondedwith the functional category of the cortical area; neuronal firing isrelatively regular in primary and higher-order motor areas,random in visual areas, and bursty in the prefrontal area. Thus,intrinsic dynamics are present in cortical areas that may berelevant to function-specific cortical computations.

    Materials and Methods

    Spike Data AnalysisNeuronal data for 15 cortical areas were collected from awake,

    behaving monkeys in eight laboratories. Four of the 15 areas werestudied in two laboratories, thus 19 data sets were generated intotal. Single electrodes or tetrodes were used to record neuronalspikes during various task trials and inter-trial intervals. Allprocedures for animal care and experimentation were inaccordance with the guidelines of the National Institutes of Healthand approved by the animal experiment committee at therespective institution where the experiments were performed.

    The initial 2,000 ISIs of the recorded spike train for eachneuron were analyzed, which contained task trial periods andinter-trial intervals, between which the firing rate differs greatly.Spike trains that contained fewer than 2,000 ISIs, or those withmean firing rates less than 5 spikes/s, were ignored; 1,307 neuronswere accepted. An irregularity metric was computed for the entire2,000 ISIs to yield a representative value for each neuron. Theyare divided into 20 sequences of 100 ISIs for analyzing fractionalsequences; the variation of a metric for an individual neuron wasestimated by comparing metric values computed for 20 fractionalsequences.

    Firing MetricsSix firing metrics were used to analyze the spike data.

    The conventional coefficient of variation Cv [35,36] is defined asthe ratio of the standard deviation of the ISIs DI to the mean I ,

    Cv~DI!

    I : 1

    The local variation Lv [32,33] is defined as

    Lv~3

    n{1

    Xn{1

    i~1

    Ii{Iiz1IizIiz1

    " #2, 2

    where Ii and Iiz1 are the i-th and i+1st ISIs, and n is the numberof ISIs. Both Cv and Lv adopt a value of 0 for a sequence ofperfectly regular intervals and are expected to take value of 1 for aPoisson random series of events with ISIs that are independentlyexponentially distributed. Whereas Cv represents the globalvariability of an entire ISI sequence and is sensitive to firing ratefluctuations, Lv detects the instantaneous variability of ISIs: The

    termIi{Iiz1IizIiz1

    " #2~1{

    4IiIiz1

    IizIiz1 2represents the cross-correla-

    tion between consecutive intervals Ii and Iiz1, each rescaled withthe instantaneous spike rate 2= IizIiz1 . The metric is superiorto standard correlation analysis because (i) the irregularity ismeasured separately from the firing rate; (ii) nonstationarity iseliminated by rescaling intervals with the momentary rate; and (iii)the non-Poisson feature is evaluated in the deviation from Lv = 1.Three more metrics that have been proposed for estimation ofinstantaneous ISI variability, SI, the geometric average of therescaled cross-correlation of ISIs [37,38], Cv2, the coefficient ofvariation for a sequence of two ISIs [39], and IR, the difference ofthe log ISIs [34] were also used.

    Figure 1 displays three types of spike sequences comprisingidentical sets of exponentially distributed ISIs. In terms of the ISIdistributions, all of these are regarded as Poisson processes,accordingly Cv values are all identical at 1. However, thesesequences clearly differ in how their ISIs are arranged; Lv may beable to detect these differences.

    In comparison with Cv, local metrics, such as Lv, SI, Cv2, and IR,detect firing irregularities fairly invariantly with firing ratefluctuations. However, these metrics are still somewhat dependenton firing rate fluctuations. Assuming that rate dependence iscaused by the refractory period that follows a spike, we can

    Figure 1. Spike sequences that have identical sets of inter-spike intervals. Intervals are aligned (A) in a regular order, (B)randomly, and (C) alternating between short and long.doi:10.1371/journal.pcbi.1000433.g001

    Author Summary

    Neurons, or nerve cells in the brain, communicate witheach other using stereotyped electric pulses, called spikes.It is believed that neurons convey information mainlythrough the frequency of the transmitted spikes, called thefiring rate. In addition, neurons may communicate someinformation through the finer temporal patterns of thespikes. Neuronal firing patterns may depend on cellularorganization, which varies among the regions of the brain,according to the roles they play, such as sensation,association, and motion. In order to examine therelationship among signals, structure, and function, wedevised a metric to detect firing irregularity intrinsic andspecific to individual neurons and analyzed spike sequenc-es from over 1,000 neurons in 15 different cortical areas.Here we report two results of this study. First, we foundthat neurons exhibit stable firing patterns that can becharacterized as regular, random, and bursty. Sec-ond, we observed a strong correlation between the type ofsignaling pattern exhibited by neurons in a given area andthe function of that area. This suggests that, in addition toreflecting the cellular organization of the brain, neuronalsignaling patterns may also play a role in specific types ofneuronal computations.

    Cortical Firing Patterns

    PLoS Computational Biology | www.ploscompbiol.org 2 July 2009 | Volume 5 | Issue 7 | e1000433

    spike train of size p [Fig. 4(d)]. Generating independent, yet time-dependent events, the 197procedure is expected to create time-dependent Poisson random processes, P (, t) = 198(t)e(t) , where the firing rate (t) in th