Upload
mikko-kivelae
View
252
Download
0
Embed Size (px)
Citation preview
Estimating inter-event time distributions from finite
observation periods
Mikko Kivelä Aalto University
MK, M.A. Porter, Phys. Rev. E 92 052813
“It is a curious fact that researchers in statistical physics know very little about statistics.” - Anonymous referee
Temporal communication networks● Communication between people: emails, letters,
calls, SMS, messages in websites etc. ● Temporal networks: links/nodes active at discrete
times (activation events)
Time
Holme & Saramäki, Phys. Reps. 519, 97 (2012)
Inter-event times (IETs)● Times between activations of nodes or edges in
temporal networks
Time
Activation event of a node or a link.
Inter-event times - why are they important?
1) Validating models of human activity ● Models predict that there should be very long IETs
− Power-law IET dist. Barabási,Nature435,207(2005)− Exponential IET dist.
2) Processes acting on top of temporal networks are affected by burstiness/long IETs ● Spreading processes Karsaietal.,Phys.Rev.E83,025102(2011)● Mixing time of random walks Delvenneetal.,arXiv:1309.4155
[physics.soc-ph](2013)
● Opinion formation Takaguchi&Masuda,Phys.Rev.E84,036115(2011)
The tail of the IET distribution is important!
Example: bustiness & spreading
M.Kivelä,R.K.Pan,K.Kaski,J.Kertész,J.Saramäki,M.Karsai:Multiscaleanalysisofspreadinginalargecommunicationnetwork,J.Stat.Mech.3P03005(2012)MKarsai,MKivelä,RKPan,KKaski,JKertész,ALBarabási,JSaramäki:Smallbutslowworld:Hownetworktopologyandburstinessslowdownspreading,PhysicalReviewE83(2),025102(2011)
Bursts can dramatically slow down spreading on networks!
Example: bustiness & spreadingLong tailed IET distribution = bursty event sequence
= high residual waiting times = slow spreading
random time point
M.Kivelä,R.K.Pan,K.Kaski,J.Kertész,J.Saramäki,M.Karsai:Multiscaleanalysisofspreadinginalargecommunicationnetwork,J.Stat.Mech.3P03005(2012)
The problem
A typical example of a plot of IET distribution found in the literature
Rybski et al., Sci. Reps. 2, 560 (2012)
A typical example of a plot of IET distribution found in the literature
Rybski et al., Sci. Reps. 2, 560 (2012)
Is this cut-off in the power-law really a scale in the interaction patterns or is it just a „finite size effect“?
A typical example of a plot of IET distribution found in the literature
Rybski et al., Sci. Reps. 2, 560 (2012)
Length of the observation period T=~500 days
Questions● Is there a finite size effect or is the dip “real”? ● What is the reason for it? ● What is the functional form of the effect?
● Linear? Vazquezetal.,Phys.Rev.Lett.98,158702(2007) ● Exponential? Wu et. al, PNAS 107 18803 (2010)
● How does it affect estimates of statics such as the bustiness parameter or residual waiting time?
● How to correct for the finite size effect correctly?
Renewal process model
Real IET distribution p(τ) Observed IET distribution p'(τ)
● Stationary renewal process and finite observation window:
Renewal process model
Real IET distribution p(τ) Observed IET distribution p'(τ)
● Stationary renewal process and finite observation window:
● There is a linear length bias:
Soon&WoodroofeJ.Stat.Plan.Inference53,171(1996)
Renewal process model
p'(τ), observed IETsp(τ) , real IETs ~(T-τ)p(τ)
Exponential: p(τ)~e-τ, T=5 Power-law: p(τ)~τ-2.1, T =40
Strategies used in the literature to cope with finite time windows
● Periodic boundary conditions Karsaietal.,Phys.Rev.E83,025102(2011)● Rescaling the IET distribution Holme,Europhys.Lett.64,427(2003)
● Resample data with smaller T -> infer scaling law Vazquezetal.,Phys.Rev.Lett.98,158702(2007)
● Select only high frequency event sequences ● E.g., >103 - 105 event sequences, but a single sequence or 10% of sequences used Barabási,Nature435,207(2005);Wuet.al,PNAS10718803(2010)
● Nothing! (some times “finite size effects” are mentioned)
The solution
Censored IETsReal IET distribution p(τ)
Observed IET distribution p'(τ)
● Two types of IETs are sampled: ● Observed ● Censored
● Bias arises when one leaves out the censored ones
● How to deal with censored IETs: Survival analysis
Using Kaplan-Meier estimator to estimate the IET distribution
● Take the observed and censored IETs:Censored IET: we know that τ3 is longer than τfc
Number of observed IETs of length ti
Number of IETs know to be at least length ti (including censored)
Kaplan&Meier,JASA53,457(1958);Denby&Vardi,Technometrics27,361(1985)
● Estimate the cumulative IET distribution:
● Result is an unbiased non-parametric maximum likelihood estimator for the IET distribution
Renewal process model
Exponential: p(τ)~e-τ, T=5 Power-law: p(τ)~τ-2.1, T =40
p'(τ), observed IETsp(τ) , real IETs ~(T-τ)p(τ)
Renewal process model
p'(τ), observed IETsp(τ) , real IETs ~(T-τ)p(τ)
Exponential: p(τ)~e-τ, T=5 Power-law: p(τ)~τ-2.1, T =40
pkm(τ), Kaplan-Meier est
How important is the correction?Email communication
Eckmann et al, PNAS 101 14333 (2004)Messages in POK
Rybski al, Sci. Reps. 2 560 (2012)Short messages
Wu al, PNAS 107 18803 (2010)
IET: Average IET: sqrt of 2nd moment Avg residual waiting time
Observed KM estimate Observed KM estimate Observed KM estimate
Email 0.908 1.51 3.20 6.88 5.62 15.6
POK 5.13 28.4 23.1 106 51.9 198
SMS 0.633 1.40 2.11 4.89 3.53 8.53
Times in days
Kaplan-MeierObserved IETs
When to worry about length bias?● The bias depends on how far you are from the end of the
time window:
● E.g, the max point in IET distribution is 100 times smaller than the time window T -> max 1% error
A rule of thumb
Duarte et al., ICWSM’2007
Affected
Not Affected
T= 2.5 106 s
Separating time sequences with different activity levels
● There is heterogeneity in the frequencies of different sequences (e.g., some people send huge numbers of emails, others only few)
● IET distributions can have the same shape but different frequencies: data collapse
Observed IET distributions Kaplan-Meier estimates
Summary● IETs are subject to linear length bias if there is a
finite sampling window ● The bias is considerably large for several popular
and freely available data sets on communication ● Kaplan-Meier estimator is an easy way to
estimate the real IET distribution (other estimators exist)
● You can even use low-frequency event sequences
● Try it: http://github.com/bolozna/iet
Model with heterogeneous activity levels
Activities t0 from a power-law distribution for p(t0)
Poisson processes with rates t0
Estimators for single sequences
Poisson process with n expected events