Nonstationarities in teletraffic data which may spoil your statistical tests

Nonstationarities in teletraffic data which may spoil your

statistical tests

Piotr Żuraniewski (UvA/TNO/AGH)

Felipe Mata (UAM), Michel Mandjes (UvA), Marco Mellia (POLITO)

Stationarity

• Many models assume stationarity: statistical properties do not change over time– strong stationarity: all statistical properties

remain the same over time– weak stationarity: statistical properties up to

second order (mean, variance, covariance) remain unchanged

Nonstationarity – problems

• Real life: things are changing…• Bad news: sample stationarity can not be

positively verified• Best answer we can get: ‘we found no

evidence of given type of nonstationarity’• Some examples:

– mean shift– polynomial deterministic trend– variance change

Example

• Change in the number of users in VoIP system

• Model: load change in M/G/inf queue

• Sample ACF suggests very high correlation– slow decay?– long range

dependency?

0 50 100 150 200250

300

350

400

450

time

no.

of u

sers

0 5 10 15 20-0.2

0

0.2

0.4

0.6

0.8

lag

sam

ple

AC

F

Example

0 50 100 150 200250

300

350

400

450

time

no.

of u

sers

0 5 10 15 20-0.2

0

0.2

0.4

0.6

0.8

lag

sam

ple

AC

F

• Changepoint detection procedure we developed allows to separate parts with different load

• There is no significant correlation in either of this parts

• Sample ACF does not estimate ACF in case of nonstationarity

0 5 10 15 20-0.2

0

0.2

0.4

0.6

0.8

lag

sam

ple

AC

F

Changepoint detection

• Window of 50 samples presented to detection procedure

• Add newest observation, drop oldest and repeat detection procedure

• In this example: true change in window number 51

• Changepoint detection works well – see output of 500 experiments

0 50 100 1500

0.2

0.4

0.6

0.8

1

window no.

dete

ctio

n ra

tio

0 50 100 150 200250

300

350

400

450

time

no.

of u

sers

Changepoint detection

• However, if we add deterministic trend, things go wrong

• Observe high false alarm ratio after polluting data with trend

0 50 100 150 200250

300

350

400

450

500

time0 50 100 150 200

025

0 50 100 1500

0.2

0.4

0.6

0.8

1

window no.

dete

ctio

n ra

tio

Work in progress

• Real VoIP data from Italian service provider and aggregated IP data from Spanish university backbone network

• Current research: estimate and remove trend from traffic

• Only than apply changepoint detection procedure(s)

1.2912 1.2914 1.2916 1.2918 1.292 1.2922 1.2924 1.2926 1.2928 1.293

x 109

0

100

200

300

400

500

600

700

800

900

Work in progress

• Trend estimation methods:– moving average?– kernel/wavelets smoothing?– parametric methods?– time series regression?

• How to judge if estimated trend is really significant?

• Models different than M/G/inf?

Conclusions

• Different types of nonstationarities may severely influence statistical tests or values of estimators

• Even if we try to detect one type of nonstationarity, the other type may ruin our original test

• We always have to pay attention to the assumptions of the theorems used

• Share your experience!

Documents

Nonstationarities in teletraffic data which may spoil your statistical tests