14
Maurizio Naldi Università di Roma “Tor Vergata” POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC MODELLING Workshop “Statistica e Telecomunicazioni”, Roma 2-3 Luglio 2001 Università Di Roma “Tor Vergata” Dip. Informatica Sistemi Produzione

Maurizio Naldi Università di Roma “Tor Vergata” POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC MODELLING Workshop “Statistica e Telecomunicazioni”, Roma

Embed Size (px)

Citation preview

Page 1: Maurizio Naldi Università di Roma “Tor Vergata” POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC MODELLING Workshop “Statistica e Telecomunicazioni”, Roma

Maurizio NaldiUniversità di Roma “Tor Vergata”

POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC

MODELLING

Workshop “Statistica e Telecomunicazioni”, Roma 2-3 Luglio 2001

Università Di Roma “Tor Vergata”Dip. Informatica Sistemi Produzione

Page 2: Maurizio Naldi Università di Roma “Tor Vergata” POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC MODELLING Workshop “Statistica e Telecomunicazioni”, Roma

WHAT’S A POPULARITY MODEL

Popularity models describe the way users distribute their preferences among a set of objects.They are represented under the form of either a frequency-rank plot (suitable for highly preferred objects) or a frequency-count plot (suitable for the less preferred objects.

Page 3: Maurizio Naldi Università di Roma “Tor Vergata” POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC MODELLING Workshop “Statistica e Telecomunicazioni”, Roma

EXAMPLES OF FREQUENCY-RANK AND FREQUENCY-COUNT PLOTS

A frequency-rank plotNo. of preferences vs. rank

A frequency-count plotNo. of preferences vs. no. of objects

that have those preferences

Page 4: Maurizio Naldi Università di Roma “Tor Vergata” POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC MODELLING Workshop “Statistica e Telecomunicazioni”, Roma

SOME POPULARITY MODELS(FREQUENCY-RANK LAWS)

• Zipf

• Simon

• Yule

rrf

rbrrf

1

1

r

rrf

Page 5: Maurizio Naldi Università di Roma “Tor Vergata” POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC MODELLING Workshop “Statistica e Telecomunicazioni”, Roma

RELATIONSHIP TO PARETO’S MODEL

Arx

If the objects in a set of N are ranked by size according to Zipf’s law

Then the number of objects having a size greater or equal tox is

1

A

xr

The probability distribution function is therefore

11

1

A

x

NrF i.e. of the Pareto type

Page 6: Maurizio Naldi Università di Roma “Tor Vergata” POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC MODELLING Workshop “Statistica e Telecomunicazioni”, Roma

APPLICATIONS

Present• Cache algorithm design• Address cache table dimensioning• Optimization of Video-on-Demand

servers’ architecturePossible• Any communications context where the

user has a wide choice

Page 7: Maurizio Naldi Università di Roma “Tor Vergata” POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC MODELLING Workshop “Statistica e Telecomunicazioni”, Roma

TRAFFIC MONITORING POINTS

Users Sites

Web proxy observation pointSome-to-All

Web server observation point

All-to-One

Users Sites

Page 8: Maurizio Naldi Università di Roma “Tor Vergata” POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC MODELLING Workshop “Statistica e Telecomunicazioni”, Roma

OBSERVED REQUEST DISTRIBUTIONS

Page 9: Maurizio Naldi Università di Roma “Tor Vergata” POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC MODELLING Workshop “Statistica e Telecomunicazioni”, Roma

OBSERVED REQUEST DISTRIBUTIONS

Page 10: Maurizio Naldi Università di Roma “Tor Vergata” POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC MODELLING Workshop “Statistica e Telecomunicazioni”, Roma

OBSERVED REQUEST DISTRIBUTIONS

Page 11: Maurizio Naldi Università di Roma “Tor Vergata” POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC MODELLING Workshop “Statistica e Telecomunicazioni”, Roma

OBSERVED DISTRIBUTIONS OF USERS AMONG SITES

Page 12: Maurizio Naldi Università di Roma “Tor Vergata” POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC MODELLING Workshop “Statistica e Telecomunicazioni”, Roma

THE 20/80 (10/90) RULE

Proportion ofdocuments [%]

Expectedrequests [%]

Observedrequests [%]

1 19-46 20-35

10 44-68 45-55

Proportion ofrequests [%]

Expectedproportion of

documents [%]

Observedproportion of

documents [%]

70 12-37 25-40

90 54-75 70-80

• The proportion of requests for the top documents is overestimated• Fixed proportion rules are false

Page 13: Maurizio Naldi Università di Roma “Tor Vergata” POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC MODELLING Workshop “Statistica e Telecomunicazioni”, Roma

GENERAL COMMENTS

• When fitted by linear regression via Zipf’s law the estimated parameter typically lies in the 0.6-0.85 range

• All log-log frequency-rank plots exhibit an initial concavity (top objects’ preferences are overestimated)

• All log-log frequency-count plots exhibit final (count vs. frequency) spreading

Page 14: Maurizio Naldi Università di Roma “Tor Vergata” POPULARITY DISTRIBUTIONS AND INTERNET TRAFFIC MODELLING Workshop “Statistica e Telecomunicazioni”, Roma

OPEN ISSUES

• Search for better models (solving the initial concavity problem)

• Search for parameter estimation methds other than linear regression

• Definition of proper goodness-of-fit tests