Transcript

Amir RastiReza Rejaie

Dept. of Computer Science

University of Oregon

Peer-to-peer systems have become increasingly popular◦ Millions of simultaneous users◦ Significant percentage of Internet

traffic is one of the most

popular p2p applications◦ Responsible for 35% of all Internet

traffic [Parker05] BitTorrent is important because

◦ Popularity◦ Its impact on the network

3

Scalable one to many peer-to-peer file distribution

Overlay: Unstructured, Random, High degree

Swarming◦ File is divided into segments◦ Segments are randomly distributed

among peers – Get rarest seg. first Contribution

◦ Peers exchange segments and contribute their outgoing bandwidth

◦ Incentive: Tit-for-Tat Tracker

◦ Torrent coordinator◦ Periodic peer status updates

Performance: Intuitively depends on◦ Peer properties (BW, Contribution,

etc. )◦ Group properties (Population, Content

availability, Churn)

Introduction

4

1. Modeling and analytical studies2. Simulation studies

3. Empirical studies◦ Capture BitTorrent system properties in operation through

measurement (instrumented clients)[Legout06]◦ Group properties[Izal04]: Population, Average cont. avail., ..◦ No explicit notion of performance◦ No study on the effects of underlying factors of peer

performance

Related work

Characterization:◦ Understanding group-level and

peer-level properties in a torrent Analysis:

◦ What are the main factors that affect observed performance by individual peers?

5

Common approach: Instrumented clients

◦ Detailed and flexible◦ Representative?

Our approach: Tracker logs ◦ Coarse granularity(30 min)◦ Global view

Data Sets

Methodology/Approach

Tracker Log

file

Tracker

Source #Torrents

Start Date

End Date

#Reports

#Sessions

RedHat 1 3/03 8/03 2M 170k

Debian 1599 2/05 3/05 32M 1268k

Games 2585 8/03 12/04 38M 4416k

Torrent File Size # Sessions, rank

Duration

RedHat

1.8GB 170k, 3rd 146d

Debian 677MB 139k, 6th 51d

Games 363MB 195k, 2th 66dTracker logs sets

Selected Torrents

6

Session: ◦ Set of all updates from a

particular peer from its arrival till departure

Peer-level properties:◦Represent the peer’s

status during a session:

◦ Average download rate◦ Average upload rate

Methodology

Slope = upload rate

Download Complete

Session Start

Studied zone(leeching)

Download rate

Slopes= upload rates

Download rates

Avg download rate

7

Population, Avg. Content Availability, Churn

Sampling approach:◦ Once every τ minutes◦ Last update before and first

update after each sample◦ Interpolation◦ Averaging across peers

τ determines sampling resolution

τ > average update interval Peer view:

◦ Average of the samples during peer’s download time

Measurement methodology

Update Time

τ

8

Is Download Rate a good performance metric ?◦ A reference is needed to evaluate peer’s download rate◦ Ideally peer performance is:

◦ Accurate measurement of Utilization is difficult We use maximum observed download rate as a (lower bound)

estimate for incoming bandwidth. Standard deviation of download rate captures stability of

download rate◦ Rates close to avg. higher performance◦ Normalization comparability

Two performance metrics:

BandwidthIn

rateDownloadAvgnUtilizatio

_

__

Methodology

RateDownloadObservedMax

RateDownloadAvgUtil

___

__

RateDownloaAvg

SdevrateDownloadsdevNorm

__

___

Similar distribution across 3 different torrents

Utilization has an almost uniform distribution◦ Nearly Fixed probability density

90% show closely uniform distribution

Diverse performance No dominant modes

9

Characterization Results/Peer-Properties

10

Content availability ◦ 75% of peers in RH

observe an average cont. avail. of 50%

◦ No content shortage

Avg. Population◦ Very different◦ Flash crowd in RH

Characterization Results/Peer-view of group properties

Initial flash crowd

11

Underlying factorsUnderlying factors Remember the second questions

◦ What are the peer- or group-level properties that primarily determine the observed performance by individual peers in a torrent?

Performance metrics:◦ Utilization and Stability

Possible Underlying factors:◦ Group-level properties: Population, Churn , Content avail.◦ Peer-level properties: Upload rate, etc.

Approach To Identify Underlying factors◦ Scatter-plot◦ Linear Regression (Using S-plus)◦ Spearman’s rank correlation (S-Plus)

12

Utilization vs. Average group content availability◦ No obvious correlation

Utilization vs. Average group population◦ Vertical patterns◦ No obvious correlation

Statistical Analysis/Scatter-plots

13

Model R-Square outbw.50p avg.grp.pop avg.grp.cont.avail avg.grp.churnutil 0.0651 0.0091 -0.1206 0.3493 0.0015util-log 0.0603 0.0965 -0.0311 0.4367 0util-step 0.0603 0.0965 -0.0309 0.4358 removedsdev 0.0709 -0.0142 0.2245 -0.3344 -0.0029sdev-log 0.0741 -0.1585 0.0778 -0.6486 -0.0005sdev-step 0.0741 -0.1585 0.0778 -0.6486 -0.0005

Suggested techniques result in marginal improvement (R-squared)

No single parameter with dominant effect

Seed percentage was removed by step() suggests number of seeds is sufficient

Statistical Analysis/Linear Regression

Several values to consider:

R-Squared determines goodness of fit [0:1]

P-value determines: “Probability of obtaining a result as impressive” just by chance

14

Torrent Perf. up.dev Avg.Pop Avg.Cont Avg.ChurnRH inbw.util -0.46 -0.13 0.05 -0.12RH inbw.sdev 0.49 0.2 -0.03 0.19DE inbw.util -0.42 -0.02 0.1 -0.02DE inbw.sdev 0.47 0.03 -0.1 0GA inbw.util -0.36 -0.05 0.04 -0.05GA inbw.sdev 0.47 0.14 -0.11 0.14

Highest correlation with deviation of upload rate for all torrents -> Tit-for-tat effect

Two perf. metrics are similarly affected with opposite signs

GA: Little correlation with util. -> unreliable metric

DE: Slightly larger effect from content avail.

Statistical Analysis/Spearman’s Rank correlation

15

Conclusions◦ No single factor determines observed performance by peers◦ Outgoing bandwidth seems to have the largest effect

Tit-for-tat is working◦ There often appears to be sufficient number of seeds available

(non-factor on performance)◦ Capturing comparable performance is hard◦ Performance of the peers in a torrent is rather diverse

Instrumented clients cannot reflect a representative picture. Future work

◦ Active monitoring of BitTorrent◦ BitTorrent overlay topology using peer exchange feature◦ Characterizing new features:

DHT, super-seeding, peer exchange

Thank you !


Recommended