16
Amir Rasti Reza Rejaie Dept. of Computer Science University of Oregon

Amir Rasti Reza Rejaie Dept. of Computer Science University of Oregon

Embed Size (px)

Citation preview

Amir RastiReza Rejaie

Dept. of Computer Science

University of Oregon

Peer-to-peer systems have become increasingly popular◦ Millions of simultaneous users◦ Significant percentage of Internet

traffic is one of the most

popular p2p applications◦ Responsible for 35% of all Internet

traffic [Parker05] BitTorrent is important because

◦ Popularity◦ Its impact on the network

3

Scalable one to many peer-to-peer file distribution

Overlay: Unstructured, Random, High degree

Swarming◦ File is divided into segments◦ Segments are randomly distributed

among peers – Get rarest seg. first Contribution

◦ Peers exchange segments and contribute their outgoing bandwidth

◦ Incentive: Tit-for-Tat Tracker

◦ Torrent coordinator◦ Periodic peer status updates

Performance: Intuitively depends on◦ Peer properties (BW, Contribution,

etc. )◦ Group properties (Population, Content

availability, Churn)

Introduction

4

1. Modeling and analytical studies2. Simulation studies

3. Empirical studies◦ Capture BitTorrent system properties in operation through

measurement (instrumented clients)[Legout06]◦ Group properties[Izal04]: Population, Average cont. avail., ..◦ No explicit notion of performance◦ No study on the effects of underlying factors of peer

performance

Related work

Characterization:◦ Understanding group-level and

peer-level properties in a torrent Analysis:

◦ What are the main factors that affect observed performance by individual peers?

5

Common approach: Instrumented clients

◦ Detailed and flexible◦ Representative?

Our approach: Tracker logs ◦ Coarse granularity(30 min)◦ Global view

Data Sets

Methodology/Approach

Tracker Log

file

Tracker

Source #Torrents

Start Date

End Date

#Reports

#Sessions

RedHat 1 3/03 8/03 2M 170k

Debian 1599 2/05 3/05 32M 1268k

Games 2585 8/03 12/04 38M 4416k

Torrent File Size # Sessions, rank

Duration

RedHat

1.8GB 170k, 3rd 146d

Debian 677MB 139k, 6th 51d

Games 363MB 195k, 2th 66dTracker logs sets

Selected Torrents

6

Session: ◦ Set of all updates from a

particular peer from its arrival till departure

Peer-level properties:◦Represent the peer’s

status during a session:

◦ Average download rate◦ Average upload rate

Methodology

Slope = upload rate

Download Complete

Session Start

Studied zone(leeching)

Download rate

Slopes= upload rates

Download rates

Avg download rate

7

Population, Avg. Content Availability, Churn

Sampling approach:◦ Once every τ minutes◦ Last update before and first

update after each sample◦ Interpolation◦ Averaging across peers

τ determines sampling resolution

τ > average update interval Peer view:

◦ Average of the samples during peer’s download time

Measurement methodology

Update Time

τ

8

Is Download Rate a good performance metric ?◦ A reference is needed to evaluate peer’s download rate◦ Ideally peer performance is:

◦ Accurate measurement of Utilization is difficult We use maximum observed download rate as a (lower bound)

estimate for incoming bandwidth. Standard deviation of download rate captures stability of

download rate◦ Rates close to avg. higher performance◦ Normalization comparability

Two performance metrics:

BandwidthIn

rateDownloadAvgnUtilizatio

_

__

Methodology

RateDownloadObservedMax

RateDownloadAvgUtil

___

__

RateDownloaAvg

SdevrateDownloadsdevNorm

__

___

Similar distribution across 3 different torrents

Utilization has an almost uniform distribution◦ Nearly Fixed probability density

90% show closely uniform distribution

Diverse performance No dominant modes

9

Characterization Results/Peer-Properties

10

Content availability ◦ 75% of peers in RH

observe an average cont. avail. of 50%

◦ No content shortage

Avg. Population◦ Very different◦ Flash crowd in RH

Characterization Results/Peer-view of group properties

Initial flash crowd

11

Underlying factorsUnderlying factors Remember the second questions

◦ What are the peer- or group-level properties that primarily determine the observed performance by individual peers in a torrent?

Performance metrics:◦ Utilization and Stability

Possible Underlying factors:◦ Group-level properties: Population, Churn , Content avail.◦ Peer-level properties: Upload rate, etc.

Approach To Identify Underlying factors◦ Scatter-plot◦ Linear Regression (Using S-plus)◦ Spearman’s rank correlation (S-Plus)

12

Utilization vs. Average group content availability◦ No obvious correlation

Utilization vs. Average group population◦ Vertical patterns◦ No obvious correlation

Statistical Analysis/Scatter-plots

13

Model R-Square outbw.50p avg.grp.pop avg.grp.cont.avail avg.grp.churnutil 0.0651 0.0091 -0.1206 0.3493 0.0015util-log 0.0603 0.0965 -0.0311 0.4367 0util-step 0.0603 0.0965 -0.0309 0.4358 removedsdev 0.0709 -0.0142 0.2245 -0.3344 -0.0029sdev-log 0.0741 -0.1585 0.0778 -0.6486 -0.0005sdev-step 0.0741 -0.1585 0.0778 -0.6486 -0.0005

Suggested techniques result in marginal improvement (R-squared)

No single parameter with dominant effect

Seed percentage was removed by step() suggests number of seeds is sufficient

Statistical Analysis/Linear Regression

Several values to consider:

R-Squared determines goodness of fit [0:1]

P-value determines: “Probability of obtaining a result as impressive” just by chance

14

Torrent Perf. up.dev Avg.Pop Avg.Cont Avg.ChurnRH inbw.util -0.46 -0.13 0.05 -0.12RH inbw.sdev 0.49 0.2 -0.03 0.19DE inbw.util -0.42 -0.02 0.1 -0.02DE inbw.sdev 0.47 0.03 -0.1 0GA inbw.util -0.36 -0.05 0.04 -0.05GA inbw.sdev 0.47 0.14 -0.11 0.14

Highest correlation with deviation of upload rate for all torrents -> Tit-for-tat effect

Two perf. metrics are similarly affected with opposite signs

GA: Little correlation with util. -> unreliable metric

DE: Slightly larger effect from content avail.

Statistical Analysis/Spearman’s Rank correlation

15

Conclusions◦ No single factor determines observed performance by peers◦ Outgoing bandwidth seems to have the largest effect

Tit-for-tat is working◦ There often appears to be sufficient number of seeds available

(non-factor on performance)◦ Capturing comparable performance is hard◦ Performance of the peers in a torrent is rather diverse

Instrumented clients cannot reflect a representative picture. Future work

◦ Active monitoring of BitTorrent◦ BitTorrent overlay topology using peer exchange feature◦ Characterizing new features:

DHT, super-seeding, peer exchange

Thank you !