View
240
Download
0
Category
Tags:
Preview:
Citation preview
BitTorrent
BitTorrent network
On the itinerary: Introduction to BitTorrent Basics & properties 3 Interesting analysis results
Publishing
How to publish a (usually large) file ? Dedicated server:
Easy to manage, Easy to find, Persistent service Nevertheless…
BitTorrent: Organizes multiple clients that share the same file Leveraging the upload bandwidth of the
participants Self scaling, resilience, operates well in “Flash
crowed” period Takes 50%-60% of all p2p traffic
The Basics of BitTorrent
Content provider (everyday people) wants to publish a file (the initial seed):
Creates a meta file (*.torrent) Publish this (light) *.torrent file on a web server File is broken to small blocks (32-256 KB) Uploads blocks to other peers The goal: to publish the file to many nodes by
the help of other peers, with minimum load on the seeder
The Basics of BitTorrent
Third party (the tracker): A tracker site keeps track of the active
participants + extra statistics. Upon requests from nodes, supplies a random
subset of active nodes Receives updates from the active nodes Keeps track of new node joining the ‘torrent’ (or
‘swarm’) and nodes that left it
The Basics of BitTorrent
Peer (leecher) who is interested: Obtains the public *.torrent file Being directed to the tracker Obtains a list of random neighbors (~40) Downloads and uploads blocks to its ‘best’
neighbors (choking and unchoking) Upon download completion, becomes a seed
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent demo (Wikipedia)
BitTorrent basic schemes
(Immediate) Problems arise: Last block problem: assume nodes depart upon
completion, how would leeches obtain the last block ?
Free riding problem: willing to download, but unwilling to serve others
Simple and effective solutions: Last block problem: Nodes employ Local Rarest
First policy Free riding problem: Nodes employ “Tit For Tat”
policy, i.e. give more to those whom you accept more from
On the itinerary
One interesting limitation of BitTorrent networks
BitTorrent provides poor service availability via analysis of tracker logs over long period
Proposition of peer selection approach Enables to lower costs and resources of ISPs Does not require ISP and peers cooperation Implementation of the proposition as part of Azureus
client, codename ‘Ono’ Tradeoffs
Performance (avg. Download time) vs. Fairness (avg. share ratio)
BitTorrent limitations
Taken from the paper “Measurements, Analysis and Modeling of BitTorrent-like systems” [1]
Inspect overall performance in the lifetime of a torrent Analysis is based on traces A model is derived which is used to draw conclusions Verify that the derived conclusions match the observed
behavior Limitations were found:
Poor service availability (coming next) Fluctuating download performance Unfair service to peers
Service availability - Analysis
Analysis is based on traces Tracker logs (~1500 torrents, sampled every 30
sec) Traces from servers that publish *.torrent files
Extracted data Identify peers Birth time of the torrent, size For each peer in the same torrent: arrival time,
download & upload bandwidth, download & upload accumulative bytes
Service availability - Analysis
Y-axis at time t : The total number of requests for all torrentsin the trace minus the cumulative number of requests for all torrents after time t, since theyare born.
Similar observations areseen in the *.torrent metadata traces
Service availability - Analysis
This suggests an exponential decrease rate of requests, since a torrent is born
Notice that this is a cumulative measure Does a specific torrent behaves the
same? Use the least square method to measure how
much a specific torrent deviates from this logarithmic fitting
Analyze the average relative deviation distribution (which is mostly small – on average 6%)
Service availability – A model
Based on the observations, define the torrent popularity at time t as the peers arrival rate, which is the derivative of the of the peer arrival time distribution for that torrent.
Arrival rate: Where is the initial arrival rate when the
torrent starts And is the attenuation parameter Both are evaluated from the observations
Service availability – A model
Define Torrent lifespan: duration from the birth to the
time of no complete copy (thus new leeches would not be able to complete the download)
Inter arrival time between two successive arriving peers could be approximated as
Assume seeds leave the system at rate then the average service time for a seed is approximately
Service availability – A model
Look at consecutive peers that join the torrent
Peer n and (n+1) join the torrent in time t(n) and t(n+1) respectively
The inter arrival time between them is approximately
Peer n downloads the file with speed u(n) and stays in the torrent for time duration
When peer arrival rate is small enough (n is large), peer n+1, with speed u(n+1) <= u(n) could only be served by peer n
Service availability – A model
Thus when peer n+1 can’t complete the download and the torrent is dead
Using the definition of arrival rate, we get the torrent lifespan:
Both and are extract from the trace (using linear regression), as well as
Compare results from the model to what the trace holds
Results match very well
Model vs. Observations
Comparison of torrents lifespan:average lifespanaccording to traceis 8.89 and 8.34based on the model
Service availability - Summery
Conclusions were obtained relying on extensive trace analysis and modeling
Existing BitTorrent systems provides poor service availability
This is due to the exponential decreasing peer arrival rate
This provides strong motivation for inter-torrent collaborations
Next on the itinerary
One interesting limitation of BitTorrent networks
BitTorrent provides poor service availability via analysis of tracker logs over long period
Proposition of peer selection approach Enables to lower costs and resources of ISPs Does not require ISP and peers cooperation Implementation of the proposition as part of Azureus
client, codename ‘Ono’ Tradeoffs
Performance (avg. Download time) vs. Fairness (avg. share ratio)
Reducing cross-ISP traffic
Taken from the paper “Taming the torrent” [2] Motivation: overwhelming popularity of p2p (70%
of internet traffic worldwide) yielded significant revenues for ISP.
However, p2p traffic significantly has increased ISP’s costs, particularly in terms of cross-ISP traffic
This has driven ISP to try and forcefully reduce p2p traffic Block specific ports, tricking clients to close
connections Deep packet inspection Caching
Reducing cross-ISP traffic
One approach to alleviate this pain is to use an Oracle that provides knowledge about which peers are in the same ISP This would benefit both ISPs and p2p
community But, this requires p2p users and ISP to
collaborate and to trust each other Not likely to be adopted
Reducing cross-ISP traffic
Another approach is to recycle data that is already being collected by Content Distribution Networks CDNs attempt to improve web performance
by redirecting requests to replica servers The goal is to help content providers (i.e.
CNN) to distribute content by redirecting requests to replica servers that are: Topologically proximate Provide lower-latency
CDNs as oracles
Hypothesis: when peers exhibit similar redirection behavior, they are likely to be close to the replica server, and thus to each other Represent redirection behavior using ratio-
maps Each ratio represents the frequency of
redirecting to a specific replica Number of replicas is usually small (max 31) Keep a time window (~ a day)
CDNs redirections as ratio-maps
The ratio map of a peer is a set of (replica server, ratio) for peer a Specifically, if peer a is redirected toward
replica server r1 75% of the time window, and toward replica server r2 25% of the time window, then the corresponding ratio-map is
The sum of all in a given ratio map equals one
Similarity via ratio-maps
Define a metric that, given two peers, produces a value describing the similarity between the peers’ redirections behavior We are looking for overlap in redirection
frequencies maps between each two peers Use cosine-similarity between two peers a and
b:
Cosine-similarity
Distance(a,b): Sum is over the set of replica servers that a (b) to
which the peer has been redirected over the time window
is the ratio of time that peer a has been redirected to replica server i
Cosine-similarity is analogous to dot product When maps are identical, equals 1 When maps are orthogonal (no common replica),
equals 0 Values lie in [0,1] Determine a threshold (0.15)
CDNs implementation
Ono, an extension to Azureus client Upon handshake of two peers, exchange ratio-
maps This enables Ono to perform a biased peer
selection Performs DNS lookup for each CDN name to
determine redirection behavior and encodes it in ratio-maps
Periodically update the ratio-maps Overhead is extremely small
18KB upstream, 36KB downstream per day Computation of cosine-similarity is easy
Ono-recommended empirical results
Over 120,000 peers use Ono Ono collects extra network data
Ping, Trace-Route to replica servers and peers
Obtain feedback on the biased peer selection Not easy to determine cross-ISP hops IP hops is easy and gives some measure
Compare Ono-recommended peers selection to random peer selection
Ono-recommended empirical results
Cumulative Distribution function of the number of ip hops taken along paths between Ono client and his peers.
Each value represents the average number of hops for all peers, seen by a particular Ono client during 6 hour interval
Ono finds shorter paths Median in less than half More than 20%
are only one hopaway, via less than 2%
Ono-recommended empirical results
Each ip address was mapped to corresponding Autonomous System id
Similar to the previous graph Over 33% of paths found
by Ono do not leave the origin AS
Median AS hops is onevs. less than 10% in the random case
CDNs as oracles - Summery
Recycling network views collected by CDNs Good internet citizenship in terms of
reducing cross-ISP traffic Performance of peers is not effected Scalable (the more clients adopt it, the
more accurate the bias would get) Available easily and freely
Last on the itinerary
One interesting limitation of BitTorrent networks
BitTorrent provides poor service availability via analysis of tracker logs over long period
Proposition of peer selection approach Enables to lower costs and resources of ISPs Does not require ISP and peers cooperation Implementation of the proposition as part of Azureus
client, codename ‘Ono’ Tradeoffs
Performance (avg. Download time) vs. Fairness (avg. share ratio)
BitTorrent Tradeoffs
Taken from the paper “The delicate tradeoffs in BitTorrent-like file sharing protocol design” [3] Peers that participate in BT are heterogeneous
with regard to download and upload capacities Taking a system approach
The system throughput depends critically on the “fat” peers
However, this might result in unfairness towards those who contribute more
This in turn would encourage peers to supply low upload rate to others
BitTorrent Tradeoffs
A user would look for download it gets to be proportional to the upload it supplies Assuming peers take the system overview Long lasting, steady state, rational
Two parameters and their inter relations are explored Performance: minimum average rate of
download time Fairness: ratio between give and take
Model
W.L.O.G. the file is of size 1 Assume peer average arrival rate of
Assume Peers do not abort Upon completion, peer leaves the torrent (BT provides
no incentive of seeding) Assume n classes (types) of peers
For each new peer arrival, with probability it belongs to type I
Thus, average arrival rate for class i is Class i has Ui and Di as upload and download
capacities Assume U1>U2>…>Un (type 1 are the “fat” ones…)
Model
Visualization of the model for n=2
Model – measuring performance
Assume (quite natural) that the bottleneck is the upload capacity i.e. no network bottleneck such as server
saturation The file uploading capacity of the entire
system is Consider the steady state:
Define as the average number of type i peers Approximation: Substitute the later in the former, in s.s.:
Model – measuring performance
In steady state, the capacity should be equal to the arrival rate (as the file size is 1) Obtain Define “share-ratio” and rewrite as
In a steady and balanced system, share-ratio should be 1
Average system download time Those two equations define the solution
space, as well as the resulting performance Feasible solutions are the set
Model – measuring fairness
Consider share-ratio as a good and natural measure of fairness
Define fairness index: Measures how equal the ratios are (if all are
the same, it equals 1) (after some work) Obtain: Also expressed in terms of upload and
download of each class
Rate strategies
General assumption: all peers maximize their upload capacity Based on experiments
We want optimal average download time T Solve the constraint problem
Use Lagrangian multiplier method
Rate strategies
Optimal average download time T that is obtained is:
We get an assignment for The system gives the “thin” peers (other
than type-1) maximum upload capacity “Thin” peers get more than they contribute Calculate the fairness index under this
solution (shown to be quite low)
Rate strategies
Now, apply the same strategy to achieve optimal fairness Then check what is the resulting
performance measure Optimal fairness is achieved when We get different assignments for Compare the two:
In terms of system performance, we have: In terms of system fairness, we have:
Entire design space
Actually those are only two of infinite solutions for assigning
The space lies on a curve Example: a system of two types with
specific capacities
Simulation results
Experiments with two-types system (with the same characteristics as the last system) Average downloading time:
Fairness:
Simulation results
Fundamental tradeoffs: Taken for the extreme values of the two
strategies:
Summery: Cannot enjoy both heavens Current BitTorrnet implementations lie
somewhere on the curve
Think about …
With regard to the first paper (service availability), how did the ~8.3 average torrent lifespan was deduced ?
Thank you (those who are still awake…)
Papers
[1] Measurements, Analysis, and Modeling of BitTorrent-like Systems Lei Guo, Songqing Chen, Zhen Xiao, Enhua Tan, Xiaoning
Ding, and Xiaodong Zhang [2] Taming the Torrent - A Practical Approach to
Reducing Cross-ISP Traffic in Peer-to-Peer Systems David R. Choffnes and Fabián E. Bustamante
[3] The Delicate Tradeoffs in BitTorrent-like FileSharing Protocol Design Bin Fan, Dah-Ming Chiu, John C.S. Lui
Recommended