Upload
martina-hunter
View
218
Download
0
Embed Size (px)
Citation preview
1
Understanding Pollution Dynamics in P2P File
Sharing
Uichin Lee, Min Choi*, Junghoo ChoM. Y. Sanadidi, Mario Gerla
UCLA, KAIST*
IPTPS’06
Elaine
2
Outline
Pollution in P2P file sharing system Related work User behavior study Analytic pollution model and its result The impact of pollution on P2P traffic load
3
Pollution in P2P file sharing system
Object in p2p file sharing system is composed of two parts Meta-data (song title,artist name, length, encodin
g scheme, etc.) Content
4
Pollution in P2P file sharing system
P2P file sharing system with search capability (e.g Emule, Gnutella, kazaa, etc.)
Query for song A
.
.
.
P2P network
Responses for song A Get return list of available peers
Select one source for download
5
Pollution in P2P file sharing system
Definition of Pollution A file’ s actual content does’ t match its meta-data
description!
MA
CA
Clean file of song A
MA
CB
Tampering withMeta-data
MA
CA’
Tampering withContent
6
Pollution in P2P file sharing system
File sharing system impact sales in music, video, games industry, television networks How pollution attack works.
Injecting a massive number of decoys into the peer-to-peer network, to reduce the availability of the targeted item
A Case 2003 Madonna’s new album
Pollution companies occurs Overpeer, Loudeye, Retspan
7
Related work1. [Content Availability, Pollution and Poisoning in File Sharing Peer to
Peer Networks] , ACM conference on Electronic commerce Understanding different pollution attack impact on content availabili
ty in peer-to-peer file sharing networks
2. [Denial of Service Resilience in peer to Peer File Sharing Systems] , SIGMETRICS
Made the first attempt to model the dynamics of P2P file pollution attacks
3. [Pollution in P2P File Sharing System], INFOCOM KaZaA is severely polluted Given that polluters have limited capabilities
(bandwidth/processing power), current level of pollution is too high
Not only does a user always detect the polluted file, but he also deletes it
8
Related work
This paper A more accurate model to model pollution
Polluted files indeed spread from user to user over the network
Pollution has a significant impact on P2P traffic load
9
User behavior study
Goal How does user behavior impact pollution spread?
Two stages of this study Questionnaire : familiarity / usage patterns Behavior observation : awareness / slackness
30 graduate students (UCLA, KAIST)
10
User behavior studyP2P Familiarity
1. Have you ever used P2P file sharing?
2. Do you frequently share files with P2P systems?
3. Do you know how to enable or disable sharing local files?
4. Do you know how popular P2P software works?
5. Do you know about multipart downloading or swarming?
Familiarity is high
11
User behavior study
P2P Usage Pattern
1) Preparation Stage
qualityavailability
57%20%20% file size
Downloaddecision
2) Download Stage
frequent size-dependent
41%35%20% check later
Checkingfrequency Re-download?
yes23%file size dep.57%
3) Post-download Stage
Pollution experience
yes70%
Failed in noticing pollution
yes30%
Sharing yes43.3%
12
User behavior study
Summary1. Even sophisticated P2P users sometimes fail to
recognize polluted files
2. Users do not check the quality of a downloaded file immediately after the completion of download
3. Not all users are cooperative in sharing downloaded files
4. Users make their download decisions primarily based quality of a file.
13
User Behavior Observation
Mainly interested in measuring the following two parameters Awareness probability
The fraction of users who recognize pollution in a downloaded file
Slackness distribution Distribution of intervals between download completion
time and quality checking time.
14
User Behavior Observation
Experiment Setup Seeding the server with genuine/polluted Mp3 files Modified P2P software to monitor user behavior Users are asked to use it and to download files After each downloaded topic , asking users about familiari
ty and if polluted or not Controlled downloading speed (50K - 1Mbps) One month
15
User Behavior Observation
Pollution techniques (on MP3 files) Meta-data modification : changed names Quality degradation Incomplete file : cut (30-60 seconds beg./end.) Noise insertion: every 20 seconds Shuffled content : randomly shuffled content
20 genuine songs 5*4 = 20
16
User Behavior Observation
User Awareness
0 0.2 0.4 0.6 0.8 1
Ashlee Simpson - ShadowAvril Lavigne - My Happy Ending
Frankie J - ObsessionHoobastank - The Reason
Akon - LonelyGreen Day - Boulevard of Broken Dreams
Kelly Clarkson - Since You've Been GoneMaroon5 - This Love
Black Eyed Peas - Don't Phunk With My H.Fat J oe - Get It Poppin
Carrie Underwood - Inside Your HeavenLife House -You And Me
Will Smith - SwitchMissy Elliott - Lose ControlDHT - Listen To Your Heart
Kelly Clarkston - Behind These Hazel E.
50 Cent - J ust A Lil BitGwen Stefani - Hollaback Girl
Rihanna - Pon De ReplayThe Pussycat Dolls - Don't Cha
Shuffled Content
Noise Insertion
Incomplete File
Quality degradation
Meta-data Modification
17
User Behavior Observation
slackness The elapsed time betwee
n download completion and pollution checking.
Summary:(1) P2P users are lacking in pollution awareness (2) Slackness distribution shows a bimodal form.
18
POLLUTION MODEL Discrete time analysis by extending the previous model and
incorporating study results Total M users in the system Only one kind of file in the system G0/B0 : initial # genuine/bogus copies Download process
1. At step k, a user (never downloaded before) downloads a file with probability sk (i.e., interest level)
2. After download, the authenticity is checked after an interval t , where t <= L (max. slackness)
3. Realizes bogus with probability pa (i.e., awareness), and delete directly; if so, he will try again with probability pr (i.e., re-download prob.), and goes back to step 2
4. Share the file with probability pc (i.e., cooperativeness)
19
Pollution Model
# downloads at time step k (Nk)
Ever downloaded users (Dk)
gk (bk) : # of users currently downloading genuine(bad) copies Total Gk and Bk files are shared in the system
kkkk rsDMN )(
kkkk sDMDD )(1
kk
kkk BG
BNb
kk
kkk BG
GNg
New Trials Re-downloadsNew Trials
20
Pollution Model
Total # genuine files (Gk+1)
Total # bogus files (Bk+1)
Prob. of not sharing bad files
# re-downloads at k+1
L
jjkjckkk gspgGG
111 )1(
L
jjkjDkkk bspbBB
111
L
jjkjark bsppr
111
L: max. slacknessgk: # incoming good files at kbk: # incoming bad files at ksj: prob. of checking after jpc: cooperation probabilitypa: awareness probabilitypr: re-download probability
)1)(1( caaD pppp
Max Slackness: L=3
kk-1k-2
gk-2s3
gks1gk-1s2
Total # checking at k: gk-2s3+ gk-1s2 + gks1
21
Analytic Results
Metrics for measuring the efficacy of pollution Pollution level (for a given time slot)
Bk / Gk
Settings M=15,000 (total number of users) L=48 (max. slackness) sk= 1/24 (gets interested in every 24 hrs.)
pr = 1 (re-downloads always!)
pc = 0.25 (cooperativeness) Initial pollution level = 20
22
Analytic Results Comparison with previous model
Where people’ re perfect at recognizing pollution
1. Polluted files indeed spread due to the lack of awareness2. Such a high level of pollution in KaZaA [7] can be explained using our model
23
Analytic Results The effectiveness of increasing the initial pollution
level by the polluter vs Retry probability
The more that users are impatient, the more the polluter is successful in polluting files
24
Analytic Results The effectiveness of increasing the initial pollution
level by the polluter vs User awareness
1. As awareness increases, a higher k does not providethe polluter much improvement.
2. Awareness is critical to make an effective attack (The polluter can’t controll pc, pr )
25
Analytic Results
As the level of pollution increases, awareness becomes muchmore important than user cooperativeness for the growth ofgenuine copies
Number of genuine files in steady state as a function of cooperativeness and awareness
26
Popular files are targets of the polluters!! Users will re-download with probability pr
At time step ts, the total # of retrial
1. In the worst case, # of re-downloads is x3 larger!!
2. 60% of the Internet traffic is P2P
IMPACT ON INTERNET TRAFFIC LOAD
st
k
L
jjkjar bspp
1 11
27
Conclusion
User behavior study shows Users are not error-free in recognizing pollution Users’ slackness follows a bimodal distribution
Developed an analytical model Analytic model shows
Awareness is one of the key factors in pollution dynamics
Pollution has a great impact on the P2P traffic loads
28
Reference1. [Content Availability, Pollution and Poisoning in File S
haring Peer to Peer Networks] , ACM conference on Electronic commerce
2. [Denial of Service Resilience in peer to Peer File Sharing Systems] , SIGMETRICS
3. [Understanding Pollution Dynamics in P2P File Sharing], INFOCOM
4) [Understanding Pollution Dynamics in P2P File Sharing], IPTPS ’06
Part of power point from author (with mark)