28
1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi * , Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

Embed Size (px)

Citation preview

Page 1: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

1

Understanding Pollution Dynamics in P2P File

Sharing

Uichin Lee, Min Choi*, Junghoo ChoM. Y. Sanadidi, Mario Gerla

UCLA, KAIST*

IPTPS’06

Elaine

Page 2: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

2

Outline

Pollution in P2P file sharing system Related work User behavior study Analytic pollution model and its result The impact of pollution on P2P traffic load

Page 3: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

3

Pollution in P2P file sharing system

Object in p2p file sharing system is composed of two parts Meta-data (song title,artist name, length, encodin

g scheme, etc.) Content

Page 4: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

4

Pollution in P2P file sharing system

P2P file sharing system with search capability (e.g Emule, Gnutella, kazaa, etc.)

Query for song A

.

.

.

P2P network

Responses for song A Get return list of available peers

Select one source for download

Page 5: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

5

Pollution in P2P file sharing system

Definition of Pollution A file’ s actual content does’ t match its meta-data

description!

MA

CA

Clean file of song A

MA

CB

Tampering withMeta-data

MA

CA’

Tampering withContent

Page 6: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

6

Pollution in P2P file sharing system

File sharing system impact sales in music, video, games industry, television networks How pollution attack works.

Injecting a massive number of decoys into the peer-to-peer network, to reduce the availability of the targeted item

A Case 2003 Madonna’s new album

Pollution companies occurs Overpeer, Loudeye, Retspan

Page 7: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

7

Related work1. [Content Availability, Pollution and Poisoning in File Sharing Peer to

Peer Networks] , ACM conference on Electronic commerce Understanding different pollution attack impact on content availabili

ty in peer-to-peer file sharing networks

2. [Denial of Service Resilience in peer to Peer File Sharing Systems] , SIGMETRICS

Made the first attempt to model the dynamics of P2P file pollution attacks

3. [Pollution in P2P File Sharing System], INFOCOM KaZaA is severely polluted Given that polluters have limited capabilities

(bandwidth/processing power), current level of pollution is too high

Not only does a user always detect the polluted file, but he also deletes it

Page 8: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

8

Related work

This paper A more accurate model to model pollution

Polluted files indeed spread from user to user over the network

Pollution has a significant impact on P2P traffic load

Page 9: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

9

User behavior study

Goal How does user behavior impact pollution spread?

Two stages of this study Questionnaire : familiarity / usage patterns Behavior observation : awareness / slackness

30 graduate students (UCLA, KAIST)

Page 10: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

10

User behavior studyP2P Familiarity

1. Have you ever used P2P file sharing?

2. Do you frequently share files with P2P systems?

3. Do you know how to enable or disable sharing local files?

4. Do you know how popular P2P software works?

5. Do you know about multipart downloading or swarming?

Familiarity is high

Page 11: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

11

User behavior study

P2P Usage Pattern

1) Preparation Stage

qualityavailability

57%20%20% file size

Downloaddecision

2) Download Stage

frequent size-dependent

41%35%20% check later

Checkingfrequency Re-download?

yes23%file size dep.57%

3) Post-download Stage

Pollution experience

yes70%

Failed in noticing pollution

yes30%

Sharing yes43.3%

Page 12: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

12

User behavior study

Summary1. Even sophisticated P2P users sometimes fail to

recognize polluted files

2. Users do not check the quality of a downloaded file immediately after the completion of download

3. Not all users are cooperative in sharing downloaded files

4. Users make their download decisions primarily based quality of a file.

Page 13: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

13

User Behavior Observation

Mainly interested in measuring the following two parameters Awareness probability

The fraction of users who recognize pollution in a downloaded file

Slackness distribution Distribution of intervals between download completion

time and quality checking time.

Page 14: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

14

User Behavior Observation

Experiment Setup Seeding the server with genuine/polluted Mp3 files Modified P2P software to monitor user behavior Users are asked to use it and to download files After each downloaded topic , asking users about familiari

ty and if polluted or not Controlled downloading speed (50K - 1Mbps) One month

Page 15: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

15

User Behavior Observation

Pollution techniques (on MP3 files) Meta-data modification : changed names Quality degradation Incomplete file : cut (30-60 seconds beg./end.) Noise insertion: every 20 seconds Shuffled content : randomly shuffled content

20 genuine songs 5*4 = 20

Page 16: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

16

User Behavior Observation

User Awareness

0 0.2 0.4 0.6 0.8 1

Ashlee Simpson - ShadowAvril Lavigne - My Happy Ending

Frankie J - ObsessionHoobastank - The Reason

Akon - LonelyGreen Day - Boulevard of Broken Dreams

Kelly Clarkson - Since You've Been GoneMaroon5 - This Love

Black Eyed Peas - Don't Phunk With My H.Fat J oe - Get It Poppin

Carrie Underwood - Inside Your HeavenLife House -You And Me

Will Smith - SwitchMissy Elliott - Lose ControlDHT - Listen To Your Heart

Kelly Clarkston - Behind These Hazel E.

50 Cent - J ust A Lil BitGwen Stefani - Hollaback Girl

Rihanna - Pon De ReplayThe Pussycat Dolls - Don't Cha

Shuffled Content

Noise Insertion

Incomplete File

Quality degradation

Meta-data Modification

Page 17: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

17

User Behavior Observation

slackness The elapsed time betwee

n download completion and pollution checking.

Summary:(1) P2P users are lacking in pollution awareness (2) Slackness distribution shows a bimodal form.

Page 18: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

18

POLLUTION MODEL Discrete time analysis by extending the previous model and

incorporating study results Total M users in the system Only one kind of file in the system G0/B0 : initial # genuine/bogus copies Download process

1. At step k, a user (never downloaded before) downloads a file with probability sk (i.e., interest level)

2. After download, the authenticity is checked after an interval t , where t <= L (max. slackness)

3. Realizes bogus with probability pa (i.e., awareness), and delete directly; if so, he will try again with probability pr (i.e., re-download prob.), and goes back to step 2

4. Share the file with probability pc (i.e., cooperativeness)

Page 19: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

19

Pollution Model

# downloads at time step k (Nk)

Ever downloaded users (Dk)

gk (bk) : # of users currently downloading genuine(bad) copies Total Gk and Bk files are shared in the system

kkkk rsDMN )(

kkkk sDMDD )(1

kk

kkk BG

BNb

kk

kkk BG

GNg

New Trials Re-downloadsNew Trials

Page 20: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

20

Pollution Model

Total # genuine files (Gk+1)

Total # bogus files (Bk+1)

Prob. of not sharing bad files

# re-downloads at k+1

L

jjkjckkk gspgGG

111 )1(

L

jjkjDkkk bspbBB

111

L

jjkjark bsppr

111

L: max. slacknessgk: # incoming good files at kbk: # incoming bad files at ksj: prob. of checking after jpc: cooperation probabilitypa: awareness probabilitypr: re-download probability

)1)(1( caaD pppp

Max Slackness: L=3

kk-1k-2

gk-2s3

gks1gk-1s2

Total # checking at k: gk-2s3+ gk-1s2 + gks1

Page 21: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

21

Analytic Results

Metrics for measuring the efficacy of pollution Pollution level (for a given time slot)

Bk / Gk

Settings M=15,000 (total number of users) L=48 (max. slackness) sk= 1/24 (gets interested in every 24 hrs.)

pr = 1 (re-downloads always!)

pc = 0.25 (cooperativeness) Initial pollution level = 20

Page 22: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

22

Analytic Results Comparison with previous model

Where people’ re perfect at recognizing pollution

1. Polluted files indeed spread due to the lack of awareness2. Such a high level of pollution in KaZaA [7] can be explained using our model

Page 23: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

23

Analytic Results The effectiveness of increasing the initial pollution

level by the polluter vs Retry probability

The more that users are impatient, the more the polluter is successful in polluting files

Page 24: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

24

Analytic Results The effectiveness of increasing the initial pollution

level by the polluter vs User awareness

1. As awareness increases, a higher k does not providethe polluter much improvement.

2. Awareness is critical to make an effective attack (The polluter can’t controll pc, pr )

Page 25: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

25

Analytic Results

As the level of pollution increases, awareness becomes muchmore important than user cooperativeness for the growth ofgenuine copies

Number of genuine files in steady state as a function of cooperativeness and awareness

Page 26: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

26

Popular files are targets of the polluters!! Users will re-download with probability pr

At time step ts, the total # of retrial

1. In the worst case, # of re-downloads is x3 larger!!

2. 60% of the Internet traffic is P2P

IMPACT ON INTERNET TRAFFIC LOAD

st

k

L

jjkjar bspp

1 11

Page 27: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

27

Conclusion

User behavior study shows Users are not error-free in recognizing pollution Users’ slackness follows a bimodal distribution

Developed an analytical model Analytic model shows

Awareness is one of the key factors in pollution dynamics

Pollution has a great impact on the P2P traffic loads

Page 28: 1 Understanding Pollution Dynamics in P2P File Sharing Uichin Lee, Min Choi *, Junghoo Cho M. Y. Sanadidi, Mario Gerla UCLA, KAIST * IPTPS’06 Elaine

28

Reference1. [Content Availability, Pollution and Poisoning in File S

haring Peer to Peer Networks] , ACM conference on Electronic commerce

2. [Denial of Service Resilience in peer to Peer File Sharing Systems] , SIGMETRICS

3. [Understanding Pollution Dynamics in P2P File Sharing], INFOCOM

4) [Understanding Pollution Dynamics in P2P File Sharing], IPTPS ’06

Part of power point from author (with mark)