12
Credo: Thwarting collusion attacks in peer-to-peer content distribution systems Nguyen Tran Jinyang Li Lakshminarayanan Subramanian New York University Abstract—Current peer-to-peer content distribution networks do not effectively encourage users to act as seeders; this results in suboptimal behavior as many users disconnect immediately upon retrieving files. Reputation systems can incentivize users to remain active and seed files. Unfortunately, existing reputation systems are unsatisfactory – they do not accurately capture a node’s net contribution in the face of Sybil and collusion attacks. In this paper, we present Credo, a robust credit-based reputa- tion system with a novel reputation algorithm to compute node reputation based on the set of uploads reported by peer nodes. Credo’s algorithm employs two novel techniques: credit diversity and modeling good behavior, which can limit the reputation gains of colluding adversaries effectively. Using simulations, we show that Credo is resilient against attack and provides an effective incentive, allowing nodes with more upload contributions to experience faster downloads. We have incorporated Credo in the Azureus BitTorrent client and show that overall system per- formance is significantly improved in the PlanetLab deployment. Additionally we show that our implementation scales well and is capable of handling systems with large numbers of nodes. I. I NTRODUCTION With the recent growth in demand for high-quality multi- media content, capacity requirements for content distribution networks (CDNs) have increased proportionally. Peer-to-peer content distribution is a natural low-cost option to scale distri- bution capacity. In a P2P CDN model, content providers serve content using a small number of “official” seeder nodes and rely on participating users to upload previously downloaded content to others. By aggregating the bandwidth of thousands or millions of participating users, P2P CDNs promise ex- tremely high capacity at very low costs. However, in order to reach their full potential, P2P CDNs must address the long- standing challenge of incentivizing users to upload content to others. The P2P incentive problem has been widely studied in the past. Unfortunately, popular solutions such as the tit-for-tat mechanism provided by BitTorrent [4] and variants [19], [11] are insufficient. BitTorrent only incentivizes those peers that are actively downloading the same file to upload to each other. Once a user completes a download, she has no incentive to act as a seeder and continue uploading. In practice, the distribution of content on popular torrent networks such as PirateBay is often heavy-tailed, with most files having only a few simultaneous downloaders. As a result, tit-for-tat is of little use as an incentive mechanism in these scenarios. An ideal incentive mechanism should motivate users to contribute to the P2P CDN even after completing their downloads. This is sometimes referred to as the seeder promotion problem [23]). Existing studies [16] as well as our own measurement experiments suggest that P2P CDNs can achieve significant performance boost by addressing the seeder promotion prob- lem (Section II). However, the current solutions for solving the seeder promotion problem are unsatisfactory. Popular private BitTorrent communities such as TorrentLeech and What.CD maintain an invitation-only membership. These communities keep track of members’ self-reported upload contribution, and kick out members that have failed to make the required amount of contribution. For example, in TorrentLeech, each peer must serve as a seeder for a file for 24 hours after downloading and upload at least 0.4 times of its downloaded amount. Unfortu- nately, selfish nodes can purposefully misreport their upload contributions, an attack that is becoming increasingly common in private BitTorrent communities [7]. Current defenses against such attacks are very limited and involves banning client software which can be modified to misreport information (e.g. Vuze, Deluge). One promising direction for solving the seeder promotion problem is to design a robust reputation system. In such a system, a user’s reputation score accurately captures her upload contribution. To encourage contribution, P2P CDNs can give preferential treatment to users with high reputation. Thus the more a peer contributes (in terms of its uploads) the better the service (in terms of download speed) it gets. The major challenge in designing such a reputation system is ensuring a user’s reputation score accurately reflects their upload contribution; malicious users should not be able to acquire excessively high reputation scores via collusion or Sybil attacks. In this paper, we propose Credo, a credit-based reputation system that addresses both challenges and show how it can be applied to P2P CDNs to solve the seeder promotion problem. To track a node’s contribution in Credo, a seeder collects a signed upload receipt whenever it uploads a chunk of data to another node. Credo employs a central server to period- ically aggregate upload receipts from nodes and compute a reputation score for each node based on these receipts. The central server computes the reputation score by keeping track of a credit set (C ) for each node and transfers a credti from node A’s set to B if node B has uploaded a data chunk to A, as indicated by the corresponding upload receipt. A node with many uploads will have a large number of credits while a node with many downloads has large number of debits (i.e. there are many credits of that node in others’ credit sets). A naive way of using the size of the credit set as a node’s reputation score is vulnerable to attacks. Instead, Credo measures a node’s reputation score using two steps. First, Credo models the distribution of node debits and filters the node’s credit set according to the measured distribution. Then, Credo measures the node’s reputation score by its filtered credit set’s diversity which counts no more than a few

Credo: Thwarting collusion attacks in peer-to-peer content …trandinh/publications/Credo_TR.pdf · 2011. 7. 30. · Credo: Thwarting collusion attacks in peer-to-peer content distribution

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Credo: Thwarting collusion attacks in peer-to-peer content …trandinh/publications/Credo_TR.pdf · 2011. 7. 30. · Credo: Thwarting collusion attacks in peer-to-peer content distribution

Credo: Thwarting collusion attacks in peer-to-peercontent distribution systems

Nguyen Tran Jinyang Li Lakshminarayanan SubramanianNew York University

Abstract—Current peer-to-peer content distribution networksdo not effectively encourage users to act as seeders; this resultsin suboptimal behavior as many users disconnect immediatelyupon retrieving files. Reputation systems can incentivize users toremain active and seed files. Unfortunately, existing reputationsystems are unsatisfactory – they do not accurately captureanode’s net contribution in the face of Sybil and collusion attacks.

In this paper, we present Credo, a robust credit-based reputa-tion system with a novel reputation algorithm to compute nodereputation based on the set of uploads reported by peer nodes.Credo’s algorithm employs two novel techniques:credit diversityand modeling good behavior, which can limit the reputation gainsof colluding adversaries effectively. Using simulations,we showthat Credo is resilient against attack and provides an effectiveincentive, allowing nodes with more upload contributions toexperience faster downloads. We have incorporated Credo inthe Azureus BitTorrent client and show that overall system per-formance is significantly improved in the PlanetLab deployment.Additionally we show that our implementation scales well and iscapable of handling systems with large numbers of nodes.

I. I NTRODUCTION

With the recent growth in demand for high-quality multi-media content, capacity requirements for content distributionnetworks (CDNs) have increased proportionally. Peer-to-peercontent distribution is a natural low-cost option to scale distri-bution capacity. In a P2P CDN model, content providers servecontent using a small number of “official” seeder nodes andrely on participating users to upload previously downloadedcontent to others. By aggregating the bandwidth of thousandsor millions of participating users, P2P CDNs promise ex-tremely high capacity at very low costs. However, in order toreach their full potential, P2P CDNs must address the long-standing challenge of incentivizing users to upload content toothers.

The P2P incentive problem has been widely studied in thepast. Unfortunately, popular solutions such as the tit-for-tatmechanism provided by BitTorrent [4] and variants [19], [11]are insufficient. BitTorrent only incentivizes those peersthatareactivelydownloading thesamefile to upload to each other.Once a user completes a download, she has no incentiveto act as a seeder and continue uploading. In practice, thedistribution of content on popular torrent networks such asPirateBay is often heavy-tailed, with most files having onlya few simultaneous downloaders. As a result, tit-for-tat isoflittle use as an incentive mechanism in these scenarios. Anideal incentive mechanism should motivate users to contributeto the P2P CDN even after completing their downloads. This issometimes referred to as theseeder promotionproblem [23]).

Existing studies [16] as well as our own measurementexperiments suggest that P2P CDNs can achieve significantperformance boost by addressing the seeder promotion prob-

lem (Section II). However, the current solutions for solving theseeder promotion problem are unsatisfactory. Popular privateBitTorrent communities such as TorrentLeech and What.CDmaintain an invitation-only membership. These communitieskeep track of members’ self-reported upload contribution,andkick out members that have failed to make the required amountof contribution. For example, in TorrentLeech, each peer mustserve as a seeder for a file for 24 hours after downloading andupload at least 0.4 times of its downloaded amount. Unfortu-nately, selfish nodes can purposefully misreport their uploadcontributions, an attack that is becoming increasingly commonin private BitTorrent communities [7]. Current defenses againstsuch attacks are very limited and involves banning clientsoftware which can be modified to misreport information (e.g.Vuze, Deluge).

One promising direction for solving the seeder promotionproblem is to design a robust reputation system. In sucha system, a user’s reputation score accurately captures herupload contribution. To encourage contribution, P2P CDNscan give preferential treatment to users with high reputation.Thus the more a peer contributes (in terms of its uploads)the better the service (in terms of download speed) it gets.The major challenge in designing such a reputation systemis ensuring a user’s reputation score accurately reflects theirupload contribution; malicious users should not be able toacquire excessively high reputation scores via collusion orSybil attacks. In this paper, we propose Credo, acredit-basedreputation system that addresses both challenges and showhow it can be applied to P2P CDNs to solve the seederpromotion problem.

To track a node’s contribution in Credo, a seeder collectsa signed upload receipt whenever it uploads a chunk of datato another node. Credo employs a central server to period-ically aggregate upload receipts from nodes and compute areputation score for each node based on these receipts. Thecentral server computes the reputation score by keeping trackof a credit set (C) for each node and transfers a credti fromnode A’s set to B if node B has uploaded a data chunk toA, as indicated by the corresponding upload receipt. A nodewith many uploads will have a large number of credits whilea node with many downloads has large number of debits(i.e. there are many credits of that node in others’ creditsets). A naive way of using the size of the credit set asa node’s reputation score is vulnerable to attacks. Instead,Credo measures a node’s reputation score using two steps.First, Credo models the distribution of node debits and filtersthe node’s credit set according to the measured distribution.Then, Credo measures the node’s reputation score by itsfiltered credit set’sdiversitywhich counts no more than a few

Page 2: Credo: Thwarting collusion attacks in peer-to-peer content …trandinh/publications/Credo_TR.pdf · 2011. 7. 30. · Credo: Thwarting collusion attacks in peer-to-peer content distribution

2

credits from the same node. The combination of filtering anddiversity techniques ensure that the maximum reputation ofan adversarial node is effectively bounded, and the bound isindependent of the collusion size. Moreover, an adversarialnode’s reputation score will eventually decrease after it hasdownloaded a bounded amount of data from honest nodes.

We make three contributions in this paper.

1) We propose the design of Credo. To the best of ourknowledge, Credo is the first reputation scheme thataccurately assigns nodes reputation scores which reflecttheir contributions while providing quantifiable guaran-tees for resistance against Sybil and collusion attacks.

2) We present an analysis of Credo’s security guaranteesin the face ofk colluding adversarial nodes each con-trolling s Sybil identities. Each adversary’s reputationscore is upper bounded bymin{λ ·k · s,O(s · d̄)} whereλ is a constant and̄d is the average debits of honestnodes. Furthermore, an adversarial node can downloadat mostO(s · d̄) data chunks before its reputation scoreis diminished.

3) We show our implementation of Credo scales to handle alarge number of peer nodes. Simulation results based onthe transfer log of an existing P2P CDN show that Credocan accurately capture node contribution and defendagainst attacks in practical workloads.

II. SEEDER PROMOTION PROBLEM

The importance of seeder promotion.BitTorrent’s tit-for-tat mechanism only motivates nodes that are currentlydownloading the same file to upload to each other. Once anode finishes downloading, there’s no incentive for it to stayonline and upload to others. Despite such lack of incentive,BitTorrent works fairly well in practice when there are a fewaltruistic high-capacity nodes available [20]. Thereforeit isworth investgating how much practical performance gain isachievable if nodes are incentivized to stay online and becomeseeders after completing downloads. One way to quantify theimpact of seeder promotion is to compare the performance ofpublic BitTorrents to that of private BitTorrents. While thereis no incentive for seeding in a public BitTorrent, a privateBitTorrent demands a certain level of upload contribution fromeach node in order to maintain the node’s membership.

We measured the download speed in a public BitTorrentsystem (PirateBay) as well as three private BitTorrent commu-nities (Demonoid, What.CD and TorrentLeech). Of the threeprivate BitTorrent communities, What.CD and TorrentLeechdemand a certain minimum level of upload contribution (orsharing ratio) from each member in order to stay in thecommunity while Demonoid has no such requirement. Foreach BitTorrent community, we joined 100 recently activeswarms in that community and measured the download speedsof nodes in the swarm by periodically connecting to a nodeto obtain its download progress.

Figure 1 shows the cumulative distribution of observeddownload speed in all four BitTorrent communities. As we

0

0.2

0.4

0.6

0.8

1

1 10 100 1000 10000

CDF

Download speed [kbps]

PirateBayDemonoidWhat.CD

TorrentLeech

Fig. 1. CDF of achieved download speeds in various BitTorrentcommunities. The download speeds in communities that incentivizedseeding (TorrentLeech and What.CD) are significantly better thanthat of the public BitTorrent (PirateBay) and the private community(Demonoid) with no seeding incentives.

can see, private communities which incentivizes upload con-tribution (i.e. TorrentLeech and What.CD) achieve 6-7× themedian download speed of PirateBay (a public BitTorrent) andDemonoid (a private community with no upload incentive). Wealso found that the ratio of seeders to leechers in TorrentLeechand What.CD is more then 10× those in PirateBay andDemonoid; this difference in available seeders is a largecomponent of the performance gap between the private andpublic systems. Our observations are similar to another recentmeasurement study on a different set of private BitTorrentcommunities [16]. Based on these results, we hypothesizethat P2P CDNs can achieve significant performance gain byaddressing the seeder promotion problem.

Fairness incentivizes contribution.How to motivate selfishnodes to act as seeders? We assume that the utility of eachpeer is characterized by its average download speed and thegoal of each peer is to employ a strategy that maximizes itsdownload speed. It is worth pointing out that a selfish peer isnot necessarily interested in minimizing its upload cost: eachuser has a different threshold for acceptable upload cost. Thismodel of selfish peers is similar to that proposed in [11]. AP2P CDN is considered asfair if the more a peer contributesto the system (i.e. uploads) relative to its consumption (i.e.downloads), the better average download speed it experienceswhen competing with other downloaders. We hypothesize that,in a fair P2P CDN, nodes are motivated to act as seeders toachieve better download speeds in return.

Achieving fairness is more flexible than enforcing a specificsharing ratio as done in private BitTorrent communities suchas TorrentLeech. With a specific sharing ratio, a peer has noincentives to upload more than its required sharing ratio. Worseyet, a peer unable to meet the sharing ratio requirement forvarious non-selfish reasons (e.g. it is seeding unpopular files orhas small upload capacity compared to its download capacity)risks getting expelled from the system.

Our fairness notion maps naturally to a reputation systemwhere each peer’s reputation score reflects its net contribution(i.e. its uploads minus its downloads) and each peer allocatesits upload capacity to active downloaders according to theirreputation scores. However, this reputation based approach

Page 3: Credo: Thwarting collusion attacks in peer-to-peer content …trandinh/publications/Credo_TR.pdf · 2011. 7. 30. · Credo: Thwarting collusion attacks in peer-to-peer content distribution

3

A

B

C

D5

32

E

E'100

B10

C

2

(a) (b)

Fig. 2. The upload graph formed by receipts, e.g. the link C→B ofweight 2 denotes thatB has uploaded 2 data chunk toC. The naivemethod of measuring each node’s net contribution by the differencebetween a node’s weighted incoming links and outgoing linksisvulnerable to the Sybil attack. For example, in (b),E instructs itsSybil identityE′ to generate many upload receipts forE.

faces two practical challenges: (a) how to capture a peer’snet contribution? (b) how to defend against attacks on thereputation system itself? The goal of our work is to design areputation system that addresses both these challenges.

III. A PPROACH

At a high-level, Credo keeps track of each node’s uploadcontribution using signed “receipts”: in exchange for down-loading a data chunk from seederB, nodeA generates a signedreceipt (A → B) and gives it toB. Credo employs a centralserver to periodically aggregate upload receipts collected byall nodes and compute each node’s reputation score based onthese receipts. To promote contribution, each seeder prefer-entially selects nodes with higher reputation scores amongcompeting download requests to upload data to.

Credo performs Sybil-resilient member admission control toprevent an attacker from joining the system with an arbitarilylarge number of Sybil identities. However, each attacker canstill participate in the system with a few (s) Sybil identities tomanipulate Credo’s reputation mechanism. The main contribu-tion of Credo is a centralized reputation algorithm that calcu-lates a reputation score for each node based on the collection ofupload receipts. The algorithm achieves quantifiable securityguarantees for its defense against both Sybil and collusionattacks. In a Sybil attack, an adversarial node uses the fewSybil identities under its control to boost its own reputationscore without performing actual uploads. Furthermore, severalattackers can collude together with their respective Sybilidentities to boost each others reputation score. Next, wedescribe the two key ideas in Credo’s reputation algorithm.

1. Measuring contribution using credit diversity. The setof receipts from all nodes form the upload graph. Figure 2shows two example graphs where the linkC → B with weight2 indicates thatB has uploaded 2 data chunks toC. Thenaive method of measuring a node’snet contribution is tocalculate the difference between a node’s weighted incominglinks and its outgoing links. For example, in Figure 2(a),A’s net contribution is calculated as5. The naive methodaccurately captures the net contribution ofhonestnodes butis extremely vulnerable to Sybil and collusion attacks. Forexample, in Figure 2(b), attackerE instructs its Sybil identityE′ to generate 100 upload receipts forE, thereby increasingthe net contribution ofE to 90 without having to perform any

actual uploads. How can we measure a node’s net contributionwhile remaining resilient to this type of attack?

Intuitively, nodeA’s upload contribution in Figure 2(a) ismore “diverse” than that ofE in Figure 2(b) because thenode (D) that A has uploaded to has also uploaded to othernodes while the node (E′) that E has uploaded to has madeno contribution. We capture this notion of diversity usingthe concept of credit transfers. Credo’s reputation algorithmmaintains two quantities for each node: (1) the node’s creditset (C), represented as a multi-set. (2) the node’s debits (d),represented as a number. For example,CA = {C : 1, D : 2}indicates thatA’s credit set consists of 1 credit issued byCand 2 credits issued byD. The algorithm processes everylink in the upload graph by performing a “credit transfer”.For example, to processA → B, the algorithm removes onerandomly chosen credit fromA’s credit set and adds it toB’sset. The issuer of a credit does not change as it is transferredto another credit set. IfA’s credit set is empty, the algorithmincrementsA’s debits (dA) by 1 and adds a new credit issuedby A to B’s credit set.

The credit-based processing causes nodes with more “di-verse” contribution to have credit sets with more distinctcredits. For example,A’s credit set in Figure 2(a) may becomeCA = {C : 1, B : 3, D : 1} while E’s credit set in Figure 2(b)is simplyCE = {E′ : 100}. We measurecredit diversityby thenumber of distinct issuers in a credit set. However, since a nodemight repeatedly upload to the same node, we countλ > 1credits from each distinct issuer. The defaultλ value is 3. Forexample, forCA = {C : 1, B : 3, D : 1}, diversity(CA) = 5.For CE = {E′ : 100}, diversity(CE) = 3.

A node’s reputation is calculated based on its credit diversityand its debits:

rep = diversity(C)− d (1)

Credit diversity limits the maximum reputation gain of Sybilattacks. If an adversarial node has onlys Sybil identities, itsmaximum reputation score is onlyλ · s without performingany uploads. Moreover, even if a set ofk adversaries eachwith s Sybil identities collude, the reputation score of eachadversarial node is bounded byλ · k · s.

Credit diversity may underestimate the contribution of anhonest node if it repeatedly performs many uploads to an-other node who has done little contribution itself, a behaviorindistinguishable from that of an adversarial node launchingSybil attack. Fortunately such a scenario is unlikely to occur –honest nodes preferentially upload to the downloaders withthehighest reputations, and as high reputation nodes have largeand diverse credit set themselves, an honest seeder will able toincrease its credit diversity by obtaining upload receiptsfromthem, as our later evaluations demonstrate (Section VII).

Existing graph-based reputation schemes are based onEigenvalue [2], [9] or max-flow computations [6], [3]. How-ever, with Eigenvalue [2], [9] methods, a node’s reputationscore doesnot necessarily reflect its net contribution. Inparticular, a node can improve its reputation score more byuploading the same amount of data to those nodes with

Page 4: Credo: Thwarting collusion attacks in peer-to-peer content …trandinh/publications/Credo_TR.pdf · 2011. 7. 30. · Credo: Thwarting collusion attacks in peer-to-peer content distribution

4

E'

F'

E

F

3

3

3

3

A

6

3

3

Fig. 3. Two adversarial nodes (E,F ) collude by obtaining from eachof their respective Sybil identities (E′,F ′) 3 upload receipts. AsEdownloads from honest nodeA (dotted linkE → A), E can replenishits diminished credit set by obtaining new upload receipts from Sybils(dotted linksE′

→ E, F ′→ E).

higher reputations. Credo employs a similar credit transfermechanism as currency systems [18], [26], [1], [23], but itsuseof credits is very different. Currency systems give the samedownload service to all nodes who possess currency tokens anddeny service to “bankrupted” nodes. By contrast, Credo useseach node’s credit set to calculate its reputation score andgivespreferential service to those nodes with higher reputations.

2. Credit filtering based on good behavior. Althoughusing credit diversity limits the maximum reputation scoreof each adversarial node, the maximum reputation score stillincreases if the size of colusion (k) increases. Even worse,each colluding adversarial node can still maintain its maximumreputation score without any contribution no matter how muchdata it has downloaded from others. That is because everySybil node can issue arbitrarily many upload receipts toreplenish the credit set of an adversarial node. For example,in Figure 3, adversarial nodesE and F collude with theirrespective Sybils to achieve a credit set of diversity6. If Edownloads 6 units of data from honest nodeA (shown by thedotted link E → A), E’s credit set should ideally decreaseby 6. However,E could easily request more upload receiptsfrom the SybilsE′ andF ′ (shown by dotted linksE′ → E,F ′ → E) to maintain a credit diversity of2λ at all times nomatter how much data it downloads from honest nodes.

To address these limitations, Credo’s reputation algorithmexplicitly models the typical behavior of honest nodes. Moreconcretely, the algorithm measures the distribution of debitsof all nodes. The debit distribution as observed in the creditset of an honest node should not deviate too much fromthe measured distribution. We use such knowledge to filter anode’s credit set to obtain a subset of credits,C′ ⊂ C, such thatthe observed debit distribution inC′ conforms to the measureddistribution. We augment Equation 1 to use the filtered set(C′) for calculating credit diversity. This filtering techniqueforces rational adversarial nodes to limit the numbers ofupload receipts their Sybils generate so that credits from theirSybils do not get filtered. In other words, the numbers ofupload receipts generated by Sybils are similar to that fromhonest nodes. As a result, the amount of “free“ credit anadversarial node gets from Sybils is bounded byO(s · d̄)which is independent of the collusion size (k), where d̄ isthe average debit of honest nodes. Moreover, an adversarialnode’s reputation score goes down after it uses up the ”free“

credits from Sybils.

IV. CREDO DESIGN

In this section, we first describe the system architectureincluding how nodes generate upload receipts and how reputa-tion scores are utilized. Next, we explain how Credo computesnode reputations based on aggregated upload receipts.

A. System Architecture

The Credo system consists of a trusted central server as wellas a large collection of peer nodes. The central server performsSybil-resilient node admission control and is responsibleforaggregating upload receipts and computing reputation for allpeers. A peer node may act as aseederwho stores a completefile and uploads chunks of it to others. A peer node may alsoact as aleecherto download missing file chunks from otherseeders or leechers. Credo focuses on the interaction betweenseeders and leechers which constitute the majority (> 80%)of data transfers [16] and lets the normal BitTorrent protocolhandle data exchange among leechers.

Node admission: The central server admits a new nodeinto the system by generating a public/private key pair for theadmitted node. In order to prevent an attacker from joining thesystem with an arbitrarily large number of Sybil identities,the central server employs existing Sybil-resilient admissioncontrol methods such as schemes based on a fairly strongtype of user identity such as credit card numbers or cellphonenumbers, or algorithms based on the social network amongusers [28], [24]. Sybil-resilient admission control cannot pre-vent an adversarial node from joining the system but limit eachadversarial node to a small number (s) of Sybil identities.

Obtain upload receipts for seeding: A seeder gets anupload receipt after uploading one data chunk (of size up to1MB) to a leecher. For example, if seederA has uploaded toB, A obtains the receipt of the form [A ← B, SHA1(data),ts] signed byB’s private key. SHA1(data) is the hash of thedata block being uploaded fromA to B and ts is the currenttimestamp according toB.

A leecher might refuse to generate the required uploadreceipt after downloading data from a seeder. To deter suchbehavior, we borrow the fair-exchange protocol from BARGossip [13]. Specifically, to upload data toB, seederA firsttransfers the encrypted data chunk toB and then givesB thedecryption key only ifA receives a valid upload receipt fromB. The symmetric key used byA to encrypt data is uniquelydetermined byA’s private key (PrvA) and the content hash ofthe data chunk (SHA1(data)). In addition to the encrypted data,A also includes a signed tuple [SHA1(data),SHA1(encrypteddata)] to bind its words that the encrypted data correspondto the data chunk being requested.B can only obtain thedecryption key after sending the proper upload receipt to A.In the case thatB does not receive any valid decryptionkey from A after it has sent the upload receipt,B sendsthe upload receipt to the central server. Since the centralserver knows all nodes’ private key, it can re-generate thedecryption key based onA’s private key and SHA1(data)

Page 5: Credo: Thwarting collusion attacks in peer-to-peer content …trandinh/publications/Credo_TR.pdf · 2011. 7. 30. · Credo: Thwarting collusion attacks in peer-to-peer content distribution

5

and give it toB. If seederA uploads garbage data toB,B will report A’s misbehavior to the central server with averifiable proof including the encrypted data chunk and thesigned [SHA1(data),SHA1(encrypted data)].

Aggregate upload receipts:Every peer node periodicallytransfers its newly received upload receipts to the centralserver. Since all honest nodes should verify every receivedupload receipt, the central server only samples a small fraction(e.g. 10%) of aggregated receipts to check their validity.Upon detecting an invalid receipt, the server can punish thecorresponding seeder for presenting the bogus receipt byreducing its reputation score or suspending its membership.

The central server computes a reputation score for eachnode every few hours (default is four). Only upload receiptsgenerated in the lastτ time period are used for reputationcomputation. We pickτ to be three weeks, a period shortenough so that a node is motivated to continuously contributeto the system and yet is also long enough for a node’s pastcontribution to affect its current reputation score.

Each node can request a signed reputation certificate fromthe central server indicating its current reputation score. Aleecher presents the reputation certificate as part of it downloadrequest to a seeder. Each seeder in Credo designates a smallnumber of upload slots to serve a few leechers at a time.When there are more download requests than upload slots, theseeder picks those leechers with top reputation scores amongall competing requests to serve.

B. Credit-based reputation computation

The most interesting component of Credo is the algorithmused by the central server to compute node reputation based onthe collection of aggregated upload receipts. As summarizedearlier in Section III, the algorithm processes the upload graphby performing credit transfers between nodes. The algorithmresists against Sybil and collusion attacks by filtering eachnode’s credit set according to modeled honest behavior andquantifying a node’s upload contribution by its credit diversity.

1) Credit transfer: The set of upload receipts form adirected graph with weighted links. A cycle in the graph withthe same weight on each link represents a fair exchange amongthose nodes. Therefore, the algorithm first prunes the graphbyremoving such cycles. For example, if there exist linkA→ Bwith weight 5, linkB → C with weight 2, and linkC → Awith weight 2, the pruned graph only contains linkA → Bwith weight 3. Since all cycles are removed, the pruned graphbecomes a direct acyclic graph (DAG).

The algorithm processes the graph in the topological sortorder, i.e. a node is processed only if all of its predecessorswere processed. For example, nodes in the graph of Figure 2(a)is processed in the order C, B, D, A and the ordering ofprocessing for Figure 2(b) is{E’,C}, E, B. All nodes initiallyhave an empty credit set (C = ∅) and zero debits (d = 0).Each node is processed by examining all links pointing toit. To process linkA → B of weight w, the algorithm firstchecks if the number of credits inCA is more thanw. If itis the case, we subtractw randomly chosen credits from (CA)

and adds them toCB. If that is not the case, then we makeup for the differencex by addingx credits issued by A toCB and incrementing A’s debits byx. In Figure 2(a), C endsup with credit setCC = ∅ and dC = 2 and B has credit setCB = ∅ anddB = 1. In Figure 2(b), B ends up with credit setCB = {E′ : 10, C : 2} anddB = 0. The algorithm continuesuntil it has processed all links in the DAG.

After processing the graph, to calculate a node’s reputationscore, the algorithm first filters its credit setC to obtain a setof “good” credits,C′ ⊂ C. It then computes the diversity ofC′ to bound the maximum reputation gain of collusion andSybil attacks. When calculatingdiversity(C′), we count nomore thanλ credits from the same node. The final reputationscore is calculated asdiversity(C′) − d. In the next section,we explain how the filtering step is performed to mitigatesustained collusion and Sybil attacks.

2) Credit filtering: The goal of credit filtering is to choosea subset of creditsC′ such that the distribution of the debitsfor the issuers of credits inC′ approximates the overall debitdistribution of honest nodes in the system.

Computing the overall debit distribution: Let X be therandom variable of the debit of a node chosen randomly amongthose nodes with positive debits. Since a few Sybil identitiesmay skew the distribution ofX with extremely large debits,we use a truncated distribution ofX that excludes a smallfraction (δ) of nodes with the most debits. As a result, as longas the set of colluding Sybil identities do not exceedδ of allnodes, they cannot affect the measured distributionX .

Let Z be the random variable of the debit of the issuingnode for a randomly chosen credit in the credit sets of allnodes. We model an honest node’s credit set as a collection ofrandomly chosen credits in the system. Thus, we expect thedebit distribution corresponding to collection of creditsin anhonest node’s credit set to approximate the distribution ofZ.

We can derive the distribution ofZ from that of X asfollows,

Pr(Z = x) =Pr(X = x)x

E(X)(2)

,where the linear scale factor accounts for the fact that nodeswith higher debits issue proportionally more credits.

The algorithm represents theZ-distribution using a setof probability density bounds that correspond tom bins, asshown in Figure 4. Let the range of thei-th bin be[bi, bi+1).The ranges of the bins are chosen to so that the size ofsuccessive bins increases exponentially, i.e.bi+1

bi= γ where

γ is a small constant bigger than 1. Our security boundfor Credo’s collusion resilience is dependent on the choiceof γ (Section V). The probability density of thei-th bin iscalculated asPr(bi ≤ Z < bi+1). Let pi =

Pr(bi≤X<bi+1)·biE(X) .

We can see thatpi is the lower bound ofPr(bi ≤ Z < bi+1),because:

Pr(bi ≤ Z < bi+1) =

bi≤x<bi+1Pr(X = x) · x

E(X)

The set of lower boundspi is used to ensure that theobserved debit distribution for credits in the filtered credit set

Page 6: Credo: Thwarting collusion attacks in peer-to-peer content …trandinh/publications/Credo_TR.pdf · 2011. 7. 30. · Credo: Thwarting collusion attacks in peer-to-peer content distribution

6

probability density

Pr(Z=x)

debits of the issuer of a credit

b0 b1 b2 b3

p1

p0

p2

# of credits in bin

debits of the issuer of a credit in credit set C

p1

p0

p2

b0 b1 b2 b3

after

before filtering

Fig. 4. Credo models the behavior of honest nodes withZ-distribution (Z is the random variable corresponding to the debitof the issuer of a randomly chosen credit). Credo representsZ-distribution usingm bins with exponentially increasing sizes.

Fig. 5. Credo limits sustained collusion attack using thepi test.In the example, the grey bars correspond toC of an adversarialnode. The white bars correspond to the resultingC

′ that fitsZ-distribution (dotted lines).

does not deviate too much from the overallZ-distribution. Inparticular, letZ ′ be the random variable of the debit of theissuing node for a random credit in a filtered credit setC′, itmust satisfyPr(bi ≤ Z ′ < bi+1) ≥ pi for all i.

Filter credit set using Z-distribution: To extract a subset ofcreditsC′ whose debit distribution matches theZ-distribution,the algorithm chooses credits forC′ so that the fraction ofcredits in thei-th bin exceedspi. A credit is classified to thei-th bin if its issuer has a debit value within[bi, bi+1).

When the Syblis issued abnormal large amount of credits,the credit set of an adversarial node consists of an “unusually”large number of credits in bins with largei’s . This causes thefraction of credits in bins with smalli’s to be lower than therequired lower bound. The filtering step will evict credits inthe bin of largei’s in order to increase fraction of credits inbins with smalleri’s. Figure 5 gives an example. As can beseen, the original credit setC consists of many credits in binscorresponding to issuers with large debits (the high grey barat the rightmost side). After filtering, the number of creditsaccepted in that range is significantly reduced (the white barat the rightmost side) according to the expected lower boundfor each bin (the dotted bins).

The filtering process proceeds as follows. Letci be thenumber of credits classified to thei-th bin. When filteringthe credit set to arrive atC′, the algorithm ensures that:

ci|C′| · pi

≥ 1 ∀i ∈ [0,m) (3)

We start with the original credit setC and check for the validityof the test specified by Equation 3. If the test fails with thei-th bin having the highest value of ci|C′|pi

, we remove onerandom credit from thei-th bin. We repeat this process untilthe test passes or there is no more credit. The remain formsthe filtered setC′.

V. SECURITY PROPERTIES

In this section, we present analysis results to quantify howCredo limits the maximum reputation score of adversarialnodes and prevents sustained collusion attacks. Colludingadversarial nodes exchange upload receipts issued by theirSybil identities (see example in Figure 3). We assume thatcolluders are self-interested individuals: they divide the Sybil-issued receipts among themselves so that each adversarial nodeis benefitted equally from the collusion. For the simplicityof

discussion, we only show the analysis for the scenario whereadversarial nodes do not contribute any uploads to the system.

Observation 5.1:Suppose there arek self-interested col-luding adversarial nodes, each withs Sybil identities. Credolimits the maximum reputation of an adversarial node to bemin{λ·k ·s, s·γ ·d̄}, whered̄ is the average debits of an honestnode. Moreover, the average number of data chunks that anadversarial node can download with the maximum reputationscore is at mosts · γ · d̄.

Proof: Since an adversarial node has no upload contribu-tion, the credits in its credit set belong to Sybil identities inthe collusion group. Hence, the maximum credit diversity ofk colluding adversaries is bounded byλ · k · s.

Next, we prove the bound on the maximum number ofcredits in an adversarial node’s credit set which do not getfiltered. Let X ′ be the random variable of the debit of aSybil identity and letZ ′ be the random variable of thedebit of the issuer (a Sybil identity) of a randomly chosencredit among the credits of adversarial nodes. We know thatPr(Z ′ = x) = Pr(X′=x)x

E(X′) . Since adversarial nodes divide theupload receipts issued by Sybils among themselves, the debitdistribution for each adversary’s credit set can be approximatedby the overall distributionZ ′.

The filtering process ensures that the filtered credit set ofan adversary passes the set ofpi tests, i.e.:

pi < Pr(bi ≤ Z ′ < bi+1)

<Pr(bi ≤ X ′ < bi+1) · bi+1

E(X ′)(4)

Substitutingpi = Pr(bi≤X<bi+1)·biE(X) into Inequality 4 and

re-arranging sides, we obtain:

E(X ′) ≤bi+1

biE(X)

Pr(bi ≤ X ′ < bi+1)

Pr(bi ≤ X < bi+1)

= γ · E(X) ·Pr(bi ≤ X ′ < bi+1)

Pr(bi ≤ X < bi+1)(5)

In Inequality 5,γ is determined by the number of chosenbins (m) such thatγ = bi+1

bifor all i. Moreover, Inequality 5

holds true for alli. As the last step of simplification, we use theproperty that for any given positive numbersa, b1, c1, b2, c2, ifa ≤ b1

c1anda ≤ b2

c2, thena ≤ b1+b2

c1+c2. Applying this observation

to Inequality 5 for alli, we obtain:

E(X ′) ≤ γ · E(X) (6)

Page 7: Credo: Thwarting collusion attacks in peer-to-peer content …trandinh/publications/Credo_TR.pdf · 2011. 7. 30. · Credo: Thwarting collusion attacks in peer-to-peer content distribution

7

Because the total number of credits fromk·s Sybil identitiesis k ·s ·E(X ′), each adversarial node has at mosts ·E(X ′) ≤s · γ · E(X) credits in its credit set. By definition,E(X) isthe expected debits of nodes after excluding thoseδ · n nodeswith the most debits,E(X) ≤ d̄ whered̄ is the expected debitsof an honest node with positive debit. Thus, each adversarialnode has at mosts · γ · d̄ credits issued by the Sybils. Thisbound is independent of the collusion size (k). As a result,the reputation of an adversarial node will drop afters · γ · d̄downloads.

It is interesting to note that, for a given range[b0, bm] ofthe distribution ofX , the system parameter of the number

of bins (m) uniquely determinesγ. Specifically,γ = m

bmb0

.Therefore,γ decreases as we increase the number of bins (m),improving the bound on sustained collusion attacks. However,whenm is too large, we also risk filtering out too many creditsfrom a honest node’s credit set unnecessarily.

VI. I MPLEMENTATION

We have built the central server and the client node imple-mentation using Java.Central server: Our server implementation consists of3000+lines of codes. The implementation logs upload receipts re-ceived from client nodes to disk. Every4 hours, the serverreads all collected receipts from disk and invokes the reputa-tion algorithm to compute node reputation. The reputation cal-culation algorithm employs12 concurrent threads to achievespeedup on multicore machines.Credo client: Our client implementation is based on theopen-source Azureus BitTorrent implementation. A Credoclient can engage in the original BitTorrent protocol withother BitTorrent clients. When two Credo clients first meet,they exchangeCredoHandshakemessages (a new messagetype that we added to the existing BitTorrent protocol). ACredoHandshakeconsists the node public key certificate andits reputation score signed by the central server.

After finishing handshake, if both peers are leechers, theyuse the normal BitTorrent protocol to exchange data blocksamong themselves. Otherwise, the seeder chooses the leecherwith the highest reputation to serve. Specifically, we modifiedthe two functionscalculateUnchokesand getImmediateUn-chokesin SeedingUnchoker.java to pick the leecher with thehighest reputation to unchoke.

Once a leecher is unchoked, it sends aCredoRequestmessage to request a set of data blocks. The total size of datablocks requested is less than1MB (the default data size perupload receipt in Credo). The seeder uploads encrypted datato the leecher. After finishing downloading, the leecher sendsa CredoPaymessage with the required upload receipt. Finally,the seeder sends back aCredoKeymessage that contains thedecryption key for the data. Every hour, every seeder sends itscollection of receipts received in the last hour to the centralserver.

VII. E VALUATION

This section evaluates whether Credo reputation’s reputationscheme gives the right incentive for nodes to contribute in theface of collusion and Sybil attacks. Our key results are:

• Nodes with higher net contribution achieve faster down-loads in Credo. Thus, Credo gives nodes an incentive tocontribute as seeders.

• Colluders who do not upload any data to the systemhave bounded maximum reputation scores. Furthermore,as they download data from honest nodes, their reputationscores decrease.

• The implementation of Credo incurs reasonable trafficand computation overhead and can potentially scale tohandle a large number of peer nodes.

We use a combination of simulations and experiments withour prototype implementations to demonstrate these results.

A. Simulations

1) Simulation setup:We simulate a network of60000 nodesfor a 3 month period. We set the upload speed limit of eachnode to be200KB per second. A node divides its uploadcapacity among4 upload slots of50KB per second each.The download speed of a node is5 times its upload speed,i.e. 1MB per second. We control the upload contribution ofa node by thewillingnessparameter. Whenever a seeder hasa free upload slot, it decides to upload to some leecher with aprobability proportional to its willingness. We set the willing-ness of nodes in our simulation to follow the distribution ofupload capacity as measured in [19]. Our simulations set thereceipt expiration period (τ ) to be 3 weeks.

We inject 10 files, each of200MB, to the network at thebeginning. After that, a new file is injected when all nodesthat want a particular file have finished downloading it. Inother words, there are always10 files that are being downloadin the network. Not every node wants every file: when weinject a new file, we randomly choose600 nodes to downloadthe file. The probability that a node is chosen to downloadthe file is proportional to its demand. We model two types ofdemand: 1) all nodes have identical demands, 2) the demand ofa node follows the demand distribution observed in the Mazefile sharing system [27]. We also choose10 random nodes asthe initial seeders.

2) Credo incentivizes contribution:More contributionleads to faster download: Figure 6 plots the download timeas a function of a node’s net contribution. We measure a node’snet contribution during the lastτ time period as the numberof its uploaded chunks minus the number of its downloadedchunks. We record the net contribution and download timeafter a node finishes downloading a file. For the results shownin Figure 6, we grouped the net contribution of different nodesinto bins of size50, and computed the average and standarddeviation of the download time in each bin.

Figure 6 shows that a node achieves faster download timeswhen it has a higher net contribution, for both the Mazedemand model and the identical demand model.

Page 8: Credo: Thwarting collusion attacks in peer-to-peer content …trandinh/publications/Credo_TR.pdf · 2011. 7. 30. · Credo: Thwarting collusion attacks in peer-to-peer content distribution

8

0

2000

4000

6000

8000

10000

-20000 -10000 0 10000 20000 30000 40000

Avg download time [sec]

Avg net contribution [chunks]

Maze demandIdentical demand

Fig. 6. Average download time as a function of a node’s netcontribution. The average download time decreases as a node’s netcontribution increases

-20000

0

20000

40000

60000

80000

-20000 0 20000 40000 60000 80000

Avg node reputation

Avg net contribution [chunk]

Maze demand

Fig. 7. Average node reputation as a function of a node’s netcontribution. A node’s reputation score increases linearly as its netcontribution increases.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

-20000 -10000 0 10000 20000 30000 40000

CDF

Avg net contribution [chunk]

Maze demandIdentical demand

Fig. 8. Cumulative distribution of nodes’ average net contribution.A large fraction of nodes (0.7) have negative net contribution.

0

5000

10000

15000

20000

25000

30000

0 5000 10000 15000 20000 25000 30000

Credit diversity

Size of credit set

Fig. 9. Credit diversity as a function of credit size. Credit diversityhas strong correlation with large credit size (> 5000), and weakercorrelation for nodes with smaller credit size.

-20000-15000-10000-5000

0 5000 10000 15000 20000

-20000 -10000 0 10000 20000

Avg reputation

Avg net contribution [chunk]

honest nodescolluders

Fig. 10. Reputation of colluding adversaries and honest nodes as afunction of net contribution. Adversaries vary demand in differentsimulations. As demand increases, adversaries’ reputations decrease.

0

2000

4000

6000

8000

10000

-20000 -10000 0 10000 20000 30000 40000

Avg download time [sec]

Avg net contribution [chunk]

honest nodescolluders

Fig. 11. Download time of colluding adversaries and honest nodesas a function of net contribution; colluding adversaries vary demandat different simulations.

A node’s reputation captures its net contribution: A nodeachieves faster downloads when it has a higher reputation.Figure 7 shows a node’s reputation as a function of its netcontribution. We recorded a node’s reputation score as wellas its net contribution when it finishes downloading a file. Wecomputed the average net contribution and average reputationfor each node and plot the two quantities in Figure 7. Wecan see the reputation increases linearly as a node’s netcontribution increases.

To further quantify how Credo’s reputation score accuratelycaptures a node’s net contribution, we calculate the standardA′ metric, which is defined to be the probability that thereputation of a nodeA is greater than the reputation of nodeBgiven thatA has higher net contribution for two random nodesA andB. In Figure 7,A′ = 0.95 which shows that Credo’sreputation score accurately captures a node’s net contribution.

Figure 7 is divided into two regions: negative net contribu-tion on the left, and positive net contribution on the right.We observe that the derivative of the curve on the left isapproximately1. That is because nodes with negative netcontribution rarely earn credits, i.e.diversity(C′)| ≈ 0. Thereputation of those nodes is essentially the negation of theirdebits. Figure 8 shows the cumulative distribution of nodes’net contribution. We can see that more than70% of nodes havenegative net contribution in simulation with both the Maze andidentical demand models.

Nodes with positive net contribution are rarely associatedwith positive debits. Their reputation is mainly the diversityof their credit sets. The derivative of the curve in the positive

Page 9: Credo: Thwarting collusion attacks in peer-to-peer content …trandinh/publications/Credo_TR.pdf · 2011. 7. 30. · Credo: Thwarting collusion attacks in peer-to-peer content distribution

9

contribution region of Figure 7 is only slightly smaller than 1.This shows that filtering credits and measuring credit diversitydo not significantly under-estimate the net contribution ofhonest nodes.

We also take a closer examination on how the reputationscores for nodes with positive net contributions. Figure 9shows credit diversity as the function of the size of creditset at end of the simulation. We observe that credit diversityis close to the size of the credit set when the size is greaterthan 5000. This means that credit filtering has little negativeeffect on the credit sets of honest nodes with enough uploadcontribution. When the credit set size is smaller than5000,there are some sets whose credit diversity is much smallerthan their actual size. This is because when the credit set issmall, it is difficult to approximate theZ-distribution. As theresult, many credits have been filtered.

3) Credo’s defense against colluders is robust:We evaluatehow Credo performs under the collusion attack by designating200 nodes as adversarial nodes. Each adversarial node controls2 Sybil nodes. They collude with each other to form acollusion size of600, i.e.1% of the system. In eachτ interval,the Sybil nodes issue upload receipts to optimize the amountofcredits that can pass the filtering step and achieve the boundinObservation 5.1 (Section V). The credits are divided equally toadversarial nodes. Adversaries use those credits to downloadfiles. We also vary the the number of files that adversarieswant to download in different simulations.

In Figure 10, we plot the reputation of the adversarial nodesas well as honest nodes as a function of their net contributions.Because the adversarial nodes never upload to other nodes,their net contribution are always negative. Their maximumreputation score is1200 because there are400 colluding Sybilnodes. As their demand of downloading increases, adversarialnodes require Sybils to issue more and more upload receiptswhich are eventually filtered. As a result, their reputationscores decrease.

We plot the download time as a function of the net contri-bution for adversarial nodes as well as honest nodes in thesimulations in Figure 11. As expected, the download timeincreases as adversary nodes’ demand increase, i.e. their netcontribution decreases.

4) Credo vs Eigenvector-based reputation:An alternativeway to compute nodes’ reputation at the central server isfinding the eigenvector of the contribution graph inducedfrom upload receipts. This eigenvector-based reputation wasoriginally designed for web ranking [2] but has been used tocompute node reputations in P2P systems [14], [9]. However,the reputation score of the eigenvector-based scheme, calledEB score, is not designed to capture the net contribution ofa node. We show that when used in a P2P CDN, EB scoresdo not accurately capture a node’s net contribution and is alsomore vulnerable to collusion than Credo.

For a real-world workload, we use a Maze trace collectedin December 2005 which records two weeks of upload anddownload activities of nodes in the system. A nodeA thatuploadswMB to nodeB is represented by a directed edge

0

5

10

15

20

25

-40 -30 -20 -10 0 10 20

EB reputation

Net contribution [1000 chunks]

honest nodescolluders

Fig. 12. Eigenvector-based (EB) reputation of colluding adversariesand honest nodes as a function of net contribution on Maze workload.Adversaries who are in the bottom0.2% of net contribution can getin the top 2% of reputation by colluding. EB reputation does notcapture net contribution well among honest nodes (A′

= 0.71).

from B to A in a graphG with a weightw on it. We computeEB reputation scores using the standard PageRank method.We vary the probability of resetting a random walkǫ, and findthat the value ofǫ = 0.15 produces the best balance betweendefending against collusion and capturing net contribution. Tomodel attack, we choose50 nodes that have uploaded at least50MB and have the lowest net contribution as adversaries.Each adversary introduces5 more Sybils to the graph, i.e. theyform a collusion group of300 nodes. They perform collusionby having the50 adversaries create high weight (1, 000, 000)edges to each other. Each pair of an adversary and a Sybilalso creates a pair of high weight directed edges. Figure 12shows EB reputation of honest nodes and adversaries usingthe Maze workload. Each point in the graph represents a nodein the system. As we can see, EB reputation does not capturenet contribution well (A′ = 0.71). Moreover, it has very poordefense against collusion. The50 adversaries we choose areamong the bottom0.2% in term of net contribution, but theycan get in top2% in term of reputation by colluding.

To compute Credo reputation, we feed the central serverwith receipts generated from the Maze trace. The same50adversaries collude by having the250 Sybils issue credits inan optimal way assuming that they know theZ-distribution.Figure 13 shows the reputation of honest nodes and adversariesas a function of net contribution. As we can see, Credocaptures net contribution well (A′ = 0.96). We notice thatthe reputation of nodes that have net contribution greater than5, 000 are much smaller than the size of their credit sets.That is because these nodes upload huge amount of data(5GB) in 5 days, and they upload much more than3MB toother nodes. The diversity technique penalizes these nodes’reputation because of their repeat interaction with other nodes.Nevertheless, these nodes still remain as top reputation nodes.This graph also shows that Credo’s defense against collusionis better than EB. The50 adversaries remain in the bottom22% in term of reputation even after colluding.

We also do the same comparison on a synthetic workloadfrom a 1000 nodes network. To generate workload, we createupload receipts in which the downloader is chosen randomlyand the uploader is picked according to the willingness to

Page 10: Credo: Thwarting collusion attacks in peer-to-peer content …trandinh/publications/Credo_TR.pdf · 2011. 7. 30. · Credo: Thwarting collusion attacks in peer-to-peer content distribution

10

-30-25-20-15-10-5 0 5 10 15

-40 -30 -20 -10 0 10 20

Credo reputation [1000]

Net contribution [1000 chunks]

honest nodescolluders

Fig. 13. Credo reputation of colluding adversaries and honest nodesas a function of net contribution on Maze workload. Adversaries whoare in the bottom0.2% of net contribution can only get in the top80− 60% of reputation by colluding. Credo reputation captures netcontribution well among honest nodes (A′

= 0.96).

-15

-10

-5

0

5

10

15

20

25

-10 0 10 20 30 40

Reputation [100]

Net contribution [100 chunks]

Credo (honest nodes)Credo (colluders)

PageRank (honest nodes)PageRank (colluders)

Fig. 14. Reputation of colluding adversaries and honest nodes as afunction of net contribution on synthetic workload for bothCredo andeigenvector-based reputation. Credo out-performs eigenvector-basedreputation in both capturing net contribution and defending againstcollusion.

contribute of nodes. We set the willingness to increase linearlywith nodes’ ID. We chose10 adversaries who uploads at least10 chunks among those which have the worst net contribution.Each adversary bring in2 sybils. The attack strategy is thesame as in the previous experiment. Figure 14 shows thereputation of honest nodes and adversaries in Credo andEB reputation. We scale the EB reputation so that it canfit to one graph with Credo reputation. Since the syntheticworkload has less repeat interaction and are more uniform,both reputation capture net contribution better than in theMazetrace. Still, Credo reputation is still better than eigenvector-based reputation (A′ = 0.998 vs A′ = 0.898). Credo also hasbetter defense against collusion. The adversaries remainsinbottom 30% in Credo reputation, while they can get to top1% in EB reputation.

B. Experiments using Credo’s prototype

1) Scalability of the central server:Credo central serverneeds to receive upload receipts from all nodes, verifies them,and compute nodes’ reputation. We show that our implementa-tion running on a8-core machine (2.27GHz CPUs and16GBmemory) can easily handle the traffic of260, 000 nodes inMaze network.Bandwidth: We show that the bandwidth required for the cen-

tral server to receive upload receipts is modest. The aggregatetraffic among Maze nodes in 2 weeks is280TB. A fractionof this data traffic is between leechers only which does notresult in upload receipts nor receipt upload traffic to the centralserver. In typical public and private BitTorrent communities,> 80% the traffic are between seeders and leechers [16]. Evenif all of the traffic are upload from seeders to leechers, thecentral server receives only1.4GB of uploaded receipts perday (each receipt is70-byte in size and captures a1MB chunktransfer). This means the central server only needs a modestdownload capacity of130Kbps to receive upload receipts.Storage: After verifying an upload re-ceipt, the central server stores it as a triple〈SHA1(data), uploaderID, downloaderID〉 which requires28 bytes on disk. Therefore, it requires8GB disk space tostore2 weeks upload activities in order to compute reputation.CPU: The central server needs to verify a fraction of uploadreceipts it receives. Our implementation can verify400 receiptsper second using a single thread, i.e.35 millions receipts perday. This means that a single threaded verifier can handle datatraffic that is 17.5 times bigger than observed in the Mazesystem.

The central server also needs to compute the reputationscore for every node. Our current implementation takes1hour for the central server to process the upload receiptscorresponding to 2 weeks traffic in the Maze system. Ourdefault period for updating reputation is4 hours. Within thatamount of time, our central server can compute reputation fora network of1, 000, 000 nodes whose traffic demand is similarto that observed in the Maze system.

2) Deployment on Planet lab:To examine the real perfor-mance benefits when nodes are incentivized to contribute, wecompare between two scenarios:

1) Every node runs the Credo client implementation Nodesare incentivized to stay online, and serve other nodes inorder to gain reputation after downloading a file.

2) Every node runs the original Azureus client implemen-tation. Since there is no incentive to stay online afterdownloading a file, nodes go offline immediately afterfinishing downloading the file.

We experiment with both scenarios on 210 PlanetLab nodes.We inject a file to one seeder at the beginning. We set the filesize to25MB to make the experiment finish in a reasonabletime (< 2 hours). Other nodes arrive to download the fileonce at a time every15 second. We set the application limitthroughput using the distribution in [19]. The downloadthroughput limit is5 times larger than upload throughput limitfor every node, in order to capture the asymmetry of uploadand download throughput in wide area network.

Figure 15 plots the cumulative distribution of completedownload time for each node in both scenario. We observethat both the average as well as the median download timeimprove significantly when nodes are incentivized to stayonline in scenario 1. The average download time drops from935 seconds (scenario 2) to347 seconds (scenario 1). The

Page 11: Credo: Thwarting collusion attacks in peer-to-peer content …trandinh/publications/Credo_TR.pdf · 2011. 7. 30. · Credo: Thwarting collusion attacks in peer-to-peer content distribution

11

0

0.2

0.4

0.6

0.8

1

0 1000 2000 3000 4000 5000 6000

CDF

Complete time [sec]

scenario 1scenario 2

Fig. 15. Cumulative distribution of download time two scenario:1) nodes are incentivized by Credo protocol to stay online andcontinue upload to others after downloading a file, 2) nodes go offlineimmediately after getting a file. The average and median downloadtime in scenario 1 are significantly smaller than that of scenario 2.

median download time also drops from638 seconds to172seconds.

This result shows that the aggregate capacity of the systemimproves by a factor of2.7 when nodes are incentivized tocontribute. The reason is that because the download capacityof nodes is higher than upload capacity, download capacitiesare always under-utilized when there is not enough seeders.There is only1 seeder at any instance in scenario 2. On theother hand, in scenario 2, after some nodes finish downloadingthe file, they contribute to the aggregate seeding capacity of thesystem. Other nodes can utilize their high download capacitiesto get the file faster.

VIII. R ELATED WORK

We survey existing proposals of P2P reputation and currencysystems and discuss why they do not completely address theseeder promotion problem. We also discuss existing work oncatching misbehaving nodes.

Reputation systems.Reputation systems incentivize peersto upload in order to gain reputation; peers with high reputa-tion values are promised better download performance in thefuture [9], [6], [14], [20]. Most existing reputation systemsare graph-based, where each graph edge is formed between apair of nodes that have had direct interactions. EigenTrust[9]uses the PageRank-style [2] propagation algorithm on theinteraction graph. To reduce the chances of collusion inPageRank calculation [29], OneHop reputation [20] and multi-level tit-for-tat [14] restrict reputation propagation toone hopor a few hops. Max-flow based calculation [6], [3] can be usedto defend against collusion.

Graph-based reputation schemes such as EigenTrust arenot designed to capture a node’s net contribution. In partic-ular, a node linking to another node with higher reputationachieves higher reputation gain. Thus, a strategic peer cangain unfair advantage by selectively contributing to certainpeers. As a result, graph-based reputations do not satisfythe desired fairness property. Moreover, the defense againstcollusion of existing graph-based reputation systems is weak.

For example, colluders can increase the net reputation of thecollusion group by1/ǫ, whereǫ is the probability of resettinga random walk, in a PageRank-style reputation computationlike EigenTrust [9] or multi-level tit-for-tat [14]. In max-flow based reputation systems like [6], [3], colluders canget higher reputation by recruiting more nodes, which haveinteractions with nodes outside the collusion group, into thecollusion group. This gives the incentive for honest nodes tocollude. However, graph-based reputation can be particularlyadvantageous in a live streaming environment where nodeswith different capacities should be incentivized to positionthemselves in different positions in the underlying streamdistribution graph, as is done in the Contracts system [21].

Another way to design reputation is to use upload entropymetric as described in [15]. This upload entropy metric and thethe credit diversity technique in Credo share the same insightwhich is used to separate honest nodes and a single adversarialnode: honest nodes interact to a divert set of nodes while asingle adversarial node only interacts with a small number ofSybils. Thus, upload entropy metric shares the same weaknesswith credit diversity: it cannot protect the system when manyadversarial nodes collude.

Currency systems.Currency systems must maintain systemliquidity according to the current demand and hoarding lev-els [10]. Karma [25] scales currencies based on the numberof active nodes. Antfarm [18] adjusts the amount of tokensbased on the number of active nodes and active swarms.

Currency systems such as Dandelion [23], BitStore [22],PACE [1], Antfarm [18] and others [26], [25] incentivize peersto upload to others in exchange for tokens that entitle themto future downloads. The tokens can be directly transferableamong peers [26], reclaimed by the central party upon eachuse [18], [26], or they can completely reside at the centralparty link in Dandelion [23]. Credo is not a currency system:although it uses the concept of credit transfer, it is only usingcredits to calculate each node’s reputation score. Existingcurrency system proposals enforce the strict “download asmuch as you upload” policy where a node with no currency isnot allowed to download. Compared to our notion of fairness,the policy enforced by currency systems is less desirable: allpeers achieve the same download speed as long as they havenon-zero currency tokens, a peer has no incentives to con-tribute more than what is necessary to satisfy its own demand.Moreover, currency systems face the daunting challenge ofmaintaining monetary supply according to current demand andhoarding levels at all times [10] to avoid undesired inflationsor the bankruptcy of many nodes.

Detecting and punishing deviant behaviors.It is notenough to just incentivize contribution, we also need disincen-tives to discourage peers from cheating. BAR Gossip [13] andFlightPath [12] introduce the idea of centralized punishmentsbased on cryptographic proof-of-misbehavior and use it toensure the fair exchange of data within a single swarm. In thecontext of peer-to-peer storage systems, Ngan et al. [17] andSamsara [5] rely on verifiable records and periodic auditingtocheck that peers indeed store data as they claim. SHARP [8]

Page 12: Credo: Thwarting collusion attacks in peer-to-peer content …trandinh/publications/Credo_TR.pdf · 2011. 7. 30. · Credo: Thwarting collusion attacks in peer-to-peer content distribution

12

also audits to ensure that an autonomous system complies withits resource contract.

Credo does not need general purpose auditing becausecredits are only used to calculate a peer’s reputation insteadof serving as resource claims. We borrow the idea of crypto-graphic proof-of-misbehavior from BAR Gossip [13] to detectand penalize misbehaving nodes.

IX. CONCLUSION

This paper describes the design of Credo, a robust credit-based reputation system that addresses the seeder promotionproblem in P2P. Credo reputation captures a nodes’ true netcontribution and is robust in the face of and Sybil attacks andcollusion.

REFERENCES

[1] A PERJIS, C., FREEDMAN, M. J.,AND JOHARI, R. Peer-assisted contentdistribution with prices. InCoNext(2008).

[2] BRIN, S.,AND PAGE, L. The anatomy of a large-scale hypertextual websearch engine. InWWW(1998).

[3] CHENG, A., AND FRIEDMAN , E. Sybilproof reputation mechanisms. InP2PECON(2005).

[4] COHEN, B. Incentives build robustness in bittorrent. InEconomics ofPeer-to-Peer Systems(2003).

[5] COX, L. P.,AND NOBLE, B. Samsara: Honor among theives in peer-to-peer storage. In19th ACM Symposium on Operating Systems Principles(SOSP)(2003).

[6] FELDMAN , M., LAI , K., STOICA, I., AND CHUANG, J. Robust incentivetechniques for peer-to-peer networks. InElectric Commerce(2004).

[7] FILENETWORKS, T. Vuze and deluge to be banned onwhat.cd and waffles.fm. http://filenetworks.blogspot.com/2010/02/vuze-and-deluge-to-be-banned-on-whatcd.html.

[8] FU, Y., CHASE, J. S., CHUN, B., SCHWAB, S., AND VAHDAT, A.Sharp: An architecture for secure resource peering. InACM SOSP(2003).

[9] K AMVAR , S. D., SCHLOSSER, M. T., AND GARCIA-MOLINA , H. Theeigentrust algorithm for reputation management in p2p networks. InWWW(2003).

[10] KASH, I., FRIEDMAN , E.,AND HALPERN, J. Optimizing scrip systems:Efficiency, crashes, hoarders, and altruists. InElectronic Commerce(2007).

[11] LEVIN , D., LACURTS, K., SPRING, N., AND BHATTACHARJEE, B.Bittorrent is an auction: analyzing and improving bittorrent’s incentives.In SIGCOMM (2008).

[12] L I , H., CLEMENT, A., MARCHETTI, M., KAPRITSOS, M., ROBISON,L., ALVISI , L., AND DAHLIN , M. Flightpath: Obedience vs. choice incooperative services. InOSDI (2008).

[13] L I , H., CLEMENT, A., WONG, E., NAPPER, J., ROY, I., ALVISI , L.,AND DAHLIN , M. Bar gossip. InOSDI (2006).

[14] L IAN , Q., PENG, Y., YANG, M., ZHANG, Z., DAI , Y., AND L I , X.Robust incentives via multi-level tit-for-tat. InIPTPS(2006).

[15] L IU , Z., DHUNGEL, P., WU, D., ZHANG, C., AND ROSS, K. W.Understanding and improving ratio incentives in private communities.In ICDCS (2010).

[16] MEULPOLDER, J., ACUNTO, L. D., CAPOTA, M., WOJCIECHOWSKI,M., POUWELSE, J., EPEMA, D., AND SIPS, H. Public and privatebittorrent communities: A measurement study. InIPTPS(2010).

[17] NGAN, T.-W., WALLACH , D., AND DRUSCHEL, P. Enforcing fairsharing of peer-to-peer resources. InIPTPS(2003).

[18] PETERSON, R. S., AND SIRER, E. G. Antfarm: Efficient contentdistribution with managed swarms. InNSDI (2009).

[19] PIATEK , M., ISDAL, T., ANDERSON, T., KRISHNAMURTHY, A., AND

VENKATARAMANI , A. Do incentives build robustness in bittorrent? InNSDI (2007).

[20] PIATEK , M., ISDAL, T., KRISHNAMURTHY, A., AND ANDERSON, T.One hop reputations for peer to peer file sharing workloads. In NSDI(2008).

[21] PIATEK , M., KRISHNAMURTHY, A., VENKATARAMANI , A., YANG, R.,AND ZHANG, D. Contracts: Practical contribution incentives for p2p livestreaming. InNSDI (2010).

[22] RAMACHANDRAN , A., DAS SARMA , A., AND FEAMSTER, N. Bitstore:An incentive-compatible solution for blocked downloads inbittorrent.In NetEcon(2007).

[23] SIRIVIANOS , M., PARK , J. H., YANG, X., AND JARECKI, S. Dandelion:Cooperative content distribution with robust incentives.In USENIX ATC(2007).

[24] TRAN, N., LI , J., SUBRAMANIAN , L., AND CHOW, S. S. Optimal sybil-resilient node admission control. InINFOCOM (2011).

[25] V ISHNUMURTHY, V., CHANDRAKUMAR , S., AND SIRER, E. Karma:A secure economic framework for peer-to-peer resource sharing. InP2P-ECON(2003).

[26] YANG, B., AND GARCIA-MOLINA , H. Ppay: micropayments for peer-to-peer systems. InCCS(2003).

[27] YANG, M., CHEN, H., ZHAO, B. Y., DAI , Y., AND ZHANG, Z. Deploy-ment of a large-scale peer-to-peer social network. InUSENIX WORLDS(2004).

[28] YU, H., GIBBONS, P., KAMINSKY, M., AND X IAO , F. Sybillimit: Anear-optimal social network defense against sybil attacks. In IEEESymposium on Security and Privacy(2008).

[29] ZHANG, H., GOEL, A., GOVINDAN , R., AND MASON, K. Makingeigenvector-based reputation systems robust to collusions. InProc. of theThird Workshop on Algorithms and Models for the Web Graph(2004).