7
Assessing the authenticity of Distributed Hash Table entries in Peer-to-Peer networks Tarik El Yassem & Jop van der Lelie System and Network Engineering (MSc) (Dated: April 2, 2012) This paper describes two attacks against the Mainline Distributed Hash table that can be leveraged to inject false information in BitTorrent monitoring systems. The attacks show that information in the DHT might be falsified and is therefore not reliable enough to serve as proof. The only way a monitoring system can reliably assess if a certain node is really offering a file for download is to connect to the suspect system and attempting to download the file. I. INTRODUCTION BitTorrent is a popular filesharing protocol which makes use of peer-to-peer technology. In the classic Bit- Torrent implementation a peer creates a torrent file con- taining a list of files that it wants to share and starts to seed it. Other peers will then download small pieces of the files listed in the torrent from the ”seeder” and also start sharing the pieces they already have. In or- der to know which peers to connect to, a tracker keeps a list of a ”swarm” of peers who are downloading a cer- tain torrent. Because each peer first has to contact the tracker to retrieve the other peers of the swarm this is called a centralized peer-to-peer architecture. Most often because of legal issues against these trackers[1], these ar- chitectures move towards a decentralized peer-to-peer ar- chitecture. In such an architecture no centralized tracker is used and peers will use a deterministic procedure to construct the overlay network in which each peer will act as a small tracker. The most common procedure used for this is using a Distributed Hash Table (DHT). Inside the overlay network that is constructed, peers will then query each other to retrieve the list of peers connected in the swarm of a torrent. Recent developments such as the website youhavedownloaded.com, which supposedly shows the download history of BitTorrent users, arouse interest in the workings and reliability of distributed hash tables from a forensics perspective. Can information that is retrieved from decentralized peer-to-peer networks us- ing a DHT be trusted, and to what extent? II. THEORY A. BitTorrent BitTorrent is a popular peer-to-peer filesharing pro- tocol used to distribute large files across the internet. It works with metainfo files (torrents) as bencoded dic- tionaries containing an announce section and an info section[2]. The announce consists of the URL of the tracker responsible for this torrent and info section has information about the files associated with this torrent. This information consists of the name, size and a SHA-1 hash of the file(s). Because each file is split into a number of smaller pieces, the size and total amount of the pieces is also included in the torrent. In order to download files using BitTorrent, a client has to perform the following three steps: 1. Content discovery: retrieving the torrent 2. Peer discovery: get peer list of the torrent 3. Data exchange: exchanging data with the peers For the content discovery a client downloads a torrent file, most often from a website, and generates a 160-bit SHA-1 ”infohash” of the info section of the torrent. It also ran- domly generates its own 160-bit peer-ID. The client then queries the tracker URL from the metainfo file with the infohash, it’s peer-ID, the clients IP and TCP port, total amount uploaded and downloaded so far and the number of bytes it still needs to download before all files are com- plete. The tracker will respond with a list of peers associ- ated with the torrent, it also includes the peer-ID, IP and TCP port for each peer. The client itself is also added to the peers list on the tracker and peers can exchange pieces inside the swarm of peers. Each peer will auto- matically share the pieces it already has with others in the swar. This achieves a much lower cost for the content provider of the original files. One of the disadvantages of BitTorrent is the dependency on the tracker which acts as a centralized entity and therefore does not scale. The tracker is also a single point of failure and is often the target of law enforcement and copyright protection agencies. To make BitTorrent a fully decentralized peer- to-peer network the role of the tracker may be delegated to the peers by using a Distributed Hash Table(DHT). The peer discovery is then done on the overlay network itself instead by the tracker. Most BitTorrent clients im-

Assessing the authenticity of distributed hash table

Embed Size (px)

DESCRIPTION

This paper describes two attacks against the Mainline Distributed Hash table that can be leveraged to inject false information in BitTorrent monitoring systems. The attacks show that information in the DHT might be falsi ed and is therefore not reliable enough to serve as proof. The only way a monitoring system can reliably assess if a certain node is really o ering a le for download is to connect to the suspect system and attempting to download the file.

Citation preview

Page 1: Assessing the authenticity of distributed hash table

Assessing the authenticity of Distributed Hash Tableentries in Peer-to-Peer networks

Tarik El Yassem & Jop van der LelieSystem and Network Engineering (MSc)

(Dated: April 2, 2012)

This paper describes two attacks against the Mainline Distributed Hash table that can be leveragedto inject false information in BitTorrent monitoring systems. The attacks show that informationin the DHT might be falsified and is therefore not reliable enough to serve as proof. The only waya monitoring system can reliably assess if a certain node is really offering a file for download is toconnect to the suspect system and attempting to download the file.

I. INTRODUCTION

BitTorrent is a popular filesharing protocol whichmakes use of peer-to-peer technology. In the classic Bit-Torrent implementation a peer creates a torrent file con-taining a list of files that it wants to share and startsto seed it. Other peers will then download small piecesof the files listed in the torrent from the ”seeder” andalso start sharing the pieces they already have. In or-der to know which peers to connect to, a tracker keepsa list of a ”swarm” of peers who are downloading a cer-tain torrent. Because each peer first has to contact thetracker to retrieve the other peers of the swarm this iscalled a centralized peer-to-peer architecture. Most oftenbecause of legal issues against these trackers[1], these ar-chitectures move towards a decentralized peer-to-peer ar-chitecture. In such an architecture no centralized trackeris used and peers will use a deterministic procedure toconstruct the overlay network in which each peer will actas a small tracker. The most common procedure usedfor this is using a Distributed Hash Table (DHT). Insidethe overlay network that is constructed, peers will thenquery each other to retrieve the list of peers connectedin the swarm of a torrent. Recent developments such asthe website youhavedownloaded.com, which supposedlyshows the download history of BitTorrent users, arouseinterest in the workings and reliability of distributed hashtables from a forensics perspective. Can information thatis retrieved from decentralized peer-to-peer networks us-ing a DHT be trusted, and to what extent?

II. THEORY

A. BitTorrent

BitTorrent is a popular peer-to-peer filesharing pro-tocol used to distribute large files across the internet.It works with metainfo files (torrents) as bencoded dic-

tionaries containing an announce section and an infosection[2]. The announce consists of the URL of thetracker responsible for this torrent and info section hasinformation about the files associated with this torrent.This information consists of the name, size and a SHA-1hash of the file(s). Because each file is split into a numberof smaller pieces, the size and total amount of the piecesis also included in the torrent. In order to download filesusing BitTorrent, a client has to perform the followingthree steps:

1. Content discovery: retrieving the torrent

2. Peer discovery: get peer list of the torrent

3. Data exchange: exchanging data with the peers

For the content discovery a client downloads a torrent file,most often from a website, and generates a 160-bit SHA-1”infohash” of the info section of the torrent. It also ran-domly generates its own 160-bit peer-ID. The client thenqueries the tracker URL from the metainfo file with theinfohash, it’s peer-ID, the clients IP and TCP port, totalamount uploaded and downloaded so far and the numberof bytes it still needs to download before all files are com-plete. The tracker will respond with a list of peers associ-ated with the torrent, it also includes the peer-ID, IP andTCP port for each peer. The client itself is also addedto the peers list on the tracker and peers can exchangepieces inside the swarm of peers. Each peer will auto-matically share the pieces it already has with others inthe swar. This achieves a much lower cost for the contentprovider of the original files. One of the disadvantagesof BitTorrent is the dependency on the tracker whichacts as a centralized entity and therefore does not scale.The tracker is also a single point of failure and is oftenthe target of law enforcement and copyright protectionagencies. To make BitTorrent a fully decentralized peer-to-peer network the role of the tracker may be delegatedto the peers by using a Distributed Hash Table(DHT).The peer discovery is then done on the overlay networkitself instead by the tracker. Most BitTorrent clients im-

Page 2: Assessing the authenticity of distributed hash table

2

plement both ways for peer discovery and the user canchoose which to use.

B. Distributed Hash Table

A Distributed Hash Table(DHT) provides a lookup ser-vice for all nodes in the network similar to hash tablelookups. Given a certain key, each node can query theDHT of any other node to retrieve the associated value.Each node is responsible for maintaining the mapping be-tween certain keys and values which allows DHT-basednetworks to scale very well. There are many services thatmake use of a DHT such as distributed file systems, do-main name services, instant messaging and peer-to-peerfilesharing. One implementation of a peer-to-peer file-sharing protocol is called Kademlia, in which no trackeris used to coordinate the downloads on the network.

C. Kademlia

Kademlia[3] is an implementation of a peer-to-peerDistributed Hash Table (DHT). It works by assigningeach node a 160-bit identifier (ID) which is supposedto be unique and randomly chosen. Every node stores< key, value > pairs of nodes that have an ID which is”close” to it’s own ID. A lookup algorithm can be usedto lookup nodes ”closer” to any desired ID.

In Kademlia, both nodes and keys are identified by an160-bit identifier, and the whole protocol relies on thedistance between two identifiers. Given two identifiersx and y, Kademlia defines the distance between themas their bitwise exclusive or (XOR) interpreted as an in-teger, d(x, y) = x ⊕ y. Note that the XOR metric issymmetric, so d(x, y) = d(y, x).

Each node keeps a list of recent contacted nodes calledthe k-buckets. Inside the k-bucket the node keeps a listof < IP, port, ID > triples for nodes with a distancebetween 2i and 2i+1 from itself where 0 ≤ i ≤ 160. Thek-buckets are kept ordered with the last seen node on topand can grow to a maximum size k.

When a node wants to find the k other closest nodesto a given ID it will recursively query α nodes from itsclosest non-empty k-bucket. Others will respond withthe k closest node it knows of so the node can updateits own k-buckets. This process can be repeated untilno closer nodes are found. This is called a node lookupand is the most important procedure a participant canundertake in Kademlia. This ensures that only O(log n)nodes are queried during a node lookup out of n nodes inthe system.

There are a few implementations that are based onKademlia such as Kad, Azureus DHT and Mainline DHT.The latter is used by most BitTorrent clients, such as theofficial BitTorrent client, uTorrent, BitComet, Transmis-sion and BitSpirit.

D. Mainline DHT

The Mainline DHT implementation[4] is based onKademlia and also uses a 160-bit identifier for keys wherethe ”infohash” of a torrent file is also considered a key.Mainline DHT uses k = 8 for the number of nodes inthe k-buckets. All messages sent in the network are sentalong with a token using the UDP protocol. The tokencan be used to validate the response of a certain query. InMainline DHT, there are four different messages: ping,find node, announce peer and get peers. The ping mes-sage just probes a node to see if it is online. Withfind node a node will return the < IP, port, peer− ID >values of the k closest nodes it knows of by looking inits k-buckets. With the announce peer message, a nodecan instruct another node that it should add the givenpeer to its peer list for a given infohash. The get peersbehaves like the find node but if the node has previouslyretrieved an announce peer query, it will return the listof peers.

When a client wants to add a torrent to the DHT net-work it will first do a recursive node lookup to find the kclosest nodes to the infohash of the torrent. Afterwardsit will send the k nodes an announce peer message withit’s own IP address and TCP port used in the BitTorrentprotocol to download the files. Other nodes can find thelist of peers by sending a get peers message with the in-fohash. This will eventually return the peer list of thenewly created torrent.

For a node to keep it’s k-buckets updated, the entryof a node in the bucket is renewed whenever a node re-trieves a message (request or reply) for the entry. And toprevent that no IDs are available in a certain k-bucket,the node will refresh the bucket in case no node lookupwas retrieved for that bucket in the past 30 minutes. Arefresh means it will randomly choose a node from thebucket and perform a node lookup for that ID.

Figure 1 illustrates an implementation of the Main-line DHT. First a node creates a new infohash for acertain torrent. The client announces the infohash tothe nearest nodes to the infohash. So in this example,the infohash = 41, and it is being announced(1) to thenode with id = 42 which is the closest node id to theinfohash. Next, a client searches for peers to downloadthe file from(2). It knows the infohash value so it firstsearches for nearby nodes by issuing get peers messages.In our simplified example, it directly finds the responsi-ble node at 42. However, in reality the client will need tofind it’s way through the DHT in order to find the closestnode. It receives back the IP address and TCP port ofthe clients that share the file(3), so the BitTorrent clientmay download the file(4).

E. Monitoring the DHT network

In multiple recent news articles[5][6], people and organ-isations have been accused of having provided or down-

Page 3: Assessing the authenticity of distributed hash table

3

FIG. 1: illustration of the get peers mechanism

loaded certain content via BitTorrent. Copyright or-ganisations have prosecuted individuals of providing ordownloading copyrighted material[7][8], the copyright or-ganisations themselves have in turn been accused[9] ofthe same act. The same accusations have been madeagainst governmental entities such as the French primeminister[10] and the Dutch parliament[11]. There havebeen many reports of similar events within the same timeframe. The youhavedownloaded.com website claims itcan show if and what files have been downloaded fromBitTorrent from a given IP address. These cases havemade us question the reliability of information that canbe gained by monitoring BitTorrent traffic. This sectionwill discuss the possibilities of manipulating BitTorrenttraffic in order to ascertain it’s reliability. Manipulationrequires misleading an observer, therefore possible waysof observing BitTorrent filesharing and downloading willbe described before discussing manipulation techniques.

Monitoring peer-to-peer communication is not a triv-ial task due to its distributed nature. It is not sufficientto place a sensor near a single node as only a relativelysmall portion of all traffic in the p2p network will passthrough a single node. A well known attack against p2pnetworks is the sybill attack[12], in which false nodes areinjected into the network in order to monitor or manip-ulate traffic. Many monitoring systems, as the describedby Memon et al.[13] and Timpanaro et al.[14], are basedon the sybill attack. However, a sybill attack is not suffi-cient for large-scale monitoring since a very high numberof sybills would need to be placed into the network inorder to monitor a decent portion of all traffic.

Le Blond et al[15] use tracker crawling as a way to mon-itor BitTorrent users on a large scale. Zhang et al.[16]describe a similar technique but also obtained peer listsfrom the Vuze and Mainline DHTs to monitor specifictorrent file sharers. Ways to perform large-scale mon-

itoring of BitTorrent users by crawling the DHT havebeen described by Wolchok et al.[17] and Liuy et al.[18].Piatek et al.[19] describe techniques for implicating ar-bitrary network endpoints in illegal content sharing anddemonstrate the effectiveness of these techniques experi-mentally, attracting real DMCA complaints for nonsensedevices. Their work focuses on trackers and the tech-niques they have described require a misreporting client,a man in the middle attack or malware.

The use of trackers has become less popular due to theincreasing number of takedowns and censorship of torrenttracking websites. Inside the DHT network there are twoways to monitor traffic, one could either actively queryothers to receive peer information or passively and waitto others query the monitoring node for peers.

It is unkown what kind of techniques are used byyouhavedownloaded.com, and by other, unkown monitor-ing systems. It is assumed that a system that monitorsDHT traffic uses the techniques described previously.

F. Poisoning the DHT network

There are various weaknesses in the DHT implementa-tions to poison the DHT traffic. For instance one couldperform an index poisoning attack as described by Lianget al.[20] which floods the overlay network with bogusidentifiers for files. When a user searches the network andwants to download one of the files the client will fail tolookup the given identifier which results in a DoS. Similarattacks have been descibed against the Kad network bySteiner et al.[21] and Locher et al.[22]. Countermeasureshave been suggested in various forms, mostly focussingon authentication such as described by Lou et al.[23].

The goal of most DHT poisoning attacks is either aDoS against the network or against arbitrary IP adresses.DHT poisoning has not been researched from a foren-sics perspective. With recent developments where com-panies try to sue indivituals or where accusations aremade against organisations or people, based on infor-mation gathered in the DHT network, it is importantto assess the authenticity of these DHT entries. In ourattacks we describe how self forged messages could feedfalsified information to monitoring nodes inside the DHTnetwork.

III. EXPERIMENTAL METHODS

As described in the previous section, a DHT monitorcan be manipulated in believing a certain system hasshared a file, or downloaded a file itself. By supplyingthe monitor either a forged announce peers or get peersmessage, the system can be provided a false impressionof a share or download. The monitoring system must beable to identify requests for a given file. The only way itcan do this is by monitoring for specific infohashes, as theinfohash is unique for a given file. If an attacker wishes

Page 4: Assessing the authenticity of distributed hash table

4

to implicate an arbitrary system, the attacker must usean infohash that is likely to be monitored. Furthermore,the attacker likely wants to use an infohash of an illegalor offensive file to cause a negative impact on the systemowner the attacker is impersonating.

1. Injecting an arbitrary peer into a peerlist

To mislead a monitoring system to believe a certain IPaddress is exchanging data on a certain torrent file, theattacker needs to perform the following steps:

• find or create an infohash corresponding to a cer-tain torrent file

• join the DHT

• send an announce peer message with a arbitraryTCP port within a UDP packet with a spoofedsource IP to nodes near the infohash

When the announce peer message is received by thenodes they will extract the infohash and TCP port fromthe message and lookup the destination IP in the UDPpacket header. It will then append the peerlist associ-ated with the given infohash with the IP and TCP port.Whenever the monitoring node queries the same nodewith a get peers query for that infohash, the node willreply with the poisoned peerlist which seems legit for themonitor.

This weakness might also be exploited in a denial ofservice attack when many nodes will try to connect toone peer as suggested in the further research section.

2. Replying with a poisoned peerlist

To mislead a monitoring system to believe a certain IPaddress has downloaded or is in the process of download-ing a certain file, the attacker has to add the IP to thepeerlist of a torrent file. He can do this by performingthe following steps:

• find or create an infohash corresponding to a cer-tain torrent file

• choosing a node ID that is close to the targetedinfohash

• join the DHT with the chosen node-ID

• reply with false peer information on a get peersquery

This attack exploits a well known vulnerability DHTsas described by Steiner et al.[21], the ability to choose anarbitrary node-ID. By doing so an attack may becomethe node responsible for serving get peers queries for acertain torrent file as described by the Kademlia protocol.

Please note that this attack only suggests that a givennode with the proved IP is exchanging data of that tor-rent. Since the actual data exchange occurs through adirect TCP connection over the BitTorrent protocol, thisdata exchange can not be seen by the monitor on theDHT network. When a peerlist is replied to a get peersquery of the monitor, the supplied IP addresses might befalsely implicated.

IV. RESULTS

In the previous section we described two attacks to ma-nipulate DHT messages so that an arbitrary IP can beput onto the peerlist of a torrent which suggests it is ex-changing data. In this section we describe the results ofour attacks we performed on a DHT based network. Asstated in the theory section we focussed on the MainlineDHT implementation because this is the most used im-plementation of Kademlia. Currently no Mainline DHTspecification exists, but there a best practice(BEP) basedon the implementation of the official BitTorrent clientavailable on the BitTorrent website[4]. It is important tonote that the BEP is loosely specified and some detailsare left to be decided upon by the programmer.

In order to verify understanding of the workings of themainline DHT, and to monitor the captured traffic, anpython implementation of Mainline DHT can be used.Look-MLKademlia[24] can filter DHT traffic from a reg-ular packet capture file and gives detailed informationabout the DHT messages.

The first attack, injecting an arbitrary peer onto thepeerlist of a certain infohash, was performed by captur-ing legit DHT traffic using Wireshark. We captured DHTtraffic on one of our nodes which was acting as a trackerof a certain torrent because it’s node ID happened to bethe closest one to the infohash. After capturing an an-nounce peer message of an arbitrary node to our tracker,we resend the same announce peer message but modifiedthe UDP header with a spoofed source IP and recalcu-lated the checksums. We sent the packet and monitoredthe network traffic and actively queried the DHT for thepeers of the infohash.

The second attack, responding to get peers requestswith false IP adresses and TCP ports was performed byadjusting a Mainline DHT implementation. We modi-fied the code of pymdht[25], a python implementationof Mainline DHT. This tool was also used to choose thenode-ID. We ran the modified pymdht on a seperate sys-tem and chose a node-ID close to the infohash of the testtorrent. We then replied with the false information to allget peers queries we retrieved, we verified the workingsof the attack by studying the captured network traffic.

Page 5: Assessing the authenticity of distributed hash table

5

V. ANALYSIS

The theoretical possibility of providing crafted mes-sages to a node in response to a get peers message, andspoofing announce peer messages was described in theprevious section. We implemented the two attacks andtested them on a Mainline DHT network.

Implementing the false get peers response was trivial,we used a modified version of the pymdht tool which is aPython implementation of Mainline DHT. The modifiedversion added a false IP address to the get peers response.We executed the attack by using a test client that down-loaded a test torrent. The packet capture showed thatthe client received the modified response. The get peersrequest includes a token, but since the attacker himselfhas control over the receiving node, the attacker can sendthe token back with the crafted response and thus looklike a legitimate participant of the swarm.

We did not succeed in becoming the responsible nodefor a given infohash as is described by Steiner et al.[21],and Garcia[26]. Though Dar writes[27] that Garcia didhave trouble implementing it in practice. This is proba-bly due to the implementation of the Mainline DHT inthe pymdht tool. Because of this we could not test our at-tack against get peers requests for a specific infohash butwe could see our false IP address be added to the peerlistthat others requested at our node. A naive monitoringsystem that makes decisions based on this network traf-fic can thus be fooled into believing the injected node issharing a given file.

For the announce peer attack, we sniffed a genuine an-nounce peer message and then modified the source IP.We executed the attack and monitored the network traf-fic for responses. Unfortunately we did not succeed invalidating this attack in practice. No responses were re-turned to our announce peer message and no effects wereseen when doing a lookup on the infohash. We verifiedthat our IP and UDP headers were correct by sendinga get peers message. In this case we did get a valid re-ply. This confirmed that our tool was working correctly.The spoofing of the IP address was also successfull as aresponse was recieved at the spoofed IP address. This at-tack requires that the internet connection of the attackerdoes not apply egress filtering or other filtering basedon source IP. As for our announce peer query, all headerfields were carefully scrutinized in order to prevent anymistakes with header length or checksum fields but stillthe victims that we tried, were not responding to ourmessage. The attack would have been easier to verify ifwe could become the responsible node for the infohashwe attacked so we could choose which implementation ofthe Mainline DHT the victim was using.

It is unclear what kind of defences were implementedby various clients. But it is clear that any requirementthat is not explicitly stated by the Mainline BEP wouldmake our attacks more difficult. Some suggestions[23][28]have been made to solve the client side node ID assign-ment problem. The attacker would then need to find a

way to become the closest node, or flood a node withforged get peers replies. Because all DHT traffic is UDP,the second attack is theoretically possible, since the at-tacker can spoof the source IP. The attacker would nothave knowledge of the token and would need to bruteforce it. The token is specified by the BEP as ”a shortbinary string”, however most implementations use a 4byte SHA-1 hash of the IP address along with a secret.The way the secret is calculated varies between imple-mentations.

We did notice differences in announce peer messagecontents, sometimes an announce peer message wouldalso contain a field stating it was an announce peer mes-sage. There were also some other differences in capturedannounce peer packets. We believe this is caused by dif-ferent implementations. It is possible the packet we usedfor our announce peer attack was incompatible with theattacked client. Further research would be necessary tofully implement our attack.

VI. CONCLUSION

Systems that monitor BitTorrent usage are mostlyprocessing information from BitTorrent trackers. Somemonitoring systems also monitor DHT networks and usecrawling techniques or monitor a relatively small por-tion of traffic by injecting false nodes. DHT’s based onKademlia specifically, use the UDP protocol in order toallow nodes to find peers. We have focussed our researchon Mainline DHT. These DHT’s allow clients to choosetheir node ID, thus allowing them to choose a positionrelative to an infohash.

These properties make it possible to forge messages insuch a way that a monitoring system can be manipulated.Piatek et al.[19] proved that monitoring systems that relyon tracker information can be manipulated to implicateinnocent users.

We have shown it to be theoretically possible to impli-cate arbitrary users by forging messages within the DHT.We did only succeed in implementing one of the attackssuccessfully. Unfortunately we did not find the reasonwhy one of our attacks did not seem to work. Theo-retically, the attack should be possible and the separatestages of the attack al worked in practice. But whencombined something seemed to go wrong. Many clientshave implemented undocumented changes to the proto-col specification, it is likely that an implementation spe-cific requirement hinders our attack. The protocol itselfhas been proven weak and therefore believe that thereis enough reason to doubt the reliability of informationlearned from a DHT.

Naive monitoring systems may be manipulated to im-plicate innocent users. The only way a monitoring sys-tem can reliably assess if a certain node is really offeringa file for download is to connect to the specified IP andTCP destination port and attempting to download thefile. We believe that innocent users might have possibly

Page 6: Assessing the authenticity of distributed hash table

6

been falsely implicated by an attack such as describedin this paper. The youhavedownloaded.com site devel-oper has stated that information from DHT’s has beenused. People and organisations such as national parlia-ments, prime ministers and copyright organisations havebeen implicated by youhavedownloaded.com, howeverthe implicated people or organisations claimed to havebeen framed. Since the inner workings of youhavedown-loaded.com have not been made public, we have to resortto speculation. We suspect that youhavedownloaded.comdoes not attempt to download files from peers, and relieson information from the DHT, which we have shown tobe easy to manipulate.

The implications of successfully framing an innocentperson can be severe, they might suffer damage to theirpublic image, be penalized by actions of copyright hold-ers or be investigated by law enforcement. Anyone whointerprets the results from a monitoring system, or re-ports on findings based on such a system should realisethat DHT information is not reliable enough to draw con-clusions upon. The workings of the monitoring systemshould be scrutinised in order to verify that informationprocessed by the monitoring system cannot be forged.

VII. SUGGESTIONS FOR FURTHERRESEARCH

While analysing our attack, we noticed that an at-tacker might also use this technique to launch a possibledenial of service attack through DHT nodes. This couldwork by inserting a large amount of target IP addresses,each with a different port in a get peers reply for popu-lar infohashes. Unsuspecting clients who would want todownload the torrent represented by the infohash wouldtry to make a TCP connection to the target IP. We havenot investigated the possible effectiveness of such an at-tack, but recognize that the attack would be very hardto counter.

Our attack might be mitigated by a monitoring sys-tem that keeps account of the nearest notes and com-pares their responses to get peers requests. This strategy,along with other possible mitigation techniques might beworthwhile to research.

[1] http://arst.ch/o1v.[2] http://www.bittorrent.org/beps/bep_0003.html.[3] Peter Maymounkov and David Mazieres. Kademlia: A peer-to-peer information system based on the xor metric. -, 2002.[4] http://www.bittorrent.org/beps/bep_0005.html.[5] http://www.channelnews.com.au/Content_And_Management/IPTV/A7P6R8E7l.[6] http://cnews.canoe.ca/CNEWS/Politics/2012/01/03/19196571.html.[7] Bendover. http://www.bbc.co.uk/newsbeat/17525349.[8] http://arst.ch/t2t.[9] http://news.cnet.com/8301-27080_3-57345342-245/bittorrent-downloads-linked-to-riaa-dhs-ip-addresses/.

[10] http://news.cnet.com/8301-27080_3-57343665-245/.[11] http://torrentfreak.com/dutch-parliament-downloading-movies-and-music-will-stay-legal-111224/.[12] Guido Urdaneta, Guillaume Pierre, and Maarten van Steen. A survey of dht security techniques. -, 2009.[13] Yang Guo Ghulam Memon, Reza Rejaie and Daniel Stutzbach. Large-scalemonitoring of dht traffic. -, 2009.[14] Thibault Cholez, Isabelle Chrisment, and Olivier Festor. Monitoring and controlling content access in kad. -, 2010.[15] Stevens Le Blond, Arnaud Legout, Fabrice Lefessant, Walid Dabbous, and Mohamed Ali Kaafar. Spying the world from

your laptop Identifying and profiling content providers and big downloaders in bittorrent. -, 2010.[16] Di Wu Chao Zhang, Prithula Dhungel and Keith W. Ros. Unraveling the bittorrent ecosystem. -, 2011.[17] Scott Wolchok and J. Alex Halderman. Crawling bittorrent dhts for fun and profit. -, 2010.[18] Tao Meng Xiangtao Liuy, Kai Cai, and Xueqi Cheng. Rainbow: a robust and versatile measurement tool for kademlia-based

dht networks. -, 2010.[19] Tadayoshi Kohno Michael Piatek and Arvind Krishnamurthy. Challenges and directions for monitoring p2p file sharing

networks – or – why my printer received a dmca takedown notice. -, 2008.[20] Keith W. Ross Jian Liang, Naoum Naoumov. The index poisoning attack in p2p file sharing systems.[21] Taoufik En-Najjary Moritz Steiner and Ernst W. Biersack. Exploiting kad: Possible uses and misuses.[22] Thomas Locher, David Mysicka, Stefan Schmid, and Roger Wattenhofer. Poisoning the kad network. 2010.[23] Xiaosong Lou and Kai Hwang. Prevention of index-poisoning ddos attacks in peer-to-peer file-sharing networks*.[24] https://github.com/rauljim/Look-MLKademlia.[25] https://github.com/rauljim/pymdht.

[26] ISMAEL SAAD GARCIA. Exploring mainline dht: an experimental approach.[27] Sara Dar. Lookup analyzer in bittorrent mainline dht look@mlkademlia.[28] Luca Maria Aiello, Marco Milanesio, Giancarlo Ruffo, and Rossano Schifanella. Tempering kademlia with a robust identity

based system. 2008.

Page 7: Assessing the authenticity of distributed hash table

7

Appendix A: Forged announce peer packet

Ethernet00 18 8b 0e a3 0b

dst 00:18:8b:0e:a3:0b

00 19 b9 aa 67 06

src 00:19:b9:aa:67:06

08 00

type 0x800

IPversion 4L

45

ihl 5L

00

tos 0x0

01 81

len 385

11 88

id 4488flags

00 00

frag 0L

80

ttl 128

11

proto udp

da 23

chksum 0xda23

91 64 62 cd

src 145.100.98.205

4d 230c 6c

dst 77.35.12.108options []

UDP

ab 6d

sport 43885

78 29

dport 30761

01 6d

len 365

99 9d

chksum 0x999d

Raw

64 31 3a 72 64 323a 69 64 32 30 3a 8a ec 80 55 45 db a4 ca b5 c906 45 a2 01 47 2a f1 ac 83 db 32 3a 69 70 34 3a53 55 0a 1b 31 3a 6e 31 36 3a 31 30 6d 62 2e 6269 6e 2e 6b 70 6e 2e 62 69 6e 35 3a 6e 6f 64 6573 32 30 38 3a 8a ec 80 bc f2 96 89 1f ce dd 31e3 7e 17 0d bf 60 7f 7b 79 58 a2 42 a5 c8 d5 8aec 82 2b b7 0f e5 9b 8c 0f 0c ce 38 a9 36 9d 3bb7 35 9c 57 14 4a 94 27 14 8a ec 00 a7 d0 1d 06c5 b7 41 78 4d 9b 0f e2 c4 d9 6f 42 be 7d 1a ef11 e9 fd 8a ec 06 51 a4 a0 e3 a3 8d 65 5a 63 4e4b de fc 8a 06 12 ff bb 6a 6c 10 78 41 8a ec 16ef 08 c8 9d 24 f3 b0 af 33 db b7 6a b6 45 0b 25ae 29 63 23 56 63 48 8a ec 34 4d 8e 9f 21 8e 0a15 8a b8 9d 9c 11 14 0e 3f 92 d2 be 16 49 c7 3ec1 8a ec 48 e8 03 55 fa 53 8a 4a 2a 02 e3 ee bb11 02 64 83 03 4d 60 13 92 56 fc 8a ec 54 31 3222 90 b6 db d7 e9 f5 24 3b 0e 2c f6 b7 4e ab 4e3e fa 17 7a b8 35 3a 74 6f 6b 65 6e 32 30 3a 150b a3 7f f2 c6 04 85 0d 48 f7 4d 5c f5 45 4f d5ce ad e7 36 3a 76 61 6c 75 65 73 6c 36 3a 91 6462 d8 ab 6d 65 65 31 3a 74 34 3a 52 07 00 00 313a 76 34 3a 55 54 68 95 31 3a 79 31 3a 72 65

load ’d1:rd2:id20:\x8a\[...]