24
1 Dept of CSE VISVESVARAIAH TECHNOLOGICAL UNIVERSITY BELGAUM DHARWAD – 580 002 A seminar report on BITTORRENT PROTOCOL Submitted by Rajani .B. Paraddi 2SD06CS071 8th semester

seminar report on bit torrent

Embed Size (px)

DESCRIPTION

seminar report on bit torrent

Citation preview

Page 1: seminar report on bit torrent

1 Dept of CSE

VVIISSVVEESSVVAARRAAIIAAHH TTEECCHHNNOOLLOOGGIICCAALL UUNNIIVVEERRSSIITTYY

BBEELLGGAAUUMM

DHARWAD – 580 002

A seminar report on

BITTORRENT PROTOCOL

Submitted by

Rajani .B. Paraddi

2SD06CS071

8th semester

Page 2: seminar report on bit torrent

2 Dept of CSE

VVIISSVVEESSVVAARRAAIIAAHH TTEECCHHNNOOLLOOGGIICCAALL UUNNIIVVEERRSSIITTYY

BBEELLGGAAUUMM

DEPARTMENT OF COMPUTER SCIENCE ENGINEERING

CERTIFICATE Certified that the seminar work entitled “BITTORRENT

PROTOCOL” is a bonafide work presented by Rajani.B.Paraddi bearing USN

2SD06CS071 in a partial fulfillment for the award of degree of Bachelor of Engineering in

Computer Science Engineering of the Vishveshwaraiah Technological University,

Belgaum during the year 2009-10. The seminar report has been approved as it satisfies the

academic requirements with respect to seminar work presented for the Bachelor of

Engineering Degree.

Staff In Charge

H.O.D CSE

Name: Rajani .B. Paraddi USN: 2SD06CS071

Page 3: seminar report on bit torrent

3 Dept of CSE

Index

1. Introduction

1.1. Overview

1.2. History

2. BitTorrent and Other approaches

2.1. Other P2P Methods

2.2. Typical HTTP File Transfer

2.3. The DAP method

2.4. The BitTorrent Approach

3. Working of BitTorrent

4. Terminology

5. Architecture of BitTorrent

5.1. Metainfo File

5.2. Tracker

5.3. Peers

5.4. Data

5.5. Bittorrent Clients

6. Vulnerabilities of BitTorrent

6.1. Attacks on bittorrent

6.2. Solutions

7. Conclusion

8. References

Page 4: seminar report on bit torrent

4 Dept of CSE

1. Introduction[1] 1

1.1 Overview 2

BitTorrent is a peer-to-peer file sharing protocol used to distribute large amounts 3

of data. BitTorrent is one of the most common protocols for transferring large files. Its main 4

usage is for the transfer of large sized files. It makes transfer of such files easier by 5

implementing a different approach. A user can obtain multiple files simultaneously without 6

any considerable loss of the transfer rate. It is said to be a lot better than the conventional file 7

transfer methods because of a different principle that is followed by this protocol. It also 8

evens out the way a file is shared by allowing a user not just to obtain it but also to share it 9

with others. This is what has made a big difference between this and the conventional file 10

transfer methods. It makes a user to share the file he is obtaining so that the other users who 11

are trying to obtain the same file would find it easier and also in turn making these users to 12

involve themselves in the file sharing process. Thus the larger the number of users the more 13

is the demand and more easily a file can be transferred between them. 14

BitTorrent protocol has been built on a technology which makes it possible to 15

distribute large amounts of data without the need of a high capacity server, and expensive 16

bandwidth. This is the most striking feature of this file transfer protocol. The transferring of 17

files will never depend on a single source which is supposed the original copy of the file but 18

instead the load will be distributed across a number of such sources. Here not just the sources 19

are responsible for file transfer but also the clients or users who want to obtain the file are 20

involved in this process. This makes the load get distributed evenly across the users and thus 21

making the main source partially free from this process which will reduce the network traffic 22

imposed on it. Because of this, BitTorrent has become one of the most popular file transfer 23

mechanisms in today’s world. Though the mechanism itself is not as simple as an ordinary 24

file transfer protocol, it has gained its popularity because of the sharing policy that it imposes 25

on its users. 26

27

1.2 History 28

BitTorrent was created by a programmer named Bram Cohen. After inventing this 29

new technology he said, "I decided I finally wanted to work on a project that people would 30

actually use, would actually work and would actually be fun". Before this was invented, there 31

were other techniques for file sharing but they were not utilizing the bandwidth effectively. 32

Page 5: seminar report on bit torrent

5 Dept of CSE

The bandwidth had become a bottleneck in such methods. This meant that most of the users 33

can simply download the files without being needed to upload. So this again put a lot of 34

network load on the original sources and on small number of users. This led to inefficient 35

usage of bandwidth of the remaining users. This was the main intention behind Cohen’s 36

invention, i.e., to make the maximum utilization of all the users’ bandwidth who are involved 37

in the sharing of files. By doing so, every person who wants to download a file had to 38

contribute towards the uploading process also. This new and novel concept of Cohen gave 39

birth to a new peer to peer file sharing protocol called BitTorrent. Cohen invented this 40

protocol in April 2001. The first usable version of BitTorrent appeared in October 2002, but 41

the system needed a lot of fine-tuning. BitTorrent really started to take off in early 2003. 42

43

44

2. BitTorrent and Other approaches[3] 45

46

2.1 Other P2P Methods 47

The most common method by which files are transferred on the Internet is the client-48

server model. A central server sends the entire file to each client that requests it, this is how 49

both http and ftp work. The clients only speak to the server, and never to each other. The 50

main advantages of this method are that it's simple to set up, and the files are usually always 51

available since the servers tend to be dedicated to the task of serving, and are always on and 52

connected to the Internet. However, this model has a significant problem with files that are 53

large or very popular, or both. Namely, it takes a great deal of bandwidth and server 54

resources to distribute such a file, since the server must transmit the entire file to each client. 55

Perhaps you may have tried to download a demo of a new game just released, or CD images 56

of a new Linux distribution, and found that all the servers report "too many users," or there is 57

a long queue that you have to wait through. The concept of mirrors partially addresses this 58

shortcoming by distributing the load across multiple servers. But it requires a lot of 59

coordination and effort to set up an efficient network of mirrors, and it's usually only feasible 60

for the busiest of sites. 61

62

2.2 A Typical HTTP File Transfer 63

The most common type of file transfer is through a HTTP server. In this method, a 64

HTTP server listens to the client’s requests and serves them. Here the client can only depend 65

Page 6: seminar report on bit torrent

6 Dept of CSE

on the lone server that is providing the file. The overall download scheme will be limited to 66

the limitations of that server. Also this kind of transfer of file is subjected to single point of 67

failure, where if the server crashes then the whole download process will seize. A single 68

server can handle many such clients and serve the requested file simultaneously to all the 69

clients. The file being served will be available as one single piece, which means that if the 70

download process stops abruptly in the middle the whole file has to be downloaded again. 71

BitTorrent protocol has overcome all these shortcomings seen in this type and thus it is more 72

robust due to which it is chosen by many people over this traditional method of file transfer. 73

74

75

Fig 2.1: HTTP/FTP File Transfer 76

77

2.3 The DAP method 78

Download Accelerator Plus (DAP) is the world's most popular download accelerator. 79

DAP's key features include the ability to accelerate downloading of files in FTP and HTTP 80

protocols, to pause and resume downloads, and to recover from dropped internet connections. 81

On the Internet the same file is often hosted on numerous mirror sites, such as at 82

universities and on ISP servers. DAP immediately senses when a user begins downloading a 83

file and identifies available mirror sites that host the requested file. As soon as it is 84

triggered, DAP's client side optimization begins to determine - in real time - which mirror 85

sites offer the fastest response for the specific user's location. The file is downloaded in 86

several segments simultaneously through multiple connections from the most responsive 87

server(s) and reassembled at the user's PC. This results in better utilization of the user's 88

Page 7: seminar report on bit torrent

7 Dept of CSE

available bandwidth. This ensures that each available mirror server is utilized to serve the 89

users that most benefit. This in turn effects an efficient balancing of the load among available 90

servers across the entire World Wide Web, and reduces download times for users while 91

allowing them to receive maximum benefit from their available bandwidth. DAP's resume 92

functionality and the ability to continue downloading even when one of the participating 93

connections has dropped also provides users with a more reliable download experience. 94

95

2.4 The BitTorrent Approach 96

In BitTorrent, the data to be shared is divided into many equal-sized portions called 97

pieces. Each piece is further sub-divided into equal-sized sub-pieces called blocks. All clients 98

interested in sharing this data are grouped into a swarm, each of which is managed by a 99

central entity called the tracker. BitTorrent has revolutionized the way files are shared 100

between people. It does not require a user to download a file completely from a single server. 101

Instead a file can be downloaded from many such users who are indeed downloading the 102

same file. A user who has the complete file, called the seed will initiate the download by 103

transferring pieces of file to the users. Once a user has some considerable number of such 104

pieces of a file then even he can start sharing them with other users who are yet to receive 105

those pieces. This concept enables a client not to depend on a server completely and also it 106

reduces overall load on the server. 107

108

Fig 2.2 : BitTorrent File Transfer 109

Each client independently sends a file, called a torrent, that contains the location of 110

the tracker along with a hash of each piece. Clients keep each other updated on the status of 111

their download. Clients download blocks from other (randomly chosen) clients who claim 112

they have the corresponding data. Accordingly, clients also send data that they have 113

Page 8: seminar report on bit torrent

8 Dept of CSE

previously downloaded to other clients. Once a client receives all the blocks for a given 114

piece, he can verify the hash of that piece against the provided hash in the torrent. Thus once 115

a client has downloaded and verified all pieces, he can be confident that he has the complete 116

data. 117

Both BitTorrent and DAP download files from multiple sources. Also the files are 118

divided into pieces in both approaches. But BitTorrent has many such features that DAP 119

doesn’t, which has made it the most popular one. In BitTorrent the users participate actively 120

in sharing files along with servers. This is the uniqueness of this protocol. Also this needs an 121

implementation of a dedicated server called tracker to handle the peers connected in the 122

network. The file transfer in DAP takes place through the traditional HTTP or FTP protocol 123

which means that the transfer rate will always be limited by the server’s bandwidth. If these 124

servers are flooded with requests then the breakdown and the transaction will terminate. This 125

is not the case in BitTorrent since the whole process is not depending on servers alone. The 126

load is distributed across the network between peers and servers. This makes BitTorrent far 127

better than its competing peers like DAP and others. 128

129 130

3. Working of BitTorrent[4] 131

132

As previously explained, BitTorrent’s design makes it extremely efficient in the 133

sharing of large data files among interested peers. BitTorrent scales well and is a superior 134

method for transferring and disseminating files between interested peers while limiting free 135

riding (peers who download but do not upload) between those same peers. BitTorrent’s is 136

based on a “tit for tat” reciprocity agreement between users that ultimately results in pareto 137

efficiency. Pareto efficiency is an important economic concept that maximizes resource 138

allocation among peers to their mutual advantage. Cohen’s vision of peers simultaneously 139

helping each other by uploading and downloading has been realized by the BitTorrent 140

system. 141

The protocol shares data through what are known as torrents. For a torrent to be alive 142

or active it must have several key components to function. These components include a 143

tracker server, a .torrent file, a web server where the .torrent file is stored and a complete 144

copy of the file being exchanged. Each of these components is described in the following 145

paragraphs. The file being exchanged is the essence of the torrent and a complete copy is 146

Page 9: seminar report on bit torrent

9 Dept of CSE

referred to as a seed. A seed is a peer in the BitTorrent network willing to share a file with 147

other peers in the network. 148

149

Fig 3.1 : A Typical BitTorrent System 150

151 Peers lacking the file and seeking it from seeds are called leechers. While seeds only 152

upload to leechers, leechers may both download from seeds and upload to other leechers. 153

BitTorrent’s protocol is designed so leeching peers seek each other out for data transfer in a 154

process known as “optimistic unchoking”. Together seeds and leechers engaged in file 155

transfer are referred to as a swarm. A swarm is coordinated by a tracker server serving the 156

particular torrent and interested peers find the tracker via metadata known as a .torrent file. 157

Since BitTorrent has no built in search functionality, .torrent files are usually located via 158

HTTP through search engines or trackers. 159

The first step in the BitTorrent exchange occurs when a peer downloads a .torrent file 160

from a server. The role of .torrent files is to provide the metadata that allows the protocol to 161

function; .torrent files can be viewed as surrogates for the files being shared. These .torrent 162

files contain key pieces of data to function correctly including file length, assigned name, 163

hashing information about the file and the URL of the tracker coordinating the torrent 164

activity. Torrent files can be created using a program such as MakeTorrent, another open 165

source tool available under the free software model. 166

When a .torrent file is opened by the peer’s client software, the peer then connects to 167

the tracker server responsible for coordinating activity for that specific torrent. The tracker 168

and client communicate by a protocol layered on top of HTTP and the tracker’s key role is to 169

coordinate peers seeking the same file for Cohen envisioned “The tracker’s responsibilities 170

are strictly limited to helping peers find each other”. In reality the tracker’s role is a bit more 171

complex as many trackers collect data about peers engaged in a swarm. 172

Page 10: seminar report on bit torrent

10 Dept of CSE

Leechers and seeds are coordinated by the tracker server and the peers periodically 173

update the tracker on their status allowing the tracker to have a global view of the system. 174

The data monitored by the tracker can include peer IP addresses, amount of data 175

uploaded/downloaded for specific peers, data transfer rates among peers, the percentage of 176

the total file downloaded, length of time connected to the tracker, and the ratio of sharing 177

among peers. Usually a tracker coordinates multiple torrents and the most popular trackers 178

are busy coordinating thousands of swarms simultaneously. 179

It should be noted that .torrent files are not the actual file being shared; rather .torrent 180

files are the metadata information which allow which trackers and peers to coordinate their 181

activities. As previously mentioned, the complete file is actually stored on peer seed nodes 182

and not the tracker server. Since .torrent files are small and require little space to store, one 183

server can easily host thousands of .torrent files without prohibitive server or bandwidth 184

requirements. 185

186

187

4. Terminology 188

189

These are the common terms that one would come across while making a typical 190

BitTorrent file transfer. 191

Ø Torrent : this refers to the small metadata file you receive from the web server 192

(the one that ends in .torrent.) Metadata here means that the file contains 193

information about the data you want to download, not the data itself. 194

Ø Peer : A peer is another computer on the internet that you connect to and 195

transfer data. Generally a peer does not have the complete file. 196

Ø Leeches : They are similar to peers in that they won’t have the complete file. 197

But the main difference between the two is that a leech will not upload once 198

the file is downloaded. 199

Ø Seed : A computer that has a complete copy of a certain torrent. Once a client 200

downloads a file completely, he can continue to upload the file which is called 201

as seeding. This is a good practice in the BitTorrent world since it allows other 202

users to have the file easily. 203

Ø Reseed : When there are zero seeds for a given torrent, then eventually all the 204

peers will get stuck with an incomplete file, since no one in the swarm has the 205

Page 11: seminar report on bit torrent

11 Dept of CSE

missing pieces. When this happens, a seed must connect to the swarm so that 206

those missing pieces can be transferred. This is called reseeding. 207

Ø Swarm : The group of machines that are collectively connected for a particular 208

file. 209

Ø Tracker : A server on the Internet that acts to coordinate the action of 210

BitTorrent clients. The clients are in constant touch with this server to know 211

about the peers in the swarm. 212

Ø Share ratio : This is ratio of amount of a file downloaded to that of uploaded. 213

A ratio of 1 means that one has uploaded the same amount of a file that has 214

been downloaded. 215

Ø Distributed copies : Sometimes the peers in a swarm will collectively have a 216

complete file. Such copies are called distributed copies. 217

Ø Choked : It is a state of an uploader where he does not want to send anything 218

on his link. In such cases, the connection is said to be choked. 219

Ø Interested : This is the state of a downloader which suggests that the other end 220

has some pieces that the downloader wants. Then the downloader is said to be 221

interested in the other end. 222

Ø Snubbed : If the client has not received anything after a certain period, it 223

marks a connection as snubbed, in that the peer on the other end has chosen 224

not to send in a while. 225

Ø Optimistic unchoking : Periodically, the client shakes up the list of uploaders 226

and tries sending on different connections that were previously choked, and 227

choking the connections it was just using. This is called optimistic unchoking. 228

229

230

5. Architecture of BitTorrent 231

The BitTorrent protocol can be split into the following five main components: 232

Ø Metainfo File - a file which contains all details necessary for the protocol to operate. 233

Ø Tracker - A server which helps to manage the BitTorrent protocol. 234

Ø Peers - Users exchanging data via the BitTorrent protocol. 235

Ø Data - The files being transferred across the protocol. 236

Ø Client - The program which sits on a peers computer and implements the protocol. 237

Page 12: seminar report on bit torrent

12 Dept of CSE

Peers use TCP (Transport Control Protocol) to communicate and send data. This protocol 238

is preferable over other protocols such as UDP (User Datagram Protocol) because TCP 239

guarantees reliable and in-order delivery of data from sender to receiver. UDP cannot give 240

such guarantees, and data can become scrambled, or lost all together. The tracker allows 241

peers to query which peers have what data, and allows them to begin communication. Peers 242

communicate with the tracker via the plain text via HTTP (Hypertext Transfer Protocol) The 243

following diagram illustrates how peers interact with each other, and also communicate with 244

a central tracker. 245

246

Fig 5.1 : Architecture of a BitTorrent System 247

5.1 Metainfo File [2] 248

When someone wants to publish data using the BitTorrent protocol, they must create a 249

metainfo file. This file is specific to the data they are publishing, and contains all the 250

information about a torrent, such as the data to be included, and IP address of the tracker to 251

connect to. A tracker is a server which 'manages' a torrent, and is discussed in the next 252

section. The file is given a '.torrent' extension, and the data is extracted from the file by a 253

BitTorrent client. This is a program which runs on the user computer, and implements the 254

bittorrent protocol. Every metainfo file must contain the following information, (or 'keys'): 255

Page 13: seminar report on bit torrent

13 Dept of CSE

• info: A dictionary which describes the file(s) of the torrent. Either for the single file, 256

or the directory structure for more files. Hashes for every data piece, in SHA 1 format 257

are stored here. 258

• announce: The announce URL of the tracker as a string 259

The following are optional keys which can also be used: 260

• announce-list: Used to list backup trackers 261

• creation date: The creation time of the torrent by way of UNIX time stamp (integer 262

seconds since 1-Jan-1970 00:00:00 UTC) 263

• comment: Any comments by the author 264

• created by: Name and Version of programme used to create the metainfo file 265

These keys are structured in the metainfo file as follows: 266

267

{'info': {'piece length': 131072, 'length': 38190848L, 'name': 268

'Cory_Doctorow_Microsoft_Research_DRM_talk.mp3', 'pieces': 269

'\xcb\xfaz\r\x9b\xe1\x9a\xe1\x83\x91~\xed@\.....', } 'announce': 270

'http://tracker.var.cc:6969/announce', 'creation date': 1089749086L } 271

272

Instead of transmitting the keys in plain text format, the keys contained in the 273

metainfo file are encoded before they are sent. Encoding is done using bittorrent specific 274

method known as 'bencoding'. 275

5.1.1 Bencoding: 276

Bencoding is used by bittorrent to send loosely structured data between the BitTorrent 277

client and a tracker. Bencoding supports byte strings, integers, lists and dictionaries. 278

Bencoding uses the beginning delimiters 'i' / 'l' / 'd' for integers, lists and dictionaries 279

respectively. Ending delimiters are always 'e'. Delimiters are not used for byte strings. 280

Bencoding Structure: 281

• Byte Strings : <string length in base ten ASCII> : <string data> 282

• Integers: i<base ten ASCII>e 283

• Lists: l<bencoded values>e 284

• Dictionaries: d<bencoded string><bencoded element>e 285

Page 14: seminar report on bit torrent

14 Dept of CSE

Minus integers are allowed, but prefixing the number with a zero is not permitted. 286

However '0' is allowed. 287

Examples of bencoding: 288

4:spam // represents the string "spam" 289

i3e // represents the integer "3" 290

l4:spam4:eggse // represents the list of two strings: ["spam","eggs"] 291

d4:spaml1:a1:bee // represents the dictionary {"spam" => ["a" , "b"] } 292

5.1.2 Metainfo File Distribution : 293

Because all information which is needed for the torrent is included in a single file, this 294

file can easily be distributed via other protocols, and as the file is replicated, the number of 295

peers can increase very quickly. The most popular method of distribution is using a public 296

indexing site which hosts the metainfo files. A seed will upload the file, and then others can 297

download a copy of the file over the HTTP protocol and participate in the torrent. 298

5.2 Tracker[2] 299

A tracker is used to manage users participating in a torrent (known as peers). It stored 300

statistics about the torrent, but its main role is allow peers to 'find each other' and start 301

communication, i.e. to find peers with the data they require. Peers know nothing of each other 302

until a response is received from the tracker. Whenever a peer contacts the tracker, it reports 303

which pieces of a file they have. That way, when another peer queries the tracker, it can 304

provide a random list of peers who are participating in the torrent, and have the required 305

piece. 306

A tracker is a HTTP/HTTPS service and typically works on port 6969. The address of 307

the tracker managing a torrent is specified in the metainfo file, a single tracker can manage 308

multiple torrents. Multiple trackers can also be specified, as backups, which are handled by 309

the BitTorrent client running on the users computer. BitTorrent clients communicate with the 310

tracker using HTTP GET requests, which is a standard CGI method. This consists of 311

appending a "?" to the URL, and separating parameters with a "&". 312

The parameters accepted by the tracker are: 313

• info_hash: 20-byte SHA1 hash of the info key from the metainfo file. 314

• peer_id: 20-byte string used as a unique ID for the client. 315

Page 15: seminar report on bit torrent

15 Dept of CSE

• port: The port number the client is listed on. 316

• uploaded: The total amount uploaded since the client sent the 'started' event to the 317

tracker in base ten ASCII. 318

319

Fig 5.2 : Tracker 320

• downloaded: The total amount downloaded since the client sent the 'started' event to 321

the tracker in base ten ASCII. 322

• left: The number of bytes the client till has to download, in base ten ASCII. 323

• compact: Indicates that the client accepts compacted responses. The peer list can then 324

be replaced by a 6 bytes per peer. The first 4 bytes are the host, and the last 2 bytes 325

are port. 326

• event: If specified, must be one of the following: started, stopped, completed. 327

• ip: (optional) The IP address of the client machine, in dotted format. 328

• numwant: (optional) The number of peers the client wishes to receive from the 329

tracker. 330

• key: (optional) Allows a client to identify itself if their IP address changes. 331

• trackerid: (optional) If previous announce contained a tracker id, it should be set 332

here. 333

The tracker then responds with a "text/plain" document with the following keys: 334

Page 16: seminar report on bit torrent

16 Dept of CSE

• failure message: If present, then no other keys are included. The value is a human 335

readable error message as to why the request failed. 336

• warning message: Similar to failure message, but response still gets processed. 337

• interval: The number of seconds a client should wait between sending regular 338

requests to the tracker. 339

• min interval: Minimum announce interval. 340

• tracker id: A string that the client should send back with its next announce. 341

• complete: Number of peers with the complete file. 342

• incomplete: number of non-seeding peers (leechers) 343

• peers: A list of dictionaries including: peer id, IP and ports of all the peers. 344

5.2.1 Scraping 345

Scraping is the process of querying the state of a given torrent (or all torrents) that the 346

tracker is managing. The result is known as a "scrape page". To get the scrape, you must start 347

with the announce URL, find the last '/' and if the text immediately following the '/' is 348

'announce', then this can be substituted for 'scrape' to find the scrape page. 349

Examples: 350

Announce URL

Scrape URL

http://example.com/annnounce à http://example.com/scrape

http://example.com/a/annnounce à http://example.com/a/scrape

http://example.com/announce.php à http://example.com/scrape.php

351 The tracker then responds with a "text/plain" document with the following bencoded keys: 352

• files: A dictionary containing one key pair for each torrent. Each key is made up of a 353

20-byte binary hash value. The value of that key is then a nested dictionary with the 354

following keys: 355

• complete: number of peers with the entire file (seeds) 356

• downloaded: total number of times the entire file has been downloaded. 357

• incomplete: the number of active downloaders (lechers) 358

• name: (optional) the torrent name 359

Page 17: seminar report on bit torrent

17 Dept of CSE

5.3 Peers[4] 360

Peers are other users participating in a torrent, and have the partial file, or the 361

complete file (known as a seed). Pieces are requested from peers, but are not guaranteed to be 362

sent, depending on the status of the peer. BitTorrent uses TCP (Transmission Control 363

Protocol) ports 6881-6889 to send messages and data between peers, and unlike other 364

protocols, does not use UDP (User Datagram Protocol) 365

5.3.1 Piece Selection 366

Peers continuously queue up the pieces for download which they require. Therefore 367

the tracker is constantly replying to the peer with a list of peers who have the requested 368

pieces. Which piece is requested depends upon the BitTorrent client. There are three stages of 369

piece selection, which change depending on which stage of completion a peer is at. 370

5.3.2 Random First Piece 371

When downloading first begins, as the peer has nothing to upload, a piece is selected 372

at random to get the download started. Random pieces are then chosen until the first piece is 373

completed and checked. Once this happens, the 'rarest first' strategy begins. 374

5.3.3 Rarest First 375

When a peer selects which piece to download next, the rarest piece will be chosen 376

from the current swarm, i.e. the piece held by the lowest number of peers. This means that the 377

most common pieces are left until later, and focus goes to replication of rarer pieces. 378

At the beginning of a torrent, there will be only one seed with the complete file. There 379

would be a possible bottle neck if multiple downloaders were trying to access the same piece. 380

rarest first avoids this because different peers have different pieces. As more peers connect, 381

rarest first will the some load off of the tracker, as peers begin to download from one another. 382

Eventually the original seed will disappear from a torrent. This could be because of 383

cost reasons, or most commonly because of bandwidth issues. Losing a seed runs the risk of 384

pieces being lost if no current downloaders have them. Rarest first works to prevent the loss 385

of pieces by replicating the pieces most at risk as quickly as possible. If the original seed goes 386

before at least one other peer has the complete file, then no one will reach completion, unless 387

a seed re-connects. 388

Page 18: seminar report on bit torrent

18 Dept of CSE

5.3.4 Endgame Mode 389

When a download nears completion, and waiting for a piece from a peer with slow 390

transfer rates, completion may be delayed. To prevent this, the remaining sub-pieces are 391

requested from all peers in the current swarm. 392

5.3.5 Peer Distribution 393

The role of the tracker ends once peers have 'found each other'. From then on, 394

communication is done directly between peers, and the tracker is not involved. The set of 395

peers a BitTorrent client is in communication with is known as a swarm. 396

To maintain the integrity of the data which has been downloaded, a peer does not 397

report that they have a piece until they have performed a hash check with the one contained 398

in the metainfo file. 399

Peers will continue to download data from all available peers that they can, i.e. peers 400

that posses the required pieces. Peers can block others from downloading data if necessary. 401

This is known as choking. 402

5.3.6 Choking[2] 403

When a peer receives a request for a piece from another peer, it can opt to refuse to 404

transmit that piece. If this happens, the peer is said to be choked. This can be done for 405

different reasons, but the most common is that by default, a client will only maintain a default 406

number of simultaneous uploads (max_uploads). All further requests to the client will be 407

marked as choked. Usually the default for max_uploads is 4. 408

Fig 5.3 : Choking by a peer 409

Page 19: seminar report on bit torrent

19 Dept of CSE

The peer will then remain choked until an unchoke message is sent. Another example 410

of when a peer is choked would be when downloading from a seed, and the seed requires no 411

pieces. To ensure fairness between peers, there is a system in place which rotates which peers 412

are downloading. This is known as optimistic unchoking. 413

5.3.7 Optimistic Unchoking[2] 414

To ensure that connections with the best data transfer rates are not favoured, each peer 415

has a reserved 'optimistic unchoke' which is left unchoked regardless of the current transfer 416

rate. The peer which is assigned to this is rotated every 30 seconds. This is enough time for 417

the upload / download rates to reach maximum capacity. 418

The peers then cooperate using the tit for tat strategy, where the downloader responds 419

in one period with the same action the uploader used in the last period. 420

5.3.8 Communication Between Peers 421

Peers which are exchanging data are in constant communication. Connections are 422

symmetrical, and therefore messages can be exchanged in both directions. These messages 423

are made up of a handshake, followed by a never-ending stream of length-prefixed messages. 424

5.3.9 Handshaking[2] 425

Handshaking is performed as follows: 426

1. The handshake starts with character 19 (base 10) followed by the string 'BitTorrent 427

Protocol'. 428

2. A 20 byte SHA1 hash of the bencoded info value from the metainfo is then sent. If 429

this does not match between peers the connection is closed. 430

3. A 20 byte peer id is sent which is then used in tracker requests and included in peer 431 requests. If the peer id does not match the one expected, the connection is closed. 432

5.3.10 Message Stream[2] 433

This constant stream of messages allows all peers in the swarm to send data, and 434

control interactions with other peers. 435

A peer will be 'interested' in data if there is a peer which has the required pieces. If the 436

peer which has this data is not choked, then data will be transferred. After handshaking, by 437

default, connections start out as choked, and not interested. 438

439

Page 20: seminar report on bit torrent

20 Dept of CSE

Prefix Message Structure Additional Information

0 choke <len=0001><id=0> Fixed length, no payload. This enables a peer to block another peer’s request for data.

1 unchoke <len=0001><id=1> Fixed length, no payload. Unblock peer, and if they are still interested in the data, upload will begin.

2 interested <len=0001><id=2> Fixed length, no payload. A user is interested if a peer has the data they require.

3 not interested

<len=0001><id=3> Fixed length, no payload. The peer does not have any data required.

4 have <len=0005><id=4><piece index> Fixed length. Payload is the zero-based index of the piece. Details the pieces that peer currently has.

5 bitfield <len=0001+X><id=5><bitfield> Sent immediately after handshaking. Optional, and only sent if client has pieces. Variable length, X is the length of bitfield. Payload represents pieces that have been successfully downloaded.

6 request <len=0013><id=6><index><begin><length>

Fixed length, used to request a block of pieces. The payload contains integer values specifying the index, begin location and length.

7 piece <len=0009+X><id=7><index><begin><block>

Sent together with request messages. Fixed length, X is the length of the block. The payload contains integer values specifying the index, begin location and length.

8 cancel <len=13><id=8><index><begin><length>

Fixed length, used to cancel block requests. payload is the same as ‘request’. Typically used during ‘end game’ mode.

Page 21: seminar report on bit torrent

21 Dept of CSE

5.4 Data 440

BitTorrent is very versatile, and can be used to transfer a single file, of multiple files 441

of any type, contained within any number of directories. File sizes can vary hugely, from 442

kilobytes to hundreds of gigabytes. 443

5.4.1 Piece Size 444

Data is split into smaller pieces which sent between peers using the bittorrent 445

protocol. These pieces are of a fixed size, which enables the tracker to keep tabs on who has 446

which pieces of data. This also breaks the file into verifiable pieces, each piece can then be 447

assigned a hash code, which can be checked by the downloader for data integrity. These 448

hashes are stored as part of the 'metinfo file'. 449

The size of the pieces remains constant throughout all files in the torrent except for 450

the final piece which is irregular. The piece size a torrent is allocated depends on the amount 451

of data. Piece sizes which are too large will cause inefficiency when downloading (larger risk 452

of data corruption in larger pieces due to fewer integrity checks), whereas if the piece sizes 453

are too small, more hash checks will need to be run. 454

As the number of pieces increase, more hash codes need to be stored in the metainfo 455

file. Therefore, as a rule of thumb, pieces should be selected so that the metainfo file is no 456

larger than 50 - 75kb. The main reason for this is to limit the amount of hosting storage and 457

bandwidth needed by indexing servers. The most common piece sizes are 256kb, 512kb and 458

1mb. The number of pieces is therefore: total length / piece size. 459

For example, a 1.4Mb file could be split into the following pieces. This shows 460

5 * 256kb pieces, and a final piece of 120kb. 461

462

Fig 5.4 : Pieces of a file 463

5.5 BitTorrent Clients 464

A BitTorrent client is an executable program which implements the BitTorrent 465

protocol. It runs together with the operating system on a users machine, and handles 466

Page 22: seminar report on bit torrent

22 Dept of CSE

interactions with the tracker and peers. The client sits on the operating system and is 467

responsible for controlling the reading / writing of files, opening sockets etc. 468

A metainfo file must be opened by the client to start partaking in a torrent. Once the 469

file is read, the necessary data is extracted, and a socket must be opened to contact the 470

tracker. BitTorrent clients use TCP ports 6881-6999. To find an available port, the client will 471

start at the lowest port, and work upwards until it finds one it can use. This means the client 472

will only use one port, and opening another BitTorrent client will use another port. A client 473

can handle multiple torrents running concurrently. 474

475 476

6. Vulnerabilities of BitTorrent 477

6.1 Attacks on BitTorrent 478

As we have seen so far, BitTorrent is one of most favoured file transfer protocol in 479

today’s world. But it has been exposed to various attacks in the recent past due to the 480

vulnerabilities that are being exploited by the hacker community. Here are some of the 481

attacks that are commonly seen. 482

6.1.1 Pollution attack 483

1. The peers receive the peer list from the tracker. 484

2. One peer contacts the attacker for a chunk of the file. 485

3. The attacker sends back a false chunk. 486

4. This false chunk will fail its hash and will be discarded. 487

5. Attacker requests all chunks from swarm and wastes their upload bandwidth. 488

6.1.2 DDOS attack 489

DDOS stands for Distributed denial of service. This attack is possible because of the 490

fact that BitTorrent Tracker has no mechanism for validating peers. This means there is no 491

way to trace the culprit in these kind of attacks. Also attacks of this stature are possible 492

because of the modifications that can be done to the client software. 493

1. The attacker downloads a large number of torrent files from a web server. 494

2. The attacker parses the torrent files with a modified BitTorrent client and 495

spoofs his IP address and port number with the victims as he announces he is 496

joining the swarm. 497

3. As the tracker receives requests for a list of participating peers from other 498

clients it sends the victims IP and port number. 499

Page 23: seminar report on bit torrent

23 Dept of CSE

4. The peers then attempt to connect to the victim to try and download a chunk of 500

the file. 501

6.1.3 Bandwidth Shaping 502

Many ISPs don’t encourage the use of BitTorrent from their users. This is because 503

BitTorrent is usually used to transfer large sized files due to which the traffic over the ISPs 504

increase to a large extent. To avoid such exploding traffic on their servers many ISPs have 505

started to avoid the traffic caused by BitTorrent. This can be done by sniffing the packets that 506

pass through and detecting whether they oblige BitTorrent protocol. ISPs make use of filters 507

to find out such packets and block them from passing their servers. 508

509

6.2 Solutions 510

Here are a few solutions to the attacks that were discussed above. 511

6.2.1 Pollution attack 512

The peers which perform such attacks are identified by tracing their IPs. Then, such 513

IPs are blacklisted to avoid further communication with them. These blacklisted IPs are 514

blocked by denying them connections with other peers. This is done by using software like 515

Peer Guardian or moBlock, which download the list of blacklisted IPs from internet. 516

517

6.2.2 DDOS attack 518

The main solution to this kind of attack is to have clients parse the response from the 519

tracker. In the case where a host (tracker) does not respond to a peer’s request with a valid 520

BitTorrent protocol message it should be inferred that this host is not running BitTorrent. The 521

peer should then exclude hat address from its tracker list, or set a high retry interval for that 522

specific tracker. Another fix would be for web sites hosting torrents to check and report 523

whether all trackers are active, or even remove the on-responding trackers from the tracker 524

list in the torrent. Another measure could be to restrict the size of the tracker list to reduce the 525

effectiveness of such an attack. 526

527

6.2.3 Bandwidth Shaping 528

There are broadly two approaches followed to counter this type of attacks. The first 529

method is to encrypt the packets sent by the means of BitTorrent protocol. By doing this, the 530

filters that sniff packets will not be able to detect such packets belonging to BitTorrent 531

protocol. This means that the filters are fooled by the encrypted packets and thus packets can 532

Page 24: seminar report on bit torrent

24 Dept of CSE

sneak through such filters. Another approach is to make use of tunnels. Tunnels are dedicated 533

paths where the filters are avoided by using VPN software which connects to the unfiltered 534

networks. This results in successfully bypassing the filters and thus the packets are 535

guaranteed to be transmitted across networks. 536

537

538

7. Conclusion 539

540

BitTorrent pioneered mesh-based file distribution that effectively utilizes all the 541

uplinks of participating nodes. Most followon research used similar distributed and 542

randomized algorithms for peer and piece selection, but with different emphasis or twists. 543

This work takes a different approach to the mesh-based file distribution problem by 544

considering it as a scheduling problem, and strives to derive an optimal schedule that could 545

minimize the total elapsed time. BitTorrent’s application in this information sharing age is 546

almost priceless. However, 547

it is still not perfected as it is still prone to malicious attacks and acts of misuse. Moreover, 548

the lifespan of each torrent is still not satisfactory, which means that the length of file 549

distribution can only survive for a limited period of time. Thus, further analysis and a more 550

thorough study in the protocol will enable one to discover more ways to improve it. 551

552

553

8. References 554

555

1. Information on BitTorrent Protocol 556

en.wikipedia.org/wiki/BitTorrent_(protocol) 557

2. BitTorrent Specifications http://wiki.theory.org/BitTorrentSpecification 558

3. Other Information http://www.dessent.net/btfaq/#compare 559

4. Cohen, Bram (2003) Incentives Build Robustness in BitTorrent, May 22 2003 560

http://www.bitconjurer.org/BitTorrent/bittorrentecon.pdf 561