Upload
sanjay-ravishankar
View
2.154
Download
2
Tags:
Embed Size (px)
Citation preview
The BitTorrent Protocol
Common Scenario
• Millions want to download the same popular huge files (for free)– Softwares– Media (the real example!)
• Client-server model fails– Single server fails– Can’t afford to deploy enough servers
Router
“Interested” End-host
Source
Client-Server
Overloaded!
Peer-to-Peer
• A model of communication where every node in the network acts alike.
• As opposed to the Client-Server model, where one node provides services and other nodes use the services.
Advantages of P2P Computing
• No central point of failure– E.g., the Internet and the Web do not have a central
point of failure.– Most internet and web services use the client-server
model (e.g. HTTP), so a specific service does have a central point of failure.
• Scalability– Since every peer is alike, it is possible to add more
peers to the system and scale to larger networks.
Disadvantage of P2P Computing• Decentralized coordination. • All nodes are not created equal.
BitTorrent
• Written by Bram Cohen in 2001• Designed to transfer large files • 160 million clients, 100 million active users• Used by many different people and
organisations• The more popular a large video, audio or
software file, the faster and cheaper it can be transferred with BitTorrent
• “Pull-based” “swarming” approach– Each file split into smaller pieces– Nodes request desired pieces from neighbors
• As opposed to parents pushing data that they receive
– Pieces not downloaded in sequential order
• Encourages contribution by all nodes• Peer-to-peer in nature• Even if clients join simultaneously (“flash crowd”)• BitTorrent protocol is implemented in
applications called BitTorrent Clients such as uTorrent, Bit Comet.
• Peers – A node or computer that does not have the complete file
• Seed or seeder - A computer with a complete copy of a BitTorrent file
• Swarm - A group of computers simultaneously sending (uploading) or receiving (downloading) the same file
• .torrent - A pointer file that directs your computer to the file you want to download
• Tracker - A server that manages the BitTorrent file-transfer process
BitTorrent Terminology
BitTorrent Swarm
• Swarm– Set of peers all downloading the same file– Organized as a random mesh
• Each node knows list of pieces downloaded by neighbors
• Node requests pieces it does not own from neighbors
3
User obtains *.torrent file. File contains meta info about a target file.
2
User loads *.torrent file into BitTorrent client, which then looks up the named client
1
Armed with a list of peers holding pieces of the file, user downloads from many peers
4
A *.torrent guides users to owners of a file
Tracker coordinates peers.
All peers act as a source
Peers exchange different pieces of the file with one another until they assemble a whole
As soon as the user has a piece of the file on his machine, he can become a source of that piece to other peers, helping speed download
Seed
A machine with a complete copy (the seed) can distribute incomplete pieces to multiple peers
• All data in a metainfo file is encoded. • info: a dictionary that describes the file(s) of the torrent. • announce: contains the URL of the “tracker”• creation date• Comments from the author(optional)• created by: (optional)• piece length: number of bytes in each piece (integer)• pieces: string consisting of the concatenation of all 20-
byte SHA1 hash values, one per piece
The key ingredients of the *.torrent file are the tracker’s address and the unique SHA1 hash
Bit Torrent Download• Download and install the BitTorrent client
software
• Check and configure firewall and/or router for BitTorrent (if applicable)
• Find files to download
• Download and open the .torrent pointer file • Let BitTorrent give and receive pieces of the file
• Stay connected after the download completes to share your .torrent files with others
Upload and Publish File
• Publish the .torrent file on torrent search Index sites such as PirateBay
• Download and install the BitTorrent client software
• Create a New .torrent file
Peer-peer transactions:Choosing pieces to request
• Rarest-first: Look at all pieces at all peers, and request piece that’s owned by fewest peers– Increases diversity in the pieces downloaded
• avoids case where a node and each of its peers have exactly the same pieces; increases throughput
– Increases likelihood all pieces still available even if original seed leaves before any one node has downloaded entire file
Choosing pieces to request
• Random First Piece:– When peer starts to download, request
random piece.• So as to assemble first complete piece quickly• Then participate in uploads
– When first complete piece assembled, switch to rarest-first
Why BitTorrent took off
• Better performance through “pull-based” transfer– Slow nodes don’t bog down other nodes
• Allows uploading from hosts that have downloaded parts of a file
Why BitTorrent took off
• Practical Reasons (perhaps more important!)– Working implementation (Bram Cohen) with simple
well-defined interfaces for plugging in new content– Many recent competitors got sued / shut down
• Napster, Kazaa
– Users use well-known, trusted sources to locate content• Avoids the pollution problem, where garbage is passed off as
authentic content
Pros and cons of BitTorrent
• Pros– Proficient in utilizing partially downloaded files– Discourages “freeloading”
• By rewarding fastest uploaders
– Encourages diversity through “rarest-first”• Extends lifetime of swarm
• Works well for “hot content”
Pros and cons of BitTorrent
• Cons– Assumes all interested peers active at same
time; performance deteriorates if swarm “cools off”
– Even worse: no trackers for obscure content
Pros and cons of BitTorrent
• Dependence on centralized tracker: pro/con?– Single point of failure: New nodes can’t
enter swarm if tracker goes down– Lack of a search feature
• Prevents pollution attacks• Users need to resort to out-of-band search: well
known torrent-hosting sites / plain old web-search
“Trackerless” BitTorrent
• To be more precise, “BitTorrent without a centralized-tracker”
• E.g.: Azureus• Uses a Distributed Hash Table (Kademlia DHT)• Tracker run by a normal end-host (not a web-
server anymore)– The original seeder could itself be the tracker – Or have a node in the DHT randomly picked to act as
the tracker
Why is (studying) BitTorrent important?
• BitTorrent consumes significant amount of internet traffic today– In 2004, BitTorrent accounted for 35 to 60% of
all internet traffic (according to CacheLogic)– BT always used for legal software (linux iso)
distribution to
• With help from BitTorrent, Facebook can now push hundreds of megabytes of new code to all servers worldwide in just a minute.
• Twitter is calling in the help of BitTorrent to deploy files across its many servers in a more efficient way. The project dubbed ‘Murder’ is based on the Open Source BitTornado BitTorrent client.
Companies using BitTorrent Technology
Conclusion
• BitTorrent is a well thought-out protocol that embraces aspects of cooperation and self-optimizing mechanisms.
• BitTorrent propose solutions for current optimization and scalability problems
Thank you for your attention.