Introduction P2p

Introduction to P2P systems

LicenseAttribution-ShareAlike 2.5 You are free:to copy, distribute, display, and perform the work to make derivative works to make commercial use of the work Under the following conditions: Attribution. You must give the original author credit. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a licence identical to this one.For any reuse or distribution, you must make clear to others the licence terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above.This is a human-readable summary of the Legal Code (the full licence). Disclaimer

P2P is about sharing resources

Your CPU time Your bandwidth Your disk space

What is P2P

From WikipediaA peer-to-peer computer network is a

network that relies on the computing power and bandwidth of the participants in the network rather than concentrating it in a relatively low number of servers

P2P and GRID

From Wikipedia

Grid computing […] performs higher throughput computing by taking advantage of many networked computers to model a virtual computer architecture.

Topology Comparison

Client/server GRID P2P

server

client

client=server

Overlay

Crs4.it Australian ISP

Mobile phones in cell xyz

Overlay

Crs4.it Australian ISP

Mobile phones in cell xyz

Three main issues in P2P systems Bootstrapping Index/Lookup (query) Delivery of large objects (in case of file

sharing)

A la Napster

Query / Query Hits

GET <file>

Copyright issues with Napster

Napster claimed that the law allows people to share music with friends.

The court considered this position illegal and Napster was closed.

Gnutella Overlay

RequestorResponder

Gnutella Messages

Byte Description

0 - 15 GUID

16 ping, pong, push, query, queryhit

17 TTL

18 hops

19-22 Payload length

23 – 23+payload length

Gnutella messages

ping: discover hosts on network pong: reply to ping query: search for a file query hit: reply to query push: download request for firewalled

servents

Ref. http://rfc-gnutella.sourceforge.net/developer/stable/index.html

Gnutella: PING

Requestor

Gnutella: PONG

Requestor

Gnutella: QUERY

Requestor

Gnutella: QUERY-HITS

DRequestor

QUERY-HITS

Responder 1

Responder 2

Gnutella: GET the file

RequestorResponder 1

GET file HTTP/1.1

Gnutella, behind firewalls

Requestor Responder

GET file

Gnutella, behind firewalls (2)

DRequestor

Responder

Gnutella, behind firewalls (3)

Requestor

Responder

Bootstrapping in Gnutella

X-Try Ping/Pong Storing from QueryHit messages GWebCache

Open issues in Gnutella Latency Scalability Vulnerability Privacy Security

Is Gnutella obsolete?

Alive and Kicking The version 0.6 of the protocol prevents

pure flooding and uses smart routing based on Ultrapeers

More than 2 millions users with 500,000 nodes always up

Popularity of P2P Networks (measured by Slick.com) Latest Statistics taken 2006-02-26 22:14:12:

eDonkey2KUsers: 3,474,261FastTrackUsers: 2,609,688GnutellaUsers: 2,219,539OvernetUsers: 578,521MP2PUsers: 252,893FiletopiaUsers: 4,806

Hub (Gnutella2 et al.)

Hub Web

Hub Requirements

> 100 sockets CPU and RAM for servicing the network Uptime (>2 hours) Broadband (also for upload) Able to receive inbound TCP and/or UDP (IP

in the global address space, no NAT)

Hub Tasks

Keep up-to-date information about other hubs

Manage routing tables to route messages efficiently

Manage filters for query messages Monitor they own resources.

Query Hash Table

QHTs provide information to know that a particular node (and possibly its descendants) will not be able to provide any matching objects for a given query.

queries can be discarded confidently. Neighbours know what their neighbours do not

have, but cannot say for sure what they do have.

What is Hashing

From Wikipedia, the free encyclopedia A hash function or hash algorithm is a

function for examining the input data and producing an output hash value. The process of computing such a value is known as hashing. The process of hashing has the property that two different inputs are unlikely to hash to the same hash value.

What is Hashing (2)

Collisions occur with 2^(-N)

Query Hash Table

1 1 1 1 1 1 1 1 1 1

0 1 2 2^N

0<= Hash(word) <= 2^N

Query Filtering

If any of the lookups based on URNs found a hit, send the query packet

If at least two thirds of lookups based on words found a hit, send

Otherwise, drop the packet

Consider all text content in the query, including generic search text and metadata search text if it is present.

Tokenize quoted phrases into words, ignoring the phrase at this level

Distributed hashtables

Distributed Hashtables Main features: a key is mapped onto a

node of the network. Several proposals: Chord, Pastry and

Kademlia. Lookup(key) reaches the right node with

O(log(N) ) hops.

Possible applications of DHT

DHT DNS Content lookup Web search engine

DNS over DHT (1)

Problem: how to register a name onto a IP address

Assign a name to your machine, example ‘mymachine’

Check if this name is available or not using the DHT operation get(‘mymachine’).

If the result is null then register the name and the IP with the DHT operation put(‘mymachine’, 212.22..)

DNS over DHT (2)

Problem: how to resolve a name onto a IP address

Use the DHT operation get(hostname). The result if not null is the IP address

you’re searching

Content indexing/lookup on DHT A content has a set of metadata (i.e.

author, editor, genre, …) Build a different index based on DHT for

each metadata i.e. the index for author

put(‘john’, http://host/dir/content.avi)

How DHT works

In DHT each node has a node ID which belogs to a set S (for instance the set of bitstrings with length 160)

Also keys must hashed in the same set S (hash(key) belongs to S)

Web crawlers and DHT

Assume a network of nodes in a DHT Assume each node runs also a crawler. For each word in a Web page it performs

Put(word,URL) So a distributed index of the Web is

built[1]

Web search and DHT

When the user type a keyword ‘foo’ lookup the DHT Get(‘foo’)

The DHT will give the list of URL indexed with ‘foo’

Kademlia S = [00 ....0 - 11 ...1] the set of 160bit

strings Each node has a node ID in S For each 'key' hash(key) is in S

Kademlia distance Given x,y in S Define the distance d(x,y) = xor(x,y) d has the following properties: d(x,y) = d(y,x) d(x,x) = 0 d(x,y) + d(y,z) >= d(x,z)

k-Buckets in kademlia

Each node stores an array of lists: list[i] i = 0,1, ... , 159 list[i] stores up to k tuples: (IP,port,ID) list[i] stores tuples whose ID is:

2^i <= D(this,ID)< 2^(i+1) list[i] is ordered as LRS (last recent

Tree for nodes in kademlia

k-Buckets in kademlia For small values of i, list[i] has few

elements For larger values of i, list[i] is likely to

contain more elements.

Operations in kademlia

PING (IP, port) STORE (key, value) FIND_VALUE (key) FIND_NODE (ID)

Lookup in Kademlia FIND_NODE(hash(k)) Compute D=xor(this,hash(key)) Find a tuples in list[i] (i.e. a=3) Send FIND_NODE(hash(key)) to the 3

nodes I receive other node addresses. Reiterate

FIND_NODE(hash(key)) on them. Stop when no new addresses are received

Nodes Joining and Leaving

Whenever one node asks another for its contacts, the called node stores the contact information of the caller.

When a node joins the network it takes some of the contacts of an arbitrary node and uses them as its own.

It then does a search for itself. This results in other nodes being called, which makes them aware of the new node's existence

Node Joining and Leaving (2)

A new node may have become the closest node to certain keys

The previous closest nodes will replicate the appropriate key/value pairs to the new node

Ignoring replication the cost of a node joining is only O(log n) messages.

Range Query in DHT (1)

DHT maps a key onto a node It is easy to lookup a value given a key It is uneasy lookup values in a range of

keys Example 1:

Lookup all tuples in ‘aaaa’ < key < ‘bbbb’ Example 2:

Lookup all tuples in ’39,88’ < lat < ’39,94’

References (1)

Napster Timeline http://www.cnn.tv/SPECIALS/2001/napster/timeline.html

The Gnutella Developer Forum http://www.the-gdf.org/wiki/index.php?title=Main_Page

History of Gnutella in ‘Gnutella’ http://ntrg.cs.tcd.ie/undergrad/4ba2.02-03/p5.html

Slyck.com DHT Links

http://www.etse.urv.es/~cpairot/dhts.html

References (2)

YACY (DHT Web search/index) http://www.yacy.net/yacy/

Kademlia: A Peer-to-peer Information System Based on the XOR Metric. (paper)

Khashmir – Kademlia in Python http://khashmir.sourceforge.net/

A Case Study in Building Layered DHT Applications (paper on range query/DHT)

http://www.placelab.org/publications/pubs/IRS-TR-05-001.pdf

LicenseAttribution-ShareAlike 2.5 You are free:to copy, distribute, display, and perform the work to make derivative works to make commercial use of the work Under the following conditions: Attribution. You must give the original author credit. Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a licence identical to this one.For any reuse or distribution, you must make clear to others the licence terms of this work. Any of these conditions can be waived if you get permission from the copyright holder. Your fair use and other rights are in no way affected by the above.This is a human-readable summary of the Legal Code (the full licence). Disclaimer

Introduction P2p

Documents

P2P Systems - cs.jhu.edubaruch/teaching/600.447/class-slides/P2P/P2P... · P2P Systems Keith W. Ross ... . 40 Gnutella overlay management UNew node uses bootstrap node to get IP

Introduction to Peer-to-Peer Networks. What is a P2P network A P2P network is a large distributed system. It uses the vast resource of PCs distributed

Introduction to Bitcoin A P2P Electronic Cash System

Introduction to P2P Computing · Department of Applied Mathematics and Computer Science Introduction to P2P Computing Nicola Dragoni Embedded Systems Engineering DTU Compute 1. Introduction

P2p Peer To Peer Introduction

Peer to Peer Technologies. Outline What is P2P? P2P architectures Examples of P2P system (P2P applications) P2P data management techniques Conclusions

Cs423-cotter1 P2P Discovering P2P (Miller) Internet

Introduction Network Science: Peer-to-Peer Systems · Introduction Peer-to-peer (P2P) ... P2P and Overlay Networks Peer-to-Peer systems are usually structured as “overlays” Logical

A Study on Mobile P2P Systems Hongyu Li. Outline Introduction Characteristics of P2P Architecture Mobile P2P Applications Conclusion

P2P Networks Introduction

P2P Networks Introductiondeim.urv.cat/~pgarcia/P2P/ppt/p2pintro.pdf · 1. Introduction to P2P Networks 2. Unstructured and Hybrid Networks 3. Structured Networks (1) 4. Structured

Project JXTA: An Open P2P Applications Platform ... · An Open P2P Applications Platform Introduction and Update Juan Carlos Soto Engineering Manager ... Scope peer operations Discovery,

Detecting P2P Traffic from the P2P Flow Graph

P2P Computing MIRA YUN September 16, 2005. Outline What is P2P P2P taxonomies Characteristics Different P2P systems Conclusion

P2P Course, Structured systems 1 Introduction (26/10/05)

Introduction to P2P systems - libvolume6.xyzlibvolume6.xyz/mechanical/btech/semester7/mechanismdesign/... · Introduction to P2P systems ... •File transfers use direct connection

P2P - Burlington Electric Quiltersburlingtonelectricquilters.com/wp-content/uploads/2019/11/p2p.pdf · P2P Chevron Stars P2P Chevron Dots and Lines p2p curve 020. Title: CatReport1

ISP-Aided Neighbor Selection for P2P Systems · 3 P2P from an ISPs view Good: P2P applications fill a void P2P applications are easy to develop and deploy P2P applications spur broadband

Introduction of P2P systems. What is P2P Systems Definition: Significantly autonomous from a centralized authority. –Each node can act as a Client as

2016 U.K. M-Payment and P2P Payment U.K. M-PAYMENT AND P2P PAYMENT CONSUMER STUDY 4 INTRODUCTION “As is the case ... In terms of m-commerce ... 2016 U.K. M-PAYMENT AND P2P PAYMENT