19
Structuring Unstructured Peer-to- Peer Networks Stefan Schmid Roger Wattenhofer Distributed Computing Group HiPC 2007 Goa, India

Structuring Unstructured Peer-to-Peer Networks

  • Upload
    conley

  • View
    37

  • Download
    0

Embed Size (px)

DESCRIPTION

Structuring Unstructured Peer-to-Peer Networks. Stefan Schmid Roger Wattenhofer. D istributed C omputing G roup. HiPC 2007 Goa, India. Networks…. DISTRIBUTED COMPUTING. Neuron Networks. Web Graph. Internet Graph. Different properties: Natural vs. Man-made Robustness Diameter - PowerPoint PPT Presentation

Citation preview

Structuring Unstructured Peer-to-Peer Networks

Stefan Schmid

Roger Wattenhofer

DistributedComputing

Group

HiPC 2007

Goa, India

Stefan Schmid @ HiPC 2007 2

Networks…

DISTRIBUTED COMPUTINGInternet Graph Web Graph Neuron Networks

Social GraphsPublic Transportation Networks

Different properties:

• Natural vs. Man-made

• Robustness

• Diameter

• Routability

• ...

Stefan Schmid @ HiPC 2007 3

An Interesting Network: Peer-to-Peer Network

• Popular Examples: - File sharing: BitTorrent, eMule, Kazaa, ... - Streaming: Zattoo, Joost, ...- Internet telefony: Skype, ...- etc.

• Important: p2p accounts for

much Internet traffic today!

(source: cachelogic.com)

• Network of peers, e.g., to share files

• Desirable properties:- Scalability

- Low degree, low network diameter

- Fast routing

- etc.

Stefan Schmid @ HiPC 2007 4

Some Own Applications

• Wuala online storage system

- Student project, start-up, http://wua.la

• Pulsar streaming

- tilllate.com, DJ events, ...; pstreams.com

- cheap infrastructure at content provider

• BitThief BitTorrent downloads

• Distributed Computations

- BOINC client for ECC discrete

logarithm challence

Succes

sful p

arad

igm

& te

chnolo

gy,

but stil

l im

portant r

esea

rch c

halle

nges!

Stefan Schmid @ HiPC 2007 5

Structured vs. Unstructured Topologies

• Old „p2p“ systems such as Napster were based on server

- Server stores index: search for contents is simple

- Problem: single point of failure

- Legacy issues...

• Unstructured systems, e.g., Gnutella, allow arbitrary topologies

and arbitrary data placement

- Peers just connect to an arbitrary set of other peers

- No single point of failure

- But often inefficient: routing based on flooding or random walk

• Structured systems, e.g., eMule‘s Kad network, give guarantees

- Proactive maintenance of topology

- Provable network diameter and peer degree

- Routing possible, look up, e.g., in log(n) hops

(maybe also low stretch)

Stefan Schmid @ HiPC 2007 6

What is „better“?

• Unstructured systems have less maintenance overhead

- Peers can join and leave wherever they want

• Unstructured systems allow for a richer set of queries

- e.g., range queries, Boolean queries

• Most importantly: despite the interesting properties (and large body of research) of structured networks, today‘s predominant networks are still

unstructured (e.g., Gnutella, BitTorrent, etc.)

Really?

Really? Flooding always

possible!

• But unstructured systems often have scalability problems

- When Napster was unplugged, Gnutella went down.

Discussion needs to be continued...!

Stefan Schmid @ HiPC 2007 7

Routing in Arbitrary Topologies?

• How to find a file in an arbitrary network?

• Option 1: Flooding (up to a certain hop radius r)

- Robust, but does not scale.

- Does not find the „needles“, but does a good job finding popular files.

• Option 2: Random Walks

- Less messages, but no lookup performance guarantee.

- Potentially large delay (solution: many parallel „walkers“)

- Walkers can be lost...

- Analysis difficult.

- Again: Good to find popular contents, bad to find needles.

Stefan Schmid @ HiPC 2007 8

Flooding

• This talk considers search operations by flooding.

• Efficiency of flooding?

Very efficient on trees! Many redundant transimissions...

Flooding efficiency depends on network topology!

Stefan Schmid @ HiPC 2007 9

Clustella

• We propose Clustella

- a new P2P client for unstructured peer-to-peer systems

- based on flooding, but with „smart neighbor selection“

- allows for more efficient flooding!

Stefan Schmid @ HiPC 2007 10

Vision

• Clustella Vision: unstructured p2p network

Normal client

Clustella client

By connecting to peers in far-away parts of the network, small cycles in the topology are avoided, and flooding is more efficient. Not only Clustella clients do benefit, but also all other clients in the network.

Stefan Schmid @ HiPC 2007 11

Flood Coverage

• Main open question: How to connect to remote peers?

• Given a set of potential neighbors, it would be useful to know the hop distance to each of those!

• Then, we could connect to the one furthest away...

• Goal: Maximize flood coverage, i.e., maximize minimum number of nodes reached by a r-hop flooding – locally and despite dynamics

Stefan Schmid @ HiPC 2007 12

Hop-Estimation With Clustering

• Main idea: Use clustering!

- Divide network into different clusters.

- Peers in different clusters belong to different network regions and can safely be connected without creating small cycles.

• How to achieve such a clustering? Introduction of beacons!

- Two parameters: radius Rd and radius Rb (Rd < Rb)

- If a peer has no beacon in Rd neighborhood, it becomes a beacon itself.

- A peer knows all beacons in its Rb neighborhood.

- Rb roughly equals the flooding radius R

Stefan Schmid @ HiPC 2007 13

Clustella Mechanism (1)

• One beacon in radius Rd

• Beacon known in radius Rb

• Flooding radius R

• Beacons append their ID to all packets (piggy-back)

• If packet expires before, other peers (here: π‘‘) forward beacon information

• Entire Rb neighborhood will know beacon π‘

• Peers try to connect to peers which have no beacons in common!

Stefan Schmid @ HiPC 2007 14

Clustella Mechanism (2)

• Edges are undirected

• All peers have degree d or d+1

• If connection is accepted if own degree is d or smaller; otherwise, a neighbor may have an open slot, or a connection is broken down

• Invariant quickly reestablished!

• Neighbors of existing neighbor are also good candidates, as they are located in the same network region.

Stefan Schmid @ HiPC 2007 15

Two Challenges

• Evaluation of current neighbors- Existing neighbors are always in the same network region- Evaluating their quality and comparing them to alternative neighbors is difficult- Include routes in packets! Exclude beacons known from a neighbor only

• Dynamics- Clustella must be robust to churn, i.e., frequent joins and leaves

- E.g., node crash: Clustella peer p stores some neighbors for each of its neighbors q; these neighbors are good candidates as they are in the same network region as q

Stefan Schmid @ HiPC 2007 16

Evaluation

• Simulation of three different neighbor selection strategies- Gnutella-like (unfair?): Peers join at some well-known entry point and ask for their neighbors‘ neighbors until they reach full degree- Random walk (more interesting?): Peers find new peers by a random walk of length L- Clustella: Peers find new neighbors by exploring the network using a walk of length L and by taking beacon information into account

• Results- Gnutella-like topologies result in very inefficient flooding operations- Clustella yields higher flood coverage than random walk

Stefan Schmid @ HiPC 2007 17

Future Work

• Hierarchical clustering (beacons with different radii)- Already a small hierarchy can yield better flood coverage

- However, maintenance of hierarchy can be expensive under churn!

- Moreover, fairness must be guaranteed: High-level beacon peers should not

have more work to do!

• Smaller messages- Reducing the message sizes for large radii is important!- Idea: Use of Bloom filters instead of sending beacon IDs directly

Stefan Schmid @ HiPC 2007 18

Conclusion

• We believe that structuring topologies can be benefitial to peer-to-peer systems!

• Clustering with beacons is simple and probably also useful in other applications, e.g., in music graph

• Implementation must ensure fairness and use small message sizes.

• A good choice of parameters important for both efficiency and stability.

• Incorporation into Gnutella??

Stefan Schmid @ HiPC 2007 19

Thank you.

Thank you for your interest.