180
Digital Enterprise Research Institute www.deri.ie Introduction to Peer-to-Peer Manf red Hauswir t h, Marcel Karn stedt Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

P2P Tutorial

Embed Size (px)

Citation preview

Page 1: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 1/180

Digital Enterprise Research Institute www.deri.ie

Introduction to Peer-to-Peer

Manfred Hauswirth, Marcel Karnstedt

Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Page 2: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 2/180

Di ital Enter rise Research Institute www.deri.ie

Goals of the Tutorial

Position the P2P paradigmin the design space ofdistributed systems

Get an overview of P2P

systems and the underlyingconcepts

 decentralized datamanagement in P2Psystems

Page 3: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 3/180

Di ital Enter rise Research Institute www.deri.ie

What is P2P?

Clay Shirkey (The Accelerator Group):

eer- o- peer s a c ass o app ca ons a a e a van age o

resources—storage, cycles, content, human presence—available at the edges of the Internet. Because accessing

environment of unstable connectivity and unpredictable IP

addresses, peer- to- peer nodes must operate outside the DNSand have si nif icant or total autonom of cent ral servers.” 

P2P “litmus test:”

 – Does it allow for variable connectivity and temporary network

 – Does it give the nodes at the edges of the network signif icantautonomy?

~ - 

Page 4: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 4/180

Di ital Enter rise Research Institute www.deri.ie

P2P in a historical Context

The original Internet was designed as a P2P system

 

 – no f irewalls / no network address translation – no asymmetric connections (V.90, ADSL, cable, etc.)

t e ac - t en er apps” an te net are utanyone could telnet/ FTP anyone else

servers acted as clients and vice versa

cooperation was a central goal and “value”: no spam orexhaustive bandwidth consumpt ion

“ - ” 

Usenet News

DNS

The emergence of P2P can be seen as a renaissance

of the original Internet model

Page 5: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 5/180

Di ital Enter rise Research Institute www.deri.ie

What is P2P?

Every participating node acts as botha client and a server (“servent ”)

Every node “pays” its part icipation by

providing access to (some of) itsresources

no central coord ination

no cent ral database 

system

global behavior emerges from localinteractions

all existing data and services areaccessible from any peer

peers are autonomous

peers an connec ons are unre a e

Page 6: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 6/180

Di ital Enter rise Research Institute www.deri.ie

Where is P2P – System layers ?

User 

Users

QoSuses

 

Applicat ion layer E- commerce systems can be P2P

Application

ex loits

or cent ralized

Information management

Information

Management

 P2P or centralized

Networks are P2P

Qosexploits Internet

Page 7: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 7/180

Di ital Enter rise Research Institute www.deri.ie

Types of P2P Systems

E- commerce systems

, , , …

File sharing systems Napster, Gnutella, Freenet, …

Distributed Databases

Mariposa [Stonebraker96], … etwor s

Arpanet

-

Page 8: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 8/180

Di ital Enter rise Research Institute www.deri.ie

How much P2P is involved?

 

informationmanagement

 

interaction

noyesyesaps er

,Freenet

Page 9: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 9/180

Di ital Enter rise Research Institute www.deri.ie

Related Approaches

Related distributed information system approaches:

Event - based systems

Push systems

Mobile agents

 

Page 10: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 10/180

Di ital Enter rise Research Institute www.deri.ie

Event-based (publish/subscribe)

System model

Com onents ( eers) in teract b enerat in and receivin events

Components declare interest in receiving specific (patterns of)events and are not if ied upon their occurrence

Su orts a hi hl f lex ible interact ion between loosel - cou ledcomponents

Subscribe to

XY

X followed by Y

XY

Page 11: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 11/180

Di ital Enter rise Research Institute www.deri.ie

Event-based vs. Peer-to-Peer

Common properties:

 

dynamic binding between producers and consumers

Subscript ion to events ~ “passive” queries

EB: notification

P2P: active discovery u scr p on anguage suppor s more sop s ca e

queries and pattern matching (event patterns withtime dependencies)

Event - based systems typically have a specializedevent distribution infrastructure

: 2 no e types, 2 : 1 no e type

EB infrastructure must be deployed

Page 12: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 12/180

Di ital Enter rise Research Institute www.deri.ie

Push Systems

A set of designatedbroadcasters offer information

that is pre- grouped in channels(weather, news, etc.)

ece vers su scr e o c anne s

of their interest and receivechannel information as it isbein “broadcast” t imeldistribution)

Receivers may have to pay priorto receiving the information

pay- per- v ew, at ee, etc. Pull push

Page 13: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 13/180

Di ital Enter rise Research Institute www.deri.ie

Push Systems vs. Peer-to-Peer

Asymmetric communication style (P2P: symmetric)

ocus s on t me y ata str ut on not on scovery

Filtering may be deployed to reduce data

Subscript ion to channels is prerequisite

Producer/ consumer binding is stat ic

Push systems require a specialized distribut ioninfrastructure

us : no e ypes, : no e ype Push inf rastructure must be deployed

Page 14: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 14/180

Di ital Enter rise Research Institute www.deri.ie

Mobile Agents

A mobi le agent is a computat ional ent ityt at moves aroun n a networ at t s

own volit ion to accomplish a task onbehalf of its owner

 

“learns” (“Whom to visit next?”)

Mobility (heterogeneous network!)

,

Strong: code, data, execution Stack

Page 15: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 15/180

Di ital Enter rise Research Institute www.deri.ie

Mobile Agents vs. Peer-to-Peer

Very similar in terms of search and navigat ion

,

MA: the nodes propagate the agents

Mobile agent ~ “act ive” query

Mobile agent systems require a considerably moresophisticated environment

 

securi ty (protect the receiving node from malicious mobileagents and vice versa)

In many domains P2P systems can take over more apt for distributed data management

 bandwidth, securit y, etc.)

Page 16: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 16/180

Di ital Enter rise Research Institute www.deri.ie

Distributed Databases

Fragmenting large databases (e.g., relational) over

Efficient processing of complex queries (e.g., SQL)by decomposing them

Efficient update strategies (e.g., lazy vs. eager)

Consistent transact ions (e.g., 2 phase commit ) orma y approac es re y on cen ra coor na on

Page 17: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 17/180

Di ital Enter rise Research Institute www.deri.ie

Distributed Databases vs. Peer-to-Peer

Data distribut ion is a key issue for P2P systems

 

scalability LH* family of scalable hash index structures [Litwin97]

Snowball: scalable storage system for workstation clusters[Vingralek98]

Fat - Bt ree: a scalable B- Tree for arallel DB Yokota 9 

Approaches in distributed DB that addressautonomy (and scalabil it y)

Mariposa: distributed relational DBMS based on anunderlying economic model [Stonebraker96]

P2P Data Mana ement has to address bothscalability and autonomy

Page 18: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 18/180

Di ital Enter rise Research Institute www.deri.ie

Usage Patterns to Position P2P

Discovering informat ion is the predominant problem

 

ad hoc requests, irregular E.g., new town — where is the next car rental?

,

ot cat on: event- ase systems

not if icat ion for (correlated) events (event patterns)

E. ., not if me when m stocks dro below a threshold

pus

Systematic discovery: P2P systems

f ind certain type of information on a regular basis

search engines, MA

. ., Cont inuous information feed: push systems

subscrip t ion to a certain information type

event-based

E.g., sports channel, updates are sent as soon as available

Page 19: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 19/180

Di ital Enter rise Research Institute www.deri.ie

The Interaction Spectrum

Event-based s stems Mobile a entsPush systems Peer-to-peer systems

pass ve active

Page 20: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 20/180

Di ital Enter rise Research Institute www.deri.ie

Peer-to-Peer vs. C/S and Web

-

Session-based

Web-basedPeer-to-Peer

Coupling tight loose very loose

Comm.Style

asymmetric asymmetric symmetric

 

Clients (1000) (1,000,000) high (1,000,000)

Number of few 10

manynone 0

ervers ,

Page 21: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 21/180

Di ital Enter rise Research Institute www.deri.ie

Coupling vs. Scalability

     p      l      i     n     g

      C     o

session-based

push-based

web-based

-

peer-to-peer

Scalability

Page 22: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 22/180

Di ital Enter rise Research Institute www.deri.ie

P2P System Models

Centralized model

 

(single point of failure) direct contact between requestors and providers

xamp e: aps er

Decent ralized model

Examples: Freenet, Gnutella

no g lobal index, no central coordination, g lobal behavior emergesfrom local interact ions, etc.

direct contact between requestors and providers (Gnutella) or

mediated by a chain of intermediaries (Freenet) Hierarchical model

int roduction of “super- peers”

mix of centralized and decentralized model

Page 23: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 23/180

Di ital Enter rise Research Institute www.deri.ie

Centralized Information Systems

Web search engineClient 

Example: Google 150 Mio searches/ day

Client

Client

1- 2 Terabytes of data(April 2001)

1

2Google

Server

Client Client

"aberer"home page of Karl Aberer … ClientClient

Strengths

ClientClientClient

Global ranking Fast response time

Google: 15000 servers Infrastructure, administrat ion, cost

A new company for every global applicat ion ?

Page 24: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 24/180

Di ital Enter rise Research Institute www.deri.ie

(Semi-)Decentralized Information Systems

P2P Music file sharing

 

Example: Napster 1.57 Mio. Users 3

Peer

PeerPeerX

10 TeraByte of data(2 Mio songs, 220 songs per user)

(February 2001)

1

Napster

Peer Peer

Find<title> "brick in the wall"

2

ResultRequest and transfer file

erver

PeerPeer

<artist> "pink floyd"<size> "1 MB"<category> "rock" schema you find f.mp3 at peer x.

from peer X directly PeerPeer

Peer

Napster: 100 servers

Page 25: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 25/180

Di ital Enter rise Research Institute www.deri.ie

Lessons Learned from Napster

Strengths: Resource Sharing Every node “pays” it s part icipat ion by providing access to i ts resources

 – p ys ca resources s , ne wor , now e ge anno a ons , owners p es

Every participating node acts as both a client and a server (“servent”): P2P global information system without huge investment

decentralization of cost and administration = avoiding resource bot tlenecks

Weaknesses: Cent ralization server is single point of failure

copying copyrighted material made Napster target of legal att ack

increasing degree of resource sharing and decentralization

Centralized

System

Decentralized

System

Page 26: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 26/180

Di ital Enter rise Research Institute www.deri.ie

Fully Decentralized Information Systems

P2P file sharing • Strengths– Good response time, scalable

 

Example: Gnutella 40.000 nodes, 3 Mio files

– No infrastructure, no administration

– No single point of failure• Weaknesses

(August 2000) – High network traffic

– No structured search

– Free-riding

Find"brick in the wall"

I have"brick_in_the_wall.mp3"….

Self-organizing System

Gnutella: no servers

Page 27: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 27/180

Di ital Enter rise Research Institute www.deri.ie

Self-Organization

Self - organized systems well known f rom physics,

distribut ion of control ( = decentralizat ion = symmetry inroles = P2P)

oca n erac ons, n orma on an ec s ons

emergence of global structures

failure resil ience

Page 28: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 28/180

Di ital Enter rise Research Institute www.deri.ie

P2P Architectures

Principle of self - organizat ion can be applied at

NetworkingLayerInternet Routing TCP/ IP,

DNS

Data AccessLayer

OverlayNetworks

ResourceLocation

Gnutella,FreeNet

Service Layer P2P Messaging, Napster,applicat ions Distributed

ProcessingSeti,Groove

User Layer UserCommunities

Collaborat ion eBay, Ciao

Original Internet designed as decentralized system:

~ - the Internet

support appl icat ion- specif ic addresses

Page 29: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 29/180

Di ital Enter rise Research Institute www.deri.ie

Resource Location in P2P Systems

Problem: Peers need to locate distr ibuted information

Peers wit h address p store data items d that are identif ied by a key kd

Given a key d (or a predicate on d) locate a peer that stores ,i.e. locate the index information (k

d, p)

Thus, the data we have to manage consists of the key- value pairs (kd, p)

Can such a distributed database be maintained and accessed by aset of peers without central cont rol ?

P1

P2 P3

P4

kd ="jingle" ?

P5

P6P7

P8kd="jingle-bells"p="P8"

d=" jingle-bells.mp3""

("jingle",P8)

Page 30: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 30/180

Di ital Enter rise Research Institutewww.deri.ie

Resource Location Problem

Operations

search for a key at a peer: p- > search(k)

update a key at a peer: p- > update(k,p')

peers joining and leaving the network: p- > join(p’) Performance Criteria (for search)

. .

message bandwidth, e.g. messages(query) Log(size(database))messages(update) Log(size(database))

storage space used, e.g. storagespace(peer) Log(size(database)) resilience to failures (network, peers)

Qualitat ive Cri teria

complex search predicates: equality, prefix , containment, similaritysearch

use of g lobal knowledge peer autonomy

peer anonymity and t rust

security (e.g. denial of service attacks)

Page 31: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 31/180

Di ital Enter rise Research Institute www.deri.ie

Summary

What is a P2P System ?

What is emergence ?

At which layers can the P2P architecture occur ?

 locat ion system ?

Page 32: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 32/180

Di ital Enter rise Research Institute www.deri.ie

Unstructured P2P Overlay Networks

No index informat ion is used

i.e. the information (k, ) is onl available directl from

Simplest approach: Message Flooding (Gossiping)

send query message to C neighbors

- -

messages have IDs to eliminate cycles

k="jingle-bells"

Example: C=3, TTL=2

Page 33: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 33/180

Di ital Enter rise Research Institute www.deri.ie

Gnutella

Developed in a 14 days “quick hack” by Nullsoft (winamp)

Orig inally intended for exchange of recipes

vo ut on o nute a

Published under GNU General Public License on the Nullsoft web server Taken off after a couple of hours by AOL (owner of Nullsoft)

“ ” 

Gnutella protocol was reverse engineered from downloaded versions ofthe original Gnutella software

Third- party clients were publ ished and Gnutella started to spread Based on message f looding

Typical values C= 4, TTL= 7

One request leads to messages240,26)1(**20

TTL

i

iC C 

 

least one Gnutella host (gnutellahosts.com:6346; outside the Gnutellaprotocol specification)

Neighbors are found using a basic discovery protocol (ping- pong

Page 34: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 34/180

Di ital Enter rise Research Institute www.deri.ie

Gnutella: Protocol Message Types

 

PingAnnounce availability and probe for

other servents

None

Pong Response to a ping IP address and port# of responding servent;

Query Search request Minimum network bandwidth of responding

servent; search criteria

QueryHit Returned by servents that have IP address, port# and network bandwidth of

result set

Push File download requests for

servents behind a firewall

Servent identifier; index of requested file; IP

address and port to send file to

Page 35: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 35/180

Di ital Enter rise Research Institute www.deri.ie

Gnutella: Meeting Peers (Ping/Pong) 

CA

B D

A’s ping

B’s pongC’s pong

D’s pong

 s pong

Page 36: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 36/180

Di ital Enter rise Research Institute www.deri.ie

Gnutella: Searching (Query/QueryHit/GET) 

X.m 3GET X.mp3 X.mp3

CA

B D

’ . ., .

C’s query hit

E’s query hit X.mp3

Page 37: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 37/180

Di ital Enter rise Research Institute www.deri.ie

Popularity of Queries [Sripanidkulchai01]

Very popular documents are approx imately equally popular

Less o ular documents follow a Zi f- like distribut ion i.e. 

the probabil ity of seeing a query for the ith most popularquery is proport ional to 1/(ialpha)

Access fre uenc of web documents also follows Zi f- like distributions caching might work for Gnutella

Page 38: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 38/180

Di ital Enter rise Research Institute www.deri.ie

Free-riding on Gnutella [Adar00]

24 hour sampling period:

o nute a users s are no es

50% of all responses are returned by top 1% of sharing hosts

 

Problems:

Degradation of system performance: collapse? Increase of system vulnerability

“Centralized” (“backbone”) Gnutella copyright issues?

 

H1: A signif icant por t ion of Gnutella peers are free riders.

H2: Free riders are distr ibuted evenly across domains

H3: Often hosts share files nobody is interested in (are not

downloaded)

d

Page 39: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 39/180

Di ital Enter rise Research Institute www.deri.ie

Free-riding Statistics - 1 [Adar00]

H1: Most Gnutella users are free riders

,

22,084 (66%) of t he peers share no f iles

24,347 (73%) share ten or less files

Top 1 percent (333) hosts share 37% (1,142,645) of t otal fi les shared

Top 5 percent (1,667) hosts share 70% (1,142,645) of total f iles shared

Top 10 percent (3,334) hosts share 87% (2,692,082) of t otal fi les shared

F idi S i i 2

Page 40: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 40/180

Di ital Enter rise Research Institute www.deri.ie

Free-riding Statistics - 2 [Adar00]

H3: Man servents share files nobod downloads

Of 11,585 sharing hosts: Top 1% of sites provide nearly 47% of all answers

Top 25% of sites provide 98% of all answers

7,349 (63%) never provide a query response

T l f G ll

Page 41: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 41/180

Di ital Enter rise Research Institute www.deri.ie

Topology of Gnutella [Jovanovic01]

Small- world propert ies veri f ied (“f ind everything close by”)

Backbone + outskirts

G t ll B kb

Page 42: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 42/180

Di ital Enter rise Research Institute www.deri.ie

Gnutella Backbone [Jovanovic01]

C t i f Q i

Page 43: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 43/180

Di ital Enter rise Research Institute www.deri.ie

Categories of Queries [Sripanidkulchai01]

Categorized top 20 queries

C hi i G t ll

Page 44: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 44/180

Di ital Enter rise Research Institute www.deri.ie

Caching in Gnutella [Sripanidkulchai01]

Average bandwidth consumpt ion in tests: 3.5Mbps

Best case: trace 2 (73% hit rate = 3.7 t imes traff icreduction)

Gnutella: Bandwidth Barriers

Page 45: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 45/180

Di ital Enter rise Research Institute www.deri.ie

Gnutella: Bandwidth Barriers

Clip2 measured Gnutella over 1 month:

typ ca query s ts ong nc u ng ea ers

25% of the traff ic are queries, 50% pings, 25% other on avera e each eer seems to have 3 other eers activel

connected

Clip2 found a scalabili ty barr ier with substant ial

10 queries/ sec

* 560 bits/ query

o accoun or e o er quar ers o message ra c

* 3 simultaneous connect ions

67,200 bps

quer es sec max mum n e presence o many a up users

won’t improve (more bandwidth - larger files)

Gnutella: Summary

Page 46: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 46/180

Di ital Enter rise Research Institute www.deri.ie

Gnutella: Summary

Completely decent ralized

 

High fault tolerance Adopts well and dynamically to changing peer populations

Protocol causes high network traffic (e.g., 3.5Mbps). For example:

4 connections C / peer, TTL = 7

1 ping packet can cause packets240,26)1(**2

TTL iC C 

No est imates on the durat ion of queries can be given

No probabilit y for successful queries can be given

 

Free riding is a problem

Reputat ion of peers is not addressed

mp e, ro ust, an sca a e at t e moment

Modern Gnutella

Page 47: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 47/180

Di ital Enter rise Research Institute www.deri.ie

Modern Gnutella

Lots of improvements

Hybrid Super- Peer architecture “Gnutella + DHT”

Improvements of Message Flooding

Page 48: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 48/180

Di ital Enter rise Research Institute www.deri.ie

Improvements of Message Flooding

Expanding Ring

start search with small TTL (e.g. TTL = 1)

no success terat ve y ncrease e.g. = + 2

k- Random Walkers forward query to one randomly chosen neighbor only, with large

TTL

start k random walkers

random walker periodically checks with requester whether to

continue

Discussion Unstructured Networks

Page 49: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 49/180

Di ital Enter rise Research Institute www.deri.ie

Discussion Unstructured Networks

Performance

 

Message Bandwidth: high – improvements through random walkers, but essent ially the

Storage cost: low (only local neighborhood)

Update and maintenance cost: low (only local updates) Resilience to failures good: mult iple paths are explored

and data is repl icated

Quali tat ive Cri teria search predicates: very flex ible, any predicate is possible

global knowledge: none required

peer autonomy: high

Summary

Page 50: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 50/180

Di ital Enter rise Research Institute www.deri.ie

Summary

How are unstructured P2P networks characterized ?

What is the purpose of the ping/ pong messages inGnutella ?

Why is search latency in Gnutella low ?

Which are methods to reduce message bandwidth

Hierarchical P2P Overlay Networks

Page 51: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 51/180

Di ital Enter rise Research Institute www.deri.ie

Hierarchical P2P Overlay Networks

Servers provide index information, i.e. the

servers

Simplest Approach

one central server

user register f iles

index server

 as P2P archi tecture

k="jingle-bells"

Napster

Page 52: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 52/180

Di ital Enter rise Research Institute www.deri.ie

Napster

Central (virtual) database which holds an index of offeredMP3/ WMA files

Clients connect to this server, identify themselves (account)

and send a list of MP3/ WMA fi les they are sharing (C/ S) Other clients can search the index and learn from which

Addit ional services at server (chat etc.)

 register

(user, files)

A B“A has X.mp3”

Download X.mp3

Superpeer Networks

Page 53: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 53/180

Di ital Enter rise Research Institute www.deri.ie

Superpeer Networks

Improvement of Central Index Server (Morpheus, Kaaza)

mult iple index servers build a P2P network

clients are associated with one (or more) superpeers

superpeers use message flooding to forward search requests

Experiences

redundant superpeersare goo

superpeers should havehigh outdegree (> 20)

 

Discussion

Page 54: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 54/180

Di ital Enter rise Research Institute www.deri.ie

Discussion

Performance

 

Message Bandwidth: low – with superpeers flooding occurs, but the number of

Storage cost: low at client, high at index server

Update cost: low (no repl icat ion)

Resilience to failures: bad (system has single- point offailure)

Quali tat ive Cri teria search predicates: very flex ible, any predicate is possible

global knowledge: server

peer autonomy: low

Summary

Page 55: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 55/180

Di ital Enter rise Research Institute www.deri.ie

y

Which are the two levels of P2P networks in

are they related ?

Which problem of distribut ion is avoided insuperpeer networks and addressed in structured

between nodes and functional layers ?

Structured P2P Overlay Networks

Page 56: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 56/180

Di ital Enter rise Research Institute www.deri.ie

y

Unstructured overlay networks – what we learned

 

robustness (almost impossible to “ki ll” – no centralauthority)

er ormance

search latency O(log n), n number of peers

 

Drawbacks

t remendous bandwidth consumpt ion for search

free rid ing

 

Efficient Resource Location

Page 57: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 57/180

Di ital Enter rise Research Institute www.deri.ie

update cost

hi h

FULL REPLICATION

 

low STRUCTURED P2P OVERLAY

search cost

lowlow

high

(e.g. prefix routing)

maximal bandwidthhighUNSTRUCTURED P2POVERLAY NETWORKS

(e.g. Gnutella)

(e.g. Napster)

Distribution of Index Information

Page 58: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 58/180

Di ital Enter rise Research Institute www.deri.ie

Goal: provide eff icient search using few messages without usingdesignated servers

, . .maintains and provides part of the index information (k, p)

Diff icult: d istribut ing the data access structure to support eff icientsearch

server

Search starts here

Where to start the search?

data access

index information I

structure

eers storin data and index information

I1 I2 I3 I4

peers (storing data)

Approaches

Page 59: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 59/180

Di ital Enter rise Research Institute www.deri.ie

Different strategies

-

Chord: constructing a distr ibuted hash table CAN: Rout ing in a d- dimensional space

Freenet: caching index information along search paths

Commonalities

eac peer ma n a ns a sma par o e n ex n orma on(rout ing table)

searches performed by d irected message forwarding

Differences performance and qualitative criteria

P-Grid

Page 60: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 60/180

Di ital Enter rise Research Institute www.deri.ie

Search tree (prefix tree)?

0?? 1??data 101

00? 01? 10? 11?101

?

000 001 010 011 100 101 110 111

!101N objects log2(N) steps

Scalable data access structures

Page 61: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 61/180

Di ital Enter rise Research Institute www.deri.ie

Assume number of data objects > > storage of one

Distributed storage

Given a data access structure

Size of data access structure = number of data objects

ze o a a access s ruc ure > > s orage o one no e

 

Non-scalable Distribution of Search Tree

Page 62: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 62/180

Di ital Enter rise Research Institute www.deri.ie

• Distribute search tree over peers

???

0?? 1??

00? 01? 10? 11?

000 001 010 011 100 101 110 111

peer 1 peer 2 peer 3 peer 4

Scalable Distribution of Search Tree

Page 63: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 63/180

Di ital Enter rise Research Institute www.deri.ie

"Napster"

bottleneck???

0?? 1??

00? 01? 10? 11?

000 001 010 011 100 101 110 111

peer 1 peer 2 peer 3 peer 4

Scalable data access structures

Page 64: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 64/180

Di ital Enter rise Research Institute www.deri.ie

Associate each peer with a complete path

???

0?? 1??

00? 01? 10? 11?

000 001 010 011 100 101 110 111

peer 1 peer 2 peer 3 peer 4

Scalable data access structures

Page 65: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 65/180

Di ital Enter rise Research Institute www.deri.ie

???

1??peer 1 peer 2

10? eer 4

knows more aboutthis part of the tree

100 101

knows more aboutthis part of the tree

peer 3

The result is P-Grid

Page 66: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 66/180

Di ital Enter rise Research Institute www.deri.ie

101Peers cooperate in search

???101

?

11?

1??peer 1 peer 2

peer 3

???

101 Message 

peer 4

110 1111??peer 1 peer 2

?101

to peer 3 

101 ?

100 101

10? peer 4

101!

peer 3

Construction

Page 67: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 67/180

Di ital Enter rise Research Institute www.deri.ie

Split t ing Approach (P- Grid)

peers meet and decide whether to extend search t ree by split t ingt e ata space

peers can perform load balancing considering their storage load

networks with dif ferent origins can merge, like Gnutella, Freenet

Node Insert ion Approach (Chord, CAN, …)

" " 

nodes route from a gateway node to their node- id to populatethe routing table

 

Repl icat ion of data items and rout ing table entries is used toincrease failure resil ience

P-Grid Discussion

Page 68: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 68/180

Di ital Enter rise Research Institute www.deri.ie

Performance

,

Message Bandwidth: O(log n) (selective routing)

Storage cost: O(log n) (rout ing table)

Update cost: low (like search)

  search predicates: prefix searches

global knowledge: key hashing

peer autonomy: peers can locally decide on their role(spli t t ing decision)

DHT example: Chord

Page 69: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 69/180

Di ital Enter rise Research Institute www.deri.ie

Hashing of search keys AND peer addresses on binary keys oflength m e.g. m= 8, key("jingle- bells.mp3")= 17, key(196.178.0.1)= 3

Data keys are stored at next larger node key

,data with hashed identifier k, thenk ] predecessor(p), p ]k

m=832 keys

storedat

predecessor

p2

Search possibilities1. every peer knows every other

O(n) routing table size 

p3.

O(n) search cost

l h

Routing Tables

Page 70: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 70/180

Di ital Enter rise Research Institute www.deri.ie

Every peer knows m peers with exponent iallyincreasing distance

 Eac peer p stores a routing ta e

First peer with hashed identifier si such thatsi =successor(p+2i-1) for i=1,..,m

 e wr e a so si = nger , ppp+2

p+4

p+1

+

s1, s2, s3

s

p2 1 p2

2 p2

s4

3

p

4 p3

5 p4

p+16Search

O(log n) routing table sizep4

Di it l E t i R h I tit t d i i

Search

Page 71: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 71/180

Di ital Enter rise Research Institute www.deri.ie

search(p, k)

find in routing table largest (i, p*) such that p* [p,k[

/* largest peer key smaller than the searched data key */

if such a p* exists then search(p*, k)else return (successor(p)) // found

pp+2

p+4

p+1

s1, s2, s3

Search

p2

p+

s4

5

k1k2O(log n) search cost

RT with exp. increasing

p+16

   probability

p4

Di ital Enter rise Research Institute www deri ie

Node Insertion

Page 72: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 72/180

Di ital Enter rise Research Institute www.deri.ie

New node q joining the network

q asks ex ist ing node p to f ind predecessor and fingers

cost: O(log2 n)

p p+2p+1 qp+

p2 routing table

of

routing table

ofp+8 i si

1 q

i si

1 p2

p3p4

2 q

3 p2

2 p2

3 p3

4 3p+16

5 p4 5 p4

Di ital Enter rise Research Institute www deri ie

Load Balancing in Chord

Page 73: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 73/180

Di ital Enter rise Research Institute www.deri.ie

5 10^5 keys

uniform data distribution

50 keys per node?

NO, as IP addresses do

not map uniformly into

data ke s ace.

Di ital Enter rise Research Institute www deri ie

Length of Search Paths

Page 74: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 74/180

Di ital Enter rise Research Institute www.deri.ie

100 2^12 keys

Path length ½ Log2(n)

RTs can be seenas an embeddingof search treesinto the network

 an t us searc

starts at a randomlyselected tree depth

Di ital Enter rise Research Institute www.deri.ie

Chord Discussion

Page 75: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 75/180

ta te se esea c st tute www.deri.ie

Performance

Node join/ leave cost: O(log2 n)

Resilience to failures: replicat ion to successor nodes

Quali tat ive Cri teria

searc pre ca es: equa y o eys on y

global knowledge: key hashing, network orig in

peer autonomy: nodes have by vir tue of their address aspecific role in the network

Di ital Enter rise Research Institute www.deri.ie

Topological Routing (CAN)

Page 76: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 76/180

Based on hashing of keys into a d-dimensional space (a torus)

Each peer is responsible for keys of a subvolume of the space (a zone)

Each peer stores the adresses of peers responsible for the neighboringzones for rout ing

Search requests are greedi ly forwarded to t he peers in the closest zones

Assignment of peers to zones depends on a random select ion madeby the peer

Di ital Enter rise Research Institute www.deri.ie

Network Search and Join

Page 77: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 77/180

 o e o ns e ne wor y c oos ng a coor na e n e vo ume o=> O(d) updates or RTs

Di ital Enter rise Research Institute www.deri.ie

CAN Refinements

Page 78: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 78/180

Multiple Realities

 

Nodes hold a zone in each of them

Creates r replicas of the (key, value) pairs

Increases robustness

Reduces path length as search can be cont inued in the

realit where the tar et is closest

Overloading zones

Different peers are responsible for the same zone

Splits are only performed i f a maximum occupancy (e.g. 4)

Nodes know all other nodes in the same zone

But only one of the neighbors

Di ital Enter rise Research Institute www.deri.ie

CAN Path Length

Page 79: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 79/180

Di ital Enter rise Research Institute www.deri.ie

CAN Discussion

Page 80: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 80/180

Performance1/ d ,

probabil ity, provable)

Message Bandwidth: O(d n1/ d), (select ive rout ing) 

Update cost: low (like search)

Node join/ leave cost: O(d n1/ d)

Resilience to failures: reali t ies and overloading

Qualitat ive Criteria

search predicates: spatial distance of mult idimensional keys

global knowledge: key hashing, network origin

 space

Di ital Enter rise Research Institute www.deri.ie

Dynamical Clustering (Freenet)

Page 81: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 81/180

Freenet Background

P2P system which supports publicat ion, repl icat ion, and retr ieval

Protects anonymity of authors and readers: infeasible to

determine the orig in or dest ination of data Nodes are not aware of what they store (keys and f iles are sent

and stored encrypted)

Uses an adaptive routing and caching strategy

 Key Data Address

8e47683isdd0932uje89 ZT38hwe01h02hdhgdzu tcp/125.45.12.56:6474

456r5wero04d903iksd0 Rhweui12340jhd091230 tcp/67.12.4.65:4711

f3682jkjdn9ndaqmmxia eqwe1089341ih0zuhge3 tcp/127.156.78.20:8811

wen09hjfdh03uhn4218 erwq038382hjh3728ee7 tcp/78.6.6.7:2544712345jb89b8nbopledh tcp/40.56.123.234:1111

d0ui43203803ujoejqhh tcp/128.121.89.12:9991

Di ital Enter rise Research Institute www.deri.ie

Freenet Routing

Page 82: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 82/180

If a search request arrives

Either the data is in the table

r t e request s orwar e to t e a resses w t t e mostsimi lar keys (lexicographic similarit y, edit distance) til l an answer

is found or TTL reached (e.g. TTL = 500)

If an answer arrives

The key, address and data of the answer are inserted into thetable

 

Qualit y of rout ing should improve over t ime

Node is l isted under certain ke in routin tables

Therefore gets more requests for similar keys

Therefore tends to store more entries with similar keys(clustering) when receiving result s and caching them

ynam c rep ca on o a a

Di ital Enter rise Research Institute www.deri.ie

Freenet Routing

Page 83: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 83/180

peer p has k

eer ' has k'

search k

response (k,p)

new link established

' ',

Di ital Enter rise Research Institute www.deri.ie

Freenet: Inserting Files

Page 84: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 84/180

First a the key of the file is calculated

An insert message with this proposed key and ahops- to- live value is sent to the neighbor with themost similar key

en every peer c ec s w e er e propose eyis already present in it s local store

es return stored f ile or i inal re uester must ro osenew key)

no route to next peer for further checking (rout ing usesthe same ke sim ilarit measure as searchin 

continue until hops- to- live are 0 or failure

Di ital Enter rise Research Institute www.deri.ie

Freenet: Evolution of Path Length

Page 85: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 85/180

1000 identical nodes

max 50 data

max 200

references/node Initial references:

(i-1, i-2, i+1, i+2) mod n

each time-step:

- randomly insert

- TTL=20

every 100 time-steps:300 requests

median path length 500 6=

random nodes andmeasure actual path

length (failure=500).

Di ital Enter rise Research Institute www.deri.ie

Freenet Discussion

Page 86: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 86/180

Performance

 

Message Bandwidth: low (select ive rout ing)

Storage cost: relatively low (experimentally not validated !)

Update cost: low (like search)

 – but a bootstrapping phase is required

 data and keys)

ua a ve r er a

search predicates: with encrypt ion only equali ty of keys

lobal knowled e: none

peer autonomy: high (with encrypt ion r isk of stor ingundesired data)

Di ital Enter rise Research Institute www.deri.ie

Comparison

Page 87: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 87/180

Paradi m Search T eSearch Cost

messages

Gnutella Breadth-first String2* *( 1)

TTL iC C 

 

FreenetDepth-first

search on graphEquality O(Log n) ?

ChordImplicit binary

search treesEquality O(Log n)

d-dimensional^

space 

P-Grid Binary prefixtrees

Prefix O(Log n)

Di ital Enter rise Research Institute www.deri.ie

Small World Graphs

Page 88: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 88/180

Each P2P system can beinter reted as a directed ra h(overlay network)

peers correspond to nodes

links

Find a decentralized algorithm(greedy rout ing) to route amessage rom any no e to any

other node B with few hopscompared to the size of t he graph

Requires the ex istence of shortpaths in the graph

Di ital Enter rise Research Institute www.deri.ie

Milgram’s Experiment

Fi di h h i f i

Page 89: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 89/180

Finding short chains of acquaintanceslinking pairs of people in USA who didn’tknow each other;

Source person in Nebraska 

Target person in Massachusetts.

Average length of the chains that werecompleted was between 5 and 6 steps

“Six degrees of separat ion” principle

BIG QUESTION: WHY there should be short chains of ac uaintances

linking together arbit rary pairs of strangers???

Di ital Enter rise Research Institute www.deri.ie

Random Graphs

Page 90: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 90/180

For many years typical explanation was - random

Low diameter: expected distance between two nodes is

logkN, where k is the outdegree and N the number of nodes

en pa rs or ver ces are se ec e un orm y a ran omthey are connected by a short path with high probabil ity

But there are some inaccuracies

If A and B have a common f riend C it is more likely that theythemselves will be friends! (clustering)

Many real world networks (social networks, biologicalnetworks in nature, art if icial networks – power grid, WWW)

exhibit this clustering property

Random networks are NOT clustered.

Di ital Enter rise Research Institute www.deri.ie

Clustering

Page 91: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 91/180

Clustering measures the fract ion of neighbors of a node thatare connected themselves

 

but also a high d iameter

Random Graphs have a low clustering coeff icient 

Both models do match the properties expected from realnetworks!

Random Graph (k= 4)

Short ath length

Regular Graph(k=4)

L~logkN

Almost no clustering

~

Long paths

L ~ n/ (2k)

C~3/ 4

Di ital Enter rise Research Institute www.deri.ie

Small-World Networks

Page 92: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 92/180

Random rewir ing of regular graph (by Watts and

With probabil ity p rewire each link in a regular graph to a

randomly selected node

esu ng grap as proper es, o o regu ar anrandom graphs

 – High clustering and short path length

Freenet has been shown to result in small world graphs

Di ital Enter rise Research Institute www.deri.ie

Flashback: Freenet Search Performance

Page 93: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 93/180

Modifying rout ing tables in Freenet through" " 

Studies show that Freenet graphs have small- world

properties

Explains improving search performance

Regular graph:

n nodes, k nearest neighbors

path length ~ n/2k

4096/16 = 256

Random graph:

path length ~ log (n)/log(k)

~ 4

Rewired graph (1% of nodes):

path length ~ random graph

clustering ~ regular graph

Small World Graph

Di ital Enter rise Research Institute www.deri.ie

Search in Small World Graphs

Page 94: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 94/180

BUT! Watts- Strogatz can provide a model

existence of short paths 

It does not explain how the shortest paths

also Gnutella networks are small- world graphs

 

Di ital Enter rise Research Institute www.deri.ie

P2P Overlay Networks as Graphs

Page 95: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 95/180

Each P2P system can be

graph …

peers correspond to nodes

rout ing table entries as directedlinks

… embedded in some space

P- Grid: interval [0,1]

Chord: ring [0,1)

CAN: d- dimensional torus

 lexicographical d istance

Di ital Enter rise Research Institute www.deri.ie

Kleinberg’s Small-World Model

Kl i b ’ S ll W ld’ d l

Page 96: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 96/180

Kleinberg’s Small- World’s model

Embed the graph into an r- dimensional grid

constant number p of short range links (neighborhood)

q long range links: choose long- range links such that the probability

to have a long range contact is proportional to 1/ dr

 

Decentralized (greedy) rout ing performs best if f. r = dimension ofspace

r = 2

Di ital Enter rise Research Institute www.deri.ie

Influence of “r” (1)

Page 97: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 97/180

• Each peer u has link to the peer v with probability proportional to r vud  ),(

1

, .

• Optimal value: r = dim = dimension of the space 

algorithm can quickly approach the neighborhood of target, but then slowsdown till finally reaches target itself).

• If r > dim we tend to choose more close neighbors (algorithm finds quickly

’arge n s ne g or oo , u reac es s ow y s ar away .• When r = 0 – long range contacts are chosen uniformly. Random graph theory

proves that there exist short paths between every pair of vertices, BUT there is no decentralized algorithm capable finding these paths

Di ital Enter rise Research Institute www.deri.ie

Influence of “r” (2)

Given node u if we can part it ion the remaining peers into sets A1 A2

Page 98: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 98/180

Given node u if we can part it ion the remaining peers into sets A1, A2 ,A3, … , AlogN  , where Ai , consists of all nodes whose distance from u is

i i+1 - .. .

Then given r = dim each long range contact of u is nearly equally likely to

belong to any of t he sets Ai 

When q = log N  – on average each node will have a link in each set of A

A1

A3

4

Di ital Enter rise Research Institute www.deri.ie

DHTs and Kleinberg model

Page 99: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 99/180

P- Grid ’s

Kleinberg’smodel

Di ital Enter rise Research Institute www.deri.ie

Conclusions from Kleinberg's Model

With respect to the Watts and Strogatz model

Page 100: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 100/180

With respect to the Watts and Strogatz model

there is no decentralized algorit hm capable performing effective search

J. Kleinberg presented the inf inite family of Small World networks thatgeneralizes the Watts and Strogatz model and shows that decentralizedsearch algorithms can find short paths with high probability

ere ex s on y one un que mo e w n a am y or w cdecentralized algorithms are effective.

With respect to overlay networks

Many of the structured P2P overlay networks are similar t o Kleinberg’s

model (e.g. Chord, randomized version, q= log N, r= 1)

Unstructured overlay networks also fi t into the model (e.g. Gnutella q= 5,r=0)

Some variants of structured P2P overla networks are havin noneighborhood lattice (e.g. P- Grid , p= 0)

Extensions to spaces beyond regular gr ids are possible (e.g. arbit rarymetric spaces)

Di ital Enter rise Research Institute www.deri.ie

Summary

How can we characterize P2P overlay networks such that we

Page 101: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 101/180

How can we characterize P2P overlay networks such that wecan study them using graph- theoretic approaches?

What is the main di f ference between a random graph and aSW ra h?

What is the main dif ference between the Watts/ Strogatz and

What is the relat ionship between structured overlay networks

What are possible variat ions of the small world graph model?

Di ital Enter rise Research Institute www.deri.ie

Specific problems: Identity management

D fi iti

Page 102: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 102/180

Definition:

 ons s en mapp ng o a se o a r u es on o anidentifier in a unique, deterministic, and secure way

Ident if icat ion is an essent ial building block indistr ibuted (information) systems

xamp es: rec ory serv ces DNS: symbolic host names IP address

X.500: dist in uished name ob ect at tributes 

UDDI: query

web service specificat ion

Di ital Enter rise Research Institute www.deri.ie

Identity management issues

Data management

Page 103: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 103/180

Data management

Centralized vs. distr ibuted data management

 – Degree of decentralizat ion (orthogonal to the distr ibut ion of

Update consistency

Security

Access perm issions (+ management)

Requires unique identi f icat ion

es ence aga ns a ac s

Infrastructure Third art

Scalabili ty, robustness, 24/ 7 availabil ity, etc.

Di ital Enter rise Research Institute www.deri.ie

Use case: Dynamic IP addresses

Most computers on the Internet have a dynamic IP

Page 104: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 104/180

Most computers on the Internet have a dynamic IP

Limited number of IP addresses

Dynamic Host Configuration Protocol (lease time)

Host mobil ity (physical mobil ity)

Problem for any system that bui lds a distr ibuted

Use Case:

 

P2P systems (Chord, DKS, Pastry, P-Grid, etc.)

Di ital Enter rise Research Institute www.deri.ie

P2P systems and dynamic IP addresses

Structured P2P systems (Chord Pastry P- Grid DKS

Page 105: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 105/180

Structured P2P systems (Chord, Pastry, P Grid, DKS,

These systems construct a distr ibuted index routing

tables ynam c a resses rou ng a es ecome

inconsistent system can break down

Unstructured (Gnutella) and hierarchical (FastTrack)

systems

Less of a problem

,

 – high bandwidth consumption, or – single point of failures, or

 – ,

 – etc.

Di ital Enter rise Research Institute www.deri.ie

Problems to address

How to f ind out that an IP address has become invalid?

Page 106: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 106/180

No res onse

 – Network problem or did the peer get a new address?

Response – Is i t sti l l the same eer? (authenticit , re la , man- in- the-middle att acks)

Frequency of address changes is crucial

Peers can join and leave at any moment

IP address can change at any moment

Security: DOS attacks are very simple

 

EvilHacker.org part icipates in the overlay and thus f indsout IP addresses

.random hosts or itself

A scalable and secure infrastructure is required

Di ital Enter rise Research Institute www.deri.ie

Problems of current P2P approaches

Third- party infrastructures are required

Page 107: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 107/180

Third party infrastructures are required

 

Maintenance protocols may compromise structural

propert ies (e.g., load- balancing)

Previous knowledge is lost (e.g., reputat ion of thepeer, QoS, etc.)

o curren approac a resses secur y Only the owner should be allowed to update the mapping

DOS re la man- in- the- middle etc. are not addressed 

Di ital Enter rise Research Institute www.deri.ie

IPv6 as an alternative

With IPv6 dynamic addresses or NAT are no longer

Page 108: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 108/180

y g

IPv6 address space is ~3,4 * 1038 (or 1030 addresses per

person on the planet) v curren a ress space s

IPsec (included in IPv6)

solves authent icat ion roblem 

DOS attacks are more di f f icult

Mobili ty is addressed

IPv6: home/ foreign address

IPv4: mobi li ty extension but not suppor ted on a large scale

 

Di ital Enter rise Research Institute www.deri.ie

DNS as an alternative

DNS extensions

Page 109: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 109/180

 DNS could support secure updates

Problems – ery eavy- we g

 – Configuration is very diff icult and error- prone

 – Not for the “normal user” (as in a P2P system)

DynDNS (and similar services) Hosts maintain a consistent name/ address mapping in a

special DNS domain via a special client

Problems

 – Centralized scalability

 –  

Di ital Enter rise Research Institute www.deri.ie

Basic idea

DNS FQDN IP addressstatic

i

Page 110: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 110/180

DNS FQDN IP addressmapping

lookup IPaddress

Index

routing based on

FQDN (any overlay)

- r

Data

routing based on

lookup IPaddress

logical identifier

P-Grid

logical identifier  IP address

DYNAMIC mapping

Di ital Enter rise Research Institute www.deri.ie

Informal discussion

Use unique logical ident if iers (UUIDs) instead of physicalid ifi (IP dd )

Page 111: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 111/180

identifiers (IP addresses):

Peer identifiers

Routing based on UUIDs

 the logical and the physical ident if ier

- Rate of changes < self- healing rate

Dynamic equi libr ium

Advantages: General identification facility disentangling logical identifiers

from network structure

Tracking of chances (for example, reputat ion)

Di ital Enter rise Research Institute www.deri.ie

Maintenance and routing

Universal Unique Ident if ier (UUID) are generatedlocally

Page 112: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 112/180

locally

Cryptographically secure hash function global

uniqueness

Index / rout ing tables: UUID

Peers maintain an up- to- date UUID- IP mappings in P- Grid

ou ng Peers cache known UUID- IP mappings

Ma in ex ists in cache ident it of tar et eer is

checked before forwarding

No mapping is known query P- Grid for mapping

Di ital Enter rise Research Institute www.deri.ie

Security issues

Peer generates publ ic/ pr ivate key

UUID bli k P G id

Page 113: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 113/180

UUID- public key P- Grid

Why is it secure?

Data in P- Grid is stored at a number of random replicas

Hard to attack

Request are – signed (private key) authenticity & access permissions

 – time- stamped no replay possible Quorums are required for each request

et ter secur ty t an

Quorums Independent, random paths avoid weakest link problem

113

Revoke and update of securit y relevant information is possible

Di ital Enter rise Research Institute www.deri.ie

Directory maintenance: self-healing

Repair strategies

Page 114: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 114/180

 

Lazy repair: repair a routing table when all references at one level

become stale

DNS

lookup IP lookup IP address

routing based onlogical identifier

 

rou ng ase onFQDN (any overlay)

maintain logical identifier  IP address

mapping: eager or lazy

Page 115: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 115/180

Di ital Enter rise Research Institute www.deri.ie

Eager repair strategy

System is in a dynamic equilibr ium if the rate of changes due tochanging mappings and t he rate of repairs is equal

Page 116: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 116/180

 

LHS 

rup changes per 1- rup queries

Nrec – 1 addi tional recursive queries

Repair makes sense only if t he rout ing ent ry to be repaired

A repair is possib le only if recursive query succeeds

RHSRate of entries turning stale

rup c anges

1- pdyn probabil ity of non- stale references (only these can turn stale)

r references at each peer for each of log2n levels

Di ital Enter rise Research Institute www.deri.ie

Lazy repair strategy

Repair only if all references of a level are stale

Not all rout ing entr ies are treated uniform ly

Page 117: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 117/180

Not all rout ing entr ies are treated uniform ly

The number of stale ent ries for each rout ing level at each peerdefines the state of that level

Markovian model

0 ref stale

1 ref stale

2 ref stale

r ref stale

IDchange

IDchange

IDchange

IDchange

repairs

Dynamic equi librium equation

inflow = outf low (for each state) At dynamic equil ibr ium, the number of rout ing levels with

given number of stale ent ries over the whole system shouldnot change

N.B. We distinguish stale entries from offline peers

Di ital Enter rise Research Institute www.deri.ie

Lazy repair: Analytical vs.simulation results

Number of messages vs. rate of change (N=128,256,512,1024,replication factor is 8)

Page 118: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 118/180

80

Lazy Rec., Mess. vs. r_up for p_on=1 n=N 8 ,

sim,N=256ana,N=128

sim,N=128

rup60

sim,N=1024

ana,N=512sim,N=512ana,N=256

40

,

20

0.025 0.05 0.075 0.1 0.125 0.15r_up

Di ital Enter rise Research Institute www.deri.ie

Effect of pon on stability andmessage overhead

In networks with more online peers the lazy strategy is advantageousbut collapses earlier

Page 119: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 119/180

1200

Msg Lazy vs. Eager rec. r_up=0.2, lg_2 n =5

Directory "unstable"

800

Eager rec

Lazy rec

400

600

200

Directory "stable"

0.3 0.4 0.5 0.6 0.7 0.8 0.9 _

Di ital Enter rise Research Institute www.deri.ie

Summary

Decentralized, self - maintaining, light- weight , and

Page 120: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 120/180

secure rec ory serv ce

Robust and applicable in unreliable environments

Contributions

Logical independence of identi ty from network properties

General approach for ident if icat ion Structural propert ies are maintained

 

Dynamic resilience of a P2P system under “churn”

Di ital Enter rise Research Institute www.deri.ie

GridVine: Peer Data Management

Searching semantically richer objectsin lar e scale hetero eneous networks

Page 121: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 121/180

in lar e scale hetero eneous networks<xap:CreateDate>2001-12-

19T18:49:03Z</xap:CreateDate>

<xap:ModifyDate>2001-12-

date?

<es:DofCreation> 05/08/2004 </es:DofCreation>

?

? ? ?

<myRDF:Date> Jan 1, 2005 </myRDF:Date>

➠ Lack of semantic interoperability

Di ital Enter rise Research Institute www.deri.ie

Information Heterogeneity 

Syntactic discrepancies

Page 122: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 122/180

Semantic hetero eneit

mage c a e

 A0657B25 05.08.04<es:cDate> 05/08/2004 </es:cDate>VS

Extensible standards (XML, RDF, XMP, PSA, WinFS...)

<rdf:Property rdf:ID="width"> <rdf:Property rdf:ID=“Length-Y">

<rdfs:subPropertyOf rdf:resource="#length"/>

</rdf:Property>

-

<rdfs:subPropertyOf rdf:resource="#length"/>

</rdf:Property>VS

  . .,protein information, picture metadata)

➠ Shared re resentation is not enou h

Di ital Enter rise Research Institute www.deri.ie

Integrating Data in DistributedDatabases 

The Wrapper- Mediator architecture

Page 123: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 123/180

D a t   e

D a t  

Di ital Enter rise Research Institute www.deri.ie

Integrating Data in the new WebEcology

Mediated ArchitecturesMediated Architectures Large Scale Information SystemsLarge Scale Information Systems(e.g., WWW))(e.g., WWW))

Page 124: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 124/180

ScaleScale Number of sources < 100 Number of sources > 1000

UncertaintyUncertainty Consistent Data-

Uncertain Data-

- Manually curated data

Schemas created byadministrators

- Semi- automatic creation of data

Schemas created by end users

DynamicityDynam icit y Relat ively stable set of sources- stable mediator

Sources known a priori 

Network churn- node fai lures

Unknown sources

ExpressivenessExpressiveness Relat ional Data

Structured Schemas- Integrity constraints

Semi- structured data

Schematas- Few integrity constraints

Di ital Enter rise Research Institute www.deri.ie

Decentralized Interoperability

Q1=

<GUID>$p/GUID</GUID>

FOR $p IN /Photoshop_Image

Q2=

<GUID>$p/GUID</GUID>

FOR $p IN T12

p rea or o

Page 125: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 125/180

p rea or o

Photoshop(own schema)

WinFS

known schema

p rea or o

<Photoshop_Image>

<GUID>178A8CD8865</GUID>

<Creator>Robinson</Creator>

<Subject>

<WinFSImage>

<GUID>178A8CD8866</GUID>

<Author>

<Dis la Name>=<Bag>

<Item>

Tunbridge Wells

</Item>

<Item>Royal Council</Item>

Henry Peach Robinson<DisplayName>

<Role>Photographer</Role>

<Author>

<Keyword>

 

<Photoshop_Image><GUID>$fs/GUID</GUID>

<Creator>

$fs/Author/DisplayName 

</Subject>

</Photoshop_Image>

Tunbridge

</Keyword>

<Keyword>Council</Keyword>

</WinFSImage>

</Creator>

</Photoshop_Image>

FOR $fs IN /WinFSImage

➠ Extending integrat ion techniques to decent ralized sett ings

Di ital Enter rise Research Institute www.deri.ie

Peer Data Management Systems

<xap:CreateDate>2001-12-

19T18:49:03Z</xap:CreateDate>

<xa :Modif Date>2001-12-

date?

Page 126: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 126/180

19T20:09:28Z</xap:ModifyDate> 

es:cDate xap:CreateDate

<myRDF:Date> Jan 1, 2005

</myRDF:Date>

Pairwise mappings

Peer Data Management Systems (PDMS)Local mappings overcome global heterogeneity

Iterative query reformulation

Di ital Enter rise Research Institute www.deri.ie

3-Tier Network

Semantic

e a on

Page 127: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 127/180

e a on

Layer 

Jupp /

P-Grid

DHTs

Overlay

Layer 

Internet

Layer 

Di ital Enter rise Research Institute www.deri.ie

Data-Centric P2P Systems 

Piazza / Hyperion

 

Page 128: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 128/180

 – LAV- style query reformulation in P2P sett ings?

Network- intensive – arge- sca e ep oymen

Perfect mappings

PIER

Scales a relational engine on top of a DHT

Fixed schema

RDFPeers

Indexes RDF triples in a DHT

 

Di ital Enter rise Research Institute www.deri.ie

Conclusions 

More and more machine- processable (semi- structured)data available

Page 129: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 129/180

Peer Production

Human Computat ion

 

Top- down efforts to align data have failed largely

SUMO

Emergent SemanticsBottom- up

Best-Effort 

➠ Only resort to foster interoperabili ty in the large scaleecen ra ze a a spaces curren y emerg ng

Di ital Enter rise Research Institute www.deri.ie

Credits

Karl Aberer

-

Page 130: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 130/180

Anwitaman Datta

Roman Schmidt

Di ital Enter rise Research Institute www.deri.ie

Are you still awake?

Page 131: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 131/180

Digital Enterprise Research Institute www.deri.ie

Page 132: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 132/180

Introduction to Peer-to-Peer–

Manfred Hauswirth, Marcel Karnstedt

Copyright 2009 Digital Enterprise Research Institute. All rights reserved.

Di ital Enter rise Research Institute www.deri.ie

Outline – Part 2

The Vision: A Universal Storageor e a a

Page 133: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 133/180

or e a a

A Distributed Universal Storage

 

Operators

Query Engine

Mappings & Query Expansion

The Praxis: Implementation

 

133 of XYZ

Di ital Enter rise Research Institute www.deri.ie

Outline – Part 2

The Vision: A Universal Storageor e a a

Page 134: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 134/180

o e a a

A Distributed Universal Storage

 

Operators

Query Engine

Mappings & Query Expansion

The Praxis: Implementation

 

134 of XYZ

Di ital Enter rise Research Institute www.deri.ie

Examples

Page 135: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 135/180

, ,dynamic, structured, deep, l inked, …

Clique project:

Integrated structrured storage

Data pre- processing, integration, restructuring, index ing

135

Data access API and query processor for complex queries

135 Marcel Karnstedt IFIP Database Meeting Nicosia, Cyprus, 2009

Di ital Enter rise Research Institute www.deri.ie

Public Data Management

....Semantic & social Web, encyclopedias,“ “ 

Page 136: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 136/180

Datasets, which are

Maintained by large communities in a distributed way Of public interest

„Homogenized“ database, extensible and flexible,

136 of XYZ

, ,

Di ital Enter rise Research Institute www.deri.ie

Main Challenges

Data management

 

Page 137: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 137/180

Security, trust, fairness

Guarantees consistenc inte rit

CAP- Theorem [Gilbert et al. 2002], ACID vs. BASE

Query expressiveness

DB- like queries with advanced funct ionalit y Support of IR queries and simi larity is mandatory

-

Efficient processing

Eff icient query operators

137 of XYZ

Cost awareness in changing situations

Di ital Enter rise Research Institute www.deri.ie

Approaches

Who pays the load?

Who owns the data?

Page 138: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 138/180

views over 100.000

 

Efficientquery processing?Do we trust them?

Sindice, YARS

Jena, OracleSW- Store

PIER, PeerDB

AmbientDBRDFPeers

138 of XYZ

UniStore

Di ital Enter rise Research Institute www.deri.ie

Outline – Part 2

The Vision: A Universal Storageor e a a

Page 139: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 139/180

A Distributed Universal Storage

 

Operators

Query Engine

Mappings & Query Expansion The Praxis: Implementation

 

139 of XYZ

Di ital Enter rise Research Institute www.deri.ie

Influences

Robustness,self-organization,

sca a ty

Page 140: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 140/180

Efficient

lookups

P2P paradigm

PDMSDHTs &

SDDS

Index structures Distributed DBSSensor 

networks

Transparency,

query processing

Data streams

140 of XYZ

Di ital Enter rise Research Institute www.deri.ie

The Big Picture

Who wrote an article for cool

Who wrote an article for cool

DB

Page 141: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 141/180

DB

Wikipedia(article,author)“Pulp Fiction”,”MK”Wikipedia(article,author)“Pulp Fiction”,”MK”

Del.icio.us bookmark ta creatorDel.icio.us bookmark ta creator

DBPedia(link,wikilink,category)“Pulp Fictoon”,”Q. Tarantino”,”movie”

DBPedia(link,wikilink,category)“Pulp Fictoon”,”Q. Tarantino”,”movie”

“http://…pfiction”,”cool”,”MKa”“http://…pfiction”,”cool”,”MKa”

141 of XYZ

Di ital Enter rise Research Institute www.deri.ie

Layers of Processing

Scheduling,Ada tation,

Costs

Page 142: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 142/180

Processing Strategies

Similarity /Approximate

Query Operators

Multicast

Routing

Aggregation,Range

142 of XYZ

Di ital Enter rise Research Institute www.deri.ie

Outline – Part 2

The Vision: A Universal Storageor e a a

Page 143: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 143/180

A Distributed Universal Storage

 

Operators

Query Engine

Mappings & Query Expansion The Praxis: Implementation

 

143 of XYZ

Di ital Enter rise Research Institute www.deri.ie

Universal Relation Model

Since the eight ies: Model for simpl if ied retrieval inrelat ional databases

Universal relat ion containing all attributes

Page 144: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 144/180

Universal relat ion containing all attributes

Simplif ies navigation over mult iple relat ions dur ingquery formulat ion

SW- Store

1 2 3 1 2 1...

144 of XYZ

Di ital Enter rise Research Institute www.deri.ie

Triple Store

Universal relat ion model  

value)

Page 145: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 145/180

)

Similar to RDF: subject, predicate, object

OID Car Mileage HP Price

232 Volvo V70 34.000 180 28.000

SW- Store

Extensible

232 Car Volvo V70

Sindice, YARS...

Self-descriptive

No need for 

232 Mileage 34.000

232 HP 180

145 of XYZ

representing null valuesr ce .

Di ital Enter rise Research Institute www.deri.ie

Indexing

Indexing of att ributes = key for Hashing

 

For tuple (oid v1 v2 ) of R(OID A1 A2 )

Page 146: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 146/180

For tuple (oid, v1, v2, ...) of R(OID, A1, A2, ...)

232 Car Volvo V70

232 Mileage 34.000

YARSHexastore

232 Price 28.000

h(oid) for object lookup

h(A1 || v1)

h(A2 || v2) for Ai≥

v... (prefix search)

146 of XYZ

...trade-off storage vs. performance

Di ital Enter rise Research Institute www.deri.ie

P-Grid : Range Queries

Page 147: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 147/180

[Datta et al. 2005]

Di ital Enter rise Research Institute www.deri.ie

Outline – Part 2

The Vision: A Universal Storageor e a a

A Distrib ted Uni ersal Storage

Page 148: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 148/180

A Distributed Universal Storage

 

Operators

Query Engine

Mappings & Query Expansion The Praxis: Implementation

 

148 of XYZ

Di ital Enter rise Research Institute www.deri.ie

VQL

Query language VQL

 

Conjunct ive queries

Page 149: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 149/180

j q

Enhanced by advanced operators Operations both on instance and schema level

Basic query form

SELECT ?oid, ?val

WHERE { ?oid price ?val }

SELECT ?oid, ?val

WHERE { ?oid price ?val }

...

LIMIT ...

...

LIMIT ...

149 of XYZ

SPARQL

Di ital Enter rise Research Institute www.deri.ie

Similarity Queries

 

FILTER (edist(?value, v) < 2) }

FILTER (edist(?value, v) < 2) }

Page 150: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 150/180

String similari ty:

Edit distance using (posit ional) q- Grams

LSH forestSWAM

[Gravano et al. 2001, Schallehn et al. 2004] Requires addit ional key- value pairs in P- Grid

, ,

 – h(q- gram i(A)) oid

 – h(q- gram i(v)) oid

150 of XYZ

 

Di ital Enter rise Research Institute www.deri.ie

String Similarity

0

0 0 1

1

1

Page 151: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 151/180

.. ..

0 0 11

. ....

00…0 11…1

“All values in distance d= 1…”

Quer ran e for att ribute vs. uer d+ 1 - rams

151 of XYZ

[NetDB06]

Di ital Enter rise Research Institute www.deri.ie

More Operators

Similarity joins [NetDB06]

WHERE { ?o1 attr 1 ?v1 . ?o2 attr 2 ?v2

FILTER (edist(?v1 ?v2) < k) }

WHERE { ?o1 attr 1 ?v1 . ?o2 attr 2 ?v2

FILTER (edist(?v1 ?v2) < k) }

Page 152: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 152/180

( ( 1, 2) ) }( ( 1, 2) ) }

Ranking queries: top-k, skyline [DBRank07]

 

ORDER BY ?v LIMIT k 

 

ORDER BY ?v LIMIT k 

WHERE { ?o attr ?v }

ORDER BY ?v NN “A String“ LIMIT k 

WHERE { ?o attr ?v }

ORDER BY ?v NN “A String“ LIMIT k 

152 of XYZ

WHERE { ?o attr1 ?x . ?o attr2 ?y}

SKYLINE OF ?x MIN, ?y MAX

WHERE { ?o attr1 ?x . ?o attr2 ?y}

SKYLINE OF ?x MIN, ?y MAX

Di ital Enter rise Research Institute www.deri.ie

String Similarity Joins

Page 153: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 153/180

Doubled sequent ial Doubled parallel

Cloud services

153 of XYZ

Parallel and sequent ial Sequent ial and parallel

Di ital Enter rise Research Institute www.deri.ie

Skyline Queries

Ob ects that are not “dominated“ b other ob ects 

Scoring function on mult iple attributes, no weight ing

Page 154: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 154/180

dominated objects 

rice

154 of XYZ

mileage

Di ital Enter rise Research Institute www.deri.ie

Skylines: Basic Idea

DSL

“Frame Skyline“ algorithm

Minimum of f irst

Page 155: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 155/180

dimension definesmaximum for seconddimension

 

a frame narrowing thesearch space

155 of XYZ

Di ital Enter rise Research Institute www.deri.ie

Skylines: Processing

...

...

Page 156: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 156/180

...

Min x x 

Min y 

1. Find minimum in one selective dimension x2. Use y value of min(x) to l im it search range

.  

4. Always ship current skyline with query

5. Determine lobal sk line at one eer

156 of XYZ

6. Opt ionally: distributed range querying “on theway“ to min(x)

Di ital Enter rise Research Institute www.deri.ie

Skylines: More Dimensions

All project ions to 2dsub- spaces are skylinecandidates

Page 157: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 157/180

candidates

Objects of the searchedframe can dominate

projections

 dominate objects of the

searched f rame

157 of XYZ

Di ital Enter rise Research Institute www.deri.ie

Outline – Part 2

The Vision: A Universal Storageor e a a

A Distributed Universal Storage

Page 158: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 158/180

g

 

Operators

Query Engine

Mappings & Query Expansion The Praxis: Implementation

 

158 of XYZ

Di ital Enter rise Research Institute www.deri.ie

Query Execution

Goal: stateless processing „Push“ approach 

(based on Mutant Query Plans [Papadimos et al. 2002])

Receiver peer is ident if ied by applying the hash funct ion

Page 159: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 159/180

Receiver peer is ident if ied by applying the hash funct ion

Multiple instances of the plan t ravel t rough the network

A,1 , A,2 {(A,2)} B PIER, PeerDB

p1

p4

{(B,2),(C,1),(B,2),(C,4)}

{(A,2,B,2,C,1), (A,2,B,2,C,4)}

(A) B

(A) B

DARQ

p0 2

p3p5

{(A,3),(A,4)}p0

{(A,5),(B, 5)}(A) B

{(A,3),(A,4)} B

159 of XYZ

{(A,5),(A,6)}, , ,

{(A,5)} B

Page 160: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 160/180

Di ital Enter rise Research Institute www.deri.ie

Guarantees: Completeness

Problem:“ “ ‘ complete results

Goal:

Page 161: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 161/180

Goal:

Allow to est imate result completeness

 – For the user (“98% of all possible answers“)

 – For blocking query operators (aggregators, ranking- based

operators) in order t o guarantee a certain level ofcompleteness

Idea:

Not feasible on data level, but perhaps on peer level?!

161 of XYZ

Di ital Enter rise Research Institute www.deri.ie

Completeness Estimation

Estimation on peer level

Support of probabilistic guarantees

Page 162: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 162/180

Accuracy improved by Milestone messages (MiMes)

...Seaweed

[CIKM08, WIDM08]

Join(A=B) P1B P2B P3B PmB...

Extract(A)sequ Extract(B)rangeP1

A P2A P3

A PnA...

162 of XYZ

Query graph0 Routing graph level

Routing

pointPx

Y

Di ital Enter rise Research Institute www.deri.ie

CERQ: Initial Estimation

0 1

P1000 001

Page 163: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 163/180

0000 0001

P0 P2

P3

P4

Example: P-Grid range query 00100-1101 at P0

Predict trie on information from: 

2) local routing table (at least one node per level/sub-tree)

=> estimates 8 (out of 10)

163 of XYZ

...the better, the more information is kept in each routing table

[P2P07]

Di ital Enter rise Research Institute www.deri.ie

CERQ: Estimation Refinement

The ini t ial est imate might not be correct 

Piggy- back information

Query contains est imation of peers in sub- tree

Page 164: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 164/180

Query contains est imation of peers in sub tree

Query replies can contain corrections

0 1

Sub-query P0 -> P3

Sub-tree 001*P1

0100

000 001

Estimate: 3 peers

P3’s routing table

contains peer(s) of 

P0 P2

P3

P4

0000 0001

164 of XYZ

sub-tree 0010*

Di ital Enter rise Research Institute www.deri.ie

CERQ: Further Improvements

Use of structural replication 

Each ent ry might have a di fferent path

Page 165: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 165/180

Every path allows peers to learn more about sub- trees More ent ries mean better init ial estimates

 

P2P networks are dynamic

Though the structure is likely to be stable

The learned structure can be cached for later estimates

165 of XYZ

Di ital Enter rise Research Institute www.deri.ie

CERQ: Other Overlays

SkipGraphs Most similar to P- Grid

Rout ing information of mult iple levels

Unknown number of peers in bucket layer

Page 166: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 166/180

Prefix hash tree (Chord)

Peers build a tree- hierarchy

Only applicable if number of children is known

 

Forwarding along neighbors

No estimation can be given

166 of XYZ

= e ea can e mappe un er cer a n con ons

Di ital Enter rise Research Institute www.deri.ie

Outline – Part 2

The Vision: A Universal Storageor e a a

A Distributed Universal Storage

Page 167: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 167/180

 

Operators

Query Engine

Mappings & Query Expansion The Praxis: Implementation

 

167 of XYZ

Di ital Enter rise Research Institute www.deri.ie

Representing Mappings

Simple kind of at tribute correspondences

subsumes

Page 168: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 168/180

A1 A2 A3 A4 A5 A6

Triple representation

(A4, equiv, A5)

(A3, subsumes, A6)PeerDBSQPeer

168 of XYZ

 

[Ideas08]

Di ital Enter rise Research Institute www.deri.ie

Query Expansion

Unexpanded query

Page 169: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 169/180

GridVinePDMS

Map operators added

169 of XYZFirst mapping Expanded query

Di ital Enter rise Research Institute www.deri.ie

Outline – Part 2

The Vision: A Universal Storageor e a a

A Distributed Universal Storage

Page 170: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 170/180

 

Operators

Query Engine

Mappings & Query Expansion The Praxis: Implementation

 

170 of XYZ

Di ital Enter rise Research Institute www.deri.ie

UniStore

Page 171: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 171/180

Di ital Enter rise Research Institute www.deri.ie

Evaluation: Similarity Joins

Page 172: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 172/180

c1: seq & par/ seq

 

c3: seq & par/ par

c4: par& par/ par

172 of XYZ

c5: par & local

Di ital Enter rise Research Institute www.deri.ie

Evaluation: CERQ 

1 reference 3 references 5references

Page 173: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 173/180

 

More references lead to (as expected)

Better init ial est imate

Less corrections

Smaller errors

173 of XYZ

 

Replicat ion factor 1 (similar results for factor 2)

Di ital Enter rise Research Institute www.deri.ie

Evaluation: Completeness

Page 174: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 174/180

Min, max, avg 74 peers Min, max, avg 50 peers

174 of XYZ Without MiMes With MiMes

Di ital Enter rise Research Institute www.deri.ie

Outline – Part 2

The Vision: A Universal Storageor e a a

A Distributed Universal Storage

Page 175: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 175/180

 

Operators

Query Engine

Mappings & Query Expansion

The Praxis: Implementation

 

175 of XYZ

Di ital Enter rise Research Institute www.deri.ie

Summary & Outlook

Web data is huge, heterogeneous, structured,linked

Modern applicat ions require a universal and f lexiblestorage

DB like and RDF lik ing

Page 176: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 176/180

DB- like and RDF- lik ing DHTs well- suited for large- scale data management

UniStore as one solution

o us an sca a e, un versa an g - we g

Sophisticated query capabilities

Adapt ive, cost- based, stateless and parallel QP

Guarantees, semantic layer

…and all on totally decentralised and self-organising P2P!! 

176 of XYZ

 

Privacy & Trust, reputat ion

Guarantees, consistency, integri ty

Di ital Enter rise Research Institute www.deri.ie

Acknowledgements

Kai- Uwe Satt ler, Manfred Hauswir th, Kat ja Hose,

Sapkota, Conor Hayes, ...

Page 177: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 177/180

Students: Mart in Richtarsky, Michael Haß, JessicaMüller, Mario Wiegandt, Stefan Schwalm, Matthias

, , ...

Supported by the Science Foundation Ireland underGrant No. SFI/ 08/ CE/ I1380 (Lion- 2) and under

Grant No. 08/ SRC/ I1407 (Clique: Graph & Network

177 of XYZ

 

Di ital Enter rise Research Institute www.deri.ie

Related Systems

Sindice: Sindice. The semantic web index. http:/ / sindice.com/ 

YARS: A. Harth, J. Umbr ich, A. Hogan, S. Decker, Yars2: A federated repository forquerying graph structured data f rom the web, in: Proc. of ISWC/ ASWC, 2007.

 Jena: Wilkinson, K., Sayers, C., Kuno, H., Reynolds, D.: Efficient RDF storage and retrievalin Jena2. In: SWDB, pp. 131–150 (2003)

Oracle: Chong, E.I., Das, S., Eadon, G., Srinivasan, J.: An efficient SQL- based RDFquerying scheme. In: VLDB, pp. 1216–1227 (2005)

- . . , . , . . , . . -

Page 178: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 178/180

  . . , . , . . , . . part it ioned DBMS for Semant ic Web data management . The VLDB Journal (2009) 18 :385– 406

PIER: R. Huebsch, J. M. Hellerstein, N. Lanham, B. Thau Loo, S. Shenker, and I. Stoica.Querying the Internet with PIER. In VLDB’03, pages 321–332, 2003.

RDFPeers: M. Cai and M. Frank. RDFPeers: a scalable distributed RDF repository based

on a structured peer- to- peer network . In WWW’04, pages 650–657, 2004. PeerDB: W. S. Ng, B. Ch. Ooi, and K.- L. Tan. PeerDB: A P2P- based System for Dist ributed

Data Sharing. In ICDE ’03, pages 633–644, 2003.

AmbientDB: P. Boncz and C. Trei tel . AmbientDB: Relat ional Quer Processin in a P2P Network . In Workshop On Databases, Information Systems and P2P Comput ing,(DBISP2P’03), pages 153–168, 2003.

Hexastore: Weiss, C., Karras, P., Bernstein, A.: Hexastore: sextuple indexing for

semantic web data management. VLDB, 2008’

178 of XYZ

  . . . , .W3C Candidate Recommendation.

Di ital Enter rise Research Institute www.deri.ie

Related Systems /2

LSH forest: M. Bawa, T. Cond ie, and P. Ganesan. LSH forest: self - tuning indexes forsimi larit y search. In WWW’05, pages 651–660, 2005.

SWAM: F. Banaei- Kashani and C. Shahabi. SWAM: a family of access methods fors m ar ty- searc n peer- to- peer ata net wor s. n , pages – , .

Cloud services: M. Brantner, D. Florescu, D. Graf, D. Kossmann, and T. Kraska. Bui ld inga database on S3. In SIGMOD ’08, pages 251–264, 2008.

DSL: P. Wu, C. Zhan, Y. Feng, B. Zhao, D. Agrawal, and A. El Abbadi. Parallelizing skyline

ueries for scalable distribution. In EDBT’06 a es 112–130 2006.Sk f S Q C O S f

Page 179: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 179/180

Skyframe: S. Wang, Q. H. Vu, B. Ch. Ooi, A. K. H. Tung, and L. Xu. Skyframe: aframework for skyline query processing in peer- to- peer systems. The VLDB Journal,18(1):345–362, 2009.

DARQ : B. Quilitz and U. Leser. Querying Distributed RDF Data Sources with SPARQL. In’ – , , .

ObjectGlobe: R. Braumandl, M. Keidl , A. Kemper, D. Kossmann, A. Kreutz, S. Seltzsam,and K. Stocker. Ob jectGlobe: Ubiquit ous query processing on t he Internet. VLDB Journal,10(1):48–71, 2001.

Seaweed: D. Narayanan, A. Donnelly, R. Mortier, and A. Rowstron. Delay aware queryingw t seawee . n , pages – , .

SQPeer: G. Kokk inid is, E. Sid irourgos, and V. Chr istophides. Query Processing in RDF/ S-Based P2P Database Systems. Semantic Web and Peer- to- Peer, chapt er 4, pages 59–81.

Springer, 2006. GridVine: K. Aberer P. Cudr´e- Mauroux M. Hauswirth and T. Van Pelt. GridVine:

179 of XYZ

Bui ld ing Internet- Scale Semant ic Overlay Networks. In ISWC’04, pages 107–121, 2004.

PDMS: A. Y. Halevy, Z. G. Ives, P. Mork, and I. Tatarinov. Piazza: data managementinf ras- tructure for semantic web applications. In WWW’03, pages 556–567, 2003.

Di ital Enter rise Research Institute www.deri.ie

Thank you!

[CIKM08]: M. Karnstedt , K.Satt ler, M. Haß, M. Hauswir th, B. Sapkota, R. Schmidt: Estimating theNumber of Answers with Guarantees for Structured Queries in P2P Databases, CIKM 2008,

Napa, USA. [WIDM08]: M. Karnstedt , K. Satt ler, M. Haß, M. Hauswirt h, B. Sapkota, R. Schmidt:

-Applicat ions, WIDM'08 icw CIKM'08, Napa, USA, 2008.

[Ideas08]: M. Karnstedt, K.Satt ler, M. Hauswir th, B. Sapkota, R. Schmidt: Ad- hoc Integrat ionand Querying of Semantic Web Data, Ideas 2008, Coimbra, Portugal.

[DBRank07]: M. Karnstedt, J. Müller, K. Satt ler, Cost- Aware Skyl ine Queries in Structured

‘, , , .

Page 180: P2P Tutorial

8/3/2019 P2P Tutorial

http://slidepdf.com/reader/full/p2p-tutorial 180/180

[ICDE07]: M. Karnstedt , K.Satt ler, M. Richtarsky, J. Müller, M. Hauswirt h, R. Schmidt, R. John:UniStore: Querying a DHT- based Universal Storage, ICDE 2007 Demonstration.

[P2P07]: M. Karnstedt , K.Satt ler, R. Schmidt: Completeness Estimation of Range Queries inStructured Overlays, P2P 2007, Galway, Ireland.

  . , . , . , . -Queries in Structured Overlays, P2P 2006, Cambr idge, UK

[NetDB06]: M. Karnstedt, K. Sattler, M. Hauswirth, R. Schmidt: Similarity Queries on StructuredData in Structured Overlays, NetDB'06 @ ICDE 2006, Atlanta, GA.

[Gilbert et al. 2002]: S. Gilbert and N. Lynch. Brewer’s conjecture and the feasibility of, , - . , – , .

[Datta et al. 2005]: A. Datta, M. Hauswir th, R. Schmidt, R. John, and K. Aberer. Range queries intr ie- structured overlays. In P2P’05, pages 57–66, 2005.

[Papadimos et al. 2002]: V. Papadimos, D. Maier. Mutant Query Plans. Information andSoftware Technology, 44(4):197–206, April 2002.

 

180 of XYZ

c a e n e a . : . c a e n, . e s , . a er: uppor ng m ar y pera ons aseon Approx imate Str ing Matching on the Web, CoopIS 2004, Larnaca.

[Gravano et al. 2001]: L. Gravano et al.; Approx im ate String Joins in a Database (almost) forFree, VLDB 2001, Roma.