View
213
Download
0
Embed Size (px)
Citation preview
UNIVERSITY OF JYVÄSKYLÄ
Resource Discovery in Unstructured P2P NetworksDistributed Systems Research Seminar on 22.3.2007
Mikko Vapa, research studentP2P Computing Group
Department of Mathematical Information Technology
http://www.mit.jyu.fi/cheesefactory
UNIVERSITY OF JYVÄSKYLÄ
Resource Discovery Problem
• In peer-to-peer (P2P) resource discovery problem any node in the network can possess resources and also query these resources from other nodes
Node1: Where is ?
Node 1
Node 2
Node 3
Node 4
UNIVERSITY OF JYVÄSKYLÄ
A Simple Solution for the Problem
• The most studied P2P network, Gnutella, for example used Breadth-First Search (BFS) flooding algorithm which sends query to all neighbors
• Problems: all resources in the network can be found, but network gets congested and there are lots of useless packets
Node 1: Where is ?
Node 1
Node 2
Node 3
Node 4
Query
QueryQuery
Query
Query
Query
Node 4: I have it!
Node 2: I have it!Node 4: Node 4 has it too!Reply
Reply
UNIVERSITY OF JYVÄSKYLÄ
Near-Optimal Solution:Steiner Minimum Tree Problem
• Optimal paths for resource discovery can be found by using non-distributed algorithm which requires global knowledge of topology and resources
• Precisely, this problem can be formulated as a task of finding a Steiner Minimum Tree (SMT) from a graph:
UNIVERSITY OF JYVÄSKYLÄ
MST k-Steiner Minimum Tree Algorithm
• MST k-Steiner Minimum Tree Algorithm was developed for finding an approximation solution:
UNIVERSITY OF JYVÄSKYLÄ
MST k-Steiner Minimum Tree Algorithm
1
m2
r1 r2
r5
r4r3
m1
7
13
1
6
1
31
1m3
r1 r2
r5
r4r3
7
66
1
5
6
1
5
5
5
r1
r5
r4r3
1
5
5
Graph G Graph GR after step (1) Tree TR after step (2)
m2
r1
r4r3
m13
1
1
31
r5
1
1m3
1
m3m2
r1
r4r3
m13
1
3
r51
1
1
1
r1
r4r3
m13
1
3
r5m3
Graph H after step (3) Tree T after step (4) Tree T after step (5)
EEO log2
Time Complexity:
whereE = number ofedges in a graph G
Worst-CaseApproximation Ratio:
2
R
whereR = availableresources
UNIVERSITY OF JYVÄSKYLÄ
Efficiency =Found Replies / Query Packets
• MST k-Steiner Minimum Tree algorithm shows that current local search algorithms for peer-to-peer networks are far from optimal paths
Efficiency of the Algorithms
0
0,2
0,4
0,6
0,8
1
1,2
0,0 20,0 40,0 60,0 80,0 100,0
% of Resources
Eff
icie
ncy Steiner
HDS
BFS
Gnutella topology of ~75000 nodes
0,001
0,01
0,1
1
0,0 20,0 40,0 60,0 80,0 100,0
% of Resources
Eff
icie
ncy
k-Steiner HDS RWSA BFS
UNIVERSITY OF JYVÄSKYLÄHighest Degree Search
K-Steiner Minimum Tree
K-Steiner Tree Algorithm locates9 resource instances with 11 query packets. For this querythe approximated solutionis also the optimal solution.HDS uses almost twice as muchquery packets for this query.
UNIVERSITY OF JYVÄSKYLÄ
Hops
• MST k-Steiner does not use the shortest paths to locate resources
Hops Used by the Algorithms
0
1
2
3
4
5
6
0,0 20,0 40,0 60,0 80,0 100,0
% of Resources
Ho
ps
Steiner BFS
UNIVERSITY OF JYVÄSKYLÄ
Branching Factor =Average Number of Children of Each Node Having Children in A Search Tree
• MST k-Steiner starts as one search direction algorithm, but changes to multiple search direction algorithm when more resources are being located
Branching Factor of the Algorithms
01
2345
67
0,0 20,0 40,0 60,0 80,0 100,0
% of Resources
Bra
nch
ing
Fac
tor
BFS Steiner
UNIVERSITY OF JYVÄSKYLÄ
MST k-Steiner Minimum Tree Algorithm
• Ways to improve MST k-Steiner:– Conducting an extensive survey of related work in graph theory and
k-Steiner Minimum Trees and modifying the problem to support multiple resource instances on a same node (Prize Collecting Steiner Tree problem with Quota)
– Getting the results published:Vapa M., Auvinen A., Ivanchenko Y., Kotilainen N., Vuori J., ”K-Steiner Minimum Tree Is An Upper Bound for Unstructured Peer-to-Peer Resource Discovery Algorithms”, submitted to Euro-Par 2007
– Now all the tools are available for discovering the theoretical limit of peer-to-peer technology in terms of total traffic induced on a telecommunication network in a given peer-to-peer network compared to client-server approach
– However, real-world applicability of ”Distributed k-Steiner minimum tree resource discovery algorithm” seems to be impossible, because all caching in P2P networks is likely to be useless (wide namespace, dynamic peers, dynamic topology and possibly changing content)
UNIVERSITY OF JYVÄSKYLÄ
Distributed Resource Discovery
• Distributed Resource Discovery needs to be solved using distributed algorithm and therefore k-Steiner Minimum Tree cannot be used directly
• In distributed resource discovery the node has to forward the query based on local knowledge
Node 1: Where is ?
Node 1
Node 2
Query
Node 2: I have it!
But whom should Iforward this queryfurther?
Reply
Unknowntopology
Unknowntopology
UNIVERSITY OF JYVÄSKYLÄ
Our Solution: NeuroSearch
• NeuroSearch resource discovery algorithm uses neural networks and evolution to adapt its’ behavior to given environment– neural network for deciding whether to pass the query further
down the connection or not– evolution for breeding and finding out the best neural
network in a large class of local search algorithms
Query
Forward the query
Forward the query
Neighbor Node
Neighbor Node
UNIVERSITY OF JYVÄSKYLÄ
NeuroSearch’s Inputs• The internal structure of NeuroSearch algorithm
• Multiple layers enable the algorithm to express non-linear behavior
• With enough neurons the algorithm can universally approximate any decision function
UNIVERSITY OF JYVÄSKYLÄ
NeuroSearch’s Training Program
• The neural network weights define how neural network behaves so they must be adjusted to right values
• This is done using iterative optimization process based on evolution and Gaussian mutation
Define theP2P networkconditions
Define the fitness requirements
for the algorithm
Create candidate algorithmsrandomly
Select the bestones for next
generation
Breed a newpopulation
Finally select thebest algorithm forthese conditions
Iteratethousands
ofgenerations
Compare the bestone against other
local search algorithms
UNIVERSITY OF JYVÄSKYLÄ
Typical Query Pattern of NeuroSearch
NeuroSearch uses 26 querypackets to locate 11 resourceinstances. There is a total of 17resource instances availableso locating 9 resource instanceswould have been enough to reach50% of resource instances.
UNIVERSITY OF JYVÄSKYLÄ
Efficiency of the Algorithms in 100 nodes Power-Law Topology
0
0,2
0,4
0,6
0,8
1
1,2
0,0 20,0 40,0 60,0 80,0 100,0
% of Resources
Eff
icie
ncy
Optimal Steiner HDS NeuroSearch BFS
Ranking List
• Highest Degree Search is currently the best known local search algorithm for power-law distributed scenario
NeuroSearch 2004NeuroSearch 2003
UNIVERSITY OF JYVÄSKYLÄ
Ideal Algorithm• NeuroSearch is close to
HDS in performance, but different in nature:– NeuroSearch uses
maximum number of hops far less than one search direction algorithms resulting in a low latency for searching
• Ideal would be to find an algorithm that:– Has low maximum hops– Has high efficiency
independent of how many resources needs to be located
– Sustains these properties in many P2P scenarios
Hops Used by the Algorithms
0
20
40
60
80
100
120
140
160
180
0,0 20,0 40,0 60,0 80,0 100,0
% of Resources
Ho
ps
HDS NeuroSearch 16:4 7 inputs
NeuroSearch 10:10 27 inputs NeuroSearch 16:4 23 inputs
NeuroSearch 30:20 23 inputs Steiner
BFS
UNIVERSITY OF JYVÄSKYLÄ
Future Work
• Now the first versions of NeuroSearch are ready and analyzed• Ways to enhance NeuroSearch include:
– History-based inputs to allow more accurate decisions– Studying the scalability factors affecting NeuroSearch when
the P2P network size grows– Analysis of the behavior in dynamic conditions– Speeding up the optimization process by parallelizing
evolutionary algorithm using distributed computing– The computational cost is demanding and replacing the
optimization algorithm does not help (see: Neri, Kotilainen, Vapa, ”An Adaptive Global-Local Memetic Algorithm to Discover Resources in P2P Networks”, to be published in EvoCOMNET 2007)
• Less flexible approximator could replace neural network
UNIVERSITY OF JYVÄSKYLÄ
References
• M. Vapa, A. Auvinen, Y. Ivanchenko, N. Kotilainen, J. Vuori, K-Steiner Minimum Tree Is An Upper Bound for Unstructured Peer-to-Peer Resource Discovery Algorithms, submitted to Euro-Par 2007.
• F. Neri, N. Kotilainen, M. Vapa, An Adaptive Global-Local Memetic Algorithm to Discover Resources in P2P Networks, to be published in EvoCOMNET 2007
• M. Vapa, N. Kotilainen, H. Kainulainen, J. Vuori, “Resource Discovery in P2P Networks Using Evolutionary Neural Networks”, International Conference on Advances in Intelligent Systems – Theory and Applications (AISTA 2004), 15.-18.11.2004.