Upload
juniper-wilkinson
View
219
Download
0
Tags:
Embed Size (px)
Citation preview
Paraskevi RaftopoulouParaskevi Raftopoulou1,21,2 and Euripides G.M. Petrakis2
1Max-Planck Institute for Informatics, Saarbruecken, Germany http://www.mpi-inf.mpg.de/
2 Technical University of Crete, Chania, Greece http://www.intelligence.tuc.gr/
A Measure for Cluster Cohesion in Semantic Overlay Networks
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Outline Motivation & Related work Distributed resource sharing iCluster architecture Measuring clustering quality Experimental evaluation Conclusion
2 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Motivation & Related work
3 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Motivation Resource sharing is at the core of today’s
computing (Web, P2P, Grid) Information retrieval functionality is
needed Overlay networks is a nice technology to
built on Measures are used for evaluating network
organisation and retrieval efficiency
4 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Related Work Semantic Overlay Networks
Initial approaches include: [KJ04], [SMZ03], [PMW07]
Based on the idea of small-world networks:[Smi04], [LLS04], [VSI06], DESENT
Concepts & measures quantifying network organisation (generalised) Clustering coefficient:
[WS98], [HAH07] Extensions/modifications:
[FHJS02], [BGW08], [RMJ07], [FH06]
5 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Distributed resource sharing
6 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Semantic overlay networks Self-organising overlay networks The idea:
Peers that are semantically, thematically, or socially close
(i.e., sharing similar interests or resources) are organised
into groups. Queries are routed to the appropriate group.
Peers hold routing indices with links to other peers Peers connected to each other are called
neighbours Support rich data models and expressive query
languages
7 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Rewiring strategies Techniques for self-organising peers:
abandon old connections and create new ones periodic process
Inspired by the ‘small world effect’ reach anybody in a small number of routing
hops
8 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
There are cliques and subgraphs that are characterised by connections between almost any two peers within them.
Small-world networks Peers are not neighbours of one another Peers can be reached from every other
peer by a small number of hops
Main characteristics:1. small average shortest path length2. high clustering coefficient
Most pairs of peers will be connected by at least one short path.
9 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
iCluster architecture
10 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
iCluster basics (i) intelligent + (Cluster) clustering = iClusterDL
Contributions: Architecture and protocols to support IR
functionality seamless and easy integration of peers, scalable fast query processing
Self-organising peers based on SONs support rich query models benefits from loosely-connected peers
11 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
iCluster Protocols Peer join/leave Peer rewiring Query processing Document retrieval
12 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Peer rewiring
A peer p1. computes its intra-cluster similarity
(average similarity with its neighbours)2. initiates rewiring if similarity < threshold θ 3. sends a message (msg) with its interest to m
neighbours
All peers receiving msg append their interest and forward msg to m neighbours
The message is sent back to p when TTL τR= 0
13 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Query processingA peer p
1. compares q against its interests & selects the interest int most similar to q
2. if similarity ≥ threshold θ forwards a message (msg)
including q to all its neighbours with TTL τb 3. if similarity < threshold θ forwards msg to the m of
its neighbours most similar to q
All peers receiving msg do the same process The message is forwarded until TTL τf = 0
14 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Measuring clustering quality
15 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Clustering coefficient The ratio of links between the
peers within pi’s neighborhood with the number of links that could possibly exist between them
pi
ci = 1/6ci = 1/2
pipi
ci = 1ci = 0
pi
Takes values in the interval [0, 1] if ci = 1, every peer
connected to pi is also connected to every other peer within the neighborhood
If ci = 0, no peer that is connected to pi connects to any other peer connected to pi
jkikj
kj
i RIpRIppss
ppc
,,,
)1(
,
Takes into account only the immediate neighbours of the peer Takes high values when there are cliques Loses the general view of the network
16 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Clustering efficiency A new measure that
quantifies network organisation and reflects retrieval effectiveness
Based on the network organisation and on the query processing protocols
Consider that a peer pi’ s neighborhood consists of all peers by radius τb around pi
17 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Takes values in the interval [0, 1] if κi = 1, the
neighborhood of pi contains all peers similar to pi
If κi = 0, the neighborhood of pi contains none peer similar to pi
N
1kkik
N
1jjibjiGj
i
)p,p(sim:p
)p,p(sim,t)p,p(d:p
Clustering efficiency The number of peers
similar to pi that can be reached from pi within τb hops divided by the total number of similar peers
pi
ci = 0
κi = 1
Gives information about the underlying network organisation involving more than just the immediate neighbors Looks at how the network is organised at a larger scale
18 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Experimental evaluation
19 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Experimental Evaluation Used different parameters:
Data corpus Similarity threshold Query TTL Forwarding strategies
Parameter Symbol Value
peers N 2,000
short-range links s 8
long-range links l 4
similarity threshold θ 0.9
rewiring TTL τR 4
fixed forwarding TTL τf 6
broadcast TTL τb 2
message fanout m 2
OHSUMED TREC30,000 medical articles10 categories
TREC-6556,000 documents100 categories
the start of the rewiring is randomly chosen from the time interval [0, 4K]
the periodicity is randomly selectedfrom a normal distribution of 2K
20 of 25
Looked into the: Network organisation Recall
The better the network organisation is, the better the performance of retrievals should be!
The experiments are intended to: associate the performance of retrievals with the
quality of network organisation recommend the clustering measure that better
represents this association
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Experimental Evaluation
Clustering coefficient ci for different forwarding strategies
21 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Experimental Evaluation
Clustering efficiency κi for different forwarding strategies
22 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Experimental Evaluation
Retrieval
23 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Outlook
24 of 25
Workshop on Large-Scale Distributed Systems for Information Retrieval Napa Valley, California, 30 October, 2008
Paraskevi Raftopoulou Max-Planck Institute for Informatics & Technical University of Crete
Conclusion
The idea focus on IR on top of SON look at how the network is organised at a large scale
Clustering efficiency quantifies the underlying (dynamic) P2P structure reflects retrieval effectiveness
The results indicate that clustering efficiency measure is better modeling network clustering quality compared to other existing measures
25 of 25