Upload
christal-ferguson
View
219
Download
0
Embed Size (px)
Citation preview
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Immune System and Search Technology
Designing a Fast Search Algorithm for P2P Network using concepts from
Immune Systems
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Overview of the Presentation● P2P Network
– Paradigm for Decentralised Computing
● Immune System Features
● Experimental Setup
● Simulation Results
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Peer To Peer Network● Most Direct Method of Connecting Computers
– Simple
– Inexpensive
– No Boss
– No Regulation
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Peer To Peer Network● PCs at the edge of the network are called “Peers”● Peers can retrieve objects directly from each other
Advantages of a P2P Network
A large collection of peers may be available for content distribution--sometimes millions!
User takes advantage of the network’s currently available resources.
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Peer To Peer Network
● Problem of Hugeness– Emergence of Protocol
● Network Structure ● Degree of Centralization
Unstructured Network
Loosely Structured Network
Structured Network
Hybrid Decentralized
Napster
Pure Decentralized
Gnutella Freenet CAN, CHORD
Partially Centralized
FastTrack, Kazaa, Morpheus
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
P2P: Hybrid Decentralized (Napster)
When peer connects, it informs central server:
– IP address– content
Centralized
directory server
peers
Alice
Bob
1
1
1
1
3
Alice queries for
Das Wunder von Bern
Alice requests file from Bob
Hybrid Decentralized – Napster Pure Decentralized – GnutellaPartially Centralized - Kazaa
While file transfer is decentralized, locating content is highly centralized
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
P2P: Hybrid Decentralized (Napster)
Centralized
directory server
peers
Alice
Bob
1
1
1
1
3
Hybrid Decentralized – Napster Pure Decentralized – GnutellaPartially Centralized - Kazaa
● Fast ● Single point of failure
– Application crash● Performance bottleneck● Huge database to
maintain● Copyright infringement
– Legal proceedings may result in the company having to shut down directory server
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Hybrid Decentralized – Napster Pure Decentralized – GnutellaPartially Centralized - Kazaa
P2P: Intermediate Arrangement (Kazaa)
FeatureHas a centralized server that
• maintains user registrations, • logs users into the systems to keep
statistics, • provides downloads of client
software.
Two client types are supported: Supernodes (fast cpus + high bandwidth connections)Nodes (slower cpus and/or connections)
Supernodes addresses are provided in the initial download. They also maintain searchable indexes and proxies search requests for users.
^
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Hybrid Decentralized – Napster Pure Decentralized – GnutellaPartially Centralized - Kazaa
P2P: Pure Decentralized (Gnutella) Basic Feature● no hierarchy, peers have
similar responsibilities: no group leader
● no peer maintains directory info
● highly decentralized
Joining Algorithm ● use bootstrap node to
learn about others● Join message
^
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Hybrid Decentralized – Napster Pure Decentralized – GnutellaPartially Centralized - Kazaa
P2P: Pure Decentralized (Gnutella)
^
Message Query : ● Send query to
neighbors● If queried peer has
object, it sends message back to querying peer
● The queried peer forwards the query to its immediate neighbor.
● The resulting results are carried back to the user.
● A message Flooding occurs
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Hybrid Decentralized – Napster Pure Decentralized – GnutellaPartially Centralized - Kazaa
P2P: Pure Decentralized (Gnutella) Pros : ● Totally Decentralized query ● Robust; Query doesn't stop on
break down of one of the nodes● Fresh Results : No outdated Index
Cons ● Query radius: Query Radius can
be long● Excessive query traffic : 25% of
the total traffic is query traffic● Total Traffic in Gnutella Network
is 1.7 Gbps 1.7% of total traffic in US Internet Backbone
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Hybrid Decentralized – Napster Pure Decentralized – GnutellaPartially Centralized - Kazaa
P2P: Pure Decentralized (Gnutella)
Challenges Ahead : ● Reduce Query time● Stop Flooding; use Intelligent
method for search to stop network congestion
Relation Between Data and Topology
Structured and Loosely Structured Topology
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Unstructured – Gnutella Structured – CHORD Loosely Structured - Freenet
P2P: Structured Decentralized Network Distributed Hash Table :
Data or metadata is carefully placed across nodes in a deterministic fashionEvery file and every node (ip) generates a unique hash address helping in placement of dataEach node has to keep information of limited number of neighborsSearch is very fast, typically of the order log(n)Extremely scalable
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Unstructured – Gnutella Structured – CHORD Loosely Structured - Freenet
P2P: Structured Decentralized Network Disadvantages● Locality is destroyed.
– Data items (i.e. files) from a single site are not usually co-located, meaning that opportunities for enhanced browsing, pre-fetching and efficient searching are lost.
● Useful application level information is lost.– The data used by many applications is naturally
described using hierarchies, which expose relationships between items near to each other. The virtualization of the file namespace by generating keys discards this information.
● P2P Networks are extremely transient● Difficult to have keyword search and not exact-
match queries
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Unstructured – Gnutella Structured – CHORD Loosely Structured - Freenet
P2P: Loosely Structured Network ● Freenet is in between the two. ● File locations are affected by routing hints, but they are not
completely specified, so not all searches succeed.● It essentially pools unused disk space in peer computers to
create a collaborative virtual file system.● Files are replicated when they are searched.
Unstructured NetworkLoosely Structured
NetworkStructured Network
Hybrid Decentralized Napster
Pure Decentralized Gnutella Freenet CAN, CHORD
Partially CentralizedFastTrack, Kazaa,
Morpheus
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Search
Search MechanismTopology PlacementDataMessage Routing
Search CriterionExpressiveness (Key-lookup, Keyword, Rank Keyword)Efficiency (Bandwidth, Processing Power, Storage)Quality of Service (Number of Results, Response Time)Robustness (Stability in the presence of failures)
System RequirementAutonomy (Freedom to chose how much data to store, where to store)
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Artificial Immune System● Relatively new branch of computer science
– Using natural immune system as a metaphor for solving computational problems
– Not modelling the immune system
● Variety of applications so far …– Fault diagnosis (Ishida)– Computer security (Forrest, Kim)– Novelty detection (Dasgupta)– Robot behaviour (Lee)– Machine learning (Hunt, Timmis, de Castro)
– AIS are computational systems, inspired by theoretical immunology and observed immune functions, which are applied to complex problem domains (Timmis, 2001)
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Why the Immune System?
● Recognition– Anomaly detection– Noise tolerance
● Robustness● Feature extraction● Diversity● Memory● Distributed● Adaptive
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Role of the Immune System
● Protect our bodies from infection
● Primary immune response– Launch a response to
invading pathogens● Secondary immune
response– Remember past
encounters– Faster response the
second time around
Lymphatic vessels
Lymph nodes
Thymus
Spleen
Tonsils andadenoids
Bone marrow
Appendix
Peyer’s patches
Primary lymphoidorgans
Secondary lymphoid organs
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Role of the Immune System
● Protect our bodies from infection
● Primary immune response– Launch a response to
invading pathogens● Secondary immune
response– Remember past
encounters– Faster response the
second time around
MHC protein Antigen
APC
Peptide
T-cell
Activated T - cell
B- cell
Lymphokines
Activated B -cell (plasma cell)
( I )
( III )
( IV )
( V )
( VI )
( VII )
( II )
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Role of the Immune System MHC protein Antigen
APC
Peptide
T-cell
Activated T - cell
B- cell
Lymphokines
Activated B -cell (plasma cell)
( I )
( III )
( IV )
( V )
( VI )
( VII )
( II )
Epitopes
-B cell Receptors
Antigen
The immune recognition is based
on the complementarily
between the binding region of
the receptor and a portion of the antigen called
epitope.
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Role of the Immune System MHC protein Antigen
APC
Peptide
T-cell
Activated T - cell
B- cell
Lymphokines
Activated B -cell (plasma cell)
( I )
( III )
( IV )
( V )
( VI )
( VII )
( II )
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Role of the Immune System MHC protein Antigen
APC
Peptide
T-cell
Activated T - cell
B- cell
Lymphokines
Activated B -cell (plasma cell)
( I )
( III )
( IV )
( V )
( VI )
( VII )
( II )
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Role of the Immune System
Auto Immune Reaction (Self NonSelf Discrimination)Self Presented at beginning
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
General Framework for AIS
Application Domain
Representation
Affinity Measures
Immune Algorithms
Solution
P2P Network Search
Search Item - Antigen
Similarity (message,search item)
ImmuneSearch Algorithm
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Reiterating the Perspective
Solution
P2P Network Search
Search Item - Antigen
Similarity (message,search item)
ImmuneSearch Algorithm
Search MechanismTopology Message Routing
Search CriterionEfficiency (Stop Packet Flooding)Quality of Service (Number of Results)Robustness (Stability in the presence of failures)
System RequirementAutonomy (Freedom from storing data)
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Modeling the Network
Information Profile – PopSearch Profile – Classical
User
Search MechanismTopology Message Routing
Search CriterionEfficiency (Stop Packet Flooding)Quality of Service (Number of Results)Robustness (Stability in the presence of failures)
System RequirementAutonomy (Freedom from storing data)
Profile is thought to be continuous It is represented by a 10-bit binary string That is, it is assumed there are 1024 categoriesProfiles close to each other (pop,rap) are close in terms of Hamming Distance
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Modeling the Network
Zipf Law (Information and SearchProfile)
1
1
1
1
1
1
1
3
0
3
0
0
0
2
2
3
Search MechanismTopology Message Routing
Search CriterionEfficiency (Stop Packet Flooding)Quality of Service (Number of Results)Robustness (Stability in the presence of failures)
System RequirementAutonomy (Freedom from storing data)
Zipf Law Power Law to calculate probability of occurrece of a pattern r Pr ia , r is the ith frequent keyword, a is a constant close to 1Nr = K/ia Nr = N
N = 16, K = 7.68, a = 1K/1 = 7, K/2 = 4 K/3 = 3, K/4 = 2
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Search the Network – Flooding
Flooding essentially implies sending the message packet to all the neighboring nodes
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Search the Network – Random Walk
A Message packet travels at its will
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Search the Network – Immune Search
Algorithm Consists of two parts
1. The movement of Message Packets
2. Rearrangement of Topology
Proliferation
MutationHigh Concentration of Packets
Search MechanismTopology Message Routing
Search CriterionEfficiency (Stop Packet Flooding)Quality of Service (Number of Results)Robustness (Stability in the presence of failures)
System RequirementAutonomy (Freedom from storing data)
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Search the Network – Immune Search
Algorithm Consists of two parts
1. The movement of Message Packets
2. Rearrangement of Topology
Proliferation
Mutation
Search MechanismTopology Message Routing
Search CriterionEfficiency (Stop Packet Flooding)Quality of Service (Number of Results)Robustness (Stability in the presence of failures)
System RequirementAutonomy (Freedom from storing data)
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Search the Network – Immune Search
Aim Cluster Similar Nodes (Similar in Information and Search Profile)
AlgorithmMove nodes similar to user node closer to the user (change their neighborhood)
Search MechanismTopology Message Routing
Search CriterionEfficiency (Stop Packet Flooding)Quality of Service (Number of Results)Robustness (Stability in the presence of failures)
System RequirementAutonomy (Freedom from storing data)
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Search the Network – Immune Search
Movement Depends on1. The Distance from the
user node2. Amount of Matching3. Age
Aim Cluster Similar Nodes (Similar in Information and Search Profile)
AlgorithmMove nodes similar to user node closer to the user
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Search the Network – Immune Search
Movement Depends on1. The Distance from the
user node2. Amount of Matching3. Age
Aim Cluster Similar Nodes (Similar in Information and Search Profile)
AlgorithmMove nodes similar to user node closer to the user
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Search the Network – Immune Search
Movement Depends on1. The Distance from the
user node2. Amount of Matching3. Age
Aim Cluster Similar Nodes (Similar in Information and Search Profile)
AlgorithmMove nodes similar to user node closer to the user
No Movement
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Experimental Results
Experiment : • Run for 100
generation, without changing the participating nodes
• Each Generation 100 searches by users selected randomly
Efficiency • No. Of Search Items
found in 50 time steps
Comparison• Random Walk,
Flooding, • Proliferation
100
100
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Experimental Results Fairness Criteria
• Search Criteria is same HD(Search,query)
• Number of query packets are same
• Initial Number of packets in Random Walk is higher than Proliferation
• Flooding is not continued for 50 time steps
100
100
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Experimental Results Fairness Criteria
• Proliferation1 and ImmuneSearch have same proliferation rate • Proliferate
HD(Search,query) < 2• Proliferation2 has higher
proliferation rate • Proliferate
HD(Search,query) < 3• Proliferation2 has almost
same number of packets as ImmuneSearch
100
100
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Experimental Results(Cost) No of Packets
staying for 50 time stepsLimited Flooding – 16ImmuneSearch - 2Proliferation1 – 2
Proliferation is self-regulatory
100
100
Performance ImmuneSearch Proliferation1
Proliferation2 RandomWalkLimited Flooding
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Clustering (Most Frequent Token)
100
100
Cluster Very Fast – within 24 generation clustersNot one cluster but two/three clustersInformation Profile and Search Profile interminglesSo clusters are not very tightThis allows
Proliferation to flourish without much wasting
Lesser frequent tokens can cluster
Information Profile – Pop
Search Profile – Classical
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Clustering (Less Frequent Token)
100
100
Clustering of second, third and eleventh most frequent tokens
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Experimental Results
Experiment : Change 5%, 10%, --- 50% ofthe node at each
generation
100
100
Search MechanismTopology Message Routing
Search CriterionEfficiency (Stop Packet Flooding)Quality of Service (Number of Results)Robustness (Stability in the presence of failures)
System RequirementAutonomy (Freedom from storing data)
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Experimental Results
Experiment : Change 5%, 10%, --- 50% ofthe node at each
generation
100
100
Search MechanismTopology Message Routing
Search CriterionEfficiency (Stop Packet Flooding)Quality of Service (Number of Results)Robustness (Stability in the presence of failures)
System RequirementAutonomy (Freedom from storing data)
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Experimental Results
Observations• ImmuneSearch is better
till 50% replacement than simple proliferation
• 5% replacement is some times better than without replacement scheme
100
100
Search MechanismTopology Message Routing
Search CriterionEfficiency (Stop Packet Flooding)Quality of Service (Number of Results)Robustness (Stability in the presence of failures)
System RequirementAutonomy (Freedom from storing data)
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Clustering (In Changing Condition)
100
Clustering of most frequent tokens with 5%, 10% and 20% replacement.
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Experimental Results(Amount of Change in Neighborhood)
● Change of 20% óf the node after 100 generations without replacement
● The neighborhood change rate drop after some time
● In 5% continuous replacement, it always changes maintaining a more or less constant rate
● The new nodes participate in this change
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
● The network works as a self correcting/organizing system
● The proliferation – mutation combination is a good alternative for random walk and flooding
● Topology evolution helps in enhancing the performance of the network
● The design is robust● Simulate it on other overlay topologies
Discussion and Future Work
Niloy Ganguly (Zentrum für Hochleistungsrechnen (ZHR) – TU Dresden) <[email protected]>
Project funded by the Future and Emerging Technologies arm of the IST Programme
Questions and Answers