downloading - IJS - Institut "Jozef Stefan"

EIGEN REP: REPUTATION MANAGEMENT IN P2P NETWORKSBecause of the huge popularity P2P systems are gaining from day to day, mechanisms for secure

content sharing are necessary. This need is dictated by the anonymous and open nature of P2P systems, where infiltration and spread of inauthentic files are easily accomplished if malicious peers are trying to subvert the system.

Reputation mechanisms in generally are about generation, discovery and aggregation of rating information in electronic commerce and P2P systems [Bin Yu et al.], since it’s far more effective to identify malicious peers as sources for inauthentic or files with bad quality than to track down those files whose number can be virtually unlimited.

This reputation mechanism (EigenRep) is based on Power iteration (eigenvalue algorithm) for computing global reputation values for every peer in the network. This global rep. value is later used for deciding if a peer will interact with another peer, being aware of his reputation. It should represent an effective way for detecting malicious peers, and, as later shown, malicious groups, too. In other words, it should help isolating malicious peers from the P2P network, since peers who haven’t earn enough or have earned bad reputation are not allowed to participate in the P2P interactions.

Furthermore, this approach helps distributing the load of computing and storing global values among all peers in the network, leading to minimal network overhead.

The challenge for rep. systems in distributed environment is how to aggregate the local rep. values without centralized storage and management facility, while at the same time taking into consideration large enough number of peers (in order to get appropriate view for one’s reputation), without congesting the network with system messages (asking for each peer’s local reputation).

DESIGNFive issues are considered while designing the system:1) There should be no central authority that will dictate or define the shared ethic of peers;2) Peer’s representation should be associated with an opaque ID, rather than some external

identity (IP address);3) No profit for newcomers, reputation should be hard to earn and only by consistent good

behavior, hence preventing the system of whitewashers;4) System with minimal overhear in terms of computing power, infrastructure, storage and

message complexity;5) Robustness to malicious collective;

The approached used is based on transitive trust: a peer will also trust the trusted peers of its trusted peers.

Advantage: high efficiency in decreasing the number of unsatisfactory downloads; even when malicious groups are present in large percentage (40-50%);

Drawback: It does prevent the system of congestion and also reduces the message complexity, but in the case of malicious groups operating successfully despite the reputation mechanism, the system will be subverted with much greater speed.

The global reputation value of each peer i is given by the local reputation values assigned to i by other peers, weighted by the global reputations of the assigning peers.

In order to aggregate local reputation values, they must be normalized. It prevents the system from effects caused when malicious peer assign arbitrarily high reputation value to other malicious peers and arbitrarily low reputation to good peers.

c ij=max (sij ,0)

∑j

max (sij ,0)

Drawbacks of normalization: the normalized vector makes no difference between a peer with whom peer i didn’t interact with and one that peer i has had poor experience with. c ij is relative,

meaning that if c ij=c ik , we know that peers j and k have same reputation in the eyes of i, but we don’t know if they are both very reputable or not so reputable.

Anyway, this approach is used because it leads to a good probabilistic model and still being able to achieve substantially good results.

// Note: what are the limits and the constraints? What is the number of peers, i.e. the size of the network for which these calculations stop to be acceptable?

Three more practical issues are defined:1) A priori notions of trust (based on the notion that usually first few peers that join the

network are often trustworthy): some distribution p over the set of pre-trusted peers P is defined, such that pi=1/|P|, i Є P and pi=0 otherwise.

2) Inactive peers: peers that don’t download from anybody else, or assign zero score for the local reputation value of other peers. In this case pi is used to be the start vector, i.e if peer i doesn’t know or trust anybody, it will choose to trust the pre-trusted peers.

3) Malicious collectives: breaking them by having each peer put some trust in the pre-trusted peers (that are not part of the collective).

Security issues are handled by implementing two ideas:1) The current trust value of a peer must not be computed and reside at the peer itself;2) The trust value of one peer will be computed by more than one peer-called mother peers.

The coordinate space is dynamically partitioned among peers in the system so that every peer covers a particular region in that coordinate space. Peers store (key, value) pairs, a.k.a. coordinates.

The assignment of mother peers {M} is done using DHT, which uses hash function to deterministically map keys (e.g. file names) into points in a logical coordinate space.

Here, a mother peer is located by hashing a unique ID of the peer (e.g. its IP address and TCP port) into a point in the DHT. The peer who currently covers this point as part of its DHT region is appointed as the mother of that peer.

In other words: the DHT coordinates posi of the mother peers (M) are determined by applying a set of one-way hash functions h0, h1, …, hM-1.

Since each peer also acts as a mother peer, it is assigned a set of daughters Di. Thus, every peer contains and reports the opinion vector of its daughter. In addition to these parameters, two more

values are stored and reported by each peer: Bdi- the set of peers that a peer’s daughter downloaded

files from; and Adi- the set of peers who downloaded files from peer’s daughter. Bdi is provided for peer i by its daughter d, when d submits its trust assessments on the peer that it has

downloaded files from. Similarly, Adi is provided by all the peers that had interaction with peer i’s daughter d, in the process of submitting their trust assessment of d.

By introducing DHTs, the robustness of the system is increased (the data stored by the mother does not depend on the presence of the mother in the system, especially in the case of mother’s failure).

The implementation of a secure algorithm introduces very important characteristics for the systems:

1) Because of the nature of one-way hash function, there is now way for a peer to know which peer’s ID it computes the trust value for, hence it invalidates malicious peers of giving good trust assessments for other malicious peers; (ANONYMITY)

2) Peers can’t choose their own coordinates in the hash space, hence it is not possible for a peer to locate itself in the DHT space (cannot determine their position by computing the hash value of its ID) and thus give itself good trust values; (RANDOMIZATION)

3) Since there are more mothers for one peer, one multidimensional hash functions are used for creating several coordinate spaces – thus mapping the peer’s unique ID into a different point in every multi-dimensional hash space.

// Note: Are peers aware of their own trust values? (i.e. is there a constraint for the peer to query all of its mothers to get the trust assessment of itself)

After computing and storing a global value, it can be used in more ways:1) One of the purposes for a global value is to isolate malicious peers from the network. This

can be done by introducing a relationship between the probability for downloading file from a particular peer and its global trust value. This will limit the number of unsatisfactory downloads on the network while at the same time allow newcomers to build trust;

2) Another way to use a global trust value is for incenting peers to share “good” files, i.e. files that are authentic. This can be done by rewarding good peers with greater bandwidth, increased connectivity etc. Another good side-effect of this is that it may give an incentive to a non-malicious peer to delete inauthentic files that a good peer downloaded by mistake, thus cleaning the system.

SIMULATONSSimulations are based on a typical P2P network model: peers are queried and queries are propagated in the usual Gnutella way, by broadcasting them to other peers.

Interactions between peers are computed based on a probabilistic content delivery model. Namely, it is assumed that peers are interested in a subset of the whole amount available on the

network, i.e. each peer pick a number of content categories and shares files only in these categories.When the simulator generates a query, it actually generates the category and the rank of the file that will satisfy that query. Each peer that receives the query checks if it supports the category and if it holds the file.

Different thread modes are considered.The settings implemented on the simulator introduce a fairly pessimistic scenario:

malicious peers connect to the most highly connected peer when joining the network; they have large bandwidth and respond to more than 20% of the queries issued by the rest of the peers in the network.

Metrics of interest are the No inauthentic downloads versus authentic file downloads. Non-trusted versus reputation based scenarios are considered.Two different trust-based algorithms for selecting download sources are considered:1) Deterministic: peer with highest trust is chosen as a download source;2) Probabilistic: peer is chosen with certain probability and peers that have trust value

0 are given 10% probability to be chosen as a download source (making a balance between newcomers and malicious peers with no trust);

THREAT MODELS1) No presence of malicious and pre-trusted peers:

a. Vast load imbalance;b. No chance for new peers to build up reputation;c. Much better performance if using probab. Model than determin., but still just slightly

better than non-trust based model;A) Always upload inauthentic files and assign good trust values to malicious peers they’ve found on

the way;B) Always upload inauthentic files and assign good trust values to malicious peers they’ve

cooperate with in a colluding group;C) Upload inauthentic files in f% of all cases in order to get some positive ratings from good peers

and thus build some reputation (by uploading authentic files in some cases);

//Note: according to the Figures, f% is the percentage of authentic files when C-model peers were chosen as a download source

D) One group of malicious peers provides only authentic files and uses the trust they’ve gained to boost the trust values of another group of malicious peers that only provides inauthentic files.

Simulation results for model A and B are very similar due to the presence of pre-trusted peers in mode B. Otherwise malicious group formation would heavily boost the trust values of malicious nodes. However, both of the models perform much better than the case where no trust model is implemented at all.

Simulation results for model C show that malicious peers have maximum impact on the network when providing 50% authentic files, which comes at a certain cost for them, because not only they

participate in the process of sharing and spreading good files across the network, they have to maintain a repository of authentic files which requires certain maintenance overhead. Also, in the long run, these peers will lose their reputation anyway.

Simulation for model D considers a malicious group consisted of peers from both model B and D. Type D peers act as normal peers who try to gain good global trust and in turn assign it to the type B peers.

There are other two threat models described in the paper, named as Model E and Model F.The first one represents the Sybil attack, where a peer can initiate virtually infinite number of peers

on the network. After this peer is chosen for transaction, it sends inauthentic file, disconnects and enters the system with completely new identity, thus outnumbering the good peers and lowering their chance to be chosen for a transaction.

Introducing some cost for every new IDThe second model is Virus Disseminators and is not addressed by EigenTrust. The reason for that is

the following: the model consists of malicious peer sending inauthentic file every 100 transaction; the rest of the files are authentic. Since EigenTrust doesn’t completely eliminate corrupt files, in the case of executable to be shared, it can cause great damage. The reason why this kind of problems are not addressed is that it is taken into account that today’s problems with P2P networks are mainly about flooding of the network with inauthentic files (since the network is used for sharing files and digital media), not distribution of malicious executables (which are rarely used).

Documents

downloading - IJS - Institut "Jozef Stefan"