23
Reputation Network Reputation Network Analysis for Email Analysis for Email Filtering Filtering Ravi Emani Ravi Emani Ramesh Ravindran Ramesh Ravindran

Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

Embed Size (px)

Citation preview

Page 1: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

Reputation Network Reputation Network Analysis for Email FilteringAnalysis for Email Filtering

Ravi EmaniRavi EmaniRamesh RavindranRamesh Ravindran

Page 2: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

Describes about…Describes about…

E-mail Scoring mechanism based on a E-mail Scoring mechanism based on a social network augmented with reputation social network augmented with reputation ratingsratings

Algorithm for inferring reputation ratingsAlgorithm for inferring reputation ratings

Integration into a mail application – Integration into a mail application – TrustMailTrustMail

Page 3: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

Preventing Spam…Preventing Spam…

Trying to prevent spam from even reaching the Trying to prevent spam from even reaching the user’s mailboxuser’s mailbox

Methods:Methods:

- Whitelist filters- Whitelist filters

- Social Networks- Social Networks

- Connecting Users - Connecting Users

Page 4: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

Whitelist FiltersWhitelist Filters

Messages accepted according to a list of Messages accepted according to a list of approved addresses created by the userapproved addresses created by the user

AdvantagesAdvantages

- No spam in user’s inbox- No spam in user’s inbox

- Filters the spam into a low-priority folder- Filters the spam into a low-priority folder

DisadvantagesDisadvantages

-Extra burden on the user-Extra burden on the user

-Filters even the valid emails-Filters even the valid emails

Page 5: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

Social NetworksSocial Networks

Proposed by Boykin and RoychowdhuryProposed by Boykin and RoychowdhurySocial network created from the messages Social network created from the messages received by the userreceived by the userMessages identified as spam, valid or Messages identified as spam, valid or unknown based on clustering thresholds unknown based on clustering thresholds and structural properties like the and structural properties like the propensity for local clustering. propensity for local clustering. Classifies about 50% of user’s email into Classifies about 50% of user’s email into spam or other valid categoriesspam or other valid categories

Page 6: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

Optimization…Optimization…

Extension of whitelisting and social network Extension of whitelisting and social network based filteringbased filteringUses a network that connects usersUses a network that connects usersA score of ‘reputation’ or ‘trust’ is assigned by A score of ‘reputation’ or ‘trust’ is assigned by the users to the people they knowthe users to the people they knowResults in a large reputation network connecting Results in a large reputation network connecting thousands of usersthousands of usersMessages sorted by the score shown next to the Messages sorted by the score shown next to the messages in the inboxmessages in the inbox

Page 7: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

Optimization…Optimization…

Overcomes the problem of the whitelistsOvercomes the problem of the whitelists

More reliable than the whitelists even though the More reliable than the whitelists even though the user takes the burden for creating an initial set of user takes the burden for creating an initial set of reputation ratingsreputation ratings

Less work comparatively Less work comparatively

Page 8: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

Creating the Reputation NetworkCreating the Reputation Network

Uses a Distributed, web based social Uses a Distributed, web based social networknetworkReputation rating inferred from one user to Reputation rating inferred from one user to anotheranotherIndividuals are connected to each person Individuals are connected to each person they ratedthey ratedResults in a large interconnected network Results in a large interconnected network of usersof users

Page 9: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

How is it related to Semantic Web?How is it related to Semantic Web?

The only requirement is that the The only requirement is that the individuals should assert their reputation individuals should assert their reputation ratings for one another in the networkratings for one another in the networkIndividuals will be controlling their own Individuals will be controlling their own datadataData is maintained in a distributed fashionData is maintained in a distributed fashionData can be stored anywhere and Data can be stored anywhere and integrated through a common foundationintegrated through a common foundation

Page 10: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

Role of Semantic Web...Role of Semantic Web...

Semantic web, along with its component Semantic web, along with its component languages RDF, RDFS, OWL utilize web languages RDF, RDFS, OWL utilize web architecturearchitectureSupports distributed data managementSupports distributed data managementUsers create ontologies with classes and Users create ontologies with classes and properties and hence instancesproperties and hence instancesThe instances of the classes help in The instances of the classes help in describing the data on the webdescribing the data on the web

Page 11: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

FOAF ProjectFOAF Project

Friend-Of-A-Friend project developed on Friend-Of-A-Friend project developed on Semantic WebSemantic WebAn ontological vocabulary for describing An ontological vocabulary for describing people and their relationshipspeople and their relationshipsExtended by providing a mechanism Extended by providing a mechanism describing the reputation relationshipsdescribing the reputation relationshipsAllows people to rate the reputation or Allows people to rate the reputation or trustworthiness of another persontrustworthiness of another person

Page 12: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

Fig: The reputation network developed as part of the semantic web trust project at http://trust.mindswap.org.

Page 13: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

Algorithms for Inferring Reputation Algorithms for Inferring Reputation between Individualsbetween Individuals

Recommendations are made to one Recommendations are made to one person(person(sourcesource) about the reputation of ) about the reputation of another person(another person(sinksink))Trust and reputation literature contains Trust and reputation literature contains many different metricsmany different metricsThese metrics are categorized according These metrics are categorized according to the perspective used for making to the perspective used for making calculationscalculations

Page 14: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

Perspective in Reputation Perspective in Reputation Inference AlgorithmsInference Algorithms

GlobalGlobal metrics calculate a single value for each metrics calculate a single value for each entity in the networkentity in the networkLocal Local metrics calculate a reputation rating for an metrics calculate a reputation rating for an individual in the networkindividual in the networkIn In global global system an entity will always have the system an entity will always have the same inferred ratingsame inferred ratingIn In locallocal system an entity could be rated system an entity could be rated differently depending on the node the inference differently depending on the node the inference is made foris made for

Page 15: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

Perspective in Reputation Perspective in Reputation Inference AlgorithmsInference Algorithms

Global metrics can be highly Global metrics can be highly effective in situations where the effective in situations where the experiences of users are similarexperiences of users are similar

Local metrics can be appropriate Local metrics can be appropriate where user’s opinions vary about where user’s opinions vary about the same topicthe same topic

A

DC

B

E

10

1 9

10

Page 16: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

Accurate Metrics for Inferring Accurate Metrics for Inferring ReputationReputation

The inferred rating from the source to the sink is given by a weighted average of the neighbors’ reputation ratings of the sink.Reputation rating ‘t’ from Reputation rating ‘t’ from sourcesource ‘i’ to ‘i’ to sinksink ‘s’ is ‘s’ is written as ‘written as ‘ttisis’’

No inference needed if source is directly No inference needed if source is directly connected to the sinkconnected to the sinkIf not, the reputation rating is calculated by weighted average of the reputation ratings returned for the sink by each of its n neighbors.

Page 17: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

getRating(source, sink)

mark source as seen

if source has no rating for sink

denom = 0

num = 0

for each j in neighbors(source)

if j has not been seen

denom ++ j2sink = in(rating(source,j),getRating(j,sink))

num += rating(source,j) * j2sink

mark j unseen

rating(source,sink) = num/denom

return rating(source,sink)

Page 18: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

Accurate metrics for Inferring Accurate metrics for Inferring ReputationReputation

n

j ij

ijjsis t

ttt0

2js ij if

js ij if *

tttt

The concise representation of how tis is weighted is shown as follows:

The condition in this formula ensures that the source will never trust the sink more than any intermediate node

Page 19: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

Reputation Metric EvaluationReputation Metric Evaluation

To determine the accuracy of this metricTo determine the accuracy of this metric

Reputation rating tReputation rating tijij is recorded for each is recorded for each

neighbor ‘j’ by iterating through each neighbor ‘j’ by iterating through each individual ‘i’ in the networkindividual ‘i’ in the network

Later the connection from i to j is removed Later the connection from i to j is removed and the reputation rating tand the reputation rating tijij` is recorded` is recorded

The accuracy is measured as |tThe accuracy is measured as |tijij-t-tijij`|`|

Page 20: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

TrustMail: A PrototypeTrustMail: A Prototype

Message Scoring SystemMessage Scoring System

Adds reputation ratings to the folder views Adds reputation ratings to the folder views of a messageof a message

Helps sort messages accordingly by the Helps sort messages accordingly by the user after he sees the reputation ratingsuser after he sees the reputation ratings

Highlights the important and relevant Highlights the important and relevant messagesmessages

Page 21: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

Conclusion and Future WorkConclusion and Future Work

Our algorithm infers reputation Our algorithm infers reputation relationships in a networkrelationships in a network

Benefit - Valid emails from unknown Benefit - Valid emails from unknown people can receive high scores because of people can receive high scores because of the connections within the social networkthe connections within the social network

Future work involves the refinement of the Future work involves the refinement of the algorithm for inferring reputation ratingsalgorithm for inferring reputation ratings

Page 22: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

Conclusion and Future workConclusion and Future work

May involve developing and studying the May involve developing and studying the TrustMail interfaceTrustMail interfaceThe number of ratings received will The number of ratings received will change with the size of a networkchange with the size of a networkImportant issues to be consideredImportant issues to be considered-Techniques combining best with -Techniques combining best with reputation filteringreputation filtering

- Percentage of messages accurately - Percentage of messages accurately scoredscored

Page 23: Reputation Network Analysis for Email Filtering Ravi Emani Ramesh Ravindran

ReferencesReferences

Boykin, P. O. & Roychowdhury, V. Personal email Boykin, P. O. & Roychowdhury, V. Personal email networks: an effective anti-spam tool networks: an effective anti-spam tool http://www.arxiv.org/abs/cond-mat/0402143, (2004).http://www.arxiv.org/abs/cond-mat/0402143, (2004).http://sites.wiwiss.fu-berlin.de/suhl/bizer/SWTSGuide/http://sites.wiwiss.fu-berlin.de/suhl/bizer/SWTSGuide/RDFWeb: FOAF: ‘The Friend of a Friend Vocabulary’, RDFWeb: FOAF: ‘The Friend of a Friend Vocabulary’, http://xmlns.com/foaf/0.1/http://xmlns.com/foaf/0.1/Golbeck, Jennifer, Bijan Parsia, James Hendler, “Trust Golbeck, Jennifer, Bijan Parsia, James Hendler, “Trust Networks on the Semantic Web,”Networks on the Semantic Web,”Richardson, Matthew, Rakesh Agrawal, Pedro Richardson, Matthew, Rakesh Agrawal, Pedro Domingos. “Trust Management for the Semantic Web,” Domingos. “Trust Management for the Semantic Web,” Proceedings of the Second International Semantic Web Proceedings of the Second International Semantic Web Conference, Sanibel Island, Florida, 2003.Conference, Sanibel Island, Florida, 2003.