50
UT DALLAS Erik Jonsson School of Engineering & Computer Science FEARLESS engineering Data and Applications Security Security and Privacy in Online Social Networks Murat Kantarcioglu Bhavani Thuraisingham Thanks to Raymond Heatherly and Barbara Carminati for helping in slide preparations November 2010

Data and Applications Security Security and Privacy in Online Social Networks

  • Upload
    ellis

  • View
    29

  • Download
    0

Embed Size (px)

DESCRIPTION

Data and Applications Security Security and Privacy in Online Social Networks. Murat Kantarcioglu Bhavani Thuraisingham Thanks to Raymond Heatherly and Barbara Carminati for helping in slide preparations November 2010. Outline. Introduction to Social Networks Properties of Social Networks - PowerPoint PPT Presentation

Citation preview

Page 1: Data and Applications Security Security and Privacy in  Online Social Networks

UT DALLASUT DALLAS Erik Jonsson School of Engineering & Computer Science

FEARLESS engineering

Data and Applications Security

Security and Privacy in Online Social Networks

Murat Kantarcioglu

Bhavani Thuraisingham

Thanks to Raymond Heatherly and Barbara Carminati for helping in slide preparations

November 2010

Page 2: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Outline

• Introduction to Social Networks• Properties of Social Networks• Social Network Analysis Basics• Data Privacy Basics• Privacy and Social Networks• Access control issues for Online Social Networks

Page 3: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Social Networks

• Social networks have important implications for our daily lives.– Spread of Information– Spread of Disease– Economics – Marketing

• Social network analysis could be used for many activities related to information and security informatics.– Terrorist network analysis

Page 4: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Enron Social Graph*

* http://jheer.org/enron/

Page 5: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Romantic Relations at “Jefferson High School”

Page 6: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Emergence of Online Social Networks

• Online Social networks become increasingly popular.

• Example: Facebook*– Facebook has more than 200

million active users.

– More than 100 million users log on to Facebook at least once each day

– More than two-thirds of Facebook users are outside of college

– The fastest growing demographic is those 35 years old and older

*http://www.facebook.com/press/info.php?statistics

Page 7: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Properties of Social Networks

• “Small-world” phenomenon– Milgram asked participants to pass a letter to one of their

close contacts in order to get it to an assigned individual– Most of the letters are lost (~75% of the letters)– The letters who reached their destination have passed

through only about six people.– Origins of six degree– Mean geodesic distance l of graphs grows logarithmically or

even slower with the network size. (dij is the shortest distance between node i and j) .

ji ijdnnl

)1(

2

Page 8: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

“Small-World” Example: Six Degrees of Kevin Bacon

Page 9: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Properties of Social Networks

• Degree DistributionClustering

• Other important properties– Community Structure– Assortativity– Clustering Patterns– Homomiphly– ….

• Many of these properties could be used for analyzing social networks.

Page 10: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Social Network Mining

• Social network data is represented a graph– Individuals are represented as nodes

• Nodes may have attributes to represent personal traits

– Relationships are represented as edges• Edges may have attributes to represent relationship types

• Edges may be directed

• Common Social Network Mining tasks– Node classification – Link Prediction

Page 11: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Data Privacy Basics

• How to share data without violating privacy?• Meaning of privacy?

– Identity disclosure– Sensitive Attribute disclosure

• Current techniques for structured data– K-anonymity– L-diversity– Secure multi-party computation

• Problem: Publishing private data while, at the same time, protecting individual privacy

• Challenges:– How to quantify privacy protection?– How to maximize the usefulness of published data?– How to minimize the risk of disclosure?– …

Page 12: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Sanitization and Anonymization

• Automated de-identification of private data with certain privacy guarantees– Opposed to “formal determination by statisticians” requirement of HIPAA

• Two major research directions

1. Perturbation (e.g. random noise addition)

2. Anonymization (e.g. k-anonymization)• Removing unique identifiers is not sufficient• Quasi-identifier (QI)

– Maximal set of attributes that could help identify individuals– Assumed to be publicly available (e.g., voter registration lists)

• As a process

1. Remove all unique identifiers

2. Identify QI-attributes, model adversary’s background knowledge

3. Enforce some privacy definition (e.g. k-anonymity)

Page 13: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Re-identifying “anonymous” data (Sweeney ’01)

• 37 US states mandate collection of information

• She purchased the voter registration list for Cambridge Massachusetts– 54,805 people

• 69% unique on postal code and birth date

• 87% US-wide with all three

• Solution: k-anonymity– Any combination of values appears at

least k times

• Developed systems that guarantee k-anonymity

– Minimize distortion of results

Page 14: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

k-Anonymity

• Each released record should be indistinguishable from at least (k-1) others on its QI attributes

• Alternatively: cardinality of any query result on released data should be at least k• k-anonymity is (the first) one of many privacy definitions in this line of work

– l-diversity, t-closeness, m-invariance, delta-presence...• Complementary Release Attack

– Different releases can be linked together to compromise k-anonymity.– Solution:

• Consider all of the released tables before release the new one, and try to avoid linking.

• Other data holders may release some data that can be used in this kind of attack. Generally, this kind of attack is hard to be prohibited completely.

Page 15: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

L-diversity principles

• L-diversity principle: A q-block is l-diverse if contains at least l ‘well represented” values for the sensitive attribute S. A table is l-diverse if every q-block is l-diverse

l-diversity may be difficult and unnecessary to achieve.

A single sensitive attributeTwo values: HIV positive (1%) and HIV negative (99%)Very different degrees of sensitivity

l-diversity is unnecessary to achieve2-diversity is unnecessary for an equivalence class that contains

only negative recordsl-diversity is difficult to achieve

Suppose there are 10000 records in totalTo have distinct 2-diversity, there can be at most 10000*1%=100

equivalence classes

Page 16: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Privacy Preserving Distributed Data Mining

• Goal of data mining is summary results– Association rules

– Classifiers

– Clusters

• The results alone need not violate privacy– Contain no individually identifiable values

– Reflect overall results, not individual organizations

The problem is computing the results without access to the data!

Data needed for data mining maybe distributed among parties Credit card fraud data

Inability to share data due to privacy reasonsHIPPAA

Even partial results may need to be kept private

Page 17: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Secure Multi-Party Computation (SMC)

• The goal is computing a function

without revealing xi

• Semi-Honest Model– Parties follow the protocol

• Malicious Model– Parties may or may not follow the protocol

• We cannot do better then the existence of the third trusted party situation

• Generic SMC is too inefficient for PPDDM– Enhancements being explored

),,,( 21 nxxxf

Page 18: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Graph Model

• Graph represented by a set of homogenous vertices and a set of homogenous edges

• Each node also has a set of Details, one of which is considered private.

Lindamood et al. 09 & Heatherly et al. 09

Page 19: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Naïve Bayes Classification

• Classification based only on specified attributes in the node

Lindamood et al. 09 & Heatherly et al. 09

Page 20: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Naïve Bayes with Links

• Rather than calculate the probability from person nx to ny we calculate the probability of a link from nx to a person with ny‘s traits

Lindamood et al. 09 & Heatherly et al. 09

Page 21: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Link Weights

• Links also have associated weights• Represents how ‘close’ a friendship is

suspected to be using the following formula:

Lindamood et al. 09 & Heatherly et al. 09

Page 22: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Collective Inference

• Collection of techniques that use node attributes and the link structure to refine classifications.

• Uses local classifiers to establish a set of priors for each node

• Uses traditional relational classifiers as the iterative step in classification

Lindamood et al. 09 & Heatherly et al. 09

Page 23: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Relational Classifiers

• Class Distribution Relational Neighbor• Weighted-Vote Relational Neighbor• Network-only Bayes Classifier• Network-only Link-based Classification

Lindamood et al. 09 & Heatherly et al. 09

Page 24: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Experimental Data

• 167,000 profiles from the Facebook online social network

• Restricted to public profiles in the Dallas/Fort Worth network

• Over 3 million links

Lindamood et al. 09 & Heatherly et al. 09

Page 25: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

General Data Properties

Diameter of the largest component 16

Number of nodes 167,390

Number of friendship links 3,342,009

Total number of listed traits 4,493,436

Total number of unique traits 110,407

Number of components 18

Probability Liberal .45

Probability Conservative .55

Lindamood et al. 09 & Heatherly et al. 09

Page 26: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Inference Methods

• Details only: Uses Naïve Bayes classifier to predict attribute

• Links Only: Uses only the link structure to predict attribute

• Average: Classifies based on an average of the probabilities computed by Details and Links

Lindamood et al. 09 & Heatherly et al. 09

Page 27: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Predicting Private Details

• Attempt to predict the value of the political affiliation attribute

• Three Inference Methods used as the local classifier

• Relaxation labeling used as the Collective Inference method

Lindamood et al. 09 & Heatherly et al. 09

Page 28: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Removing Details

• Ensures that no ‘false’ information is added to the network, all details in the released graph were entered by the user

• Details that have the highest global probability of indicating political affiliation removed from the network

Lindamood et al. 09 & Heatherly et al. 09

Page 29: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Removing Links

• Ensures that the link structure of the released graph is a subset of the original graph

• Removes links from each node that are the most like the current node

Lindamood et al. 09 & Heatherly et al. 09

Page 30: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Most Liberal Traits

Trait Name Trait Value Weight Liberal

Group legalize same sex marriage

46.16066789

Group every time i find out a cute boy is conservative a little part of me dies

39.68599463

Group equal rights for gays 33.83786875

Group the democratic party 32.12011605

Group not a bush fan 31.95260895

Group people who cannot understand people who voted for bush

30.80812425

Group government religion disaster

29.98977927

Group buck fush 27.05782866

Lindamood et al. 09 & Heatherly et al. 09

Page 31: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Most Conservative Traits

Trait Name Trait Value Weight Conservative

Group george w bush is my homeboy

45.88831329

Group college republicans 40.51122488

Group texas conservatives 32.23171423

Group bears for bush 30.86484689

Group kerry is a fairy 28.50250433

Group aggie republicans 27.64720818

Group keep facebook clean 23.653477

Group i voted for bush 23.43173116

Group protect marriage one man one woman

21.60830487

Lindamood et al. 09 & Heatherly et al. 09

Page 32: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Most Liberal Traits per Trait Name

Trait Name Trait Value Weight Liberal

activities amnesty international 4.659100601

Employer hot topic 2.753844959

favorite tv shows queer as folk 9.762900035

grad school computer science 1.698146579

hometown mumbai 3.566007713

Relationship Status in an open relationship 1.617950632

religious views agnostic 3.15756412

looking for whatever i can get 1.703651985

Lindamood et al. 09 & Heatherly et al. 09

Page 33: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Experiments

• Conducted on 35,000 nodes which recorded political affiliation

• Tests removing 0 details and 0 links, 10 details and 0 links, 0 details and 10 links, and 10 details and 10 links

• Varied Training Set size from 10% of available nodes to 90%

Lindamood et al. 09 & Heatherly et al. 09

Page 34: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Local Classifier Results Lindamood et al. 09 & Heatherly et al. 09

Page 35: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Collective Inference Results Lindamood et al. 09 & Heatherly et al. 09

Page 36: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Online Social Networks Access Control Issues

• Current access control systems for online social networks are either too restrictive or too loose– “selected friends”

• Bebo, Facebook, and Multiply.– “neighbors” (i.e., the set of users having musical preferences and tastes similar to

mine) • Last.fm

– “friends of friends” • (Facebook, Friendster, Orkut);

– “contacts of my contacts” (2nd degree contacts), “3rd” and“4th degree contacts”• Xing

Page 37: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Challenges

I want only my family and close friends to see this picture.

Page 38: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Requirements

• Many different online social networks with different terminology– Facebook vs Linkedin

• We need to have flexible models that can represent– User’s profiles– Relationships among users

• (e.g. Bob is Alice’s close friend) – Resources

• (e.g., online photo albums)– Relationships among users and resources

• (e.g., Bob is the owner of the photo album and Alice is tagged in this photo),

– Actions (e.g., post a message on someone’s wall).

Page 39: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Overview of the Solution

• We use semantic web technologies (e.g., OWL) to represent social network knowledge base.

• We use semantic web rule language (SWRL) to represent various security, admin and filter policies.

Page 40: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Modeling User Profiles and Resources

• Existing ontologies such as FoAF could be extended to capture user profiles.

• Relationship among resources could be captured by using OWL concepts– PhotoAlbum rdfs:subClassOf Resource– PhotoAlbum consistsOf Photos

Page 41: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Modeling Relationships Among Users

• We model relationships among users by defining N-ary relationship– :Christine

a :Person ; :has_friend _:Friendship_Relation_1.

:_Friendship_relation_1 a :Friendship_Relation ; :Friendship_trust :HIGH; :Friendship_value :Mike .

• Owl reasoners cannot be used to infer some relationships such as Christine is a third degree friend of John.– Such computations needs to be done separately and represented by using

new class.

Page 42: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Specifying Policies Using OSN Knowledge Base

• Most of the OSN information could be captured using OWL to represent rich set of concepts

• This makes it possible to specify very flexible access control policies– “Photos could be accessed by friends

only” automatically implies closeFriend can access the photos too.

– Policies could be defined based on user-resource relationships easily.

Page 43: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Security Policies for OSNs

• Access control policies

• Filtering policies– Could be specified by user– Could be specified by authorized user

• Admin policies– Security admin specifies who is authorized specify filtering and

access control policies– Exp: if U1 isParentOf U2 and U2 is a child than U1

can specify filtering policies for U2.

Page 44: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Security Policy Specification (using semantic web technologies)

• Semantic Web Rule Language (SWRL) is used for specifying access control, filtering and authorization policies.

• SWRL is based on OWL:– all rules are expressed in terms of OWL concepts

(classes, properties, individuals, literals…).• Using SWRL, subject, object and actions are specified• Rules can have different authorization that states the

subject’s rights on target object.

Page 45: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Knowledge based for Authorizations and Prohibitions

• Authorizations/Prohibitions needs to be specified using OWL– Different object property for each actions supported by OSN.– Authorizations/prohibitions could automatically propagate

based on action hierarchies• Assume “post” is a subproperty of “write” • If a user is given “post” permission than user will have

“write” permission as well• Admin Prohibitions need to be specified slightly different.

(Supervisor, Target, Object, Privilige)

Page 46: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Security Rule Examples

• SWRL rule specification does depend on the authorization and OSN knowledge bases.– It is not possible to specify generic rules

• Examples:

Page 47: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Security Rule Enforcement

• A reference monitor evaluates the requests.• Admin request for access control could be

evaluated by rule rewriting– Example: Assume Bob submits the following admin

request

– Rewrite as the following rule

Page 48: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Security Rule Enforcement

• Admin requests for Prohibitions could be rewritten as well.– Example: Bob issues the following prohibition request

– Rewritten version

• Access control requests needs to consider both filter and access control policies

Page 49: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Framework Architecture

Social Network Application

Reference Monitor

Semantic Web

Reasoning Engine

Access request Access Decision

Policy Store

Modified Access request

Policy Retrieval

Reasoning Result

SN Knowledge Base

Knowledge Base Queries

Page 50: Data and Applications Security Security and Privacy in  Online Social Networks

FEARLESS engineering

Conclusions

• Various attacks exist to – Identify nodes in anonymized data– Infer private details

• Recent attempts to increase social network access control to limit some of the attacks• Balancing privacy, security and usability on online social networks will be an important

challenge• Directions

– Scalability• We are currently implementing such system to test its scalability.

– Usability• Create techniques to automatically learn rules• Create simple user interfaces so that users can easily specify these rules.