41
Xintao Wu Aug 25,2014 Research Overview 1

Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis Input perturbation Output perturbation

Embed Size (px)

Citation preview

Page 1: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Xintao Wu Aug 25,2014

Research Overview

1

Page 2: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

OutlineIntroductionPrivacy Preserving Social Network Analysis

Input perturbation Output perturbation

Fraud Detection in Social Networks Spectral analysis of graph topology Detecting Random Link Attacks Detecting weak anomalies

Sample ProjectsConclusions and Future work

2

Page 3: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Trustworthy ComputingTrustworthy = reliability, security,

privacy, usabilitySample research challenges

Understand and capture emergent behaviors/interactions among regular users, fraudsters, and victims

Design secure, survivable, persistent systems when under attack

Enable privacy protection in collecting/analyzing/sharing personal data

3

Page 4: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Privacy Breach CasesNydia Velázquez (1994)

Medical record on her suicide attempt was disclosed

AOL Search Log (2006) Anonymized release of 650K users’

search histories lasted for less than 24 hours

NetFlix Contest (2009) $1M contest was cancelled due to privacy

lawsuit23andMe (2013)

Genetic testing was ordered to discontinue by FDA due to genetic privacy

4

Page 5: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

AcxiomPrivacy

In 2003, the EPIC alleged Acxiom provided consumer information to US Army "to determine how information from public and private records might be analyzed to help defend military bases from attack."

In 2013 Acxiom was among nine companies that the FTC investigated to see how they collect and use consumer data.

Security In 2003, more than 1.6 billion customer

records were stolen during the transmission of information to and from Acxiom's clients.5

Page 6: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

6

Most restricted Restricted Some restrictions Minimal restrictions

Effectively no restrictions No legislation or no information

Privacy Regulation -- Forrester

Page 7: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Privacy Protection Laws USA

HIPAA for health careGrann-Leach-Bliley Act of 1999 for financial institutionsCOPPA for children online privacyState regulations, e.g., California State Bill 1386

CanadaPIPEDA 2000 - Personal Information Protection and Electronic

Documents Act European Union

Directive 94/46/EC - Provides guidelines for member state legislation and forbids sharing data with states that do not protect privacy

Contractual obligations Individuals should have notice about how their data is used

and have opt-out choices

7

Page 8: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Privacy Preserving Data Mining

8

ssn name zip race … age Sex income … disease

28223 Asian … 20 M 85k … Cancer

28223 Asian … 30 F 70k … Flu

28262 Black … 20 M 120k … Heart

28261 White … 26 M 23k … Cancer

. . … . . . … .

28223 Asian … 20 M 110k … Flu

69% unique on zip and birth date87% with zip, birth date and gender

Generalization (k-anonymity, l-diversity, t-closeness) Randomization

Page 9: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Social Network Data

9

Data owner

Data miner

release

name

sex age

disease

salary

Ada F 18 cancer

25k

Bob M 25 heart 110k

Cathy F 20 cancer

70k

Dell M 65 flu 65k

Ed M 60 cancer

300k

Fred M 24 flu 20k

George

M 22 cancer

45k

Harry M 40 flu 95k

Irene F 45 heart 70k

id Sex age

disease

salary

5 F Y cancer

25k

3 M Y heart 110k

6 F Y cancer

70k

1 M O flu 65k

7 M O cancer

300k

2 M Y flu 20k

9 M Y cancer

45k

4 M M flu 95k

8 F M heart 70k

Page 10: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Threat of Re-identification

10

id Sex age

disease

salary

5 F Y cancer

25k

3 M Y heart 110k

6 F Y cancer

70k

1 M O flu 65k

7 M O cancer

300k

2 M Y flu 20k

9 M Y cancer

45k

4 M M flu 95k

8 F M heart 70k

Attacker

attack

Privacy breachesIdentity disclosureLink disclosureAttribute disclosure

Page 11: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Privacy Preservation in Social Network Analysis• Input Perturbation

• K-anonymity

• Generalization

• Randomization

• Output Perturbation

• Background on differential privacy

• Differential privacy preserving social network mining

11

Page 12: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Our Work Feature preservation randomization

Spectrum preserving randomization (SDM08)

Markov chain based feature preserving randomization (SDM09)

Reconstruction from randomized graph (SDM10)

Link privacy (from the attacker perspective) Exploiting node similarity feature

(PAKDD09 Best Student Paper Runner-up Award)

Exploiting graph space via Markov chain (SDM09)

12

Page 13: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

PSNet (NSF-0831204)

13

Page 14: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Output Perturbation

14

Data owner

Data miner

name

sex age

disease

salary

Ada F 18 cancer

25k

Bob M 25 heart 110k

Cathy F 20 cancer

70k

Dell M 65 flu 65k

Ed M 60 cancer

300k

Fred M 24 flu 20k

George

M 22 cancer

45k

Harry M 40 flu 95k

Irene F 45 heart 70k

Query f

Query result + noise

Cannot be used to derive whether any individual is included in the database

Page 15: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Differential Guarantee [Dwork, TCC06]

15

name

disease

Ada cancer

Bob heart

Cathy

cancer

Dell flu

Ed cancer

Fred flu

f count(#cancer) f(x) + noise

name

disease

Ada cancer

Bob heart

Cathy cancer

Dell flu

Ed cancer

Fred flu

K

K

f count(#cancer) f(x’) + noise

3 + noise

2 + noise

achieving Opt-Out

Page 16: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Our WorkDP-preserving cluster coefficient (ASONAM12)

Divide and conquer Smooth sensitivity

DP-preserving spectral graph analysis (PAKDD13) LNPP: based on the Laplace Noise Perturbation SBMF: based on the Exponential Mechanism and

MBF density Linear-refinement of DP-preserving query

answering (PAKDD13 Best Application Paper)DP-preserving graph generation based on

degree correlation (TDP13)

16

Page 17: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

SMASH (NIH R01GM103309)

17

Page 18: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

OutlineIntroductionPrivacy Preserving Social Network Analysis

Input perturbation Output perturbation

Fraud Detection Spectral analysis of graph topology Detecting Random Link Attacks Detecting weak anomalies

Sample ProjectsConclusions and Future work

18

Page 19: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Cyber Fraud Cyber crime

cost US economy $400 Billion annually OSN Fraud and Attack

Sybil attack, spam, viral marketing, fraudulent auction, brand jacking, denial of service, etc.

Fake followers on Twitter (used in viral marketing) worth $360 million annually on the black market.

19

Page 20: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Fraud CharacterizationIndividual vs. collusiveRobot vs. money-motivated regular

userRandom vs. selective targetStatic vs. dynamic

Traditional topology-based detection methodsincur high computational cost difficult to detect collaborative attacks

or subtle anomalies

Topology-based Detection

20

Page 21: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

An abstraction of collaborative attacks including spam, viral marketing, etc.

The attacker creates some fake nodes and uses them to attack a large set of randomly selected regular nodes;

Fake nodes also mimic the real graph structure among themselves to evade detection.

Random Link Attack [Shirvastava ICDE08]

21

Page 22: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Spectral Graph Analysis based Fraud Detection

Examine the spectral space of graph topology.

A network with n nodes and m edges that is undirected, un-weighted, and without considering link/node attribute information

Adjacency Matrix A (symmetric)

Adjacency Eigenspace

22

Page 23: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Eigenspace

23

Principal Minor

Page 24: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Projecting Node in Spectral Space [SDM09]

24

Spectral coordinate: ),,( 21 kuuuu xxx

kn

k

k

nn

k

x

x

x

x

x

x

x

x

xxxx

2

1

2

22

21

1

12

11

21 k-orthogonal line pattern

0. vu

1

vu

vu

when nodes u, v from

the same community

when nodes u, v from different communities

2

Page 25: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Example

25

Spectral coordinate: ),,( 21 kuuuu xxx

Polbook Network

Page 26: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

A snapshot of websites in domain .UK (2007) (114K nodes and 1.8M links), add a mix of 8 RLAs with varied sizes and connection patterns.

SPCTRA: based on spectral spaceGREEDY: based on outer-triangles [Shrivastava, ICDE08]

Evaluation on Web spam challenge data [ICDE11]

26

Much faster 36s vs. 26h

Page 27: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

OutlineIntroductionPrivacy Preserving Social Network Analysis

Input perturbationOutput perturbation

Fraud DetectionSpectral analysis of graph topologyDetecting random link attacks Detecting weak anomalies

Sample ProjectsConclusions and Future work

27

Page 28: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

28

Privacy Preserving Data Mining (NSF CAREER)

28 28

Page 29: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Genetic Privacy (NSF SCH pending)

29BIBM13 Best Paper Award

Page 30: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

oSafari (NSF SaTC)

30

Page 31: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Manipulation in E-Commerce (NSF III pending)

31

Structured Topic Analysis

Spectral Bipartite Graph Analysis

D-S based Evidence Fusion

• Bot-committed• Money-motivated

ReviewsRatingsRanks

Page 32: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Privacy Preserving Database Application testing (NSF 0310974)

ER

Data

DDL

CatalogProduction db

R NR S

Conflict resolution

Disclosure AssessmentRule Analyzer

R’ NR’ S’

Schema & Domain Filter

Schema’ Domain’

Data Generator Mock DB

User

33

Page 33: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Data Generation for Testing DB Applications (NSF 0915059)

How to generate data to cover paths?

34

Page 34: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

OutlineIntroductionPrivacy Preserving Social Network Analysis

Input perturbation Output perturbation

Fraud Detection Spectral analysis of graph topology Detecting Random Link Attacks Detecting weak anomalies

Sample ProjectsConclusions and Future work

35

Page 35: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Big Data Computing Drowning in data

Volume, Velocity, Variety, and Veracity 2.5 Exabyte every day Web data, healthcare, e-commerce, social

networkAdvancing technology

Cheap storage/processing power Growth in huge data centers Data is in the “cloud”- Amazon AWS,

Hadoop, Azure Computing is in the “cloud”

36

Page 36: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Social Media Customer Analytics

37

Network topology (friendship,followship,intera

ction)

name

sex age

disease

salary

Ada F 18 cancer

25k

Bob M 25 heart 110k

id Sex age address

Income

5 F Y NC 25k

3 M Y SC 110k

Structured profile

Retweet sequence

Product and review

Entity resolutionPatterns

Temporal/spatialScalability

VisualizationSentiment

Privacy

Unstructured text (e.g., blog, tweet) Transaction

database

Velocity, Variety

10GB tweets per dayBelk and Lowe’sChancellor’s special fund

Page 37: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

38

Page 38: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

39

Page 39: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Samsung AVC Denial Log Analysis

40

Volume and Velocity:1 million log files per day and each has thousands entriesS3, Hive and EMR

Page 40: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Drivers of Data Computing

41

6A’sAnytimeAnywhereAccess toAnything byAnyoneAuthorized

4V’sVolumeVelocityVarietyVeracity

ReliabilitySecurityPrivacyUsability

Page 41: Xintao Wu Aug 25,2014 Research Overview 1. Outline Introduction Privacy Preserving Social Network Analysis  Input perturbation  Output perturbation

Thank You! Questions?

42

Collaborators: Aidong Lu, Xinghua Shi, Jun Li (Oregon), Dejing Dou (Oregon), Tao Xie (UIUC)

Doctoral graduates: Songtao Guo, Ling Guo, Kai Pan, Leting Wu, Xiaowei Ying

Doctoral Students: Yue Wang, Yuemeng Li, Zhilin Luo (visiting)