Involuntary Information Leakage in Social Network Services

Preview:

DESCRIPTION

Disclosing personal information in online social network services is a double-edged sword. Information exposure is usually a plus,even a must, if people want to participate in social communities; however, leakage of personal information, especially one's identity, may invite malicious attacks from the real world and cyberspace, such as stalking, reputation slander, personalized spamming and phishing. Even if people do not reveal their personal information online, others may do so. In this paper, we consider the problem of involuntary information leakage in social network services and demonstrate its seriousness with a case study of Wretch, the biggest social network site in Taiwan. Wretch allows users to annotate their friends' profiles with a one-line description, from which a friend's private information, such as real name, age, and school attendance records, may be inferred without the information owner's knowledge. Our analysis results show that users' efforts to protect their privacy cannot prevent their personal information from being revealed online. In 592; 548 effective profiles that we collected, the first name of 72% of the accounts and the full name of 30% of the accounts could be easily inferred by using a number of heuristics. The age of 15% of the account holders and at least one school attended by 42% of the holders could also be inferred. We discuss several potential means of mitigating the identified involuntary information leakage problem.

Citation preview

2009/2/2

Ieng-Fat Lam, Kuan-Ta Chen, and Ling-Jyh ChenInstitute of Information Science, Academia Sinica

Presenter: Ieng-Fat Lam

Involuntary Information Leakage inSocial Network Services

Involuntary Information Leakage inSocial Network Services

2

OutlineOutline

Introduction

Motivation

Research Method

Results

Discussion

Conclusion

3

Social Networking Services (SNSs)Social Networking Services (SNSs)

For example• Myspace, Facebook, Orkut, Yahoo! 360

• Mixi, GREE (Japan)

• Wretch (Taiwan)

Become very popular

Hosts millions of profiles

Introduction ::

4

Users in SNSsUsers in SNSs

Social Activities• Meet new friends, contact existing friends

• Share resources over the Internet

Personal Information is usually published• Photos

• identity information

• Contact information

Introduction ::

5

Disclosing personal informationDisclosing personal information

Double‐edged sword • Let other people know / search you

• But some people may not respond nicely

• Risk of personal information used by malicious people

I am Lee-Da Nu!I love movieI am 23 years old, single!!

I am Lee-Da Nu!I love movieI am 23 years old, single!!

Introduction ::

6

Not revealing person information?Not revealing person information?

I never disclose my info to the Internet!

I never disclose my info to the Internet!

Introduction ::

7

Information revealed by friendsInformation revealed by friendsIntroduction ::

8

Information revealedInformation revealedIntroduction ::

[I got it!]Real Name : Andrew RichmanGender:MaleAge: 20 ~ 22Education record:Sunrise elementary schoolSt. John secondary schoolSt. Paul University

[I got it!]Real Name : Andrew RichmanGender:MaleAge: 20 ~ 22Education record:Sunrise elementary schoolSt. John secondary schoolSt. Paul University

9

Involuntary Information leakageInvoluntary Information leakage

A User may want to protect his/her identity• But it may unintentionally revealed by friends

• Hard to detect such leakageDue to distributed nature of Internet

• Becoming a serious threat to privacy

Motivation ::

10

In this studyIn this study

We would like to • Investigate the extent of involuntary information leakage

• Gather data from Wretch (http://www.wretch.cc)The most popular SNS in Taiwan

About 4 millions user profiles

• Quantify the degree of such leakageReal Name, age and education record

• Discuss potential means to mitigate the problem

Motivation ::

11

Data CollectionData Collection

User ID List (Crawl)john123Aronroseroseiamboy…..

User ID List (Crawl)john123Aronroseroseiamboy…..

1. Pick ID randomly

2. Obtain user profileand friend list (HTML)

AndyOrange…

Frn List

4. Add user IDTo ID list

3. Parse and save crawled user data

5. UpdateID List

Research Method ::

AndyOrange…

Frn List

12

An exampleAn example

User ProfileUser Profile

Friend listFriend list

Research Method ::

13

Overview of Crawled DataOverview of Crawled Data

Wretch Data

Number of users 766,972 (20%)

Number of Effective users 592,548 (15%)

Number of Connections 7,619,212

Avg. Connections per user 11.5

*Effective user at least have one “outgoing” friend connection

Research Method ::

14

Analysis of Name LeakageAnalysis of Name Leakage

Friend annotations in Wretch• A free‐form text to describe a friend

• It is used forClassification

Real name or nickname of a friend

The feature of a friend

For example• *Beauty Cathy Brown – The hottest girl of Nightingale High School

• [[ School Mate ]] Tony MY BUDDY

Research Method ::

15

Name Inference ProcessName Inference ProcessResearch Method ::

1. Obtain friend annotations

(for each profile)

2. GenerateName Candidates

Infer First Name

16

Generate name candidatesGenerate name candidates

To  infer real name of a profile• Collect all of its incoming annotations

• Extract name candidates from annotations

Research Method ::

Andy

Aron

Andrew!!Andrew!!

Yo~ Bros. Andrew!!Yo~ Bros. Andrew!!

Sammy

Old Mr. Richman!!Old Mr. Richman!!

Cool~~ Andrew Richman!!Cool~~ Andrew Richman!!

17

Generate name candidates (cont.)Generate name candidates (cont.)

Extract method• Break the text into tokens by

Symbols: <space>, <tab>, ‘#’, ‘@’, etc.Punctuation marks: ‘ ” , . () []Connective words (in Chinese)

• Chinese‐specific naming rules陳寬達 (Chen Kuan‐Ta)Two‐word tokens as first name candidatesThree‐word tokens as full name candidates

• Duplication Count is associated

Research Method ::

18

An exampleAn example

Andy

Andrew!!德榮!!Andrew!!德榮!!

Yo~ Andrew~Bros Andrew!!喔~德榮~德榮兄!!Yo~ Andrew~Bros Andrew!!喔~德榮~德榮兄!!

Old Mr. Richman~!!老劉~!!Old Mr. Richman~!!老劉~!!

Cool~~ Andrew Richman!!超帥~~ 劉德榮!!Cool~~ Andrew Richman!!超帥~~ 劉德榮!!

Name Candidates

德榮 (Andrew) [1]超帥 (Cool) [0]劉德榮 (Andrew Richman) [0]德榮兄 (Bros Andrew) [0]喔 (Yo) [0]老劉 (Old Mr. Richman) [0]

Name Candidates

德榮 (Andrew) [1]超帥 (Cool) [0]劉德榮 (Andrew Richman) [0]德榮兄 (Bros Andrew) [0]喔 (Yo) [0]老劉 (Old Mr. Richman) [0]

Research Method ::

Full name candidates

First name candidates

Duplication count

Full name candidates

First name candidates

Duplication count

19

Inference of full name (1 / 5)Inference of full name (1 / 5)

Common family name• Family name part is a common family name

• Duplication count is greater than 1

• For exampleFor full name candidate “Andrew Richman”

If “Andrew Richman” exists in more than 1 annotations

If “Richman” is a common family name

Research Method ::

[1] Chih-Hao Tsai, “Common Chinese Names”, http://technology.chtsai.org/namefreq/

20

Inference of full name (2 / 5)Inference of full name (2 / 5)

First name as a substring of full name• A first name candidate as a substring

In the right position

• Duplication count is greater than 1

• For exampleFor full name candidate “Andrew Richman”

If “Andrew Richman” exists in more than 1 annotations

If “Andrew” is also a first name candidate

Research Method ::

21

Inference of full name (3 / 5)Inference of full name (3 / 5)

Common full name• Compare with existing  full name list

• National college exam enrollment listList maintained from 1994 to 2007

574, 010 distinguished full names

Research Method ::

[2] Chih-Hao Tsai, “A list of Chinese Names”, http://technology.chtsai.org/namelist/

22

Inference of full name (4 / 5)Inference of full name (4 / 5)

Nickname decomposition• In Chinese name

FN GN1‐GN2 (陳寬達)

• Possible nicknames:Prefix + X

Prefix + X + X

X + postfix

Where X can be FN, GN1 or GN2

Research Method ::

For “Andrew Richman”

We also have “Bros Andrew”

“Bros” is a predefined prefix

Removed “Bros” we got “Andrew”

“Andrew” is in “Andrew Richman”

For “Andrew Richman”

We also have “Bros Andrew”

“Bros” is a predefined prefix

Removed “Bros” we got “Andrew”

“Andrew” is in “Andrew Richman”

23

Inference of full name (5 / 5)Inference of full name (5 / 5)

Common words removal• If no match candidates found in above rules

• If duplicate count greater than 1

• If the full name candidate is not a nicknameDoes not contain any nickname prefix or postfix

• Not a ( or based on a ) common wordCompare to 100,511 common words

• Select the one with the highest duplication count

Research Method ::

24

Inference of First NameInference of First Name

Use same method as inference of full name• Common first name

Compare with 208,581 first names

Required duplication count greater than 1

• Nickname decomposition

• Common word removal

Research Method ::

25

Name Inference ResultsName Inference Results

Ratio of inferred names

Type  of name Ratio of name inference

Nickname 60%

Real name (full name) 30%

First name 72%

Real name or first name 78%

Results ::

26

ValidationValidation

Examine real name by manual• Randomly Select 1,000 profiles

• 738 of them are unique and correctMore examine is performed, similar result

• Wrong case: User’s nickname

• Sufficient to support the conjectureInvoluntary real name leakage occurs in real‐life social network systems, and the degree of leakage is significant

Results ::

27

Ratio of Name LeakageRatio of Name Leakage

Figure 2: Ratio of name leakage based on users’ gender

Figure 3: Relation of users’ age and ratio of name leakage

Results ::

28

Risk AnalysisRisk Analysis

To confirm the identity leakage is involuntary• We check the inferred name with user’s profile

Only less than 0.1% users reveal their real names

To quantify the tendency of using real name• Degree of Using Real name (DUR)

Ratio of a user’s outgoing annotation that contain real name of annotation target

• Degree of being Called by Real name (DCR)Ratio of incoming annotations containing user’s real name

Results ::

29

Example of DUR and DCRExample of DUR and DCR

DUR and DCR

“Andrew”

[Friend] Raymond Aron

Our King!

[Friend]John Lennon

Yo~What’sup man

[Friend] Jay leno

[Friend]David Jones

Cool~Andrew Richman

[Friend]Sammy Hagar

Bros Andrew

Criteria DUR

First name 4/5

Full name 1/5

Either 5/5

Criteria DCR

First name 1/5

Full name 1/5

Either 2/5

Results ::

30

Positive relation between DUR and DCRPositive relation between DUR and DCR

Figure 4: Relation of DUR and DCR

Results ::

31

Involuntary leakage of age and education records

Involuntary leakage of age and education records

Inferring age• Round‐based manner

• If X disclosed age, and have a friend Y

• If X and Y have relation of “classmate”, “same class”…

• Assign age of X to Y

• Then check Y’s  “classmate”

Research Method ::

32

Involuntary leakage of age and education records

Involuntary leakage of age and education records

Inferring Education records• Same as inferring age

• Divided into four education level, infer separatelyElementary School

Junior high school

Senior high school

College

• Define relation by keyword “same school”, “same college”, etc.

Research Method ::

33

Inference resultsInference results

Figure 5: Inference results of users' ages

Results ::

Figure 6: Inference results of users' education records

34

ValidationValidation

Cross‐validation• Verify inferred ages 

Based on self‐disclosed education records

• Verify inferred education recordsBased on self‐disclosed ages

• Difference of age should be smallTo verify our infer result are accurate

Results ::

35

Validation ResultsValidation ResultsResults ::

Figure 7: The inferred age differences between pairs of self-disclosed

schoolmates in the four education levels

Figure 8: The self-disclosed age differences between pairs of inferred

schoolmates in the four education levels

36

Threads caused by identity leakageThreads caused by identity leakage

StalkingSpamming• In our data set 

46% users disclosed valid email addressSpam with friends’ (spoofed) email address

Phishing• Spear phishing / Social phishing

Includes personal information in phishing emailSpoof friend’s email address

Discussion ::

37

Spear Phishing or SpamSpear Phishing or Spam

Dear Mr. Richman, We are eBay customer service, we concern about your security, please update your personal information.

Dear Mr. Richman, We are eBay customer service, we concern about your security, please update your personal information.

Dear Mr. Andrew RichmanYou win 100,000,000 USD!!Which from lottery of St. Paul University fund.

Dear Mr. Andrew RichmanYou win 100,000,000 USD!!Which from lottery of St. Paul University fund.

Discussion ::

38

Social Phishing or SpamSocial Phishing or Spam

Hay, Andrew, I am Sammy, I recommend you a cool site!!http://spam.com

Hay, Andrew, I am Sammy, I recommend you a cool site!!http://spam.com

Bros, I am David!St. Paul University student association have a party on next month, you need to transfer the registration fee ASAP, see you there.

Bros, I am David!St. Paul University student association have a party on next month, you need to transfer the registration fee ASAP, see you there.

david886@yahoo.com

sammy_cow@gmail.com

Discussion ::

39

Potential SolutionsPotential Solutions

Three possible ways to mitigate the problemA. Personal privacy settings

B. Browsing scope settings

C. Owner’s confirmation

D. Applying Disclosure Control of Natural Language information (DNCL)‧ Proposed by Haruno Kataoka et al.

Discussion ::

40

Personal Privacy SettingsPersonal Privacy Settings

1. Hide personal information

2. Hide social connections (in level)

3. Deny annotations using certain words

4. Limit specific users to access friend relations or annotations

Don’t call my real name, call me 007!

Don’t call my real name, call me 007!

ProfileProfile

Discussion ::

41

Browsing Scope SettingsBrowsing Scope Settings

Prevent large scale download of user profiles• Includes Third‐party API

Limit browsing scope• Group partitioning / “invitation letter” mechanism

Malicious man

Discussion ::

42

Owner’s ConfirmationOwner’s Confirmation

Every operation related to friend relation

At least prevent unintentional personal information leakage

I want to use “Cool Andrew Richman”, may I ?

Sure!!!

Malicious man

Hay Mr. Richman, you are the lucky winner!Hay Mr. Richman, you are the lucky winner!

My name is public, everyone knows me!!

My name is public, everyone knows me!!

Discussion ::

43

Applying DNCL (Haruno Kataoka et al.)Applying DNCL (Haruno Kataoka et al.)

Ideal way to preserve • Search ability

• Availability

• Connected

• While no sensitive information is disclosed

• Rather than “Insecure” or “Un‐enjoyable”

Implementation is expected• Different language support is the best

Discussion ::

44

ConclusionConclusion

We quantify the extent of name leakage • Using Wretch data set

• 78% of users suffer from risk of involuntary name leakage

• Users’ age and education records are also in riskReason by friends’ disclosed information

Beware of Internet scams and phishing

Conclusion ::

45

Questions?Questions?Thank you! ::

46

Ratio of self-disclosureRatio of self-disclosureResearch Method ::

Figure 1: Ratio of Self-disclosure

Recommended