41
(Ab)using Identifiers Ben Gross University of Illinois Urbana Champaign Library and Information Science bgross@acm.org http://bengross.com/ @ BayCHI 2009-11-10

(Ab)using Identifiers: Indiscernibility of Identity

  • Upload
    baychi

  • View
    539

  • Download
    3

Embed Size (px)

DESCRIPTION

Ben Gross at BayCHI November 10, 2009

Citation preview

Page 1: (Ab)using Identifiers: Indiscernibility of Identity

@

(Ab)using Identifiers

Ben GrossUniversity of Illinois Urbana Champaign

Library and Information [email protected]

http://bengross.com/

@ BayCHI2009-11-10

Page 2: (Ab)using Identifiers: Indiscernibility of Identity

@

Page 4: (Ab)using Identifiers: Indiscernibility of Identity

@

How many

Do you have?

@ Social network profiles

Web site logins

Instant messenger IDs

Email addresses

Phone numbersDomain names

Page 5: (Ab)using Identifiers: Indiscernibility of Identity

@

All your @’s

are belong to us

Page 6: (Ab)using Identifiers: Indiscernibility of Identity

@

Why you might care

•Usability implications

•Productivity implications

•Security implications

•Employee satisfaction

Page 7: (Ab)using Identifiers: Indiscernibility of Identity

@

How did I get here?

•“I only have one email address...”

•“Well, except that one I only use for...”

•“And that other one I use with...”

Page 8: (Ab)using Identifiers: Indiscernibility of Identity

@

Half a million users

“... average user has 6.5 passwords, each of which is shared across 3.9 different sites. Each user has about 25 accounts that require passwords, and types an average of 8 passwords per day.”

Dinei Florêncio and Cormac Herley. A Large-Scale Study of Web Password Habits. WWW ’07

Page 9: (Ab)using Identifiers: Indiscernibility of Identity

@

Population

•Qualitative in-depth interview study

•44 people across two Bay Area firms

•Financial services firm (regulated)

•Design firm (unregulated)

Page 10: (Ab)using Identifiers: Indiscernibility of Identity

@

Data•Financial services

•Design Firm

•Combined total

•Average # of email addresses = 1.8 min 1 / max 4. IM = 1.8 min 1 / max 4

•Average # of email addresses = 3.6 min 1 / max 10 IM = 1.7 min 1 / max 3

•Average = 3.3

Page 11: (Ab)using Identifiers: Indiscernibility of Identity

@

“The individual in ordinary work situations presents himself and his activity to others, the ways in which he guides and controls the impression they form of him and the kinds of things he may and may not do while sustaining his performance before them.”

Erving GoffmanPresentation of Self in Everyday Life, 1959.

Page 12: (Ab)using Identifiers: Indiscernibility of Identity

@

Why more than one?

Page 13: (Ab)using Identifiers: Indiscernibility of Identity

@

Social factors•“I knew that my college one wasn't

forever, so I wanted something more permanent after I graduated.”

•“...I didn't like the name that I picked when it was my first email.”

•“...you just say oh my first name and last name at gmail.com ... something easy to remember.”

Page 14: (Ab)using Identifiers: Indiscernibility of Identity

@

Technical factors

•Namespace saturation AKA the [email protected] problem

•Firewalls and VPNs AKA “They don’t let me use Hotmail at work...”

•Configuration problems AKA “What does SMTP-AUTH with MD5 checksums on port 567 mean?”

Page 15: (Ab)using Identifiers: Indiscernibility of Identity

@

Regulatory factors

Page 16: (Ab)using Identifiers: Indiscernibility of Identity

@

It’s Just Data...“We’re an information economy. They

teach you that in school. What they don't tell you is that it's impossible to move, to

live, to operate at any level without leaving traces, bits, seemingly meaningless

fragments that can be retrieved amplified...”

William Gibson Johnny Mnemonic

Page 17: (Ab)using Identifiers: Indiscernibility of Identity

@

What’s Underneath?

•Developer Tools

•FireBug/FireCookie

•Safari Web Inspector

•Charles Proxy/HTTP Analyzer

•Forensic Tools

Page 18: (Ab)using Identifiers: Indiscernibility of Identity

@

Cookies

Page 19: (Ab)using Identifiers: Indiscernibility of Identity

@

More detail

Page 20: (Ab)using Identifiers: Indiscernibility of Identity

@

Bake Your Own

Page 22: (Ab)using Identifiers: Indiscernibility of Identity

@

Referer (sic)

•adsl-75-18-132-43.dsl.pltn13.sbcglobal.net - - [10/Nov/2009:14:50:56 -0800] "GET /wireless.html HTTP/1.1" 200 29149 "http://bengross.com/voip.html" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2; en-us) AppleWebKit/531.9 (KHTML, like Gecko) Version/4.0.3 Safari/531.9"

Page 23: (Ab)using Identifiers: Indiscernibility of Identity

@

Leaky Headers

On the Leakage of Personally Identifiable Information Via Online Social Networks

Balachander Krishnamurthy and Craig Wills

Page 24: (Ab)using Identifiers: Indiscernibility of Identity

@

More Options

•URL Munging and Session IDs in URL

•Flash Cookies/Local Shared Object

•Silverlight Cookies

•Virtual Page Views, Event (Google Analytics) User Defined Values

Page 25: (Ab)using Identifiers: Indiscernibility of Identity

@

Synthetic IDs

•Everything in the Referer header can be used to for a synthetic identifier.

•The User Agent is a good source

•IP addresses if you have them

•Screen dimensions, user agent

•Hash of IP address/remote ports

Page 26: (Ab)using Identifiers: Indiscernibility of Identity

@

Other Sources of Bits

•Last Modified and ETag headers

•HTTP Keepalive

•SSL Session IDs

•TCP Timestamps

Page 27: (Ab)using Identifiers: Indiscernibility of Identity

@

The Art of Being Lost

•“We do not collect personal contact information from visitors to your website. Personal contact information means billing address, physical address, individual name, email address, etc.” (OpenTracker.com)

Page 28: (Ab)using Identifiers: Indiscernibility of Identity

@

Netflix Data Released•Dataset contains 100,480,507 movie

ratings, created by 480,189 Netflix subscribers between December 1999 and December 2005.

•“...all customer identifying information has been removed; all that remains are ratings and dates. This follows our privacy policy...”

•No unique identifiers or quasi-identifiers

Page 29: (Ab)using Identifiers: Indiscernibility of Identity

@

You Only Need Two•Robust De-anonymization of Large Sparse

Datasets by Arvind Narayanan and Vitaly Shmatikov

•IMBD as a source of entropy

•“With 8 movie ratings (of which 2 may be completely wrong) and dates that may have a 14-day error, 99% of records can be uniquely identified in the dataset.”

Page 30: (Ab)using Identifiers: Indiscernibility of Identity

@

It comes down to this“Q: If you don't publicly rate movies on IMDb and similar

forums, there is nothing to worry about.

A: ...you should not ever mention any movies you watched prior to 2005 on a public blog or website.

Everybody who was a Netflix subscriber prior to 2005 should restrain themselves from these activities...

We do not think this is a feasible privacy policy.”

FAQ“How to Break Anonymity of the Netflix Prize Dataset”

Page 31: (Ab)using Identifiers: Indiscernibility of Identity

@

Guessing Your SSN

•Predicting Social Security Numbers from Public Data by Alessandro Acquisti and Ralph Gross

•...I’ll just need the last 4 of your SSN for verification purposes...

•“...we accurately predicted the first 5 digits of 2% of California records with 1980 birthdays, and 90% of Vermont records with 1995 birthdays.”

Page 32: (Ab)using Identifiers: Indiscernibility of Identity

@

Disclosure and UI•“Facebook Beacon is a way for you to

bring actions you take online into Facebook. Beacon works by allowing affiliate websites to send stories about actions you take to Facebook.”

•Launched November 2007

•Class action lawsuit August 2008

•Shut down September 2009

Page 33: (Ab)using Identifiers: Indiscernibility of Identity

@

Opt Out: First Try

Page 34: (Ab)using Identifiers: Indiscernibility of Identity

@

Opt Out: Second Try

Page 35: (Ab)using Identifiers: Indiscernibility of Identity

@

Evasion

•Ghostery

•Opt Out Tools

•Ad Blockers/Flash Blockers

•HTTP Cookie/LSO Managers

•Header Modification Tools

•Proxies/Tor

Page 36: (Ab)using Identifiers: Indiscernibility of Identity

@

Page 37: (Ab)using Identifiers: Indiscernibility of Identity

@

Page 38: (Ab)using Identifiers: Indiscernibility of Identity

@

Page 39: (Ab)using Identifiers: Indiscernibility of Identity

@

Page 40: (Ab)using Identifiers: Indiscernibility of Identity

@

What’s Next?

•Geolocation

•Roll up for more large collections

•More of addition bits need for de-anonymization available via social networks

Page 41: (Ab)using Identifiers: Indiscernibility of Identity

@

Ben GrossUniversity of Illinois Urbana Champaign

Library and Information [email protected]

http://bengross.com/

@