Download pdf - Presentación de PowerPointgpsc.uvigo.es/sites/default/files/slides/Location_Privacy_Keynote.pdfcarrier antennas, only 4 points suffice to identify 95% of individuals. •Uniqueness

Location Privacy. Where do we stand and where are we going?

Fernando Pérez-González

Signal Theory and CommunicationsDepartment

Universidad de Vigo -SPAIN

2

Why do we like location based apps?

Google maps

3

Foursquare

4

Facebook place tips

5

Waze

6

And, of course…

7

8

How can you be geolocated?(without you fully knowing)

IP-based Geolocation

9

Source: GeoIPTool

Meta-data based Geolocation

10

Landmark recognition Geolocation

11

Biometric geolocation

12

Credit card usage Geolocation

14

Triangulation and other geolocation techniques

15

Signal strength-based triangulation

16

Source: The Wrongful Convictions Blog

17

Source: The Wrongful Convictions Blog

Signal strength-based triangulation

Multilateration: Time Difference of Arrival (TDOA)

18

Source:[Fujii et al. 2015]

Wardriving geolocation (Wigle)

19

Source:Wigle.net

Electrical Network Frequency Geolocation

20

21

22

Why is it dangerous?

23

Buster busted!

24

25

26

6 months in the life of Malte Spitz (2009-2010)

29

Source:http://www.zeit.de/datenschutz/malte-spitz-data-retention

31

Are we concerned about it?

Are people really concerned about locationprivacy?

• Survey by Skyhook Wireless (July 2015) of 1,000 Smartphone app users.

• 40% hesitate or don’t share location with apps.

• 20% turned off location for all their apps.

• Why people don’t share location?• 50% privacy concerns.

• 23% don’t see value in location data.

• 19% say it drains their battery.

• Why people turn off location?• 63% battery draining.

• 45% privacy.

• 20% avoid advertising.

32

33

How much is geolocation data worth?

34

How much value do we give to location data? [Staiano et al. 2014]

35

Dai

lyV

alu

e(€

)

Many participants opted-out of revealing geolocation information.

Avg. daily value of location info: 3 €

Strong correlation between the amount traveled and the value given to location data.

Earn money as you share data

36

• GeoTask

• £1 PayPal cash voucher per 100 days of location data sharing (£0.01/day)

Financial Times in 2013: advertisers are willing to pay a mere $0.0005 per person for general information such as their age, gender and location, or $0.50 per 1,000 people.

http://www.ft.com/cms/s/0/3cb056c6-d343-11e2-b3ff-00144feab7de.html#axzz3sKYPVPiL

Pay as you drive

38

• Formula can be a function of the amount of miles driven, or the type of driving, age of the driver, type of roads used…

• Up to 40% reduction in the cost of insurance.

39

BIA/Kelsey projects U.S. location-targeted mobile ad spending to grow from $9.8 billion in 2015 to $29.5 billion in 2020.

That’s $90 per person year!!!!

40

SAP, Germany, estimates wireless carrier revenue from selling mobile-user behavior data in $5.5 billion in 2015 and predicts $9.6 billion for 2016.

47

How aboutanonymization/pseudonymization?

Anonymity

Problems:

• Difficult authentication and personalization.

• Operating system or apps may access location before anonymization.

48

Anonymity provider(local/central)

LocationLocation

Service provider

Pseudonimity

Problems:

• Operating system or apps may access location data before pseudonymization.

• Deanonymization.

49

Location

Service providerPseudonym

Deanonymization based on home location [Hoh, Gruteser 2006]

• Data from GPS traces of larger Detroit area (1 min resolution).

• No data when vehicle parked.

• K-means algorithm for clustering locations + 2 heuristics:• Eliminate centroids that don’t have evening visits.

• Eliminate centroids outside residential areas (manually).

50

Source: [Hoh, Gruteser 2006]

Deanonymization based on home location[Krummer 2007]

• 2- week GPS data from 172 subjects (avg. 6 sec resolution).

• Use heuristic to single out trips by car.

• Then use several heuristics: destination closest to 3 a.m. ishome; place where individual spends most time is home; center of cluster with most points is home.

• Use reverse geocoding and white pages to deanonymize. Success measured by finding out name of individual.

• Positive identification rates around 5%.

• Even noise addition with std=500 m gives around 5% successwhen measured by finding out correct address.

51

Mobile trace uniqueness [de Montjoye et al 2013]

• Study on 15 months of mobility data; 0.5M individuals.

• Dataset with hourly updates and resolution given by cellcarrier antennas, only 4 points suffice to identify 95% of individuals.

• Uniqueness of mobility traces decays as 1/10th power of their resolution.

52

Source: [de Montojoye et al. 2013]

53

Location privacy protection mechanisms

Location white lies

54

Source: Caro Spark (CC BY-NC-ND)

Location based privacy mechanisms

55

Inputlocation

Outputpseudolocation

X Z

Source: Motherboards.org

Location privacy protection mechanisms (LPPMs)

•

• The mechanism may be deterministic (e.g., quantization) orstochastic (e.g., noise addition).

• Function may depend on other contextual (e.g., time) or user-tunable (e.g., privacy level) parameters.

• When the mechanism is stochastic, there is an underlyingprobability density function, i.e.,

56

)(XZ

)(

)|( XZf

Hiding

57

Perturbation: (indepedent) noise addition

58

Perturbation: quantization

59

Obfuscation

60

Spatial Cloaking

61

How to commit the perfect murder

62

Space-timeCloaking

63

Time

Dummies

64

User-centric vs. Centralized LPPM

65

User-centric

User-centric vs. Centralized LPPM

66

Centralized

67

Utility vs. Privacy

68

Privacy

Uti

lity

• In broad terms:

Very nice, but…

• There are two main problems:

How do we measure utility?

How do we measure privacy?

69

How to measure utility?

70

71



72

Real position

pseudolocation

A note about distances

76

2d

1d

Adversarial definition of privacy [Shokri et al 2011-]

• Assume stochastic mechanism for the user .

• Adversary constructs a (possibly stochastic) estimationremapping .

• Prior assumed available to the adversary.

• : Distance between and

• : Distance between and

77

)|( XZf

)|ˆ( ZXr

)(X

x̂ .x)ˆ,( xxd p

),( zxdq x .z

x

zLPPM

x̂ Adversary


• Establish a cap on average utility loss:

• This is a Stackelberg game in which the user chooses firstand the adversary plays second.

• Find optimal adversarial ‘remapping’:

• Optimal remapping depends on and .

where

78

}|),ˆ({minarg)|ˆ(* ZXXdEZXr p

)(X

QLZXdE q )},({

)|( XZf

),ˆ()|()|ˆ(}|),ˆ({ˆ,

XXdZXfZXrZXXdEXX

Pp

)(

)()|()|(

Zf

XXZfZXf

LBPM

Prior

Example: uniform noise addition

79

LPPM

zx̂

)|( XzZf

Prior

x)|( xXZf


• When for a given there are several minimizers thefunction becomes stochastic.

• The user now must maximize privacy:

• Which is achieved for some mechanism

• Privacy is defined as after solving thismaxmin problem.

80

)ˆ,()()|()|ˆ(max)},ˆ({maxˆ,,

* XXdXXZfZXrXXdE p

XXZ

p

X̂Z

)|ˆ(* ZXr

)},ˆ({ XXdE p

)|(* XZf

An interesting result

• When :

i.e. do nothing!

• When the following identity must hold

• When both user and adversary play optimally:

81

)ˆ()|ˆ(* zXzZXr

qp dd

Privacy=Utility Loss

)},({minarg)|(* XzdEXzZf p

2ddd qp

}|{ zZXEz

The Utility Loss-Privacy plane

85

Uti

lity

Loss

Privacy

Achievable regionOptimal Mechanism

Achievable regionOptimal Adversary P=UL

Adv. Strategy 1

Adv. Strategy 2

Adv. Strategy 4

Adv. Strategy 3

Adv. Playing line

What’s wrong with priors?

• Is it realistic to asume that the adversary knows the prior?

• Adversary no longer plays optimally with the ‘wrong’ prior.

• Shokri’s privacy definition is prior-dependent.

• Definition of differential privacy is prior-independent:

- Two databases differing in a single element.

- A: randomized algorithm.

- S: set of possible subsets of im(A).

86

}))(log(Pr{}))(log(Pr{ 21 SDASDA

21, DD

Geoindistinguishability [Chatzikokolakis et al 2013-]

• A mechanism is geo-indistinguishable iff:

for all

• Differential privacy corresponds to dp = Hamming distance.

• Definition is prior-independent.

• Guarantees a small leakage of information BUT is no defense against EVERY adversary: with proper sideinformation, adversary can learn a lot!

87

)',(|)'|(log()|(log(| xxdxXzfxXzf p

.,', zxx

Uniform mechanisms do not provide geo-ind

88

)|( xXZf x

'x

)'|( xXZf

|)'|(log(

)|(log(|

xXzf

xXzf

Laplacian mechanism

• Laplacian distribution in polar coordinates:

• Then,

• The Laplacian mechanism satisfies the geo-ind condition.

89

),(2

2)|(

zxdexXzf

|),()',(||)'(log)|(log| 22 xzdxzdxXzfxXzf

)',(2 xxd Triangleinequality

Laplacian mechanism

90

Optimal mechanisms for geo-ind

• Minimize quality loss (i.e., ) subject to geo-ind constraint.

• Fact: geo-ind constraint is kept under any adversarialremapping

• Optimal mechanism is then

where

• The optimal adversarial remapping would find

91

),()()|()},({,

ZXdXXZfZXdE q

ZX

q

)},({ ZXdE q

)},({minarg)|(* XZdEXZf q

)|ˆ( ZXr

}|),ˆ({minarg)|ˆ(* ZXXdEZXr p

Optimal mechanisms for geo-ind

• If the adversary does nothing. Minimization of theQL has been already done by the mechanism!!

• But if the adversary does nothing, Privacy=QL.

• The operating value thus depends on (the smaller, thelarger the privacy).

92

qp dd

98

Where are we going?

Sensitivity [Bertino et. al 2010]

99

Sensitivity

• The mechanism should weigh the importance given by theuser to each location.

• This can be specified semantically by defining categories.

• Sensitivity of a region:

prob. that the user,

known to be in that

region, is actually in

a sensitive place.

• For other mechanisms:

open problem.

100100

Graph-based models

101

Graph-based models

102

Graph-based models

103

Trace

Graph-based models

• A trace is a path together with time .

• Common assumption for an adversary: the true trace can be described through a Markov chain.

• Prior transition probabilities between states can be estimated if training traces are (at least partially) available.

104

N

iii tX 1},{

)|( nm SSP

)|(1 nm SSP

)( lSP

)|( kl SSP

)|( ln SSP

)|( km SSP

)( kSP

)( mSP

)( nSP

Training data

)( nSP

)|( ln SSP

)( lSP

)|( kl SSP

)( kSP)( mSP)|( km SSP

)|( nm SSP

Graph-based models

• Shokri et al.’s approach: depending on what the adversarywants to learn, apply a different method.

• Maximum likelihood: find the most likely trace given theobserved trace

• Dynamic programming (e.g., Viterbi algorithm) can be used.

105

)},{|},({maxarg 11},{ 1

N

iii

N

iiitXtZtXfN

iii

Graph-based models

• Distribution estimation: estimate the probabilities of alltraces using the Metropolis-Hastings algorithm.

106

Graph-based models

• Location estimation: find the most likely node at time

• Can be solved using the backward-forward algorithm to recursively compute the probabilities.

107

)},{|(maxarg 1

N

iiikX tZXfk

kt

Privacy as a zero-sum game

109

Uti

lity

Loss

Privacy


P=UL

Achievable regionOptimal Adversary

Privacy+Utility=constant

Adding a new dimension: bandwidth

110

s

dummies) 8( 3 n

)(Privacy

)(LossUtility

2

2

sd

sd

S)(Privacy

)/3()(LossUtility

2

22

Sd

Sdsd

The Utility Loss-Privacy-Bandwidth region

111

Uti

lity

Loss

Privacy


P=UL

P=3 UL

Achievable regionOptimal Adversary

BW is now 9 times larger

Service providerutility loss

User utilityloss

Privacy gain dueto dummying

Space-time cloaking

112

timeDelay

density .poptimeareaanonimity -kPrivacy

areaLossUtility

Privacy-preserving queries

Retrieval in Encrypted Domain

Encrypted query

Encrypted reply

114

Thanks!

[email protected]

Grupo Procesado de Señal en Comunicaciones

mailto:[email protected]

http://www.gpsc.uvigo.es/

What utility? An example

116

density .poptimeareaanonimity -kPrivacy

area/1/1Utility max

d

But delay also counts…

117

Uti

lity

Privacy

Delay=5 min

Delay=10 min

Delay=15 min

118

What utility? Another example

• Space-time slicing

• Is this related to bandwidth?

119

• Space-time slicing

• Is this related to bandwidth?

120