Location Privacy. Where do we stand and where are we going?
Fernando Pérez-González
Signal Theory and CommunicationsDepartment
Universidad de Vigo -SPAIN
2
Why do we like location based apps?
Google maps
3
Foursquare
4
Facebook place tips
5
Waze
6
And, of course…
7
8
How can you be geolocated?(without you fully knowing)
IP-based Geolocation
9
Source: GeoIPTool
Meta-data based Geolocation
10
Landmark recognition Geolocation
11
Biometric geolocation
12
Credit card usage Geolocation
14
Triangulation and other geolocation techniques
15
Signal strength-based triangulation
16
Source: The Wrongful Convictions Blog
17
Source: The Wrongful Convictions Blog
Signal strength-based triangulation
Multilateration: Time Difference of Arrival (TDOA)
18
Source:[Fujii et al. 2015]
Wardriving geolocation (Wigle)
19
Source:Wigle.net
Electrical Network Frequency Geolocation
20
21
22
Why is it dangerous?
23
Buster busted!
24
25
26
6 months in the life of Malte Spitz (2009-2010)
29
Source:http://www.zeit.de/datenschutz/malte-spitz-data-retention
31
Are we concerned about it?
Are people really concerned about locationprivacy?
• Survey by Skyhook Wireless (July 2015) of 1,000 Smartphone app users.
• 40% hesitate or don’t share location with apps.
• 20% turned off location for all their apps.
• Why people don’t share location?• 50% privacy concerns.
• 23% don’t see value in location data.
• 19% say it drains their battery.
• Why people turn off location?• 63% battery draining.
• 45% privacy.
• 20% avoid advertising.
32
33
How much is geolocation data worth?
34
How much value do we give to location data? [Staiano et al. 2014]
35
Dai
lyV
alu
e(€
)
Many participants opted-out of revealing geolocation information.
Avg. daily value of location info: 3 €
Strong correlation between the amount traveled and the value given to location data.
Earn money as you share data
36
• GeoTask
• £1 PayPal cash voucher per 100 days of location data sharing (£0.01/day)
Financial Times in 2013: advertisers are willing to pay a mere $0.0005 per person for general information such as their age, gender and location, or $0.50 per 1,000 people.
Pay as you drive
38
• Formula can be a function of the amount of miles driven, or the type of driving, age of the driver, type of roads used…
• Up to 40% reduction in the cost of insurance.
39
BIA/Kelsey projects U.S. location-targeted mobile ad spending to grow from $9.8 billion in 2015 to $29.5 billion in 2020.
That’s $90 per person year!!!!
40
SAP, Germany, estimates wireless carrier revenue from selling mobile-user behavior data in $5.5 billion in 2015 and predicts $9.6 billion for 2016.
47
How aboutanonymization/pseudonymization?
Anonymity
Problems:
• Difficult authentication and personalization.
• Operating system or apps may access location before anonymization.
48
Anonymity provider(local/central)
LocationLocation
Service provider
Pseudonimity
Problems:
• Operating system or apps may access location data before pseudonymization.
• Deanonymization.
49
Location
Service providerPseudonym
Deanonymization based on home location [Hoh, Gruteser 2006]
• Data from GPS traces of larger Detroit area (1 min resolution).
• No data when vehicle parked.
• K-means algorithm for clustering locations + 2 heuristics:• Eliminate centroids that don’t have evening visits.
• Eliminate centroids outside residential areas (manually).
50
Source: [Hoh, Gruteser 2006]
Deanonymization based on home location[Krummer 2007]
• 2- week GPS data from 172 subjects (avg. 6 sec resolution).
• Use heuristic to single out trips by car.
• Then use several heuristics: destination closest to 3 a.m. ishome; place where individual spends most time is home; center of cluster with most points is home.
• Use reverse geocoding and white pages to deanonymize. Success measured by finding out name of individual.
• Positive identification rates around 5%.
• Even noise addition with std=500 m gives around 5% successwhen measured by finding out correct address.
51
Mobile trace uniqueness [de Montjoye et al 2013]
• Study on 15 months of mobility data; 0.5M individuals.
• Dataset with hourly updates and resolution given by cellcarrier antennas, only 4 points suffice to identify 95% of individuals.
• Uniqueness of mobility traces decays as 1/10th power of their resolution.
52
Source: [de Montojoye et al. 2013]
53
Location privacy protection mechanisms
Location white lies
54
Source: Caro Spark (CC BY-NC-ND)
Location based privacy mechanisms
55
Inputlocation
Outputpseudolocation
X Z
Source: Motherboards.org
Location privacy protection mechanisms (LPPMs)
•
• The mechanism may be deterministic (e.g., quantization) orstochastic (e.g., noise addition).
• Function may depend on other contextual (e.g., time) or user-tunable (e.g., privacy level) parameters.
• When the mechanism is stochastic, there is an underlyingprobability density function, i.e.,
56
)(XZ
)(
)|( XZf
Hiding
57
Perturbation: (indepedent) noise addition
58
Perturbation: quantization
59
Obfuscation
60
Spatial Cloaking
61
How to commit the perfect murder
62
Space-timeCloaking
63
Time
Dummies
64
User-centric vs. Centralized LPPM
65
User-centric
User-centric vs. Centralized LPPM
66
Centralized
67
Utility vs. Privacy
68
Privacy
Uti
lity
• In broad terms:
Very nice, but…
• There are two main problems:
How do we measure utility?
How do we measure privacy?
69
How to measure utility?
70
71
How to measure utility?
How to measure utility?
72
Real position
pseudolocation
A note about distances
76
2d
1d
Adversarial definition of privacy [Shokri et al 2011-]
• Assume stochastic mechanism for the user .
• Adversary constructs a (possibly stochastic) estimationremapping .
• Prior assumed available to the adversary.
• : Distance between and
• : Distance between and
77
)|( XZf
)|ˆ( ZXr
)(X
x̂ .x)ˆ,( xxd p
),( zxdq x .z
x
zLPPM
x̂ Adversary
Adversarial definition of privacy [Shokri et al 2011-]
• Establish a cap on average utility loss:
• This is a Stackelberg game in which the user chooses firstand the adversary plays second.
• Find optimal adversarial ‘remapping’:
• Optimal remapping depends on and .
where
78
}|),ˆ({minarg)|ˆ(* ZXXdEZXr p
)(X
QLZXdE q )},({
)|( XZf
),ˆ()|()|ˆ(}|),ˆ({ˆ,
XXdZXfZXrZXXdEXX
Pp
)(
)()|()|(
Zf
XXZfZXf
LBPM
Prior
Example: uniform noise addition
79
LPPM
zx̂
)|( XzZf
Prior
x)|( xXZf
Adversarial definition of privacy [Shokri et al 2011-]
• When for a given there are several minimizers thefunction becomes stochastic.
• The user now must maximize privacy:
• Which is achieved for some mechanism
• Privacy is defined as after solving thismaxmin problem.
80
)ˆ,()()|()|ˆ(max)},ˆ({maxˆ,,
* XXdXXZfZXrXXdE p
XXZ
p
X̂Z
)|ˆ(* ZXr
)},ˆ({ XXdE p
)|(* XZf
An interesting result
• When :
i.e. do nothing!
• When the following identity must hold
• When both user and adversary play optimally:
81
)ˆ()|ˆ(* zXzZXr
qp dd
Privacy=Utility Loss
)},({minarg)|(* XzdEXzZf p
2ddd qp
}|{ zZXEz
The Utility Loss-Privacy plane
85
Uti
lity
Loss
Privacy
Achievable regionOptimal Mechanism
Achievable regionOptimal Adversary P=UL
Adv. Strategy 1
Adv. Strategy 2
Adv. Strategy 4
Adv. Strategy 3
Adv. Playing line
What’s wrong with priors?
• Is it realistic to asume that the adversary knows the prior?
• Adversary no longer plays optimally with the ‘wrong’ prior.
• Shokri’s privacy definition is prior-dependent.
• Definition of differential privacy is prior-independent:
- Two databases differing in a single element.
- A: randomized algorithm.
- S: set of possible subsets of im(A).
86
}))(log(Pr{}))(log(Pr{ 21 SDASDA
21, DD
Geoindistinguishability [Chatzikokolakis et al 2013-]
• A mechanism is geo-indistinguishable iff:
for all
• Differential privacy corresponds to dp = Hamming distance.
• Definition is prior-independent.
• Guarantees a small leakage of information BUT is no defense against EVERY adversary: with proper sideinformation, adversary can learn a lot!
87
)',(|)'|(log()|(log(| xxdxXzfxXzf p
.,', zxx
Uniform mechanisms do not provide geo-ind
88
)|( xXZf x
'x
)'|( xXZf
|)'|(log(
)|(log(|
xXzf
xXzf
Laplacian mechanism
• Laplacian distribution in polar coordinates:
• Then,
• The Laplacian mechanism satisfies the geo-ind condition.
89
),(2
2)|(
zxdexXzf
|),()',(||)'(log)|(log| 22 xzdxzdxXzfxXzf
)',(2 xxd Triangleinequality
Laplacian mechanism
90
Optimal mechanisms for geo-ind
• Minimize quality loss (i.e., ) subject to geo-ind constraint.
• Fact: geo-ind constraint is kept under any adversarialremapping
• Optimal mechanism is then
where
• The optimal adversarial remapping would find
91
),()()|()},({,
ZXdXXZfZXdE q
ZX
q
)},({ ZXdE q
)},({minarg)|(* XZdEXZf q
)|ˆ( ZXr
}|),ˆ({minarg)|ˆ(* ZXXdEZXr p
Optimal mechanisms for geo-ind
• If the adversary does nothing. Minimization of theQL has been already done by the mechanism!!
• But if the adversary does nothing, Privacy=QL.
• The operating value thus depends on (the smaller, thelarger the privacy).
92
qp dd
98
Where are we going?
Sensitivity [Bertino et. al 2010]
99
Sensitivity
• The mechanism should weigh the importance given by theuser to each location.
• This can be specified semantically by defining categories.
• Sensitivity of a region:
prob. that the user,
known to be in that
region, is actually in
a sensitive place.
• For other mechanisms:
open problem.
100100
Graph-based models
101
Graph-based models
102
Graph-based models
103
Trace
Graph-based models
• A trace is a path together with time .
• Common assumption for an adversary: the true trace can be described through a Markov chain.
• Prior transition probabilities between states can be estimated if training traces are (at least partially) available.
104
N
iii tX 1},{
)|( nm SSP
)|(1 nm SSP
)( lSP
)|( kl SSP
)|( ln SSP
)|( km SSP
)( kSP
)( mSP
)( nSP
Training data
)( nSP
)|( ln SSP
)( lSP
)|( kl SSP
)( kSP)( mSP)|( km SSP
)|( nm SSP
Graph-based models
• Shokri et al.’s approach: depending on what the adversarywants to learn, apply a different method.
• Maximum likelihood: find the most likely trace given theobserved trace
• Dynamic programming (e.g., Viterbi algorithm) can be used.
105
)},{|},({maxarg 11},{ 1
N
iii
N
iiitXtZtXfN
iii
Graph-based models
• Distribution estimation: estimate the probabilities of alltraces using the Metropolis-Hastings algorithm.
106
Graph-based models
• Location estimation: find the most likely node at time
• Can be solved using the backward-forward algorithm to recursively compute the probabilities.
107
)},{|(maxarg 1
N
iiikX tZXfk
kt
Privacy as a zero-sum game
109
Uti
lity
Loss
Privacy
Achievable regionOptimal Mechanism
P=UL
Achievable regionOptimal Adversary
Privacy+Utility=constant
Adding a new dimension: bandwidth
110
s
dummies) 8( 3 n
)(Privacy
)(LossUtility
2
2
sd
sd
S)(Privacy
)/3()(LossUtility
2
22
Sd
Sdsd
The Utility Loss-Privacy-Bandwidth region
111
Uti
lity
Loss
Privacy
Achievable regionOptimal Mechanism
P=UL
P=3 UL
Achievable regionOptimal Adversary
BW is now 9 times larger
Service providerutility loss
User utilityloss
Privacy gain dueto dummying
Space-time cloaking
112
timeDelay
density .poptimeareaanonimity -kPrivacy
areaLossUtility
Privacy-preserving queries
Retrieval in Encrypted Domain
Encrypted query
Encrypted reply
114
Thanks!
Grupo Procesado de Señal en Comunicaciones
What utility? An example
116
density .poptimeareaanonimity -kPrivacy
area/1/1Utility max
d
But delay also counts…
117
Uti
lity
Privacy
Delay=5 min
Delay=10 min
Delay=15 min
118
What utility? Another example
• Space-time slicing
• Is this related to bandwidth?
119
• Space-time slicing
• Is this related to bandwidth?
120