21
Signal Processing and Data Privacy Literature Review By Kato Mivule COSC891 Fall 2013 The Role of Signal Processing in Meeting Privacy Challenges: An Overview Sankar, L.; Trappe, W.; Ramchandran, K.; Poor, H.V.; Debbah, M., "The Role of Signal Processing in Meeting Privacy Challenges: An Overview," Signal Processing Magazine, IEEE , vol.30, no.5, pp.95,106, Sept. 2013, doi: 10.1109/MSP.2013.2264541 Bowie State University Department of Computer Science

Literature Review: The Role of Signal Processing in Meeting Privacy Challenges: An Overview

Embed Size (px)

Citation preview

Signal Processing and Data Privacy

Literature Review

By Kato Mivule

COSC891 Fall 2013

The Role of Signal Processing in Meeting Privacy Challenges: An

Overview • Sankar, L.; Trappe, W.; Ramchandran, K.; Poor, H.V.; Debbah, M., "The Role of Signal Processing in Meeting Privacy Challenges:

An Overview," Signal Processing Magazine, IEEE , vol.30, no.5, pp.95,106, Sept. 2013, doi: 10.1109/MSP.2013.2264541

Bowie State University Department of Computer Science

Introduction: Information Leakage Everywhere

• Growth of information technology has made personal data easily available.

• This overflow of unconstrained personal data raises privacy concerns.

• Yet such data sources have remarkable value (utility) to their users.

• This leads to a tension between data privacy and utility needs.

Bowie State University Department of Computer Science

Signal Processing and Data Privacy

Introduction: Information Leakage Everywhere

• Users post information to social networks, unaware of the privacy risks.

• Companies use the cloud for data processing unaware of privacy risks.

• The data cloud is a risk to “leakage” of private data.

Bowie State University Department of Computer Science

Signal Processing and Data Privacy

Privacy is differs from Security

• Data privacy deals with confidentiality control.

• Data security involves the handling of accessibility control.

• Data privacy is the procedure of protection against illegal data disclosure.

• Data security is the control of data from illegal access. [ 1 2]

• To exemplify this fundamental point, a house might be secured with locks to ensure access

control; however, bystanders could still look inside the house from a distance if there are

no curtains in the windows, thus no privacy even while access is denied to the bystanders.

Bowie State University Department of Computer Science

Signal Processing and Data Privacy

Privacy differs from Cryptography

• Cryptography works as an access control methodology.

• Privacy works as a confidentiality control method.

• After decryption, a plaintext database is a risk to ‘inside knowledge’ attacks.

• After decryption, a plaintext database loses its confidentiality.

Bowie State University Department of Computer Science

Signal Processing and Data Privacy

Adversaries could be insiders

• Every user is a potential adversary.

• A database might be secure but vulnerable to confidentiality breaches.

• For example a user learning private information by inference.

Bowie State University Department of Computer Science

Signal Processing and Data Privacy

Inference Vulnerabilities – Image Source: Sankar, Trappe, Ramchandran, Poor, Debbah (2013)

Attribute types in statistical databases

• Authors: Public attributes and Private attributes.

• More on attributes:

• PII – Personally Identifiable Information attributes

• Quasi attributes

• Non-sensitive attributes

• Sensitive attributes

Bowie State University Department of Computer Science

Signal Processing and Data Privacy

Privacy versus utility – a.k.a. the Utility-Privacy (U-P) tradeoff problem.

• Data utility – how beneficial a privatized dataset is to a user.

• Data utility (usefulness) diminishes during the data privacy process:

• When PII is removed.

• When data is perturbed.

• Equilibrium between data privacy and utility needs is an intractable problem.

• “Perfect privacy can be achieved by publishing nothing at all, but this has no utility;

perfect utility can be obtained by publishing the data exactly as received, but this offers no

privacy” Dwork (2006)

Bowie State University Department of Computer Science

Signal Processing and Data Privacy

Types of Privacy

• Database privacy

• Consumer privacy

• Competitive privacy

Bowie State University Department of Computer Science

Signal Processing and Data Privacy

Data Privacy Mechanisms

Author: k-anonymity and Differential privacy

• Non-perturbative methods: original data values are not modified.

• k-anonymity

• l-diversity

• Suppression

• Generalization

• Perturbative methods: original data values are transformed.

• Noise addition

• Multiplicative noise

• Logarithmic noise

• Differential privacy

Bowie State University Department of Computer Science

Signal Processing and Data Privacy

Data Privacy Mechanisms

• Non-perturbative methods: original data values are not modified.

Bowie State University Department of Computer Science

Signal Processing and Data Privacy

Data Privacy Mechanisms – Perturbation methods – data values modified

• Noise addition: random values are added to sensitive numerical attribute

values to ensure privacy. The general expression is: • 𝑋 + 𝜀 = 𝑍

• X is the original continuous dataset and ɛ is the set of random values (noise) with a

distribution 𝑒~𝑁 0, 𝜎2 that is added to X, and finally Z is the privatized dataset.

• Multiplicative noise: random values with mean µ= 1 and variance 𝜎2, is

multiplied to the original values. The general expression is: • 𝑋𝑗𝜀𝑗 = 𝑌𝑗

• Logarithmic noise: a logarithmic adjustment of the original values is done: • 𝑙𝑛𝑋𝑗 = 𝑌𝑗

• Random values 𝜀𝑗 are then created and added the logarithmic values, 𝑌𝑗, producing the privatized

values 𝑍𝑗 as shown below:

• 𝑌𝑗 + 𝜀𝑗 = 𝑍𝑗

Bowie State University Department of Computer Science

Signal Processing and Data Privacy

Data Privacy Mechanisms – Perturbation methods – data values modified

Differential Privacy (DP):

• DP is enforced by adding Laplace noise to query results from the database.

• With DP, the users of the database cannot discern if an item has been changed in that database.

• A DP mechanism satisfies the following criteria:

•𝑃[𝑞𝑛(𝐷1)∈𝑅]

𝑃[𝑞𝑛 𝐷2 ∈𝑅] ≤ 𝑒𝜀

• Laplace noise between (0, b) is generated and added to f(x), the original query response, such that:

• 𝑏 =∆𝑓

𝜀

• The max difference is calculated, ∆𝑓 is the max difference (most influential observation):

• ∆𝑓 = 𝑀𝑎𝑥 𝑓 𝐷1 − 𝑓 𝐷2

• Finally, 𝐷𝑖𝑓𝑓𝑒𝑟𝑒𝑛𝑡𝑖𝑎𝑙 𝑝𝑟𝑖𝑣𝑎𝑡𝑒 𝑑𝑎𝑡𝑎 = 𝑓 𝑥 + 𝐿𝑎𝑝𝑙𝑎𝑐𝑒(0, 𝑏)

Bowie State University Department of Computer Science

Signal Processing and Data Privacy

Utility-Privacy(U-P) Trade-off

• Data sanitization is concerned with:

• The statistics of the output that achieve a desired level of utility and privacy

• Deciding which input values to perturb.

• How to probabilistically perturb values.

• The U-P tradeoff framework requires the following three components:

• A (statistical) model for the data

• Measures for privacy and utility

• A method to formalize the mappings from 𝑋 to 𝑋

Bowie State University Department of Computer Science

Signal Processing and Data Privacy

Utility-Privacy(U-P) Trade-off

• Data sanitization is concerned with:

• The statistics of the output that achieve a desired level of utility and privacy

• Deciding which input values to perturb.

• How to probabilistically perturb values.

Bowie State University Department of Computer Science

Signal Processing and Data Privacy

Tension between Privacy and utility - Image Source: Sankar, Trappe, Ramchandran, Poor, Debbah (2013)

Data Utility Measure

• Utility captures how close the revealed database is to the original.

• A possible measure for the utility u is the requirement that the average distortion of the public

variables is upper bounded, for each 𝜀 > 0, and all sufficiently large n:

𝑢 ≡ 𝐸1

𝑛 𝑝

𝑛

𝑖=1𝑋𝐾𝑟,𝑖

, 𝑋 𝐾𝑟,𝑖≤ 𝐷 + 𝜖

• p . , . is the distortion function

• 𝐸 is the expectation over the joint distribution 𝑋𝐾𝑟,𝑖 , 𝑋 𝐾𝑟,𝑖

• The subscription 𝑖 , is the 𝑖𝑡ℎ𝑡ℎ entry of the database

• Distance based distortion examples include:

• Euclidean distance

• Hamming distance

• Kullback-Leibler divergence

Bowie State University Department of Computer Science

Signal Processing and Data Privacy

Privacy Quantification

• Entropy is used as a measure of information or uncertainty.

• Privacy requires that there is randomness or uncertainty of all the private variables

• 𝑒 ≡1

𝑛𝐻 𝑋𝐾ℎ

| 𝐽 ≥ 𝐸 − 𝜖

• 𝐻(. |. ) is Shannon’s conditional entropy

• X and Y are two random variables with a joint distribution 𝑝𝑋𝑌

• The conditional entropy 𝐻 𝑋 𝑌 = −(𝑥,𝑦) 𝑝𝑋𝑌 𝑥, 𝑦 𝑙𝑜𝑔𝑝𝑋|𝑌(𝑥|𝑦)

Bowie State University Department of Computer Science

Signal Processing and Data Privacy

Signal Processing applications of Data Privacy

Categories of database privacy:

• Statistical Data Privacy: which involves guaranteeing privacy of any individual in a

database that is used for statistical information processing (utility).

• Competitive Privacy: which involves information sharing for a common system good

(utility) between competing agents that comprise the system.

• Consumer Privacy: guaranteeing privacy in smart devices.

• Image Classification Privacy: privacy guarantees in biometric identification.

• The FBI to spend US$1 billion on a face recognition to scan surveillance video system.

• Civil rights groups are raising objections about possible privacy violations.

• Privacy preserving algorithms could be employed to find a balance by focusing on criminals only

Bowie State University Department of Computer Science

Signal Processing and Data Privacy

Signal Processing applications of Data Privacy

Categories of database privacy:

• Statistical Data Privacy

• Competitive Privacy

• Consumer Privacy

• Image Classification Privacy

Bowie State University Department of Computer Science

Signal Processing and Data Privacy

Database Privacy Categories – Image Source: Sankar, Trappe, Ramchandran, Poor, Debbah (2013)

Conclusion

• A general overview of the data privacy and utility problem is given.

• Most data privacy implementations center around data perturbation methods.

• Signal processing could be applied to:

• Finding the optimal balance between privacy and utility

• Filtering out unneeded noise during the perturbation process.

• The paper focused much on the data privacy and utility problem.

• The paper offered new quantification approach to the data privacy and utility problem.

• The actual application and implementation of signal processing to data privacy

problems is left to the readers. Bowie State University Department of Computer Science

Signal Processing and Data Privacy

References • Sankar, L.; Trappe, W.; Ramchandran, K.; Poor, H.V.; Debbah, M., "The Role of Signal Processing in Meeting

Privacy Challenges: An Overview," Signal Processing Magazine, IEEE , vol.30, no.5, pp.95,106, Sept. 2013,

doi: 10.1109/MSP.2013.2264541

• Mivule, Kato; Turner, Claude, “A Comparative Analysis of Data Privacy and Utility Parameter Adjustment,

Using Machine Learning Classification as a Gauge”, Complex Adaptive Systems 2013, Nov 13-15, 2013,

Baltimore, MD, USA, (In Press)

• Mivule, K; Turner, Claude, "A Review of Privacy Essentials for Confidential Mobile Data Transactions",

eprint arXiv:1309.3953, 09/2013, ARXIV, online; http://arxiv.org/pdf/1309.3953v1

• Mivule, Kato, "Utilizing Noise Addition for Data Privacy, an Overview", Proceedings of the International

Conference on Information and Knowledge Engineering (IKE 2012), Pages 65-71, Las Vegas, NV, USA.

Bowie State University Department of Computer Science

Signal Processing and Data Privacy