Fairness, Privacy, and Social Norms Omer Reingold, MSR-SVC “Fairness through awareness” with...

Preview:

Citation preview

Fairness, Privacy, and Social Norms

Omer Reingold, MSR-SVC“Fairness through awareness” with Cynthia

Dwork, Moritz Hardt, Toni Pitassi, Rich Zemel+ Musings with Cynthia Dwork, Guy Rothblum

and Salil Vadhan

In This Talk• Fairness in Classification (individual-based

notion)– Connection between Fairness and Privacy– DP beyond Hamming Distance

• A notion of privacy beyond the DB setting.• Empowering society to make choices on

privacy.

Fairness in Classification

paper

acceptance

Schooling

Advertising

Banking

Health Care

Financial aid

Taxation

Concern: Discrimination• Population includes minorities– Ethnic, religious, medical, geographic– Protected by law, policy, ethics

• A catalog of evils: redlining, reverse tokenism, self fulfilling prophecy, … discrimination may be subtle!

Credit Application (WSJ 8/4/10)

User visits capitalone.comCapital One uses tracking information provided by the tracking network [x+1] to personalize offersConcern: Steering minorities into higher rates (illegal)*

Here: A CS Perspective• An individual based notion of fairness – fairness

through awareness• Versatile framework for obtaining and

understanding fairness• Lots of open problems/directions– Fairness vs. Privacy

Other notions of “fairness” in CS• Fair scheduling• Distributed computing• Envy-freeness• Cake cutting• Stable matching

• More closely related notions outside of CS (Economics, Political Studies, …)– Rawls, Roemer, Fleurbaey, Young, Calsamiglia

Fairness and Privacy (1)• [Dwork & Mulligan 2012] objections to online

behavioral targeting often expressed in terms of privacy. In many cases the underlying concern is better described in terms of fairness (e.g., price discrimination, being mistreated).– Other major concern: feeling of “ickiness” [Tene]

• Privacy does not imply fairness – Definitions and techniques useful. – Can Fairness Imply Privacy (beyond DB setting)?

V: Individuals O: outcomes

Ad network(x+1)

x M(x)

Vendor(capital one)

A: actions

V: Individuals O: outcomes

x M(x)

Our goal: Achieve Fairness in the first step (mapping)

Assume

unknown, untrusted, un-auditable vendor

First attempt…

Fairness through Blindness

Fairness through Blindness

• Ignore all irrelevant/protected attributes– e.g., Facebook “sex” & “interested in men/women”

• Point of failure: Redundant encodings– Machine learning: You don’t need to see the label to

be able to predict it– E.g., redlining

Second attempt…

Group Fairness (Statistical Parity)

• Equalize minority S with general population T at the level of outcomes– Pr[outcome o | S] = Pr[outcome o | T]

• Insufficient as a notion of fairness– Has some merit, but can be abused– Example: Advertise burger joint to carnivores in T

and vegans in S.– Example: Self fulfilling prophecy– Example: Multiculturalism …

Lesson: Fairness is task-specific

• Fairness requires understanding of classification task (this is where utility and fairness are in accord)–Cultural understanding of protected groups–Awareness!

Our approach…

Individual Fairness

Treat similar individuals similarly

Similar for the purpose of(fairness in) the classification task

Similar distributionover outcomes

• Assume task-specific similarity metric– Extent to which two individuals are similar w.r.t. the

classification task at hand• Possibly captures some ground truth or society’s

best approximation– Or instead: society’s norms

• Open to public discussion, refinement• Our framework is agnostic to the choice of

metric• User control?

Metric – Who Decides?

• Financial/insurance risk metrics– Already widely used (though secret)

• IBM’s AALIM health care metric– health metric for treating similar patients similarly

• Roemer’s relative effort metric– Well-known approach in Economics/Political

theory• Machine Learning

Maybe not so much science fiction after all…

Metric - Starting Points

Randomized Mapping

V: Individuals O: outcomes

Classification

xM(x)

V: Individuals O: outcomes

Close individuals according to Metric d: V V R

M(x)

yM(y)

x

Towards Formal DefinitionMapped to close distributions

V: Individuals O: outcomes

Close individuals according to Metric d: V V R

M(x)

yM(y)

x

Fairness and D-Privacy (2)Mapped to close distributions

Close databases according to Hamming d: V V R

V: databases O: sanitizations

Key elements of our approach…

Efficiency (with utility maximization)

Efficient Procedure

Metric d: V V R

V: Individuals O: outcomes

x M(x)

d-fair mapping M

lossfunctionL: V O R

Minimize vendor’s expected losssubject to fairness condition

More Specific Question we Address• How to efficiently construct the mapping

M: V -> (O) • When does individual fairness imply group

fairness (statistical parity)?– For a specific metric, which sub-communities are

treated similarly?• Framework for achieving “fair affirmative

action” (ensuring minimal violation of fairness condition)

Fairness vs. Privacy• Privacy does not imply fairness. • Can (our definition of) fairness imply privacy?• Differential Privacy [Dwork-McSherry-Nissim-

Smith’06], privacy for individuals whose information is part of a database:

Privacy on the Web?

• No longer protected by the data of others – my traces can be used directly to compromise my privacy.

• Can fairness be viewed as a measure of privacy?– Can fairness “blend me in with the (surrounding) crowd”?

Relation to K-Anonymity• Critique of k-anonymity: Blending with others

that have the same sensitive property X is a small consolation.

• “Our” notion of privacy is as good as the metric! • If your surrounding is “normative” may imply

meaningful protection (and substantiate, currently unjustified, sense of security of users).

Simple Observation:Who Are You Mr. Reingold?

• If all new information on me obeys our fairness definition with metrics where the two possible Omers are very close then your confidence won’t increase by much …

??

Do We Like It?Challenge – Accumulated Leakage:• Different applications require different metrics.• Less of an issue for fairness …

DPrivacy with Other Metrics• This work gives additional motivation to study

differential privacy beyond Hamming distance.• Well motivated even in the context of database

privacy (there since the original paper).• Example: Privacy of social networks [Kifer-

Machanavajjhala SIGMOD ‘11]– Privacy depends on context

• Privacy is a matter of social norms. • Our burden: give tools to decision makers.

What is the Privacy in DP?• Original motivation mainly given in terms of opt-

out/opt-in incentives. Worry about an individual deciding if to participate.

• A different point of view: a committee that needs to approve a proposed study in the first place.–Does the study incur only tolerable amount of

privacy loss for any particular individual?

On Correlations and Priors• Assume that rows are selected independently,

and no prior information on the database:– DP protects the privacy of each individual.

• But at the presence of prior information, privacy can be grossly violated [Dwork-Naor ‘10]

• Pufferfish [Kifer- Machanavajjhala] A Semantic Approach to the Privacy of Correlated Data• Protect privacy at the presence of pre-specified adversaries

•Interesting case may be when there is a conflict between privacy and utility

Individual-Oriented Sanitization• Assume you only care about the privacy of Alice.• Further assume that the data of Alice is correlated

to the data of at most 10 others.• Enough to erase these 11 rows from the database.• Even if correlated to more, expunging more that

11 rows may exceed the (society defined) legitimate expectation of privacy (e.g., in a health study).

• Differential privacy simultaneously gives “comparable” level of privacy to everyone.

Other variants of DP • Suggests and interprets other variants of DP –

defined by the sanitization we allow individuals.• For example: in social networks, what is the

reasonable expectation of privacy for an individual:– Erase your neighborhood?– Erase information originating from you?

• Another variant: change a few entries in each column.

Objections• Adam Smith: this informal interpretation may

lose too much. For example, the distance in the definition of DP is subtle

• Jonathan Katz: How do you set up epsilon? • Omer Reingold: How do you incorporate input

from machine learning into the decision process of policy makers?

Lots of open problems/directions• Metric– Social aspects, who will define them?– How to generate metric (semi-)automatically,

metric oracle?• Connection to Econ literature/problems– Rawls, Roemer, Fleurbaey, Young, Calsamiglia– Local vs global distributive fairness? Composition?

• Case Study (e.g., in health care)– Start from AALIM?

• Quantitative trade-offs in concrete settings

Lots of open problems/directions• Further explore connection and implications

to privacy.• Additional study of DP with other metrics.• Completely different definitions of privacy? • …

Thank you.

Questions?

Recommended