Upload
kanojikajino
View
174
Download
1
Tags:
Embed Size (px)
Citation preview
Preserving Worker Privacy in Crowdsourcing
Hiroshi Kajino1, Hiromi Arai2, Hisashi Kashima3 1. The University of Tokyo, 2. RIKEN, 3. Kyoto University
1 18/09/14 ECML/PKDD 2014
Outline
■ IntroducNon & ExisNng Work □ Crowdsourcing: Outsourcing to unspecified people □ Quality control: Quality of results is variable
■ Proposed Problem SeVng □ Worker privacy: SensiNve info of workers can be inferred □ Worker-‐private quality control problem
■ Proposed Method □ ExisNng Quality control method + secure computaNon
■ Experiments □ Accuracy: Validate approximaNon in secure computaNon □ Computa=on =me: Validate computaNonal overhead
18/09/14 ECML/PKDD 2014 2
Propose & address a worker privacy problem in crowdsourcing
Outline
■ IntroducNon & ExisNng Work □ Crowdsourcing: Outsourcing to unspecified people □ Quality control: Quality of results is variable
■ Proposed Problem SeVng □ Worker privacy: SensiNve info of workers can be inferred □ Worker-‐private quality control problem
■ Proposed Method □ ExisNng Quality control method + secure computaNon
■ Experiments □ Accuracy: Validate approximaNon in secure computaNon □ Computa=on =me: Validate computaNonal overhead
18/09/14 ECML/PKDD 2014 3
Propose & address a worker privacy problem in crowdsourcing
Research Target
■ Crowdsourcing □ Pros: Easy to use at low costs
• Industry: Reduce financial/Nme costs for outsourcing
• Academy: Trigger of new AI research areas (human computaNon)
□ Cons: Quality issue, privacy issues, etc.
4
Crowdsourcing is a method to outsource tasks to unspecified workers
18/09/14 ECML/PKDD 2014
Worker Requester
overlooks inquiry
1. Submit instances
2. Return answers
(h]p://www.captcha.net/)
ExisNng Work
■ Quality of answers depends on abiliNes of workers □ CollecNng labels from mulNple workers is necessary
■ Quality control problem (in a labeling task) □ Input: Crowd labels {yij ∈ {0,1} | i = 1,..., I, j = 1,..., J} □ Output: EsNmated true labels {yi ∈ {0,1} | i = 1,..., I}
18/09/14 ECML/PKDD 2014 5
EsNmate ground truth labels by aggregaNng mulNple workers’ answers
Task example: Label an image whether it contains a bird or not
1 = Bird 0 = Not Bird
instance i 1 1 0 0 1 0
0 0
0 ? ?
?
Ground truth
worker j
ExisNng Work
■ Latent Class Method [Dawid & Skene, 1979] □ Model: Latent class model
• p = Pr[yi = 1]: Prob. of true label = 1 • αj = Pr[yij = yi | yi = 1] • βj = Pr[yij = yi | yi = 0] • I, J: #(Instance), #(Worker)
□ Inference: Given {yij}, esNmate {yi}, {αj, βj}, p • E-‐step: EsNmate {yi}, fixing {αj, βj}, p • M-‐step: EsNmate {αj, βj}, p, fixing {yi}
18/09/14 ECML/PKDD 2014 6
EsNmate consensus labels by inferring worker models
AbiliNes of worker j
yi yij p αj J I
βj
Outline
■ IntroducNon & ExisNng Work □ Crowdsourcing: Outsourcing to unspecified people □ Quality control: Quality of results is variable
■ Proposed Problem SeVng □ Worker privacy: SensiNve info of workers can be inferred □ Worker-‐private quality control problem
■ Proposed Method □ ExisNng Quality control method + secure computaNon
■ Experiments □ Accuracy: Validate approximaNon in secure computaNon □ Computa=on =me: Validate computaNonal overhead
18/09/14 ECML/PKDD 2014 7
Propose & address a worker privacy problem in crowdsourcing
Worker Privacy Issue
■ SensiNve informaNon in answers □ Loca=on
• AED4 collects locaNons of AEDs in a map • Movement history of a worker is revealed
□ Personal Informa=on in Ques=onnaire Task • Interest of workers, personal informaNon (quasi-‐idenNfier) • Joining other data sets can idenNfy anonymous workers
□ Ability • Quality control methods reveal the ability of a worker • DemoNvate to join in volunteer-‐based crowdsourcing
18/09/14 ECML/PKDD 2014 8
Simply passing answers to the requester can invade worker privacy
Our Problem SeVng
■ Worker-‐Private Quality Control Problem □ Input: Crowd labels {yij | i = 1,..., I, j = 1,..., J} □ Output: EsNmated true labels {yi | i = 1,..., I} □ Subject to: Labels and abiliNes are kept worker-‐private
cf. Similar def can be found in query audiNng
18/09/14 ECML/PKDD 2014 9
We propose a worker-‐private quality control problem
Worker j’s vj is worker-‐private if others cannot determine vj uniquely
Defini=on
Outline
■ IntroducNon & ExisNng Work □ Crowdsourcing: Outsourcing to unspecified people □ Quality control: Quality of results is variable
■ Proposed Problem SeVng □ Worker privacy: SensiNve info of workers can be inferred □ Worker-‐private quality control problem
■ Proposed Method □ ExisNng Quality control method + secure computaNon
■ Experiments □ Accuracy: Validate approximaNon in secure computaNon □ Computa=on =me: Validate computaNonal overhead
18/09/14 ECML/PKDD 2014 10
Propose & address a worker privacy problem in crowdsourcing
Proposed Method: Overview
■ Worker-‐Private Latent Class Protocol □ Model: Latent class model (same as the previous one) □ Secure Inference:
• E-‐step: Requester & workers esNmate {yi} by secure computaNon
• M-‐step: Each worker updates αj, βj secretly
18/09/14 ECML/PKDD 2014 11
Propose a privacy-‐preserving inference algorithm for LC model
secure computaNon
Workers keep their answers secret
Requester obtains true answers
New!
Proposed Method: Building Block
■ Secure Sum Protocol (Generalized Paillier cryptosystem [Damgård+,01])
Compute Σj vj when each worker j has value vj secretly □ Addi=ve Homomorphic Cryptosystem:
For plaintexts v1, v2 ∈ Zn and ciphertexts Enc(v1), Enc(v2), Enc(v1 + v2) = Enc(v1)・Enc(v2) holds
□ Protocol: 1) Each worker j computes Enc(vj), and parNes compute Enc(Σj vj) 2) ParNes decrypt Enc(Σj vj) using distributed secret keys
18/09/14 ECML/PKDD 2014 12
Secure sum allows us to compute the sum without privacy invasion
Aoer execuNng the protocol, any party learns nothing other than their iniNal knowledge & the sum.
Lemma
Proposed Method: Algorithm
■ Worker-‐Private Latent Class Protocol □ Parameters: {μi}, p, {αj}, {βj}
• μi = Pr[yi = 1 | Data], p = Pr[yi = 1] • αj = Pr[yij = yi | yi = 1], βj = Pr[yij = yi | yi = 0]
18/09/14 ECML/PKDD 2014 13
Incorporate workers into computaNon to preserve worker privacy
True labels μ1 μ2 μ3 p
1 0 1 α1, β1
1 0 0 α2, β2
0 0 0 α3, β3
AbiliNes
Proposed Method: Algorithm
■ Worker-‐Private Latent Class Protocol □ Parameters: {μi}, p, {αj}, {βj}
• μi = Pr[yi = 1 | Data], p = Pr[yi = 1] • αj = Pr[yij = yi | yi = 1], βj = Pr[yij = yi | yi = 0]
18/09/14 ECML/PKDD 2014 14
Incorporate workers into computaNon to preserve worker privacy
True labels μ1 μ2 μ3 p
1 0 1 α1, β1
1 0 0 α2, β2
0 0 0 α3, β3
AbiliNes
Public
Private values of each worker
Proposed Method: Algorithm
■ Worker-‐Private Latent Class Protocol □ Parameters: {μi}, p, {αj}, {βj} □ E-‐Step: ParNes update true labels using secure sum
18/09/14 ECML/PKDD 2014 15
Incorporate workers into computaNon to preserve worker privacy
True labels μ1 μ2 μ3 p
1 0 1 α1, β1
1 0 0 α2, β2
0 0 0 α3, β3
AbiliNes
Weighted majority vote of crowd labels
Proposed Method: Algorithm
■ Worker-‐Private Latent Class Protocol □ Parameters: {μi}, p, {αj}, {βj} □ M-‐Step: Each worker independently updates abiliNes
18/09/14 ECML/PKDD 2014 16
Incorporate workers into computaNon to preserve worker privacy
True labels μ1 μ2 μ3 p
1 0 1 α1, β1
1 0 0 α2, β2
0 0 0 α3, β3
AbiliNes
Checking agreement
Proposed Method: Security Analysis
■ CondiNons □ #(workers) ≧ 3 □ For each instance, there exist at least one worker who does not give a label to the instance.
18/09/14 ECML/PKDD 2014 17
Making true labels public does not invade worker privacy
Aoer execuNng the protocol, each worker’s labels and abiliNes are kept worker-‐private.
Theorem
Outline
■ IntroducNon & ExisNng Work □ Crowdsourcing: Outsourcing to unspecified people □ Quality control: Quality of results is variable
■ Proposed Problem SeVng □ Worker privacy: SensiNve info of workers can be inferred □ Worker-‐private quality control problem
■ Proposed Method □ ExisNng Quality control method + secure computaNon
■ Experiments □ Accuracy: Validate approximaNon in secure computaNon □ Computa=on =me: Validate computaNonal overhead
18/09/14 ECML/PKDD 2014 18
Propose & address a worker privacy problem in crowdsourcing
Experiments: Overview
■ Cons of secure computaNon 1) Approxima=on:
• Secure sum protocol works only on integers
• Use approximaNon parameter L to convert as vj -‐> round(L vj) 2) Computa=on Time:
• Cryptographic (& communicaNon) overhead
■ Data Set □ Duchenne Data Set: [Whitehill+,09]
• Judge fake smile or not • #(workers)=20, #(instances)=159
18/09/14 ECML/PKDD 2014 19
Evaluate two drawbacks of introducing secure computaNon
Cited from [Whitehill+,09]
worker j ’s value
Large number
Experiments: (1) ApproximaNon Accuracy
■ RelaNve Errors of EsNmated Parameters □ Compare esNmated model parameters w/ & w/o secure comp. □ Approx. parameter L can control errors arbitrarily □ Note: Accuracy of the true labels was the same as the original
18/09/14 ECML/PKDD 2014 20
EsNmaNon errors can be handled by approximaNon parameter L
Approx. Parameter L
Experiments: (2) ComputaNon Time
■ Cryptographic Overhead □ Key generaNon □ One iteraNon of the algorithm (encrypNon & decrypNon)
0.8 sec on the real data set (#(workers)=20, #(instances)=159, #(iteraNons)=15)
18/09/14 ECML/PKDD 2014 21
AddiNonal computaNon Nme on a real data set was less than a second
#(workers)
Conclusion
■ ContribuNons of Our Work □ No=on of worker privacy
• Workers’ sensiNve informaNon can leak from their answers
□ WPLC protocol • Introducing secure computaNon into the LC method • Security is theoreNcally guaranteed
□ Experiments • Accuracy can be controlled by a hyperparameter • ComputaNon Nme is tolerable
18/09/14 ECML/PKDD 2014 22
We proposed the noNon of worker privacy