23
Preserving Worker Privacy in Crowdsourcing Hiroshi Kajino 1 , Hiromi Arai 2 , Hisashi Kashima 3 1. The University of Tokyo, 2. RIKEN, 3. Kyoto University 1 18/09/14 ECML/PKDD 2014

Preserving Worker Privacy in Crowdsourcing

Embed Size (px)

Citation preview

Preserving  Worker  Privacy  in  Crowdsourcing

Hiroshi  Kajino1,  Hiromi  Arai2,  Hisashi  Kashima3  1.  The  University  of  Tokyo,  2.  RIKEN,  3.  Kyoto  University  

1 18/09/14 ECML/PKDD 2014

Outline

■  IntroducNon  &  ExisNng  Work  □  Crowdsourcing:  Outsourcing  to  unspecified  people  □  Quality  control:  Quality  of  results  is  variable  

■  Proposed  Problem  SeVng  □ Worker  privacy:  SensiNve  info  of  workers  can  be  inferred  □ Worker-­‐private  quality  control  problem  

■  Proposed  Method  □  ExisNng  Quality  control  method  +  secure  computaNon  

■  Experiments  □  Accuracy:  Validate  approximaNon  in  secure  computaNon  □  Computa=on  =me:  Validate  computaNonal  overhead

18/09/14 ECML/PKDD 2014 2

Propose  &  address  a  worker  privacy  problem  in  crowdsourcing

Outline

■  IntroducNon  &  ExisNng  Work  □  Crowdsourcing:  Outsourcing  to  unspecified  people  □  Quality  control:  Quality  of  results  is  variable  

■  Proposed  Problem  SeVng  □ Worker  privacy:  SensiNve  info  of  workers  can  be  inferred  □ Worker-­‐private  quality  control  problem  

■  Proposed  Method  □  ExisNng  Quality  control  method  +  secure  computaNon  

■  Experiments  □  Accuracy:  Validate  approximaNon  in  secure  computaNon  □  Computa=on  =me:  Validate  computaNonal  overhead

18/09/14 ECML/PKDD 2014 3

Propose  &  address  a  worker  privacy  problem  in  crowdsourcing

Research  Target

■  Crowdsourcing  □  Pros:  Easy  to  use  at  low  costs  

•  Industry:  Reduce  financial/Nme  costs  for  outsourcing  

•  Academy:  Trigger  of  new  AI  research  areas  (human  computaNon)  

□  Cons:  Quality  issue,  privacy  issues,  etc.  

4

Crowdsourcing  is  a  method  to  outsource  tasks  to  unspecified  workers  

18/09/14 ECML/PKDD 2014

Worker  Requester  

overlooks  inquiry

1.  Submit  instances

2.  Return  answers

(h]p://www.captcha.net/)  

ExisNng  Work

■  Quality  of  answers  depends  on  abiliNes  of  workers  □  CollecNng  labels  from  mulNple  workers  is  necessary  

■  Quality  control  problem  (in  a  labeling  task)  □  Input:  Crowd  labels  {yij ∈ {0,1} | i = 1,..., I, j = 1,..., J} □  Output:  EsNmated  true  labels  {yi ∈ {0,1} | i = 1,..., I}    

18/09/14 ECML/PKDD 2014 5

EsNmate  ground  truth  labels  by  aggregaNng  mulNple  workers’  answers

Task  example:    Label  an  image    whether  it  contains    a  bird  or  not  

1 =  Bird  0 =  Not  Bird

instance  i 1 1 0 0 1 0

0 0

0 ? ?

?

Ground  truth

worker  j

ExisNng  Work

■  Latent  Class  Method  [Dawid  &  Skene,  1979]  □ Model:  Latent  class  model  

•  p = Pr[yi = 1]: Prob.  of  true  label = 1 •  αj = Pr[yij = yi | yi = 1] •  βj = Pr[yij = yi | yi = 0] •  I, J: #(Instance),  #(Worker)  

□  Inference:  Given  {yij},  esNmate  {yi}, {αj, βj}, p •  E-­‐step:  EsNmate  {yi},  fixing  {αj, βj}, p •  M-­‐step:  EsNmate  {αj, βj}, p,  fixing  {yi}

18/09/14 ECML/PKDD 2014 6

EsNmate  consensus  labels  by  inferring  worker  models

AbiliNes  of  worker j

yi yij p αj J I

βj

Outline

■  IntroducNon  &  ExisNng  Work  □  Crowdsourcing:  Outsourcing  to  unspecified  people  □  Quality  control:  Quality  of  results  is  variable  

■  Proposed  Problem  SeVng  □ Worker  privacy:  SensiNve  info  of  workers  can  be  inferred  □ Worker-­‐private  quality  control  problem  

■  Proposed  Method  □  ExisNng  Quality  control  method  +  secure  computaNon  

■  Experiments  □  Accuracy:  Validate  approximaNon  in  secure  computaNon  □  Computa=on  =me:  Validate  computaNonal  overhead

18/09/14 ECML/PKDD 2014 7

Propose  &  address  a  worker  privacy  problem  in  crowdsourcing

Worker  Privacy  Issue

■  SensiNve  informaNon  in  answers  □  Loca=on  

•  AED4  collects  locaNons  of  AEDs  in  a  map  •  Movement  history  of  a  worker  is  revealed  

□  Personal  Informa=on  in  Ques=onnaire  Task  •  Interest  of  workers,  personal  informaNon  (quasi-­‐idenNfier)  •  Joining  other  data  sets  can  idenNfy  anonymous  workers  

□  Ability    •  Quality  control  methods  reveal  the  ability  of  a  worker  •  DemoNvate  to  join  in  volunteer-­‐based  crowdsourcing

18/09/14 ECML/PKDD 2014 8

Simply  passing  answers  to  the  requester  can  invade  worker  privacy

Our  Problem  SeVng

■  Worker-­‐Private  Quality  Control  Problem  □  Input:  Crowd  labels  {yij | i = 1,..., I, j = 1,..., J} □  Output:  EsNmated  true  labels  {yi | i = 1,..., I} □  Subject  to:  Labels  and  abiliNes  are  kept  worker-­‐private  

                 cf.  Similar  def  can  be  found  in  query  audiNng  

18/09/14 ECML/PKDD 2014 9

We  propose  a  worker-­‐private  quality  control  problem

Worker  j’s  vj  is  worker-­‐private  if  others  cannot  determine  vj  uniquely  

Defini=on  

Outline

■  IntroducNon  &  ExisNng  Work  □  Crowdsourcing:  Outsourcing  to  unspecified  people  □  Quality  control:  Quality  of  results  is  variable  

■  Proposed  Problem  SeVng  □ Worker  privacy:  SensiNve  info  of  workers  can  be  inferred  □ Worker-­‐private  quality  control  problem  

■  Proposed  Method  □  ExisNng  Quality  control  method  +  secure  computaNon  

■  Experiments  □  Accuracy:  Validate  approximaNon  in  secure  computaNon  □  Computa=on  =me:  Validate  computaNonal  overhead

18/09/14 ECML/PKDD 2014 10

Propose  &  address  a  worker  privacy  problem  in  crowdsourcing

Proposed  Method:  Overview

■  Worker-­‐Private  Latent  Class  Protocol  □ Model:  Latent  class  model  (same  as  the  previous  one)  □  Secure  Inference:  

•  E-­‐step:  Requester  &  workers  esNmate  {yi}  by  secure  computaNon  

•  M-­‐step:  Each  worker  updates  αj,  βj  secretly  

18/09/14 ECML/PKDD 2014 11

Propose  a  privacy-­‐preserving  inference  algorithm  for  LC  model

secure  computaNon  

Workers  keep    their  answers  secret

Requester  obtains  true  answers

New!

Proposed  Method:  Building  Block

■  Secure  Sum  Protocol  (Generalized  Paillier  cryptosystem  [Damgård+,01])  

Compute  Σj vj when  each  worker  j has  value  vj secretly  □  Addi=ve  Homomorphic  Cryptosystem:  

For  plaintexts  v1, v2  ∈  Zn and  ciphertexts  Enc(v1), Enc(v2),     Enc(v1 + v2) = Enc(v1)・Enc(v2) holds  

□  Protocol:  1)  Each  worker  j computes  Enc(vj),  and  parNes  compute  Enc(Σj vj)    2)  ParNes  decrypt  Enc(Σj vj) using  distributed  secret  keys  

18/09/14 ECML/PKDD 2014 12

Secure  sum  allows  us  to  compute  the  sum  without  privacy  invasion

Aoer  execuNng  the  protocol,  any  party  learns  nothing  other  than  their  iniNal  knowledge  &  the  sum.  

Lemma  

Proposed  Method:  Algorithm

■  Worker-­‐Private  Latent  Class  Protocol  □  Parameters:  {μi}, p, {αj}, {βj}  

•  μi = Pr[yi = 1 | Data], p = Pr[yi = 1] •  αj = Pr[yij = yi | yi = 1], βj = Pr[yij = yi | yi = 0]

18/09/14 ECML/PKDD 2014 13

Incorporate  workers  into  computaNon  to  preserve  worker  privacy

True  labels μ1 μ2 μ3 p

1 0 1 α1, β1

1 0 0 α2, β2

0 0 0 α3, β3

AbiliNes

Proposed  Method:  Algorithm

■  Worker-­‐Private  Latent  Class  Protocol  □  Parameters:  {μi}, p, {αj}, {βj}  

•  μi = Pr[yi = 1 | Data], p = Pr[yi = 1] •  αj = Pr[yij = yi | yi = 1], βj = Pr[yij = yi | yi = 0]

18/09/14 ECML/PKDD 2014 14

Incorporate  workers  into  computaNon  to  preserve  worker  privacy

True  labels μ1 μ2 μ3 p

1 0 1 α1, β1

1 0 0 α2, β2

0 0 0 α3, β3

AbiliNes

Public

Private  values  of  each  worker

Proposed  Method:  Algorithm

■  Worker-­‐Private  Latent  Class  Protocol  □  Parameters:  {μi}, p, {αj}, {βj}  □  E-­‐Step:  ParNes  update  true  labels  using  secure  sum

18/09/14 ECML/PKDD 2014 15

Incorporate  workers  into  computaNon  to  preserve  worker  privacy

True  labels μ1 μ2 μ3 p

1 0 1 α1, β1

1 0 0 α2, β2

0 0 0 α3, β3

AbiliNes

Weighted  majority  vote  of  crowd  labels

Proposed  Method:  Algorithm

■  Worker-­‐Private  Latent  Class  Protocol  □  Parameters:  {μi}, p, {αj}, {βj}  □ M-­‐Step:  Each  worker  independently  updates  abiliNes

18/09/14 ECML/PKDD 2014 16

Incorporate  workers  into  computaNon  to  preserve  worker  privacy

True  labels μ1 μ2 μ3 p

1 0 1 α1, β1

1 0 0 α2, β2

0 0 0 α3, β3

AbiliNes

Checking  agreement

Proposed  Method:  Security  Analysis

■  CondiNons  □  #(workers)  ≧  3  □  For  each  instance,  there  exist  at  least  one  worker  who  does  not  give  a  label  to  the  instance.  

18/09/14 ECML/PKDD 2014 17

Making  true  labels  public  does  not  invade  worker  privacy

Aoer  execuNng  the  protocol,  each  worker’s  labels  and  abiliNes  are  kept  worker-­‐private.  

Theorem  

Outline

■  IntroducNon  &  ExisNng  Work  □  Crowdsourcing:  Outsourcing  to  unspecified  people  □  Quality  control:  Quality  of  results  is  variable  

■  Proposed  Problem  SeVng  □ Worker  privacy:  SensiNve  info  of  workers  can  be  inferred  □ Worker-­‐private  quality  control  problem  

■  Proposed  Method  □  ExisNng  Quality  control  method  +  secure  computaNon  

■  Experiments  □  Accuracy:  Validate  approximaNon  in  secure  computaNon  □  Computa=on  =me:  Validate  computaNonal  overhead

18/09/14 ECML/PKDD 2014 18

Propose  &  address  a  worker  privacy  problem  in  crowdsourcing

Experiments:  Overview

■  Cons  of  secure  computaNon  1)  Approxima=on:  

•  Secure  sum  protocol  works  only  on  integers  

•  Use  approximaNon  parameter  L  to  convert  as  vj  -­‐>  round(L vj) 2)  Computa=on  Time:  

•  Cryptographic  (&  communicaNon)  overhead  

■  Data  Set  □  Duchenne  Data  Set:  [Whitehill+,09]  

•  Judge  fake  smile  or  not  •  #(workers)=20,  #(instances)=159

18/09/14 ECML/PKDD 2014 19

Evaluate  two  drawbacks  of  introducing  secure  computaNon

Cited  from  [Whitehill+,09]  

worker  j ’s  value

Large  number

Experiments:  (1)  ApproximaNon  Accuracy

■  RelaNve  Errors  of  EsNmated  Parameters  □  Compare  esNmated  model  parameters  w/  &  w/o  secure  comp.  □  Approx.  parameter  L  can  control  errors  arbitrarily  □  Note:  Accuracy  of  the  true  labels  was  the  same  as  the  original  

18/09/14 ECML/PKDD 2014 20

EsNmaNon  errors  can  be  handled  by  approximaNon  parameter  L

Approx.  Parameter  L  

Experiments:  (2)  ComputaNon  Time

■  Cryptographic  Overhead  □  Key  generaNon  □  One  iteraNon  of  the  algorithm  (encrypNon  &  decrypNon)  

0.8  sec  on  the  real  data  set  (#(workers)=20,  #(instances)=159,  #(iteraNons)=15)  

18/09/14 ECML/PKDD 2014 21

AddiNonal  computaNon  Nme  on  a  real  data  set  was  less  than  a  second

#(workers)  

Conclusion

■  ContribuNons  of  Our  Work  □  No=on  of  worker  privacy  

•  Workers’  sensiNve  informaNon  can  leak  from  their  answers  

□ WPLC  protocol  •  Introducing  secure  computaNon  into  the  LC  method  •  Security  is  theoreNcally  guaranteed  

□  Experiments  •  Accuracy  can  be  controlled  by  a  hyperparameter  •  ComputaNon  Nme  is  tolerable

18/09/14 ECML/PKDD 2014 22

We  proposed  the  noNon  of  worker  privacy

QuesNons?

18/09/14 ECML/PKDD 2014 23