Upload
aleesha-melton
View
215
Download
0
Tags:
Embed Size (px)
Citation preview
Crowdsourcing with Multi-Dimensional Trust
Xiangyang Liu1, He He2, and John S. Baras1
1Institute for Systems Research and
Department of Electrical and Computer Engineering
University of Maryland, College Park, MD2Deptment of Computer Science, University of Maryland, College
Park, MD
Crowdsourcing Background
Crowdsourcing Assignment Engine
Malicious workers
More reliable workers
Pure experts
Amazon Turkers
Trust Evaluation
True Label Inference
clients
Upload tasks
Estimated answers
Motivation• Tasks on crowdsourcing markets like Amazon Mechanical Turk often
require knowledge in widely-ranging domains.• Workers have different level of reliability in different domains.
Goal: design algorithm to jointly evaluate workers’ trust values in each of the domains and at the same time estimate true labels for classification crowdsourcing tasks.
Task
politics
sports
fashion
worker[good, bad, bad]
worker
worker[bad, good, bad]
[bad, bad, good]
Notations
Domain distribution for question i
Domain for question i
Truth label for question i. Take value from {0, 1}
Trust vector for worker j
Answer given by worker j to question i. Takes value from {0, 1}
Hyper parameter of the Dirichlet prior on domain distribution.
Parameter of the beta prior on trust of workers
Probability that question i is associated with lth domain
Trust value for worker j in domain l. Takes value from [0,1]
Probabilistic Graphical Model: No Feature
Compute posterior probability for trust and true label.
Inference and Estimation
Obtain the approximate posterior distributions by maximizing the lower bound of the log likelihood:
We update the trust and true labels as below:
Probabilistic Graphical Model With Features
Compute posterior probability for trust and true label.
Inference and Estimation
Obtain the approximate posterior distributions by maximizing the lower bound of the log likelihood:
E-Step: given current model parameter estimation, , obtain approximate posterior q.
M-Step: given current posterior q, calculate the new model parameter estimation by maximizing lower bound
Probabilistic Graphical Model With Topic Models
Multi-dimension trust crowdsourcing Topic model
Inference and EstimationAlternatively update approximate posterior distribution for different hidden variables:
ExperimentsWorker Type Domain 0 Domain 1
Type 1 0.5 0.5
Type 2 0.95 0.5
Type 3 0.5 0.95
Type 4 0.95 0.95
Pima MV SDC MDFC MDC
(1,2,2,1) 0.098 0.040 0.009 N/A
(2,2,2,1) 0.103 0.042 0.009 N/A
(3,2,2,1) 0.150 0.042 0.008 N/A
(1,2,2,1)NF 0.098 0.040 N/A 0.039
(2,2,2,1)NF 0.103 0.042 N/A 0.043
(3,2,2,1)NF 0.150 0.042 N/A 0.041
Experiments
Experiments
Scientific Text MV MDC MDTCT4 0.181 0.095 0.044T6 0.160 0.089 0.037T8 0.141 0.082 0.034
T10 0.125 0.074 0.032T12 0.116 0.069 0.032T14 0.100 0.064 0.032
Tested model on 1000 scientific text annotated by five workers. Each worker answers whether a given sentence contains contradicting statements. Each sentence has the text data along with the labels provided by the five experts.
We simulate D workers in total where worker j answers questions from topic j perfectly and answers questions from topics other than j close to randomly.
ExperimentsTo show that MDTC can recover workers’ trust in each of the domains, we plot the mean trust value of 8 workers in each of the eight domains.
Conclusions
• Formulated a probabilistic graphical model with multi-dimensional characteristics and provided novel inference method based on variational inference. (MDC)
• The model is flexible and easily extensible to incorporate feature values. (MDFC)
• We extended MDC with topic discovery based on questions’ text descriptions and derive an analytical solution to the collection variational inference.
Thank you