16
Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1 , He He 2 , and John S. Baras 1 1 Institute for Systems Research and Department of Electrical and Computer Engineering University of Maryland, College Park, MD 2 Deptment of Computer Science, University of Maryland, College Park, MD

Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical

Embed Size (px)

Citation preview

Page 1: Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical

Crowdsourcing with Multi-Dimensional Trust

Xiangyang Liu1, He He2, and John S. Baras1

1Institute for Systems Research and

Department of Electrical and Computer Engineering

University of Maryland, College Park, MD2Deptment of Computer Science, University of Maryland, College

Park, MD

Page 2: Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical

Crowdsourcing Background

Crowdsourcing Assignment Engine

Malicious workers

More reliable workers

Pure experts

Amazon Turkers

Trust Evaluation

True Label Inference

clients

Upload tasks

Estimated answers

Page 3: Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical

Motivation• Tasks on crowdsourcing markets like Amazon Mechanical Turk often

require knowledge in widely-ranging domains.• Workers have different level of reliability in different domains.

Goal: design algorithm to jointly evaluate workers’ trust values in each of the domains and at the same time estimate true labels for classification crowdsourcing tasks.

Task

politics

sports

fashion

worker[good, bad, bad]

worker

worker[bad, good, bad]

[bad, bad, good]

Page 4: Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical

Notations

Domain distribution for question i

Domain for question i

Truth label for question i. Take value from {0, 1}

Trust vector for worker j

Answer given by worker j to question i. Takes value from {0, 1}

Hyper parameter of the Dirichlet prior on domain distribution.

Parameter of the beta prior on trust of workers

Probability that question i is associated with lth domain

Trust value for worker j in domain l. Takes value from [0,1]

Page 5: Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical

Probabilistic Graphical Model: No Feature

Compute posterior probability for trust and true label.

Page 6: Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical

Inference and Estimation

Obtain the approximate posterior distributions by maximizing the lower bound of the log likelihood:

We update the trust and true labels as below:

Page 7: Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical

Probabilistic Graphical Model With Features

Compute posterior probability for trust and true label.

Page 8: Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical

Inference and Estimation

Obtain the approximate posterior distributions by maximizing the lower bound of the log likelihood:

E-Step: given current model parameter estimation, , obtain approximate posterior q.

M-Step: given current posterior q, calculate the new model parameter estimation by maximizing lower bound

Page 9: Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical

Probabilistic Graphical Model With Topic Models

Multi-dimension trust crowdsourcing Topic model

Page 10: Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical

Inference and EstimationAlternatively update approximate posterior distribution for different hidden variables:

Page 11: Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical

ExperimentsWorker Type Domain 0 Domain 1

Type 1 0.5 0.5

Type 2 0.95 0.5

Type 3 0.5 0.95

Type 4 0.95 0.95

Pima MV SDC MDFC MDC

(1,2,2,1) 0.098 0.040 0.009 N/A

(2,2,2,1) 0.103 0.042 0.009 N/A

(3,2,2,1) 0.150 0.042 0.008 N/A

(1,2,2,1)NF 0.098 0.040 N/A 0.039

(2,2,2,1)NF 0.103 0.042 N/A 0.043

(3,2,2,1)NF 0.150 0.042 N/A 0.041

Page 12: Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical

Experiments

Page 13: Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical

Experiments

Scientific Text MV MDC MDTCT4 0.181 0.095 0.044T6 0.160 0.089 0.037T8 0.141 0.082 0.034

T10 0.125 0.074 0.032T12 0.116 0.069 0.032T14 0.100 0.064 0.032

Tested model on 1000 scientific text annotated by five workers. Each worker answers whether a given sentence contains contradicting statements. Each sentence has the text data along with the labels provided by the five experts.

We simulate D workers in total where worker j answers questions from topic j perfectly and answers questions from topics other than j close to randomly.

Page 14: Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical

ExperimentsTo show that MDTC can recover workers’ trust in each of the domains, we plot the mean trust value of 8 workers in each of the eight domains.

Page 15: Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical

Conclusions

• Formulated a probabilistic graphical model with multi-dimensional characteristics and provided novel inference method based on variational inference. (MDC)

• The model is flexible and easily extensible to incorporate feature values. (MDFC)

• We extended MDC with topic discovery based on questions’ text descriptions and derive an analytical solution to the collection variational inference.

Page 16: Crowdsourcing with Multi- Dimensional Trust Xiangyang Liu 1, He He 2, and John S. Baras 1 1 Institute for Systems Research and Department of Electrical

Thank you