ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking by Gianluca Demartini, Djellel Eddine Difallah,

ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking by Gianluca Demartini, Djellel Eddine Difallah, and Philippe Cudr-Mauroux eXascale Infolab U. of FribourgSwitzerland {firstname.lastname}@unifr.ch Pick-A-Crowd: Tell Me What You Like, and Ill Tell You What to Do by Djellel Eddine Difallah, Gianluca Demartini, and Philippe Cudr-Mauroux eXascale Infolab U. of FribourgSwitzerland Presented by: Muhammad Nuruddin, student ID: 2961230, email address: [email protected], Internet Technologies and Information Systems(ITIS), M.Sc. 4 th Semester Leibniz Universitt Hannover Course details: Advanced Methods of Information Retrieval By: Dr. Elena Demidova leibniz universitt hannover Presentation on the papers 1

Entity Linking 2 Entity linking Algorithm ( Probabilistic Reasoning based ) Entity Linking: A suggested way to automate the construction of a semantic web

Example: Wikipedia provide annotated pages Military Germany Pacific Ocean Historical Incidence France

Crowdsourcing Obtaining services, ideas, or content by asking contributions from a large group of people, and especially from an online community. Example: - Wikipedia = Wiki + encyclopedia = quick + encyclopedia - IMDB movie top chart. - AMT ( AmazonMechanicalturk ) 4

Paper1: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking 5 Entity linking Algorithm ( Probabilistic Reasoning based ) Crowdsourcing Improvement between 4% and 35%

Current techniques of Entity Linking Entity Linking is known to be extremely challenging, since parsing and disambiguating natural language text is still extremely difficult for machines. The current matching techniques: Algorithmic Matching: Mostly based on probabilistic reasoning (e.g. TF-IDF based). Not fully reliable as human manual matching. Manual Matching: Fully reliable. Costly and time consuming. e.g. New York Times (NYT) employs a whole team whose sole responsibility is to manually create links from news articles to NYT identifiers. This paper represents a step towards bridging the gap between those two classes. 6

7 System Architecture The results of algorithmic matching are stored in a probabilistic network: Decision Engine decides: 1.If results has very high probability value, it is directly linked to the entity. 2.If results have very low confidence value, it is discarded and ignored. 3.Promising but uncertain valued entities are passed to Micro-Task Manager to crowdsource the problem and make a decision.

8 System Architecture After getting vote from Crowdsourcing platform, all information gathered both from the algorithmic matchers and the crowd are fed into a scalable probabilistic store, and used by their decision engine to process all entities accordingly. Lets have a look on decision engines mechanism to take a decision.

Example scenario 9 CountryJordan River Berkeley professor CountryJordan Entities River After the UNC workshop, Jordan gave a tutorial on nonparametric Bayesian methods. Worker W 1 Worker W 2 l1l1 l2l2 l3l3 HTML page doc. 1 C 11 Reliability factor Pw 1 () Good, or Bad Reliability factor Pw 1 () Good, or Bad C 12 C 13 C 21 C 22 C 23 pl j Probability of l j computed from algorithmic matches. pl 1 pl 2 pl 3 LOD cloud

Decision Engine uses Factor-Graph Factor-Graph can deal with a complicated global problem by viewing it as a factorization of several local functions. 10 l 1,l 2,l 3 3 candidate entities for a linking. pl j Probability of l j computed from algorithmic matches. W 1,W 2 two workers employed to check these l 1,l 2,l 3 relevancy. Pw 1, pw 2 worker w 1 and w 2 s reliability factor. Lf i () linking factor, connects l i to related clicks (e.g. C 11 ) and workers ( e.g W 1 ). Sa 1-2 () entities has SameAs link in LOD.

11 Equations used for Linking factor calculation in Factor-Graph

Reaching a Decision We will find a posterior probability for all the links running probabilistic inference in the network. Links with posterior probability > 0.5 are considered to be correct. 12

Updating Priors As much as entity linkings come to a decision, workers working profiles get updated. From the result, workers accuracy of work can be calculated. 13 Reliability factor of W 2 Reliability factor of W 1

EXPERIMENTS Experimental setup Collection consists of 25 English news articles News from CNN.com, NYTimes.com, washingtonpost.com, timesofindia.indiatimes.com, and swissinfo.com 489 entities extracted using stanford parser. Crowdsourcing was performed using Amazon Mturk 80 distince workers Precision, Recall and accuracy was measured. 14

Comparison of three matching techniques 15

Observations A Hybrid model ( Based on both automated and manual human experts)for entity linking. 4% to 35% improvement than manually optimized agreement voting approach. Average 14% improvement over best automated system. In both cases, the improvement is statistically signicant (t-test p < 0.05) Manual work makes the total annotation work significantly slow. So there are some questions about time quality tradeoff. They classified workers into {Good, Bad} manually and calculated workers reliability P(w), but did not mention any relation between these two factors. 16

End of presentation of first paper 17

Paper 2: Pick-A-Crowd: Tell Me What You Like, and Ill Tell You What to Do This paper is about a different Crowdsourcing approach based on push methodology. This new push methodology yields better results (29% more efficient) than usual pull strategies. 18 Any worker can pull any task Figure: Traditional Crowdsourcing pull strategy

Example of traditional approach: So whats wrong with this? Does not care about workers field of expertise. Not all workers are a good fit for all tasks. tasks requiring background knowledge is important. I had no idea what to answer to most questions... was a comment of a worker from AMT (Amazon Mechanical Turk). Any worker can pull any task [1] [1] https://requester.mturk.com/images/graphic_process.png?1403199990 19

So how they are going to improve it? Ranks/orders the workers according to the type of the work and skill of the workers and pushes the work to the most suitable workers. At first they constructs user models for each workers in the crowd in order to assign HITs ( Human Intelligence Tasks) to the most suitable available worker. User model/user profile is built based on his social network usage, his fields of interest. 20

So how this system ranks/orders the workers? Recommender system Assigning HITs to workers is similar to the task performed by recommender systems. The recommender systems matches HITs (Human Intelligence tasks) to human workers (i.e. users) profiles that describe worker interests and skills. Then the system generates a ranking of candidate workers who can do the work better. 21

System Overview 22

Workflow of the system Calculate work difficulty: - Every work is different from other works. HIT Difficulty Assessor takes each HIT and determines a complexity score for it. Assess Worker skill: - System create workers profile considering his liked pages and previously work experiences. Calculate Reward for the work: - As every work is different and every workers ability differs from work to work, Rewards for different works and workers are different. System calculates rewards considering these factors. Assign works to top-k suitable candidates: - Recommender system finds k top most suitable candidates and assign (pushes ) the work only these n workers. 23

Calculate work difficulty 3 different possible algorithms 1.Text Compare: Compare the textual description of the task with the skill description of each worker and assess the difficulty. 2.LOD(Linked Open Data) entity based: Each Facebook page liked by the workers can be linked to its respective LOD entities. Then the set of entities related to HITs and the set of entities representing the interests of the crowd can be directly compared. The task is classified as difficult when the entities involved in the task heavily differ from the entities liked by the crowd. 3.Machine Learning based: A classifier trained by means of previously completed tasks, their description and their result accuracy. The description of a new task is given as a test vector to the classifier. 24

4 possible way of Reward Estimation Input: A monetary budget B, HIT h i. 1.Rewarding the same amount of money for each task of the same type. 2.Taking into account the difficulty d() of the HIT h. 3.Computing a reward based on both the specific HIT as well as the worker skill who performs it. 4.Game theoretic based approaches to compute the optimal reward for paid crowdsourcing incentives in the presence of workers who collude in order to game the system. 25

Worker Profile Selector This module uses the similarity measure that used for matching workers to tasks. The entities included in the workers profiles can be considered. The Facebook categories of their liked pages also plays significant role. A generic Similarity measurement equation is A = set of candidate answers for task h i sim() = similarity between the worker profile and the task description. 3 Assignment models for hit ( Human Intelligence task) assignemnt. 26

HIT ASSIGNMENT MODELS Category-based Assignment Model Tasks are assigned according to Facebook pages or page categories. (e.g. Entertainment -> Movie ) Requestor mentions the category of the task. Expert Profiling Assignment Model Scoring function is based on a voting model. Voting model is based on no. of pages related to the tasks and no. of pages user liked and how many are common. Semantic-Based Assignment Model Answers and liked pages are linked to entities and Underlying graph structure is used to measure the distance ( similarity). Example SPARQL 27

Example: Expert Finding Voting Model Figure: An example of the Expert Finding Voting Model. The final ranking identifies worker A as the top worker as he likes the most pages related to the query 28

29 Any worker can pull any task Hit Assigner assigns tasks to suitable workers Summary of the system

Experimental Evaluation Experimental Setting: 170 workers Overall, more than 12K distinct liked Facebook pages workers have been recruited via Amazon Mturk. Task categories: actors, soccer players, anime characters, movie actors, movie scenes, music bands, and questions related to cricket 50 images/category. Precision, Recall over majority votes obtained over 3 or 5 workers. 30

Figure: Crowd performance on the cricket task. Square points indicate the 5 workers selected by their proposed system. The best worker performing at 0.9 Precision and 0.9 Recall. 31

Figure: OpenTurk worker Accuracy vs no. of relevant Pages a worker likes. Observations: More relevant pages in the worker profile (e.g., >30), more accuracy. 32

Table: Average Accuracy for different HIT assignment models assigning each HIT to 3 and 5 workers. AMT Amazon Mechanical Turk. Cagegory-based Category of liked page and category of task based comparison. En. type 3/5 - Entity graph Entity type in the DBPedia knowledge base assigning each HIT to 3 and 5 workers. Voting Model t i Voting model based on page text relevant to the task. Voting Model A i Voting model based on all possible answer based similarity. 1-step Considers directly related entities within one step in the graph. Result is based on 320 questions. Voting Model t i achieves 29% relative improvement over the best accuracy obtained by the AMT model 33

Observations May lead to longer task completion times Real time annotation is not possible. But obtaining high-quality answers is more important rather than getting real-time data in most of the cases. 34

Thank you! Any Question? 35

Documents

ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking by Gianluca Demartini, Djellel Eddine Difallah,