21
ZenCrowd Leveraging Probabilis3c Reasoning and Crowdsourcing Techniques for LargeScale En3ty Linking Gianluca Demar3ni, Djellel Eddine Difallah, and Philippe CudréMauroux eXascale Infolab, University of Fribourg Switzerland

ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

Embed Size (px)

DESCRIPTION

Conference talk at WWW 2012, , April 2012, Lyon, France

Citation preview

Page 1: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

ZenCrowd  Leveraging  Probabilis3c  Reasoning  and  

Crowdsourcing  Techniques  for  Large-­‐Scale  En3ty  Linking    

 Gianluca  Demar3ni,  Djellel  Eddine  

Difallah,  and  Philippe  Cudre-­‐Mauroux  eXascale  Infolab,  University  of  Fribourg  

Switzerland  

Page 2: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

MoFvaFon  

•  Linked  Open  Data  (LOD)  •  Linking  enFty  from  text  to  LOD  – Rich  snippets  – EnFty-­‐centric  search  

•  Linking  – Algorithmic  – Manual  (NYT  arFcles)  

HTML+ RDFaPages

LOD Cloud

22-­‐Apr-­‐12   Gianluca  DemarFni,  eXascale  Infolab   2  

Page 3: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

Gianluca  DemarFni,  eXascale  Infolab   3  

hVp://dbpedia.org/resource/Facebook  

hVp://dbpedia.org/resource/Instagram  

Zase:Instagram  owl:sameAs  

Google  

Android  

<p>Facebook  is  not  waiFng  for  its  iniFal  public  offering  to  make  its  first  big  purchase.</p><p>In  its  largest  acquisiFon  to  date,  the  social  network  has  purchased  Instagram,  the  popular  photo-­‐sharing  applicaFon,  for  about  $1  billion  in  cash  and  stock,  the  company  said  Monday.</p>  

<p><span  about="hVp://dbpedia.org/resource/Facebook"><cite  property=”rdfs:label">Facebook</cite>  is  not  waiFng  for  its  iniFal  public  offering  to  make  its  first  big  purchase.</span></p><p><span  about="hVp://dbpedia.org/resource/Instagram">In  its  largest  acquisiFon  to  date,  the  social  network  has  purchased  <cite  property=”rdfs:label">Instagram</cite>  ,  the  popular  photo-­‐sharing  applicaFon,  for  about  $1  billion  in  cash  and  stock,  the  company  said  Monday.</span></p>  

RDFa  enrichment  

HTML:  

Page 4: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

Crowdsourcing  

•  Exploit  human  intelligence  to  solve  – Tasks  simple  for  humans,  complex  for  machines  – With  a  large  number  of  humans  (the  Crowd)  – Small  problems:  micro-­‐tasks  (Amazon  MTurk)  

•  Examples  – Wikipedia,  Image  tagging  

•  IncenFves  – Financial,  fun,  visibility  

22-­‐Apr-­‐12   Gianluca  DemarFni,  eXascale  Infolab   4  

Page 5: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

ZenCrowd  

•  Combine  both  algorithmic  and  manual  linking  •  Automate  manual  linking  via  crowdsourcing  •  Dynamically  assess  human  workers  with  a  probabilisFc  reasoning  framework  

22-­‐Apr-­‐12   5  

Crowd  

Algorithms  Machines  

Page 6: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

Outline  

•  Related  approaches  •  ZenCrowd:  System  architecture  •  ProbabilisFc  model  to  combine  automaFc  matching  and  crowdsourcing  results  

•  Experimental  evaluaFon  

22-­‐Apr-­‐12   Gianluca  DemarFni,  eXascale  Infolab   6  

Page 7: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

Related  Approaches  

•  EnFty  Linking  •  Ad-­‐hoc  Object  Retrieval  – Keyword  queries  – Looking  for  a  specific  enFty  –  IR  indexing  and  ranking  over  RDF  data  

•  Crowdsourcing  – Training  set,  tagging,  annotaFon  –  IR  evaluaFon  – CrowdDB  

22-­‐Apr-­‐12   Gianluca  DemarFni,  eXascale  Infolab   7  

Page 8: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

ZenCrowd  Architecture  

Micro Matching

Tasks

HTMLPages

HTML+ RDFaPages

LOD Open Data Cloud

CrowdsourcingPlatform

ZenCrowdEntity

Extractors

LOD Index Get Entity

Input Output

Probabilistic Network

Decision Engine

Micr

o-Ta

sk M

anag

er

Workers Decisions

AlgorithmicMatchers

22-­‐Apr-­‐12   Gianluca  DemarFni,  eXascale  Infolab   8  

Page 9: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

Algorithmic  Matching  

•  Inverted  index  over  LOD  enFFes  – DBPedia,  Freebase,  Geonames,  NYT  

•  TF-­‐IDF  (IR  ranking  funcFon)  •  Top  ranked  URIs  linked  to  enFFes  in  docs  

•  Threshold  on  the  ranking  funcFon  or  top  N  

22-­‐Apr-­‐12   Gianluca  DemarFni,  eXascale  Infolab   9  

Page 10: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

EnFty  Factor  Graphs  

•  Graph  components  – Workers,  links,  clicks  – Prior  probabiliFes  – Link  Factors  – Constraints  

•  ProbabilisFc  Inference  – Select  all  links  with  posterior  prob  >τ  

w1 w2

l1 l2

pw1( ) pw2( )

lf1( ) lf2( )

pl1( ) pl2( )

l3

lf3( )

pl3( )

c11 c22c12c21 c13 c23

u2-3( )sa1-2( )

2  workers,  6  clicks,  3  candidate  links  

Link  priors  

Worker  priors  

Observed  variables  

Link  factors  

SameAs  constraints  

Dataset  Unicity  constraints  

22-­‐Apr-­‐12   Gianluca  DemarFni,  eXascale  Infolab   10  

Page 11: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

EnFty  Factor  Graphs  

•  Training  phase  –  IniFalize  worker  priors  – with  k  matches  on  known  answers  

•  UpdaFng  worker  Priors  – Use  link  decision  as  new  observaFons  – Compute  new  worker  probabiliFes  

•  IdenFfy  (and  discard)  unreliable  workers  

22-­‐Apr-­‐12   Gianluca  DemarFni,  eXascale  Infolab   11  

Page 12: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

Experimental  EvaluaFon  •  Datasets  –  25  news  arFcles  from  

•  CNN.com  (Global  news)  •  NYTimes.com  (Global  news)  •  Washington-­‐post.com  (US  local  news)  •  Timesofindia.indiaFmes.com  (India  local  news)  •  Swissinfo.com  (Switzerland  local  news)  

–  40M  enFFes  (Freebase,  DBPedia,  Geonames,  NYT)  •  80  workers  on  Amazon  MTurk,  $0.01  per  enFty  •  Standard  evaluaFon  measures:  Prec,  Recall,  Acc  •  Gold  Standard:  editorial  selecFon  

22-­‐Apr-­‐12   Gianluca  DemarFni,  eXascale  Infolab   12  

Page 13: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

The  micro-­‐task  

22-­‐Apr-­‐12   Gianluca  DemarFni,  eXascale  Infolab   13  

Page 14: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

Experimental  EvaluaFon  

•  AutomaFc  Approach:  P/R  trade-­‐off  

0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"0.9"1"

1" 2" 3" 4" 5"

Precision)/)Re

call)

Top)N)Results)

P"R"

22-­‐Apr-­‐12   Gianluca  DemarFni,  eXascale  Infolab   14  

Page 15: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

Experimental  EvaluaFon  

•  EnFty  Linking  with  Crowdsourcing  and  agreement  vote  (at  least  2  out  of  5  workers  select  the  same  URI)  

       Top-­‐1  precision:  0.70  

0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"0.9"1"

0" 0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"0.9"

Precision)/)Re

call)

Matching)Probability)Threshold)

P"R"

0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"0.9"1"

1" 2" 3" 4" 5"

Precision)/)Re

call)

Top)N)Results)

P"R"

Figure 5: Performance results (Precision, Recall) forthe automatic approach.

the URIs with at least 2 votes are selected as valid links(we tried various thresholds and manually picked 2 in theend since it leads to the highest precision scores while keep-ing good recall values for our experiments). We report onthe performance of this crowdsourcing technique in Table 2.The values are averaged over all linkable entities for di↵erentdocument types and worker communities.

Table 2: Performance results for crowdsourcing withagreement vote over linkable entities.

US Workers Indian WorkersP R A P R A

GL News 0.79 0.85 0.77 0.60 0.80 0.60US News 0.52 0.61 0.54 0.50 0.74 0.47IN News 0.62 0.76 0.65 0.64 0.86 0.63SW News 0.69 0.82 0.69 0.50 0.69 0.56All News 0.74 0.82 0.73 0.57 0.78 0.59

The first question we examine is whether there is a di↵er-ence in reliability between the various populations of work-ers. In Figure 6 we show the performance for tasks per-formed by workers located in USA and India (each pointcorresponds to the average precision and recall over all en-tities in one document). On average, we observe that tasksperformed by workers located in the USA lead to higherprecision values. As we can see in Table 2, Indian workersobtain higher precision and recall on local Indian news ascompared to US workers. The biggest di↵erence in terms ofaccuracy between the two communities can be observed onthe global interest news.

0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"0.9"1"

0" 0.2" 0.4" 0.6" 0.8" 1"

Recall&

Precision&

US"India"

Figure 6: Per document task e↵ectiveness.

A second question we examine is how the textual contextgiven for an entity influences the worker performance. InFigure 7, we compare the tasks for which only the entitylabel is given to those for which a context consisting of all

the sentences containing the entity are shown to the worker(snippets). Surprisingly, we could not observe a significantdi↵erence in e↵ectiveness caused by the di↵erent textual con-texts given to the workers. Thus, we focus on only one typeof context for the remaining experiments (we always givethe snippet context).

0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"0.9"1"

1" 2" 3" 4" 5" 6" 7" 8" 9" 10"

Precision)

Document)

Simple"

Snippet"

Figure 7: Crowdsourcing results with two di↵erenttextual contexts

Entity Linking with ZenCrowd.We now focus on the performance of the probabilistic in-

ference network as proposed in this paper. We consider themethod described in Section 4, with an initial training phaseconsisting of 5 entities, and a second, continuous trainingphase, consisting of 5% of the other entities being o↵ered tothe workers (i.e., the workers are given a task whose solutionis known by the system every 20 tasks on average).In order to reduce the number of tasks having little influ-

ence in the final results, a simple technique of blacklistingof bad workers is used. A bad worker (who can be consid-ered as a spammer) is a worker who randomly and rapidlyclicks on the links, hence generating noise in our system.In our experiments, we consider that 3 consecutive bad an-swers in the training phase is enough to identify the workeras a spammer and to blacklist him/her. We report the aver-age results of ZenCrowd when exploiting the training phase,constraints, and blacklisting in Table 3. As we can observe,precision and accuracy values are higher in all cases whencompared to the agreement vote approach.

Table 3: Performance results for crowdsourcing withZenCrowd over linkable entities.

US Workers Indian WorkersP R A P R A

GL News 0.84 0.87 0.90 0.67 0.64 0.78US News 0.64 0.68 0.78 0.55 0.63 0.71IN News 0.84 0.82 0.89 0.75 0.77 0.80SW News 0.72 0.80 0.85 0.61 0.62 0.73All News 0.80 0.81 0.88 0.64 0.62 0.76

Finally, we compare ZenCrowd to the state of the artcrowdsourcing approach (using the optimal agreement vote)and our best automatic approach on a per-task basis in Fig-ure 8. The comparison is given for each document in thetest collection. We observe that in most cases the humanintelligence contribution improves the precision of the auto-matic approach. We also observe that ZenCrowd dominates

22-­‐Apr-­‐12   Gianluca  DemarFni,  eXascale  Infolab   15  

Page 16: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

Experimental  EvaluaFon  

•  Worker  community  and  textual  context  

0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"0.9"1"

0" 0.2" 0.4" 0.6" 0.8" 1"

Recall&

Precision&

US"India"

0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"0.9"1"

1" 2" 3" 4" 5" 6" 7" 8" 9" 10"Precision)

Document)

Simple"

Snippet"

22-­‐Apr-­‐12   Gianluca  DemarFni,  eXascale  Infolab   16  

Page 17: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

Experimental  EvaluaFon  

•  Worker  SelecFon  

Top$US$Worker$

0$

0.5$

1$

0$ 250$ 500$

Worker&P

recision

&

Number&of&Tasks&

US$Workers$

IN$Workers$

0.6$0.62$0.64$0.66$0.68$0.7$0.72$0.74$0.76$0.78$0.8$

1$ 2$ 3$ 4$ 5$ 6$ 7$ 8$ 9$

Precision)

Top)K)workers)

22-­‐Apr-­‐12   Gianluca  DemarFni,  eXascale  Infolab   17  

Page 18: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

Experimental  EvaluaFon  

•  EnFty  Linking  with  ZenCrowd  – Training  with  first  5  enFFes  +  5%  aserwards  – 3  consecuFve  bad  answers  lead  to  blacklisFng  

0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"0.9"1"

0" 0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"0.9"

Precision)/)Re

call)

Matching)Probability)Threshold)

P"R"

0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"0.9"1"

1" 2" 3" 4" 5"

Precision)/)Re

call)

Top)N)Results)

P"R"

Figure 5: Performance results (Precision, Recall) forthe automatic approach.

the URIs with at least 2 votes are selected as valid links(we tried various thresholds and manually picked 2 in theend since it leads to the highest precision scores while keep-ing good recall values for our experiments). We report onthe performance of this crowdsourcing technique in Table 2.The values are averaged over all linkable entities for di↵erentdocument types and worker communities.

Table 2: Performance results for crowdsourcing withagreement vote over linkable entities.

US Workers Indian WorkersP R A P R A

GL News 0.79 0.85 0.77 0.60 0.80 0.60US News 0.52 0.61 0.54 0.50 0.74 0.47IN News 0.62 0.76 0.65 0.64 0.86 0.63SW News 0.69 0.82 0.69 0.50 0.69 0.56All News 0.74 0.82 0.73 0.57 0.78 0.59

The first question we examine is whether there is a di↵er-ence in reliability between the various populations of work-ers. In Figure 6 we show the performance for tasks per-formed by workers located in USA and India (each pointcorresponds to the average precision and recall over all en-tities in one document). On average, we observe that tasksperformed by workers located in the USA lead to higherprecision values. As we can see in Table 2, Indian workersobtain higher precision and recall on local Indian news ascompared to US workers. The biggest di↵erence in terms ofaccuracy between the two communities can be observed onthe global interest news.

0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"0.9"1"

0" 0.2" 0.4" 0.6" 0.8" 1"

Recall&

Precision&

US"India"

Figure 6: Per document task e↵ectiveness.

A second question we examine is how the textual contextgiven for an entity influences the worker performance. InFigure 7, we compare the tasks for which only the entitylabel is given to those for which a context consisting of all

the sentences containing the entity are shown to the worker(snippets). Surprisingly, we could not observe a significantdi↵erence in e↵ectiveness caused by the di↵erent textual con-texts given to the workers. Thus, we focus on only one typeof context for the remaining experiments (we always givethe snippet context).

0"0.1"0.2"0.3"0.4"0.5"0.6"0.7"0.8"0.9"1"

1" 2" 3" 4" 5" 6" 7" 8" 9" 10"

Precision)

Document)

Simple"

Snippet"

Figure 7: Crowdsourcing results with two di↵erenttextual contexts

Entity Linking with ZenCrowd.We now focus on the performance of the probabilistic in-

ference network as proposed in this paper. We consider themethod described in Section 4, with an initial training phaseconsisting of 5 entities, and a second, continuous trainingphase, consisting of 5% of the other entities being o↵ered tothe workers (i.e., the workers are given a task whose solutionis known by the system every 20 tasks on average).In order to reduce the number of tasks having little influ-

ence in the final results, a simple technique of blacklistingof bad workers is used. A bad worker (who can be consid-ered as a spammer) is a worker who randomly and rapidlyclicks on the links, hence generating noise in our system.In our experiments, we consider that 3 consecutive bad an-swers in the training phase is enough to identify the workeras a spammer and to blacklist him/her. We report the aver-age results of ZenCrowd when exploiting the training phase,constraints, and blacklisting in Table 3. As we can observe,precision and accuracy values are higher in all cases whencompared to the agreement vote approach.

Table 3: Performance results for crowdsourcing withZenCrowd over linkable entities.

US Workers Indian WorkersP R A P R A

GL News 0.84 0.87 0.90 0.67 0.64 0.78US News 0.64 0.68 0.78 0.55 0.63 0.71IN News 0.84 0.82 0.89 0.75 0.77 0.80SW News 0.72 0.80 0.85 0.61 0.62 0.73All News 0.80 0.81 0.88 0.64 0.62 0.76

Finally, we compare ZenCrowd to the state of the artcrowdsourcing approach (using the optimal agreement vote)and our best automatic approach on a per-task basis in Fig-ure 8. The comparison is given for each document in thetest collection. We observe that in most cases the humanintelligence contribution improves the precision of the auto-matic approach. We also observe that ZenCrowd dominates

22-­‐Apr-­‐12   Gianluca  DemarFni,  eXascale  Infolab   18  

Page 19: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

Comparing  3  matching  techniques  

•  ZenCrowd  best  for  75%  of  documents  

22-­‐Apr-­‐12   Gianluca  DemarFni,  eXascale  Infolab   19  

0"

0.1"

0.2"

0.3"

0.4"

0.5"

0.6"

0.7"

0.8"

0.9"

1"

1" 2" 3" 4" 5" 6" 7" 8" 9" 10"11"12"13"14"15"16"17"18"19"20"21"22"23"24"25"

Precision)

)

Document)

Agr."Vote"

ZenCrowd"

Top"1"

Simple  Crowdsourcing  ZenCrowd  AutomaFc  

Page 20: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

Lessons  Learnt  

•  Crowdsourcing  +  Prob  reasoning  works!  •  But  – Different  worker  communiFes  perform  differently  – Many  low  quality  workers  – No  differences  w/  different  contexts  – CompleFon  Fme  may  vary  (based  on  reward)  

22-­‐Apr-­‐12   Gianluca  DemarFni,  eXascale  Infolab   20  

Page 21: ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for Large-Scale Entity Linking

Conclusions  

•  ZenCrowd:  ProbabilisFc  reasoning  over  automaFc  and  crowdsourcing  methods  for  enFty  linking  

•  Standard  crowdsourcing  improves  6%  over  automaFc  •  4%  -­‐  35%  improvement  over  standard  crowdsourcing  •  14%  average  improvement  over  automaFc  approaches  

•  Next  steps  –  Long-­‐term  worker  behavior  analysis  – More  efficient  and  effecFve  linking  by  LOD  dataset  pre-­‐selecFon  

hVp://diuf.unifr.ch/xi/zencrowd/  

22-­‐Apr-­‐12   Gianluca  DemarFni,  eXascale  Infolab   21