45
Hybrid Humanmachine Systems Lecture 5 Gianluca Demar8ni University of Sheffield

Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Hybrid  Human-­‐machine  Systems  

Lecture  5  Gianluca  Demar8ni    

University  of  Sheffield  

Page 2: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Background  

•  Hybrid  systems  – Combining  the  scalability  of  machines  and  the  quality  of  human  intelligence  

2  

Page 3: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Hybrid  Systems:  Key  Issues  

•  The  role  of  machine  (i.e.,  algorithm)  and  humans  – use  only  humans?  both?  who’s  doing  what?  

•  Quality  control  •  Op#miza#on:  What  to  crowdsource  •  Scalability:  How  much  to  crowdsource  

3  

Page 4: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Thinking  About  Hybrid  Systems  

Algorithms  

Machines  

People  

search  

Watson/IBM  

4  

Page 5: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Example:  Hybrid  Image  Search  

5  

Yan,  Kumar,  Ganesan,  CrowdSearch:  Exploi8ng  Crowds  for  Accurate  Real-­‐8me  Image  Search  on  Mobile  Phones,  Mobisys  2010.    

Page 6: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Not  sure  

Example:  Hybrid  Data  Integra8on  

paper conf Data integration VLDB-01

Data mining SIGMOD-02

title author email OLAP Mike mike@a

Social media Jane jane@b

l   Generate  plausible  matches  –  paper  =  8tle,  paper  =  author,  paper  =  email,  paper  =  venue  –  conf  =  8tle,  conf  =  author,  conf  =  email,  conf  =  venue  

l  Ask  users  to  verify    

paper conf Data integration VLDB-01

Data mining SIGMOD-02

title author email venue OLAP Mike mike@a ICDE-02

Social media Jane jane@b PODS-05

Does  aaribute  paper  match  aaribute  author?    

No  Yes  

McCann,  Shen,  Doan:  Matching  Schemas  in  Online  Communi8es.  ICDE,  2008   6  

Page 7: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Example:  Hybrid  Query  Processing    

7  

Use  the  crowd  to  answer  DB-­‐hard  queries    Where  to  use  the  crowd:  •  Find  missing  data  •  Make  subjec#ve  

comparisons  •  Recognize  paJerns  

But  not:  •  Anything  the  computer  

already  does  well    Disk 2

Disk 1

Parser

Optimizer

Stat

istics

CrowdSQL Results

Executor

Files Access Methods

UI Template Manager

Form Editor

UI Creation

HIT Manager

Met

aDat

a

Turker Relationship Manager

M.  Franklin,  D.  Kossmann,  T.  Kraska,  S.  Ramesh  and  R.  Xin  .    CrowdDB:  Answering  Queries  with  Crowdsourcing,  SIGMOD  2011    

Page 8: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Crowdsourced  Data  Management  Applica8ons  

•  Rela8onal  –  informa8on  extrac8on  –  schema  matching  –  en8ty  resolu8on  –  building  structured  KBs  –  sor8ng  –  top-­‐k  –  ...  

•  Beyond  rela8onal  –  graph  search  –  classifica8on  –  mobile  image  search  –  social  media  analysis  –  ques8on  answering  –  NLP    –  text  summariza8on  –  sen8ment  analysis  –  seman8c  wikis  –  ...  

8

Page 9: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Qurk  (MIT)  

•  Goal:  crowd-­‐source  comparisons,  missing  data  •  Basis:  SQL3  +  UDF  

– UDF  encapsulate  crowd  input  – special  template  language  for  crowd  UDFs  – specify  UI,  quality  control,  …  (possibly  opt.  hints)  

9  

Page 10: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Qurk  Example  [Markus  et  al.  CrowdCrowd  2011]  

•  Task:  Find  all  women  in  a  “people”  database  •  Schema  

CREATE TABLE people( name varchar(256), photo blob );

•  Query  SELECT name FROM people p WHERE isFemale(p);

10  

Page 11: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Qurk  Example  [Markus  et  al.  CrowdCrowd  2011]  

•  Task:  Find  all  women  in  a  “people”  database  •  Schema  

CREATE TABLE people( name varchar(256), photo blob );

•  Query  SELECT name FROM people p WHERE isFemale(p);

TASK  isFemale(tuple)  TYPE:  Filter    Ques#on:    “is  %s  Female”,                                                  tuple[“photo”]    YesText:        “Yes”    NoText:          “No”  

11  

Page 12: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Qurk  Example  [Markus  et  al.  CrowdCrowd  2011]  

TASK  isFemale(tuple)  TYPE:  Filter    Ques#on:    “is  %s  Female”,    

                                                                   tuple[“photo”]    YesText:        “Yes”    NoText:          “No”  

12  

Page 13: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

The  magic  is  in  the  Templates  

•  Templates  generate  UIs  for  different  kinds  of  crowd-­‐sourcing  tasks  – filters:    Yes  /  No  ques8ons  –  joins:      comparisons  between  two  tuples  (equality)  – order  by:  comparisions  between  two  tuples  (gt?)  – genera8ve:  crowd-­‐source  aaribute  values  

•  Templates  also  specify  quality  control;  e.g.,  COMBINER:    MajorityVote  

13  

Page 14: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Crowdsourcing  DB  Systems  

•  Fundamentally  new  way  of  tackling  data  management  issues  using  large  networks  of  anonymous  users  

•  At  this  point,  first  interes8ng  systems  and  results,  but  s8ll  more  ques8ons  than  answers  –  Hot  research  topic  

•  Unique,  unexpected  issues  –  “My  database  hates  me”  

14  

Page 15: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Problem:  Populate  Infoboxes  

15

Page 16: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Solu8on:  IE  Using  Machine  +  Human  

16

Infobox

Born July 21, 1899 Nationality American

Hemingway was an American author ...

Verify with crowd

Train a “nationality” extractor

The American readers ...

Apply extractor to new pages to extract nationalities.

Page 17: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Recrui8ng  Model:  Search  Adver8sing  

•  Interrupt  user  in  the  middle  of  a  primary  task  –  searching  for  informa8on  on  Ray  Bradbury  

•  Ask  if  user  is  willing  to  contribute  

•  Evaluate  different  UIs:  pop-­‐up,  highlight,  icon  –  in  terms  of  intrusiveness  and  willingness  to  contribute  

17

“ray bradbury”

Page 18: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Popup  Interface   18  

Page 19: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Highlight  Interface   19  

Page 20: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Other  examples  of  hybrid  systems  

Page 21: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

hap://dbpedia.org/resource/Facebook  

hap://dbpedia.org/resource/Instagram  

tase:Instagram  owl:sameAs  

Google  

Android  

<p>Facebook  is  not  wai8ng  for  its  ini8al  public  offering  to  make  its  first  big  purchase.</p><p>In  its  largest  acquisi8on  to  date,  the  social  network  has  purchased  Instagram,  the  popular  photo-­‐sharing  applica8on,  for  about  $1  billion  in  cash  and  stock,  the  company  said  Monday.</p>  

<p><span  about="hap://dbpedia.org/resource/Facebook"><cite  property=”rdfs:label">Facebook</cite>  is  not  wai8ng  for  its  ini8al  public  offering  to  make  its  first  big  purchase.</span></p><p><span  about="hap://dbpedia.org/resource/Instagram">In  its  largest  acquisi8on  to  date,  the  social  network  has  purchased  <cite  property=”rdfs:label">Instagram</cite>  ,  the  popular  photo-­‐sharing  applica8on,  for  about  $1  billion  in  cash  and  stock,  the  company  said  Monday.</span></p>  

RDFa  enrichment  

HTML:  

21  

Page 22: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

ZenCrowd  

•  Combine  both  algorithmic  and  manual  linking  •  Automate  manual  linking  via  crowdsourcing  •  Dynamically  assess  human  workers  with  a  probabilis8c  reasoning  framework  

22  

Crowd  

Algorithms  Machines  

Page 23: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

ZenCrowd  Architecture  

Micro Matching

Tasks

HTMLPages

HTML+ RDFaPages

LOD Open Data Cloud

CrowdsourcingPlatform

ZenCrowdEntity

Extractors

LOD Index Get Entity

Input Output

Probabilistic Network

Decision Engine

Micr

o-Ta

sk M

anag

er

Workers Decisions

AlgorithmicMatchers

23  

Gianluca  Demar8ni,  Djellel  Eddine  Difallah,  and  Philippe  Cudré-­‐Mauroux.  ZenCrowd:  Leveraging  Probabilis8c  Reasoning  and  Crowdsourcing  Techniques  for  Large-­‐Scale  En8ty  Linking.  In:  21st  Interna8onal  Conference  on  World  Wide  Web  (WWW  2012).  

Page 24: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

En8ty  Factor  Graphs  

•  Graph  components  – Workers,  links,  clicks  – Prior  probabili8es  – Link  Factors  – Constraints  

•  Probabilis8c  Inference  – Select  all  links  with  posterior  prob  >τ  

w1 w2

l1 l2

pw1( ) pw2( )

lf1( ) lf2( )

pl1( ) pl2( )

l3

lf3( )

pl3( )

c11 c22c12c21 c13 c23

u2-3( )sa1-2( )

2  workers,  6  clicks,  3  candidate  links  

Link  priors  

Worker  priors  

Observed  variables  

Link  factors  

SameAs  constraints  

Dataset  Unicity  constraints  

24  

Page 25: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Experimental  Evalua8on  •  Datasets  

–  25  news  ar8cles  from  •  CNN.com  (Global  news)  •  NYTimes.com  (Global  news)  •  Washington-­‐post.com  (US  local  news)  •  Timesofindia.india8mes.com  (India  news)  •  Swissinfo.com  (Switzerland  local  news)  

–  40M  en88es  (Freebase,  DBPedia,  Geonames,  NYT)  

25  

Page 26: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Worker  Selec8on  

26  

Top$US$Worker$

0$

0.5$

1$

0$ 250$ 500$

Worker&P

recision

&

Number&of&Tasks&

US$Workers$

IN$Workers$

0.6$0.62$0.64$0.66$0.68$0.7$0.72$0.74$0.76$0.78$0.8$

1$ 2$ 3$ 4$ 5$ 6$ 7$ 8$ 9$Precision)

Top)K)workers)

Page 27: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Lessons  Learnt  

•  Crowdsourcing  +  Prob  reasoning  works!  •  But  

– Different  worker  communi8es  perform  differently  – Many  low  quality  workers  – Comple8on  8me  may  vary  (based  on  reward)  

•  Need  to  find  the  right  workers  for  your  task  (see  WWW13  paper)  

27  

Page 28: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

ZenCrowd  Summary  

•  ZenCrowd:  Probabilis8c  reasoning  over  automa8c  and  crowdsourcing  methods  for  en8ty  linking  

•  Standard  crowdsourcing  improves  6%  over  automa8c  •  4%  -­‐  35%  improvement  over  standard  crowdsourcing  •  14%  average  improvement  over  automa8c  approaches  

•  Follow  up-­‐work  (VLDBJ):  –  Also  used  for  instance  matching  across  datasets  –  3-­‐way  blocking  with  the  crowd  

hap://exascale.info/zencrowd/  

28  

Page 29: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

ZenCrowd  Architecture  

Gianluca  Demar8ni,  Djellel  Eddine  Difallah,  and  Philippe  Cudré-­‐Mauroux.  ZenCrowd:  Leveraging  Probabilis#c  Reasoning  and  Crowdsourcing  Techniques  for  Large-­‐Scale  En#ty  Linking.  In:  21st  Interna8onal  Conference  on  World  Wide  Web  (WWW  2012)  

Micro Matching

Tasks

HTMLPages

LOD Open Data Cloud

CrowdsourcingPlatform

ZenCrowdEntity

Extractors

LOD Index

Output

Probabilistic Network

Decision Engine

AlgorithmicLinkers

Micr

o-Ta

sk M

anag

er

Workers Decisions

AlgorithmicMatchers

InputDataset Pair

Graph DB 1

2

3

HTML + RDFa Pages

New Matchings<owl:sameAs>

Indexing

Input

29  

Page 30: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Blocking  for  Instance  Matching  

•  Find  the  instances  about  the  same  real-­‐world  en8ty  within  two  datasets  

•  Avoid  Comparison  of  all  possible  pairs  – Step  1:  cluster  similar  items  using  a  cheap  similarity  measure  

– Step  2:  n*n  comparison  within  the  clusters  with  an  expensive  measure  

30  

Page 31: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

3-­‐steps  Blocking  with  the  Crowd  •  Crowdsourcing  as  the  most  expensive  similarity  measure  

e

c1

c2

c3

e

c1

c2

c3

e

c1

c2

c3

e

c1

c2

c3

e

c1

c2

c3

e

c1

c2

c3

e

c1

c2

c3

e

c1

c2

c3

e c3

e c1

e c1

e c3

e c2

e c1

e

c1

c2

c3

e

c1

c2

c3

Micro Matching

Tasks

Step 1:Inverted Index Candidates

Matches with HIgh Confidance

e c3

e c1

e c1

e c3

e c2

e c1

e c2

e c1

Probabilistic network

SchemaBased

ConfidenceComputation

Step 2: Crowdsourcing non confidant matches

Step 3: Results Combination

31  

Page 32: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

CrowdQ  –  Crowd-­‐powered  Query  Understanding  

Page 33: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

birthdate  of  the  mayor  of  the  capital  city  of  italy  

33  

Page 34: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

capital  city  of  italy  

34  

Page 35: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

mayor  of  rome  

35  

Page 36: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

birthdate  of  ignazio  marino  

36  

Page 37: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Mo8va8on  

•  Web  Search  Engines  can  answer  simple  factual  queries  directly  on  the  result  page  

•  Users  with  complex  informa8on  needs  are  o�en  unsa8sfied  

•  Purely  automa8c  techniques  are  not  enough  •  We  want  to  solve  it  with  Crowdsourcing!  

37  

Page 38: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

CrowdQ  

•  CrowdQ  is  the  first  system  that  uses  crowdsourcing  to  – Understand  the  intended  meaning  – Build  a  structured  query  template  – Answer  the  query  over  Linked  Open  Data  

38  

Gianluca  Demar8ni,  Beth  Trushkowsky,  Tim  Kraska,  and  Michael  Franklin.  CrowdQ:  Crowdsourced  Query  Understanding.  In:  6th  Biennial  Conference  on  Innova8ve  Data  Systems  Research  (CIDR  2013).  

Page 39: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

39  

Page 40: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

User

Keyword QueryOn#line'Complex'Query

ProcessingComplex

query classifier

CrowdsourcingPlatform

Vetrical selection,

Unstructured Search, ...

POS + NER tagging

Query Template Index

Crowd Manager

N

Y

Queries Templ +Answer Types

StructuredLOD Search

Result Joiner

Template Generation

SERP

t1t2t3

Off#line'Complex'QueryDecomposition

Structured Query

Query Logquery

N

Answ

erCo

mpo

sitio

n

LOD Open Data Cloud

Match with existingquery templates

CrowdQ  Architecture  

40  

Off-­‐line:  query  template  genera8on  with  the  help  of  the  crowd  On-­‐line:  query  template  matching  using  NLP  and  search  over  open  data  

Page 41: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Hybrid  Human-­‐Machine  Pipeline  

41  

Q=  birthdate  of  actors  of  forrest  gump  

Query  annota8on   Noun   Noun   Named  en8ty  

Verifica8on  

En8ty  Rela8ons  

Is  forrest  gump  this  en8ty  in  the  query?  

Which  is  the  rela8on  between:  actors  and  forrest  gump   starring  

Schema  element   Starring                          <dbpedia-­‐owl:starring>    

Verifica8on   Is  the  rela8on  between:  Indiana  Jones  –  Harrison  Ford  Back  to  the  Future  –  Michael  J.  Fox  of  the  same  type  as  Forrest  Gump  -­‐  actors        

Page 42: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Structured  query  genera8on  

SELECT  ?y  ?x  WHERE  {  ?y  <dbpedia-­‐owl:birthdate>  ?x  .  

     ?z  <dbpedia-­‐owl:starring>  ?y  .        ?z  <rdfs:label>  ‘Forrest  Gump’  }  

42  

Results  from  BTC09:  

Q=  birthdate  of  actors  of  forrest  gump  MOVIE  

MOVIE  

Page 43: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Overview  of  hybrid  systems  

43  

Page 44: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Overview  of  hybrid  systems  

•  Balance  between  systems  that  use  the  human  component  as  pre-­‐processing  or  post-­‐processing  of  data  (11  vs  13)    

•  Mostly  monetary  reward  •  Majority  of  systems  perform  batch  data  processing  rather  than  real-­‐8me  jobs    

•  In  2014  we  can  observe  a  decreased  number  of  hybrid  human-­‐machine  systems  being  propose  :  focus  on  solving  core  problems  rather  than  building  new  systems  

44  

Page 45: Hybrid’Human,machine’Systems’Leveraging+Probabilis#c+Reasoning+and+Crowdsourcing+Techniques+for+LargeCScale+ En#ty+Linking.’In:’21stInternaonal’Conference’on’World’Wide’Web’(WWW’2012)’

Summary  

•  Crowdsourcing  big  data  can  make  you  go  bankrupt!  -­‐>  hybrid  systems  

•  When  ask  a  human,  when  trust  the  machine  •  Hybrid  (human  in  the  loop)  

– Pre-­‐processing:  training  data  for  ML  – Post-­‐processing:  based  on  confidence  scores  – Mix:  ac8ve  learning  

Gianluca  Demar8ni.  Hybrid  Human-­‐Machine  Informa#on  Systems:  Challenges  and  Opportuni#es.  In:  Computer  Networks,  Special  Issue  on  Crowdsourcing,  Elsevier.     45