36
Towards Constraint-based Explanations for Answers and Non-Answers Boris Glavic Illinois Institute of Technology Sean Riddle Athenahealth Corporation Sven Köhler University of California Davis Bertram Ludäscher University of Illinois Urbana-Champaign

2015 TaPP - Towards Constraint-based Explanations for Answers and Non-Answers

Embed Size (px)

Citation preview

Towards Constraint-based Explanations for Answers and

Non-Answers

Boris Glavic

Illinois Institute of Technology

Sean Riddle Athenahealth Corporation

Sven Köhler University of California

Davis

Bertram Ludäscher University of Illinois Urbana-Champaign

Outline

①  Introduction ②  Approach ③  Explanations ④  Generalized Explanations ⑤  Computing Explanations with Datalog ⑥  Conclusions and Future Work

Overview

•  Introduce a unified framework for generalizing explanations for answers and non-answers

•  Why/why-not question Q(t) •  Why is tuple t not in result of query Q?

•  Explanation •  Provenance for the answer/non-answer

•  Generalization •  Use an ontology to summarize and generalize

explanations •  Computing generalized explanations for UCQs •  Use Datalog

1

Train-Example

2

•  2hop(X,Y)  :-­‐  Train(X,Z),  Train(Z,Y).  •  Why can’t I reach Berlin from Chicago? •  Why-not 2hop(Chicago,Berlin)  

From   To  

New  York   Washington  DC  

Washington  DC   New    York  

New  York   Chicago  

Chicago   New  York  

…   …  

Berlin   Munich  

Munich   Berlin  

…   …  

Sea:le  

Chicago  

Washington  DC  

New  York  

Paris  

Berlin  

Munich  

Atlan=c  Ocean!  

Train-Example Explanations

•  2hop(X,Y)  :-­‐  Train(X,Z),  Train(Z,Y).  •  Missing train connections explain why Chicago

and Berlin are not connected •  E.g., if there only would exist a train line between

New York and Berlin: Train(New  York,  Berlin)!

3

Sea:le  

Chicago  

Washington  DC  

New  York  

Paris  

Berlin  

Munich  

Atlan=c  Ocean!  

Why-not Approaches

•  Two categories of data-based explanations for missing answers

•  1) Enumerate all failed rule derivations and why they failed (missing tuples) •  Provenance games

•  2) One set of missing tuples that fulfills optimality criterion •  e.g., minimal side-effect on query result •  e.g., Artemis, …

4

Why-not Approaches

•  1) Enumerate all failed rule derivations and why they failed (missing tuples) •  Exhaustive explanation •  Potentially very large explanations

•  Train(Chicago,Munich),  Train(Munich,Berlin)  •  Train(Chicago,Seattle),  Train(Seattle,Berlin)  •  …

•  2) One set of missing tuples that fulfills optimality criterion •  Concise explanation that is optimal in a sense •  Optimality criterion not always good fit/effective

•  Consider reach (transitive closure) •  Adding any train connection between USA and Europe

- same effect on query result 5

Uniform Treatment of Why/Why-not

•  Provenance and missing answer approaches have been treated mostly independently

•  Observation: •  For provenance models that support query

languages with “full” negation •  Why and why-not are both provenance

computations! •  Q(X)  :-­‐  Train(chicago,X).  •  Why-not Q(New  York)? •  Equivalent to why Q’(New  York)? •  Q’(X)  :-­‐  adom(X),  not  Q(X)  

6

Outline

①  Introduction ②  Approach ③  Explanations ④  Generalized Explanations ⑤  Computing Explanations with Datalog ⑥  Conclusions and Future Work

Unary Train-Example

•  Q(X)  :-­‐  Train(chicago,X).  •  Why-not Q(berlin)  •  Explanation: Train(chicago,berlin)  

•  Consider an available ontology! •  More general: Train(chicago,GermanCity)  

7

Sea:le  

Chicago  

Washington  DC  

New  York  

Paris  

Berlin  

Munich  

Atlan=c  Ocean!  

ACity

NACity

EuropeanCityUSCity

IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity

chicago seattle newyork washington_dc berlin munich paris lyon dijon

Unary Train-Example

•  Q(X)  :-­‐  Train(chicago,X).  •  Why-not Q(berlin)  •  Explanation: Train(chicago,berlin)  

•  Consider an available ontology! •  Generalized explanation:

•  Train(chicago,GermanCity)  •  Most general explanation:

•  Train(chicago,EuropeanCity)  

8

ACity

NACity

EuropeanCityUSCity

IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity

chicago seattle newyork washington_dc berlin munich paris lyon dijon

Our Approach

•  Explanations for why/why-not questions •  over UCQ queries •  Successful/failed rule derivations

•  Utilize available ontology •  Expressed as inclusion dependencies •  “mapped” to instance

•  E.g., city(name,country)  •  GermanCity(X)  :-­‐  city(X,germany).  

•  Generalized explanations •  Use concepts to describe subsets of an explanation

•  Most general explanation •  Pareto-optimal 9

ACity

NACity

EuropeanCityUSCity

IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity

chicago seattle newyork washington_dc berlin munich paris lyon dijon

Related Work - Generalization

•  ten  Cate  et  al.  High-­‐Level  Why-­‐Not  Explana9ons  using  Ontologies  [PODS  ‘15]  •  Also uses ontologies for generalization •  We summarize provenance instead of query results! •  Only for why-not, but, extension to why trivial

•  Other summarization techniques using ontologies •  Data X-ray •  Datalog-S (datalog with subsumption)

10

Outline

①  Introduction ②  Approach ③  Explanations ④  Generalized Explanations ⑤  Computing Explanations with Datalog ⑥  Conclusions and Future Work

Rule derivations

11

•  What  causes  a  tuple  to  be  or  not  be  in  the  result  of  a  query  Q?  •  Tuple  in  result  –  exists  >=  1  successful  rule  

deriva=on  which  jus=fies  its  existence  •  Existen=al  check  

•  Tuple  not  in  result  -­‐  all  rule  deriva=ons  that  would  jus=fy  its  existence  have  failed  •  Universal  check  

•  Rule  deriva=on  •  Replace  rule  variables  with  constants  from  

instance  •  Successful:  body  if  fulfilled  

Basic Explanations

12

•  A  basic  explana=on  for  ques=on  Q(t)  •  Why  -­‐  successful  deriva=ons  with  Q(t)  as  head  •  Why-­‐not  -­‐  failed  rule  deriva=ons    •  Replace  successful  goals  with  placeholder  T  •  Different  ways  to  fail  

2hop(Chicago,Munich)  :-­‐  Train(Chicago,New  York),  Train(New  York,Munich).  2hop(Chicago,Munich)  :-­‐  Train(Chicago,Berlin),  Train(Berlin,Munich).  2hop(Chicago,Munich)  :-­‐  Train(Chicago,Paris),  Train(Paris,Munich).  

   

Sea:le  

Chicago  

Washington  DC  

New  York  

Paris  

Berlin  

Munich  

Explanations Example

13

•  Why  2hop(Paris,Munich)?  

2hop(Paris,Munich)  :-­‐  Train(Paris,Berlin),          Train(Berlin,Munich).  

Sea:le  

Chicago  

Washington  DC  

New  York  

Paris  

Berlin  

Munich  

Outline

①  Introduction ②  Approach ③  Explanations ④  Generalized Explanations ⑤  Computing Explanations with Datalog ⑥  Conclusions and Future Work

Generalized Explanation

14

•  Generalized Explanations •  Rule derivations with concepts

•  Generalizes user question •  generalize a head variable

2hop(Chicago,Berlin)  –  2hop(USCity,EuropeanCity)  

•  Summarizes provenance of (non-) answer •  generalize any rule variable

2hop(New  York,Seattle)  :-­‐  Train(New  York,Chicago),                                                      Train(Chicago,Seattle).  2hop(New  York,Seattle)  :-­‐  Train(New  York,USCity),                                                      Train(USCity,Seattle).  

Generalized Explanation Def.

14

•  For user question Q(t) and rule r  •  r(C1,…,Cn)  

①  (C1,…,Cn) subsumes user question ②  headvars(C1,…,Cn) only cover existing/

missing tuples ③  For every tuple t’ covered by headvars(C1,

…,Cn) all rule derivations for t’ covered are explanations for t’

Recap Generalization Example

15

•  r:  Q(X)  :-­‐  Train(chicago,X).  •  Why-not Q(berlin)  •  Explanation: r(berlin)  

•  Generalized explanation: •  r(GermanCity)  

ACity

NACity

EuropeanCityUSCity

IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity

chicago seattle newyork washington_dc berlin munich paris lyon dijon

Most General Explanation

16

•  Domination Relationship •  r(C1,…,Cn)  dominates r(D1,…,Dn)  •  if for all i: Ci subsumes Di  •  and exists i: Ci strictly subsumes Di  

•  Most General Explanation •  Not dominated by any other explanation

•  Example most general explanation: •  r(EuropeanCity)  

ACity

NACity

EuropeanCityUSCity

IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity

chicago seattle newyork washington_dc berlin munich paris lyon dijon

Outline

①  Introduction ②  Approach ③  Explanations ④  Generalized Explanations ⑤  Computing Explanations with Datalog ⑥  Conclusions and Future Work

Datalog Implementation

① Rules  for  checking  subsump=on  and  domina=on  of  concept  tuples  

② Rules  for  successful  and  failed  rule  deriva=ons  •  Return  variable  bindings  

③ Rules  that  model  explana=ons,  generaliza=on,  and  most  general  explana=ons  

17

①  Modeling Subsumption

•  Basic  concepts  and  concepts  isBasicConcept(X)  :-­‐  Train(X,Y).  isConcept(X)  :-­‐  isBasicConcept(X).  isConcept(EuropeanCity).  

•  Subsump9on  (inclusion  dependencies)  subsumes(GermanCity,EuropeanCity).  subsumes(X,GermanCity)  :-­‐  city(X,germany).  

•  Transi9ve  closure  subsumes(X,Y)  :-­‐  subsumes(X,Z),  subsumes(Z,Y).  

•  Non-­‐strict  version  subsumesEqual(X,X)  :-­‐  isConcept(X).  subsumesEqual(X,Y)  :-­‐  subsumes(X,Y).  

18

②  Capture Rule Derivations

•  Rule  r1:2hop(X,Y)  :-­‐  Train(X,Z),  Train(Z,Y).  •  Success  and  failure  rules  r1_success(X,Y,Z)  :-­‐  Train(X,Z),                                            Train(Z,Y).  r1_fail(X,Y,Z)  :-­‐  isBasicConcept(X),    

   isBasicConcept(Y),        isBasicConcept(Z),        not  r1_success(X,Y,Z).  

 

More  general:    r1(X,Y,Z,true,false)  :-­‐  isBasicConcept(Y),    

   Train(X,Z),  not  Train(Z,Y).      

19

③  Model Generalization

•  Explana9on  for  Q(X)  :-­‐  Train(chicago,X).  expl_r1_success(C1,B1)  :−                              subsumesEqual(B1,C1),    

     r1_success(B1),        not  has_r1_fail(C1).  

 User  ques=on:  Q(B1)  Explanation: Q(C1)  :-­‐  Train(chicago,  C1).    Q(B1)  exists  and  jus=fied  by  r1:  r1_success(B1)  r1  succeeds  for  all  B  in  C1:  not  has_r1_fail(C1)  

20

③  Model Generalization

•  Explana9on  for  Q(X)  :-­‐  Train(chicago,X).  expl_r1_success(C1,B1)  :−                              subsumesEqual(B1,C1),    

     r1_success(B1),        not  has_r1_fail(C1).  

21

ACity

NACity

EuropeanCityUSCity

IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity

chicago seattle newyork washington_dc berlin munich paris lyon dijon

③  Model Generalization

•  Domina9on  dominated_r1_success(C1,B1)  :-­‐  

 expl_r1_success(C1,B1),      expl_r1_success(D1,B1),    subsumes(C1,  D1).  

•  Most  general  explana9on  most_gen_r1_success(C1,B1)  :-­‐  

 expl_r1_success(C1,B1),      not  dominated_r1_success(C1,B1).  

•  Why  ques9on  why(C1)  :-­‐  most_gen_r1_success(C1,seattle).  

22

ACity

NACity

EuropeanCityUSCity

IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity

chicago seattle newyork washington_dc berlin munich paris lyon dijon

Outline

①  Introduction ②  Approach ③  Explanations ④  Generalized Explanations ⑤  Computing Explanations with Datalog ⑥  Conclusions and Future Work

Conclusions

•  Unified framework for generalizing provenance-based explanations for why and why-not questions

•  Uses ontology expressed as inclusion dependencies (Datalog rules) for summarizing explanations

•  Uses Datalog to find most general explanations (pareto optimal)

23

Future Work I

•  Extend ideas to other types of constraints •  E.g., denial constraints – German cities have less than 10M inhabitants :-­‐  city(X,germany,Z),  Z  >  10,000,000  

•  Query returns countries with very large cities Q(Y)  :-­‐  city(X,Y,Z),  Z  >  15,000,000  

•  Why-not Q(germany)? – Constraint describes set of (missing) data – Can be answered without looking at data

•  Semantic query optimization? 24

Future Work II

•  Alternative definitions of explanation or generalization – Our gen. explanations are sound, but not complete – Complete version Concept covers at least explanation – Sound and complete version: Concepts cover explanation exactly

•  Queries as ontology concepts – As introduced in ten Cate

25

ACity

NACity

EuropeanCityUSCity

IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity

chicago seattle newyork washington_dc berlin munich paris lyon dijon

ACity

NACity

EuropeanCityUSCity

IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity

chicago seattle newyork washington_dc berlin munich paris lyon dijon

ACity

NACity

EuropeanCityUSCity

IllinoisCity WashingtonCity NYStateCity DCCity GermanCity FrenchCity

chicago seattle newyork washington_dc berlin munich paris lyon dijon

Future Work III

•  Extension for FO queries – Generalization of provenance game graphs – Need to generalize interactions of rules

•  Implementation –  Integrate with our provenance game engine

•  Powered by GProM! •  Negation - not yet •  Generalization rules - not yet

26

GProMParserParser

Query Log-- --- ----- -- --- -- -- - - - -------- --- - ---

Query Log-- --- ----- -- --- -- -- - - - -------- --- - ---

Datalog Parser

SELECT *FROM ...

Q(X) :- R(X,Y).Why(Q(1)).

ProvenanceGame

Rewriter

SQL CodeGeneratorSQL Code

GeneratorSQL CodeGenerator

User

Backend Database

Datalog Translator

Q(X) :- R(X,Y).Why(Q(1)).

move((((((('notREL_' || 'R_LOST') || '(') || 1) || ',') || V0) || ')'),(((((('REL_' || 'R_WON') || '(') || 1) || ',') || V0) || ')')) :- RR_WON_+(1,V0).move((((((('REL_' || 'R_WON') || '(') || 1) || ',') || V0) || ')'),(((((('EDB_' || 'R_LOST') || '(') || 1) || ',') || V0) || ')')) :- RR_WON_+(1,V0).move((((('REL_' || 'Q_WON') || '(') || 1) || ')'),(((((('RULE_' || '0_LOST') || '(') || 1) || ',') || Y) || ')')) :- r0_WON_+(1,Y).move((((((('RULE_' || '0_LOST') || '(') || 1) || ',') || Y) || ')'),(((((('GOAL_' || '0_0_WON') || '(') || 1) || ',') || Y) || ')')) :- r0_WON_+(1,Y).move((((((('GOAL_' || '0_0_WON') || '(') || 1) || ',') || Y) || ')'),(((((('notREL_' || 'R_LOST') || '(') || 1) || ',') || Y) || ')')) :- r0_WON_+(1,Y).move((((((('notREL_' || 'R_LOST') || '(') || 1) || ',') || Y) || ')'),(((((('REL_' || 'R_WON') || '(') || 1) || ',') || Y) || ')')) :- r0_WON_+(1,Y).r0_WON_+(1,Y) :- r0_WON_+_nonlinked(1,Y).RR_WON_+_nonlinked(1,V0) :- R(1,V0).RQ_WON_+_nonlinked(1) :- r0_WON_+_nonlinked(1,Y).RR_WON_+(1,V1) :- +r0_WON_+(1,V1),RR_WON_+_nonlinked(1,V1).r0_WON_+_nonlinked(1,Y) :- +RR_WON_+_nonlinked(1,Y).

Questions?

•  Boris – http://cs.iit.edu/~dbgroup/index.html

•  Bertram – https://www.lis.illinois.edu/people/faculty/

ludaesch

Relationship to (Constraint) Provenance Games

36

¬Train(Chicago,Munich)

g17(Chicago,Berlin)

Train(Chicago,Munich) Train(NewY ork,Berlin)

r7(Chicago,WashingtonDC,WashingtonDC,Berlin)

g27(Chicago,Berlin) g17(Chicago, Chicago)

r7(Chicago,Munich,Munich,Berlin)r7(Chicago,Berlin,Berlin,Berlin)

g27(NewY ork,Berlin)

Train(Berlin,Berlin)

r7(Chicago,NewY ork,NewY ork,Berlin)

¬Train(NewY ork,Berlin)

g27(Berlin,Berlin)

¬Train(Chicago,Berlin)

g27(WashingtonDC,Berlin)

¬Train(Chicago, Chicago) ¬Train(WashingtonDC,Berlin)

g17(Chicago,Munich)

¬Train(Chicago,WashingtonDC)

Train(Chicago,WashingtonDC)

g17(Chicago,WashingtonDC)

TwoHop(Chicago,Berlin) ¬Train(Chicago,WashingtonDC)

Train(WashingtonDC,Berlin)Train(Chicago, Chicago)

r7(Chicago, Chicago, Chicago,Berlin)

¬Train(Berlin,Berlin)

Train(Chicago,Berlin)

9 Berlin

9 Washington DC9 New York

9 Chicago 9 Munich

1

TwoHop :x1 = CHI,x2 6= WDC,x2 6= CHI

Train :x2 6= WDC,x2 6= CHI,x1 = NY C

G11 : Train :y 6= NY C,x = CHI

R1 :x = CHI,y = CHI,z = NY C

R1 :x = CHI,y = BER,z = MUN

R1 :y 6= NY C,x = CHI,y 6= WDC,y 6= CHI,y 6= BER,z 6= BER

G21 : Train :y 6= NY C,y 6= WDC,y 6= CHI,y 6= BER,y 6= MUN,z = BER

G21 : Train :

z 6= MUN,y = BER

Train :x2 6= NY C,x1 = WDC

G21 : Train :z 6= NY C,y = WDC

G21 : Train :

z 6= WDC,z 6= CHI,y = NY C

Train :x1 6= NY C,x1 6= WDC,x1 6= CHI,x1 6= BER,x2 6= BER

R1 :x = CHI,y = MUN,z = BER

R1 :x = CHI,z 6= NY C,y = WDC

¬Train :x2 6= NY C,x1 = WDC

R1 :x = CHI,z 6= NY C,y = CHI

¬Train :x1 6= NY C,x1 6= WDC,x1 6= CHI,x1 6= BER,x2 6= BER

R1 :x = CHI,y = NY C,z 6= WDC,z 6= CHI

Train :x2 6= MUN,x1 = BER

¬Train :x2 6= WDC,x2 6= CHI,x1 = NY C

Train :x1 6= NY C,x1 6= WDC,x1 6= CHI,x1 6= BER,x1 6= MUN,x2 = BER

¬Train :x2 6= MUN,x1 = BER

G21 : Train :y 6= NY C,y 6= WDC,y 6= CHI,y 6= BER,z 6= BER

¬Train :x2 6= NY C,x1 = CHI

G21 : Train :z 6= NY C,y = CHI

¬Train :x1 6= NY C,x1 6= WDC,x1 6= CHI,x1 6= BER,x1 6= MUN,x2 = BER

R1 :x = CHI,y = WDC,z = NY C

R1 :x = CHI,z 6= MUN,y = BER

Train :x2 6= NY C,x1 = CHI

R1 :y 6= NY C,x = CHI,y 6= WDC,y 6= CHI,y 6= BER,y 6= MUN,z = BER

1