1
Consistent Query Answering Under Inclusion Dependencies
Authors: Loreto Bravo and Leopoldo BertossiCarleton University, Canada
Presented by: Zhijun Lin
Advisors: Dr. Hactor Hernandez
Dr. Yuanlin Zhang
2
Integrity Constraints
Integrity constraints (ICs) describe valid database instance.
For example: - “Every student has a unique ID number”
- “Students can enroll only in the offered courses”
- “No employees can have salary higher than his manager”
3
Inconsistent databases
Inconsistent database: database that violates given integrity constraints.
Some reasons for database to be inconsistent:
- DBMS does not enforce all ICs.
- Integration of data from different databases.
- New constraints are imposed on pre existing database.
- Soft or user constraints.
4
Inconsistent databases
In several cases we don’t want to repair database to restore consistency:
- no permission.- too expensive.- temporary inconsistency.
How to obtain consistent query answer from inconsistent database?
5
Example
A database instance r
IC: (functional dependency) Name Grade.
Student Name Grade
John 90
John 80
Smith 70
6
Example
If only deletion/insertion ofwhole tuples are allowed,there are two ways to repairthe database with minimalchanges.
Note Student (Smith, 70) persists in both repairs whereas Student(John,90) does not.
Student Name Grade
John 90
Smith 70
Student Name Grade
John 80
Smith 70
7
Repair
A repair of a database instance r is a database instance r’
- over the same database schema and domain,
- satisfies ICs,
- differs from r by a minimal set of changes (insertion or deletion of tuples) wrt set inclusion.
8
Consistent Query Answer
A tuple (a1,…,an) is a consistent query answer to a query Q (x1,…,xn) in a database r if it is an answer to Q
in every repair of r.
9
Example
Student (Smith, 70) is a consistent answer.
Student (John, 90) is not a consistent answer.
For query asking for student that has higher grade than Smith, John should be a consistent answer.
Student Name Grade
John 90
Smith 70
Student Name Grade
John 80
Smith 70
10
Classes of ICs
Consider two important classes of ICs:
- Universal integrity constraints (UICs)
- Referential integrity constraints (RICs), also known as inclusion dependencies (INDs).
11
UIC and RIC
Q.P, relations database and ,in contained with ariables,distinct v of sequence are where
)3()],(([
form thehas constraintintegrity lReferentia
)2()],,...,()()([,...,
form thehas constraintintegrity Universal
1
2
3,2)131
111
1
xxx
xxPxQxx
xxxPxPxx
i
nii
n
miii
m
in
12
Example of UIC
Functional dependency “Emp: id dept” can be expressed as:
which is equivalent to:
).),(),((
21
2121
deptdeptdeptidEmpdeptidEmpdeptdeptid
).),(),((
21
2121
deptdeptdeptidEmpdeptidEmpdeptdeptid
13
Example of RIC
Consider a database schema {Emp (id, dept), People(id, name)}, in order to represent IND “Emp[id] People[id]”, which says that employees are people, we use the RIC:
which is equivalent to:
)],(),([ nameidPeopledeptidEmpnamedeptid
)),((),(( nameidPeoplenamedeptidEmpdeptid
14
Special treatment of null-value
UIC holds if its satisfied by non-null values.
{Student(john,90), Student (john,null)} satisfies UIC Namegrade.
RIC is satisfied considering only non-null values foruniversally quantified variables and any value for existentially quantified variables.
{Emp(777, CS), People(777,null)} satisfies RIC Emp[id] People [id],so does {Emp(555,null)} and {Emp(null,cs)}.
15
Example
Repair New Database instance Changes
1 {emp(john,cs), salary (john,null), dept(cs)
emp(mary,ee),dept(ee),salary(mary,2000) }
salary(john,null), dept(cs)
2 {emp(mary,ee),dept(ee),salary(mary,2000) } emp(john,cs)
Given database: D = { emp(john,cs),emp(mary,ee), dept(ee), salary(mary,2000) },
UIC: emp(X,Y) dept(Y)RIC: emp(X,Y) Z salary(Y,Z)
16
Use ASP to compute repairs
1. dom(john). dom(mary). % for all constants a != null.dom(cs). dom(ee). dom(2000).
2. emp(john,cs,td). emp(mary,ee,td). salary(mary,2000,td). dept(ee,td).
% td denotes database fact.
3. emp(A,B,t1):- emp(A,B,td), dom(A),dom(B).emp(A,B,t1):- emp(A,B,ta), dom(A), dom(B).
% t1 denotes true or becomes true% ta denotes advised to be true.% (also for salary and dept).
17
Use ASP to compute repairs
4. emp(A, B, fa) v dept(B, ta):- emp(A, B, t1), not dept(B, td), dom(A),dom(B).
emp(A, B, fa) v dept(B, ta):- emp(A, B, t1), dept(B, fa ), dom(A),dom(B).
% fa denotes advised to be false.
% repair for UIC: emp(X,Y) dept(Y)
18
Use ASP to compute repairs
5. emp(A, B, fa) v salary(B, null, ta) :- emp(A, B, t1), not aux(B), not salary(B, null, td), dom(A), dom(B).
aux(B):- salary(B, Z, td), not salary(B, Z, fa), dom(B), dom(Z).aux(B):- salary(B,Z,ta), dom(B), dom(Z).
% repair for RIC: emp(X,Y) Z salary(Y,Z) % aux(B) means salary(B,Z) in final database for
some Z.
19
Use ASP to compute repairs
6. emp(A, B, t2) :- emp(A, B, ta). emp(A, B, t2) :- emp(A, B, td), not emp(A, B, fa).
% t2 denotes true in the repair. (Also for dept and salary).
7. % A tuple cannot be both deleted and inserted. :- emp(A, B, ta), emp(A, B, fa). (Also for dept and salary).
20
Consistent Query Answering
For Query ?emp(X,Y) ,
Add rule
ans(X,Y) :- emp(X,Y,t2).
to the repair problem, if ans(A, B) appears in all stable models, then emp(A,B) is a consistent query
answer.
21
How does it work
The basic idea behind the repair program:
If there is a possible violation of ICs, it lists possible repairs (insertion/ deletion of tuples) in disjunction.
Since ASP produces answer sets which are minimal wrt set inclusion, the changes should also be minimal wrt set inclusion, matching the definition of repair.
22
Problem
Now considerD = { p(a,a)}, RICs : p(X,Y) Z p(Y,Z) Clearly D satisfies the RICs.
But the repair program will generate a redundant repair, which deletes p(a,a).
23
Grounded program
dom(a).p(a,a,td).p(a,a,t1):- p(a,a,td), dom(a).
p(a,a,fa) v p(a,null,ta):- p(a,a,t1), not aux(a), not p(a,null,td),dom(a).
aux(a):- p(a,a,td), not p(a,a,fa), dom(a).aux(a):- p(a,a,ta), dom(a).
% p(a,a,fa) justifies itself.
24
Other example with Circular justification
p(X,Y)Z q(Y,Z), q(X,Y)Z p(Y,Z), D={p(a,b), q(b,a)}
program:
p(a,b,td). q(b,a,td).
p(a,b,fa) v q(b,null,ta):- p(a,b,t1),not aux1(b).q(b,a,fa) v p(a,null,td):- q(b,a,t1),not aux2(a).
aux1(b):- q(b,a,td), not q(b,a,fa).aux2(a):- p(a,b,td), not p(a,b,fa).
25
- A set of RICs is said to be acyclic if there is no cycle in the directed graph whose vertices correspond to the relations in R, and an edge from P to R correspond to a RIC P(X1) Z R(X2,Z). Otherwise it is cyclic.
- Examples of cyclic RIC(s):1. XY (p(X,Y) Z q(Y,Z)). XY (q(X,Y) Z p(Y,Z)).
2. XY (p(X,Y) Z p(Y,Z)).
p q
p
Cyclic / Acyclic RICs
26
Problem
The problem we show earlier happens only for cyclic RICs.
The authors concluded that their repair program generates the exact repairs for UICs and acyclic RICs. When cyclic RICs are presented, the program will produce a superset of the set of the repairs.
Can we fix the repair program to make it work for cyclic RICs?
27
New repair program
Our solution is to add constraints to prevent redundant changes.Suppose we have cyclic RIC set {p, q}, and in the old program p(A,B,fa) and q(B,A,fa) justify each other.The repair rules for this RIC in the old program look like: p(A,B,fa) v q(B,null,ta):- G1. -- r1 q(A,B,fa) v p(B,null,ta):- G2. -- r2and assume there is another repair rule (not for cyclic RIC) involves p(A,B,fa). p(A,B,fa) v H :- G3. -- r3
28
New repair program
First we rewrite r1-r3 to:
p(A,B,fa) v q(B,null,ta):- G1, not other_p_fa(1,A,B). p_fa(1,A,B):- p(A,B,fa), G1, not other_p_fa(1,A,B).
q(A,B,fa) v p(B,null,ta):- G2, not other_q_fa(1,A,B).q_fa(1,A,B):- q(A,B,fa), G2, not other_q_fa(1,A,B).
p(A,B,fa) v H :- G3. p_fa(0,A,B):- p(A,B,fa), G3.
29
New repair program
Add following rules:1. suppose we have only one cyclic RIC set,
type(1). % repair rules for cyclic RIC violation.type(0). % repair rules for other IC violation.
2. other_p_fa(X,A,B):- p_fa(Y,A,B), X!=Y, type(X), type(Y). other_q_fa(X,A,B):- q_fa(Y,A,B), X!=Y, type(X), type(Y).
3. Deny circular justification:
:- p_fa(1,A,B), q_fa(1,B,A).
30
An interesting observation
The main idea of our new program is to avoid circular justification. Our method can be used in ASP- SAT translation process.
Consider program: P = { a :- b. b:- a. }Its completion, Comp(P) = {(a b), (b a) } has two models {} and {a,b}, while P has one answer set {}.
We want to prevent circular justification between a and b.
31
ASP to SAT
Rewrite P to P’: a :- b, not a2. % a2 -- other rule makes ‘a’ true. a1:- a, b, not a2. b:- a, not b2. b1:- b, a, not b2. :- a1, b1.Now comp(P’) = { ( a (b a2)), (a1 (a b a2)), ( b (a b2)), (b1 (b a b2)), (a1 b1), a2, b2 }which has only one model {}.
32
ASP to SAT
Suppose we add fact {a} to program P, the rewritten P’ should add{ a, a2 :- a }. Then comp(P’) = { a, a2 a, (a1 (a b a2)), ( b (a b2)), (b1 (b a b2)), (a1 b1), b2}
Now it has single model {a, b, a2, b1}, corresponds to the answer set of P, which is {a, b}.
33
THE END