ConsistentQueryAnswersinInconsistentDatabases
JanChomicki
UniversityatBu�alo
http://www.cse.buffalo.edu/~chomicki
JointworkwithMarceloArenasandLeoBertossi,withcontributionsbyRoger
He,VijayRaghavanandJeremySpinrad.
1
Integrityconstraints
Integrityconstraintsdescribevaliddatabaseinstances.
Examples:
(8x)(8y)(8y0)(:Student(x;y)_:Student(x;y0)_y=y0)
(8x)(8y)(9z)(:Advising(x;y)_Student(x;z))
Clausalform:
(Q1x1)���(Qnxn) _Pi
whereeachQj
isaquanti�erandeachPiisaliteraloftheformr(�x)(positive
literal)or:r(�x)(negativeliteral).Literalscanrefertodatabaserelationsor
built-inpredicates.Eachdatabaseliteralcontainsonlyvariablesthatare
mutuallydistinct.
2
Classesofintegrityconstraints
Universalconstraints:onlyuniversalquanti�ers.
Functionaldependencies(FDs)A!B:single-relationuniversalconstraints
withexactlytwodatabaseliteralsthatarenegativeandwhosebuilt-inliterals
are(typed)equalities.
InclusiondependenciesP[A]�Q[B]:
(8�x)(8�z)(9�y)(P(�z;�x))Q(�x;�y)):3
Integritymaintenance
Checking:
�aftereveryupdate
�aftereverytransaction
Onlythoseupdates(transactions)forwhomthecheckssucceedare
committed.Soadatabaseneverviolatesintegrityconstraints.
Variations:
�more exiblereactionstointegrityviolations:repairs,updatepropagation
�triggers
4
Inconsistentdatabases
Therearesituationswhenwewant/needtolivewithinconsistentdataina
database(datathatviolatesgivenintegrityconstraints):
�theconsistencyofthedatabasewillberestoredbyexecutingfurther
transactions
�integrationofheterogeneousdatabaseswithduplicateinformation
�inconsistencywrt\soft"integrityconstraints(thosethatwehopetosee
satis�edbutdonot/cannotcheck)
�denormalizedrelationsinadatawarehouse
�legacydataonwhichwewanttoimposesemanticconstraints
�itisimpossible/undesirabletorepairthedatabasetorestoreconsistency:
{nopermission
{inconsistentinformationcanbeuseful
{restoringconsistencycanbeacomplexandnondeterministicprocess
5
Consistentqueryanswers
Howtodistinguishbetweenreliableandunreliableinformationinan
inconsistentdatabase?
Repair:
�adatabasethatsatis�estheintegrityconstraints
�di�erencefromthegivendatabaseisminimal(thesetofinserted/deleted
tuplesisminimalundersetinclusion)
Typically,morethanonerepairofagiveninconsistentdatabase.
Atuple(a1;:::;an)isaconsistentqueryanswertoaqueryQ(x1;:::;xn)
inadatabaserifitisanelementoftheresultofQ
ineveryrepairofr.
6
Student
Nam
e
Address
Jones
Bu�alo
Jones
Amherst
Green
Albany
Advising
Nam
e
Advisor
Jones
Floyd
Green
Floyd
Theintegrityconstraintsarefunctionaldependencies:Nam
e
!
Address
and
Nam
e
!
Advisor.
Therepairs:
Student
Nam
e
Address
Jones
Bu�alo
Green
Albany
Advising
Nam
e
Advisor
Jones
Floyd
Green
Floyd
Student
Nam
e
Address
Jones
Amherst
Green
Albany
Advising
Nam
e
Advisor
Jones
Floyd
Green
Floyd
7
Consistentqueryanswers
SELECT*
FROMStudent
)
Name
Address
Green
Albany
SELECT*
FROMAdvising
)
Name
Advisor
Jones
Floyd
Green
Floyd
SELECTName
FROMStudent
)
Name
Jones
Green
Answerconsistencyservesasanindicationofitsqualityandreliability.
8
InclusiondependencyEmployee[Name]�Taxpayer[Name].
Instance:
Employee
Name
Salary
Scott
60M
Steven
30M
Taxpayer
Name
Scott
Repairs:
Employee
Name
Salary
Scott
60M
Taxpayer
Name
Scott
Employee
Name
Salary
Scott
60M
Steven
30M
Taxpayer
Name
Scott
Steven
9
Aggregation
SQLaggregationoperators:MIN,MAX,COUNT,SUM,AVG.
SELECTCOUNT(*)
FROMStudent
WHEREAddress='Buffalo'
)
???
Adi�erentanswerineveryrepair.Thus:
SELECTCOUNT(*)
FROMStudent
WHEREAddress='Buffalo
)
[0;1]
Aconsistentqueryanswerforaggregationqueriesisanoptimalrangeof
values,representedasapair
[greatestlowerbound,leastupperbound].
10
Anotherexampleofaggregation.
Employee
Name!Salary
Name
Salary
Scott
68M
Scott
60M
Steven
30M
SELECTSUM(Salary)
FROMEmployee
)
[90M;98M]
11
Computingconsistentqueryanswers
Querytransformation:givenaqueryQ
andasetofintegrityconstraints,
constructaqueryQ0
suchthatforeverydatabaseinstancer
thesetofanswerstoQ0
inr=thesetofconsistentanswerstoQ
inr.
Representingallrepairs:givenasetofintegrityconstraintsandadatabase
instancer:
1.constructaspace-e�cientrepresentationofallrepairsofr
2.usethisrepresentationtoanswerqueries.
Specifyingrepairsaslogicprograms.
12
Therearetoomanyrepairstoevaluatethequeryineachofthem.
A
B
a1
b1
a1
b01
a2
b2
a2
b02
���
an
bn
an
b0n
UnderthefunctionaldependencyA!B,thisinstancehas2n
repairs.
13
Querytransformation[PODS'99]
First-orderqueriestransformedusingsemanticqueryoptimization
techniques.
Residues:
�associatedwithsingleliteralsp(�x)or:p(�x)(onlyoneofeachforevery
databaserelationp)
�foreachliteralp(�x)andeachconstraintcontaining:p(�x)initsclausal
form(possiblyaftervariablerenaming),obtainalocalresiduebyremoving
:p(�x)andthequanti�ersfor�xfromthe(renamed)constraint
�foreachliteral:p(�x)andeachconstraintcontainingp(�x)initsclausal
form(possiblyaftervariablerenaming),obtainalocalresiduebyremoving
p(�x)andthequanti�ersfor�xfromthe(renamed)constraint
�foreachliteral,computetheglobalresidueastheconjunctionoflocal
residues(possiblyafternormalizingvariables)
14
Thefunctionaldependency
(8x)(8y)(8z)(:Student(x;y)_:Student(x;z)_y=z)
producesforStudent(x;y)thefollowinglocalandglobalresidue
(8z)(:Student(x;z)_y=z)
Theintegrityconstraints
(8x)(:p(x)_r(x));(8x)(:q(x)_r(x)))
producethefollowingglobalresidues
Literal
Residue
p(x)
r(x)
q(x)
r(x)
:r(X)
:p(x)^:q(x)
15
Constructingthetransformedquery
Givena�rst-orderqueryQ.
Literalexpansion:foreveryliteral,constructanexpandedversionasthe
conjunctionofthisliteralanditsglobalresidue.
Iteration:theexpansionstepisiteratedbyreplacingtheliteralsintheresidue
bytheirexpandedversions,untilnochangesoccur.
Queryexpansion:replacetheliteralsinthequerybytheir�nalexpanded
versions.
16
Thefunctionaldependency
(8x)(8y)(8z)(:Student(x;y)_:Student(x;z)_y=z)
transformsthequeryStudent(x;y)into
Student(x;y)^(8z)(:Student(x;z)_y=z)
Fortheintegrityconstraints
(8x)(:p(x)_r(x));(8x)(:r(x)_s(x)))
Literal
Residue
Firstexpansion
Second(�nal)expansion
r(x)
s(x)
r(x)^s(x)
r(x)^s(x)
p(x)
r(x)
p(x)^r(x)
p(x)^r(x)^s(x)
:r(x)
:p(x)
:r(x)^:p(x)
:r(x)^:p(x)
:s(x)
:r(x)
:s(x)^:r(x)
:s(x)^:r(x)^:p(x)
17
Propertiesofthetransformation
TheoriginalqueryQ,thetransformedqueryQ0.
Soundness:
�
\ArealltheanswerstoQ0
alsoconsistentanswerstoQ?"
�
satis�edforanyuniversalconstraintsandalargeclassof�rst-orderqueries
Completeness:
�
\AreallconsistentanswerstoQ
alsoanswerstoQ0?"
�
satis�edforbinaryuniversalconstraintsandqueriesthatareconjunctionsof
literals
Termination:
�
\Doesthetransformationterminate?"
�
satis�edforacyclicsetsofconstraints(amongothers)
Corollary:Forfunctionaldependencies,querytransformationterminatesand
thetransformedqueryissoundandcomplete.
18
Theoreticallimitsofquerytransformation
Datacomplexity:complexityasafunctionofthenumberoftuplesinthe
database.
Fact(wellknown).First-orderqueriescanbeevaluatedinAC0.
Observation.Ifthetransformationterminates,thenthetransformedquery
canbeevaluatedinAC0
aswell.
Ifwecanshowthatthedatacomplexityofdeterminingwhetherananswertoa
queryisconsistentisNP-hard,thenquerytransformationisimpossible.
Result.DeterminingwhetherananswerisconsistentisNP-hardforan
existentiallyquanti�edqueryandasetoftwofunctionaldependencies.
19
Representingallrepairs[ICDT'01]
Notasaformula(asinbeliefrevision)butasagraph.
AsetoffunctionaldependenciesF,adatabaseinstancer.
Con ictgraph:
�
nodes:tuplesinr
�
edges:thereisanedge(t1;t2)ifthereisafunctionaldependencyA
!
B
2F
suchthatt1[A]=t2[A]andt1[B]6=t2[B].
�
maximalindependentsets:repairs
Student
Nam
e
Address
Jones
Bu�alo
Jones
Amherst
Green
Albany
)
Jones
Bu�alo
Green
Albany
Jones
Amherst
20
Anothercon ictgraph
FunctionaldependenciesA!B
andB!A.
Instance:A
B
a1
b1
a1
b2
a2
b2
a2
b1
Con ictgraph:
(a1;b1)
(a1;b2)
(a2;b1)
(a2;b2)
21
Aggregationqueries
Insimplecases,queriescanbetransformedtocomputeoptimalresultranges
forconsistentqueryanswers.
StudentCourse
Name!Address
Name
Address
Course
SELECTCOUNT(*)
FROMStudent
)
WITHS(Name,Addr,C)AS
(SELECTName,Address,COUNT(*)
FROMStudentCourse
GROUPBYName,Address),
T(Name,CMin,CMax)AS
(SELECTName,MIN(C),MAX(C)
FROMS
GROUPBYName)
SELECTSUM(CMin),SUM(CMax)
FROMT;
22
Datacomplexityofaggregationqueries
greatestlowerbound
leastupperbound
jFj=1
jFj�2
jFj=1
jFj�2
MIN(A)
PTIME
PTIME
PTIME
NP-complete
MAX(A)
PTIME
NP-complete
PTIME
PTIME
COUNT(*)
PTIME
NP-complete
PTIME
NP-complete
COUNT(A)
NP-complete
NP-complete
NP-complete
NP-complete
SUM(A)
PTIME
NP-complete
PTIME
NP-complete
AVG(A)
PTIMEa
NP-complete
PTIME
NP-complete
Howtoreducethecomputationalcost?2
3
Boyce-CoddNormalForm
Boyce-CoddNormalForm
(BCNF):Everyfunctionaldependencyisakey
dependency.
BCNFproducesrestrictionsonthecon ictgraphthatimprovetractability.
Property:ForonedependencyinBCNF,thecon ictgraphisaunionof
disjointcliques.
24
BCNFandCOUNT(*)queries
Twofunctionaldependencies.
Indirectapproach:
�con ictgraphisclaw-freeandperfect
��ndingmaximum
independentsetsinsuchgraphscanbedoneinPTIME
(O(n5:5))
Directapproach:
�bipartitecliquegraph:
{nodes:cliquesin1-dependencycon ictgraphs
{edges:nonemptycliqueintersections
��ndingamaximum
matchinginthecliquegraph
�overallcomplexityO(max(e;n pn)).
25
Specifyingrepairsaslogicprograms[FQAS'00]
Logicprogramswith:
�negationinthebodyandthehead
�disjunction
�exceptions(canbeeliminated)
Scope:
�arbitraryuniversalconstraints,inclusiondependencies
�arbitrary�rst-orderqueries
�queriescanbe\modalized"andnested
26
Relatedwork
Beliefrevision:
�revisingdatabasewithintegrityconstraints
�revisedtheorychangeswitheachdatabaseupdate
�emphasisonsemantics,notcomputation
�inferenceofgroundliteralsusingtheorem
provingtechniques
Disjunctiveinformation:
�usingdisjunctionstorepresentcon ictsrelatedtofunctionaldependencies
�constructingasingledisjunctiveinstance
�querylanguages:representation-speci�c,relationalalgebraorcalculus
27
Reasoningwithinconsistency:
�avoidingtrivialityandcontradiction
�variouslogics:paraconsistent,minimal,...
�inference6=provability:tocapturenonmonotonicityofconsistentquery
answers(Bry)
28
Futurework
Broadeningscope:
�SQL
�recursivequeries
�classesofintegrityconstraints
�preferences
New
paradigms:
�queryreformulationininformationintegration
�datacleaning
�semistructureddata
29
Algorithmsandcomputationalcomplexity:
�e�cientalgorithmsforqueryprocessinginspecialcases
�lowerbounds
�approximation
Papers:
1.M.Arenas,L.Bertossi,J.Chomicki,\ConsistentQueryAnswersin
InconsistentDatabases,"ACM
Symposium
onPrinciplesofDatabase
Systems(PODS),Philadelphia,May1999.
2.M.Arenas,L.Bertossi,J.Chomicki,\SpecifyingandQueryingDatabase
RepairsusingLogicProgramswithExceptions,"InternationalSymposium
onFlexibleQueryAnsweringSystems(FQAS),Warsaw,Poland,October
2000.
3.M.Arenas,L.Bertossi,J.Chomicki,\ScalarAggregationin
FD-InconsistentDatabases,"InternationalConferenceonDatabaseThory
(ICDT),London,UK,January2001.
30