30

He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

ConsistentQueryAnswersinInconsistentDatabases

JanChomicki

UniversityatBu�alo

http://www.cse.buffalo.edu/~chomicki

JointworkwithMarceloArenasandLeoBertossi,withcontributionsbyRoger

He,VijayRaghavanandJeremySpinrad.

1

Page 2: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Integrityconstraints

Integrityconstraintsdescribevaliddatabaseinstances.

Examples:

(8x)(8y)(8y0)(:Student(x;y)_:Student(x;y0)_y=y0)

(8x)(8y)(9z)(:Advising(x;y)_Student(x;z))

Clausalform:

(Q1x1)���(Qnxn) _Pi

whereeachQj

isaquanti�erandeachPiisaliteraloftheformr(�x)(positive

literal)or:r(�x)(negativeliteral).Literalscanrefertodatabaserelationsor

built-inpredicates.Eachdatabaseliteralcontainsonlyvariablesthatare

mutuallydistinct.

2

Page 3: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Classesofintegrityconstraints

Universalconstraints:onlyuniversalquanti�ers.

Functionaldependencies(FDs)A!B:single-relationuniversalconstraints

withexactlytwodatabaseliteralsthatarenegativeandwhosebuilt-inliterals

are(typed)equalities.

InclusiondependenciesP[A]�Q[B]:

(8�x)(8�z)(9�y)(P(�z;�x))Q(�x;�y)):3

Page 4: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Integritymaintenance

Checking:

�aftereveryupdate

�aftereverytransaction

Onlythoseupdates(transactions)forwhomthecheckssucceedare

committed.Soadatabaseneverviolatesintegrityconstraints.

Variations:

�more exiblereactionstointegrityviolations:repairs,updatepropagation

�triggers

4

Page 5: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Inconsistentdatabases

Therearesituationswhenwewant/needtolivewithinconsistentdataina

database(datathatviolatesgivenintegrityconstraints):

�theconsistencyofthedatabasewillberestoredbyexecutingfurther

transactions

�integrationofheterogeneousdatabaseswithduplicateinformation

�inconsistencywrt\soft"integrityconstraints(thosethatwehopetosee

satis�edbutdonot/cannotcheck)

�denormalizedrelationsinadatawarehouse

�legacydataonwhichwewanttoimposesemanticconstraints

�itisimpossible/undesirabletorepairthedatabasetorestoreconsistency:

{nopermission

{inconsistentinformationcanbeuseful

{restoringconsistencycanbeacomplexandnondeterministicprocess

5

Page 6: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Consistentqueryanswers

Howtodistinguishbetweenreliableandunreliableinformationinan

inconsistentdatabase?

Repair:

�adatabasethatsatis�estheintegrityconstraints

�di�erencefromthegivendatabaseisminimal(thesetofinserted/deleted

tuplesisminimalundersetinclusion)

Typically,morethanonerepairofagiveninconsistentdatabase.

Atuple(a1;:::;an)isaconsistentqueryanswertoaqueryQ(x1;:::;xn)

inadatabaserifitisanelementoftheresultofQ

ineveryrepairofr.

6

Page 7: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Student

Nam

e

Address

Jones

Bu�alo

Jones

Amherst

Green

Albany

Advising

Nam

e

Advisor

Jones

Floyd

Green

Floyd

Theintegrityconstraintsarefunctionaldependencies:Nam

e

!

Address

and

Nam

e

!

Advisor.

Therepairs:

Student

Nam

e

Address

Jones

Bu�alo

Green

Albany

Advising

Nam

e

Advisor

Jones

Floyd

Green

Floyd

Student

Nam

e

Address

Jones

Amherst

Green

Albany

Advising

Nam

e

Advisor

Jones

Floyd

Green

Floyd

7

Page 8: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Consistentqueryanswers

SELECT*

FROMStudent

)

Name

Address

Green

Albany

SELECT*

FROMAdvising

)

Name

Advisor

Jones

Floyd

Green

Floyd

SELECTName

FROMStudent

)

Name

Jones

Green

Answerconsistencyservesasanindicationofitsqualityandreliability.

8

Page 9: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

InclusiondependencyEmployee[Name]�Taxpayer[Name].

Instance:

Employee

Name

Salary

Scott

60M

Steven

30M

Taxpayer

Name

Scott

Repairs:

Employee

Name

Salary

Scott

60M

Taxpayer

Name

Scott

Employee

Name

Salary

Scott

60M

Steven

30M

Taxpayer

Name

Scott

Steven

9

Page 10: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Aggregation

SQLaggregationoperators:MIN,MAX,COUNT,SUM,AVG.

SELECTCOUNT(*)

FROMStudent

WHEREAddress='Buffalo'

)

???

Adi�erentanswerineveryrepair.Thus:

SELECTCOUNT(*)

FROMStudent

WHEREAddress='Buffalo

)

[0;1]

Aconsistentqueryanswerforaggregationqueriesisanoptimalrangeof

values,representedasapair

[greatestlowerbound,leastupperbound].

10

Page 11: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Anotherexampleofaggregation.

Employee

Name!Salary

Name

Salary

Scott

68M

Scott

60M

Steven

30M

SELECTSUM(Salary)

FROMEmployee

)

[90M;98M]

11

Page 12: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Computingconsistentqueryanswers

Querytransformation:givenaqueryQ

andasetofintegrityconstraints,

constructaqueryQ0

suchthatforeverydatabaseinstancer

thesetofanswerstoQ0

inr=thesetofconsistentanswerstoQ

inr.

Representingallrepairs:givenasetofintegrityconstraintsandadatabase

instancer:

1.constructaspace-e�cientrepresentationofallrepairsofr

2.usethisrepresentationtoanswerqueries.

Specifyingrepairsaslogicprograms.

12

Page 13: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Therearetoomanyrepairstoevaluatethequeryineachofthem.

A

B

a1

b1

a1

b01

a2

b2

a2

b02

���

an

bn

an

b0n

UnderthefunctionaldependencyA!B,thisinstancehas2n

repairs.

13

Page 14: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Querytransformation[PODS'99]

First-orderqueriestransformedusingsemanticqueryoptimization

techniques.

Residues:

�associatedwithsingleliteralsp(�x)or:p(�x)(onlyoneofeachforevery

databaserelationp)

�foreachliteralp(�x)andeachconstraintcontaining:p(�x)initsclausal

form(possiblyaftervariablerenaming),obtainalocalresiduebyremoving

:p(�x)andthequanti�ersfor�xfromthe(renamed)constraint

�foreachliteral:p(�x)andeachconstraintcontainingp(�x)initsclausal

form(possiblyaftervariablerenaming),obtainalocalresiduebyremoving

p(�x)andthequanti�ersfor�xfromthe(renamed)constraint

�foreachliteral,computetheglobalresidueastheconjunctionoflocal

residues(possiblyafternormalizingvariables)

14

Page 15: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Thefunctionaldependency

(8x)(8y)(8z)(:Student(x;y)_:Student(x;z)_y=z)

producesforStudent(x;y)thefollowinglocalandglobalresidue

(8z)(:Student(x;z)_y=z)

Theintegrityconstraints

(8x)(:p(x)_r(x));(8x)(:q(x)_r(x)))

producethefollowingglobalresidues

Literal

Residue

p(x)

r(x)

q(x)

r(x)

:r(X)

:p(x)^:q(x)

15

Page 16: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Constructingthetransformedquery

Givena�rst-orderqueryQ.

Literalexpansion:foreveryliteral,constructanexpandedversionasthe

conjunctionofthisliteralanditsglobalresidue.

Iteration:theexpansionstepisiteratedbyreplacingtheliteralsintheresidue

bytheirexpandedversions,untilnochangesoccur.

Queryexpansion:replacetheliteralsinthequerybytheir�nalexpanded

versions.

16

Page 17: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Thefunctionaldependency

(8x)(8y)(8z)(:Student(x;y)_:Student(x;z)_y=z)

transformsthequeryStudent(x;y)into

Student(x;y)^(8z)(:Student(x;z)_y=z)

Fortheintegrityconstraints

(8x)(:p(x)_r(x));(8x)(:r(x)_s(x)))

Literal

Residue

Firstexpansion

Second(�nal)expansion

r(x)

s(x)

r(x)^s(x)

r(x)^s(x)

p(x)

r(x)

p(x)^r(x)

p(x)^r(x)^s(x)

:r(x)

:p(x)

:r(x)^:p(x)

:r(x)^:p(x)

:s(x)

:r(x)

:s(x)^:r(x)

:s(x)^:r(x)^:p(x)

17

Page 18: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Propertiesofthetransformation

TheoriginalqueryQ,thetransformedqueryQ0.

Soundness:

\ArealltheanswerstoQ0

alsoconsistentanswerstoQ?"

satis�edforanyuniversalconstraintsandalargeclassof�rst-orderqueries

Completeness:

\AreallconsistentanswerstoQ

alsoanswerstoQ0?"

satis�edforbinaryuniversalconstraintsandqueriesthatareconjunctionsof

literals

Termination:

\Doesthetransformationterminate?"

satis�edforacyclicsetsofconstraints(amongothers)

Corollary:Forfunctionaldependencies,querytransformationterminatesand

thetransformedqueryissoundandcomplete.

18

Page 19: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Theoreticallimitsofquerytransformation

Datacomplexity:complexityasafunctionofthenumberoftuplesinthe

database.

Fact(wellknown).First-orderqueriescanbeevaluatedinAC0.

Observation.Ifthetransformationterminates,thenthetransformedquery

canbeevaluatedinAC0

aswell.

Ifwecanshowthatthedatacomplexityofdeterminingwhetherananswertoa

queryisconsistentisNP-hard,thenquerytransformationisimpossible.

Result.DeterminingwhetherananswerisconsistentisNP-hardforan

existentiallyquanti�edqueryandasetoftwofunctionaldependencies.

19

Page 20: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Representingallrepairs[ICDT'01]

Notasaformula(asinbeliefrevision)butasagraph.

AsetoffunctionaldependenciesF,adatabaseinstancer.

Con ictgraph:

nodes:tuplesinr

edges:thereisanedge(t1;t2)ifthereisafunctionaldependencyA

!

B

2F

suchthatt1[A]=t2[A]andt1[B]6=t2[B].

maximalindependentsets:repairs

Student

Nam

e

Address

Jones

Bu�alo

Jones

Amherst

Green

Albany

)

Jones

Bu�alo

Green

Albany

Jones

Amherst

20

Page 21: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Anothercon ictgraph

FunctionaldependenciesA!B

andB!A.

Instance:A

B

a1

b1

a1

b2

a2

b2

a2

b1

Con ictgraph:

(a1;b1)

(a1;b2)

(a2;b1)

(a2;b2)

21

Page 22: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Aggregationqueries

Insimplecases,queriescanbetransformedtocomputeoptimalresultranges

forconsistentqueryanswers.

StudentCourse

Name!Address

Name

Address

Course

SELECTCOUNT(*)

FROMStudent

)

WITHS(Name,Addr,C)AS

(SELECTName,Address,COUNT(*)

FROMStudentCourse

GROUPBYName,Address),

T(Name,CMin,CMax)AS

(SELECTName,MIN(C),MAX(C)

FROMS

GROUPBYName)

SELECTSUM(CMin),SUM(CMax)

FROMT;

22

Page 23: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Datacomplexityofaggregationqueries

greatestlowerbound

leastupperbound

jFj=1

jFj�2

jFj=1

jFj�2

MIN(A)

PTIME

PTIME

PTIME

NP-complete

MAX(A)

PTIME

NP-complete

PTIME

PTIME

COUNT(*)

PTIME

NP-complete

PTIME

NP-complete

COUNT(A)

NP-complete

NP-complete

NP-complete

NP-complete

SUM(A)

PTIME

NP-complete

PTIME

NP-complete

AVG(A)

PTIMEa

NP-complete

PTIME

NP-complete

Howtoreducethecomputationalcost?2

3

Page 24: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Boyce-CoddNormalForm

Boyce-CoddNormalForm

(BCNF):Everyfunctionaldependencyisakey

dependency.

BCNFproducesrestrictionsonthecon ictgraphthatimprovetractability.

Property:ForonedependencyinBCNF,thecon ictgraphisaunionof

disjointcliques.

24

Page 25: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

BCNFandCOUNT(*)queries

Twofunctionaldependencies.

Indirectapproach:

�con ictgraphisclaw-freeandperfect

��ndingmaximum

independentsetsinsuchgraphscanbedoneinPTIME

(O(n5:5))

Directapproach:

�bipartitecliquegraph:

{nodes:cliquesin1-dependencycon ictgraphs

{edges:nonemptycliqueintersections

��ndingamaximum

matchinginthecliquegraph

�overallcomplexityO(max(e;n pn)).

25

Page 26: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Specifyingrepairsaslogicprograms[FQAS'00]

Logicprogramswith:

�negationinthebodyandthehead

�disjunction

�exceptions(canbeeliminated)

Scope:

�arbitraryuniversalconstraints,inclusiondependencies

�arbitrary�rst-orderqueries

�queriescanbe\modalized"andnested

26

Page 27: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Relatedwork

Beliefrevision:

�revisingdatabasewithintegrityconstraints

�revisedtheorychangeswitheachdatabaseupdate

�emphasisonsemantics,notcomputation

�inferenceofgroundliteralsusingtheorem

provingtechniques

Disjunctiveinformation:

�usingdisjunctionstorepresentcon ictsrelatedtofunctionaldependencies

�constructingasingledisjunctiveinstance

�querylanguages:representation-speci�c,relationalalgebraorcalculus

27

Page 28: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Reasoningwithinconsistency:

�avoidingtrivialityandcontradiction

�variouslogics:paraconsistent,minimal,...

�inference6=provability:tocapturenonmonotonicityofconsistentquery

answers(Bry)

28

Page 29: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Futurework

Broadeningscope:

�SQL

�recursivequeries

�classesofintegrityconstraints

�preferences

New

paradigms:

�queryreformulationininformationintegration

�datacleaning

�semistructureddata

29

Page 30: He, · giv en in tegrit y constrain ts): the consistency of the database will be restored b y executing further transactions in tegration of heterogeneous databases with duplicate

Algorithmsandcomputationalcomplexity:

�e�cientalgorithmsforqueryprocessinginspecialcases

�lowerbounds

�approximation

Papers:

1.M.Arenas,L.Bertossi,J.Chomicki,\ConsistentQueryAnswersin

InconsistentDatabases,"ACM

Symposium

onPrinciplesofDatabase

Systems(PODS),Philadelphia,May1999.

2.M.Arenas,L.Bertossi,J.Chomicki,\SpecifyingandQueryingDatabase

RepairsusingLogicProgramswithExceptions,"InternationalSymposium

onFlexibleQueryAnsweringSystems(FQAS),Warsaw,Poland,October

2000.

3.M.Arenas,L.Bertossi,J.Chomicki,\ScalarAggregationin

FD-InconsistentDatabases,"InternationalConferenceonDatabaseThory

(ICDT),London,UK,January2001.

30