1 Incremental Validation of XML Databases Yannis Papakonstantinou Victor Vianu Computer Science...

Preview:

Citation preview

1

Incremental Validation of XML Databases

Yannis Papakonstantinou

Victor VianuComputer Science & Eng, UCSD

Incremental Validation of XML Databases:

XMLDatabase

Document Type Definition (DTD)

XML Schema/ XQuery Type

System

Updates

O(log n)

O(log2n)

n nodes

XML As Labeled Ordered Trees

cars

used new

car car car car

year model year model model

92 Civic 96 Acura

model

Civic Maxima

year

03

Document Type Definitions (DTDs): Abstraction & Example

cars

used new

car car car

year model year model model

root : carscars used newused car*new car*car (year|) model

car

modelyear

92 Civic 96 Acura Civic Maxima03

Tree Satisfying DTD, General Case

1 2 ii-1 i+1 k-1 k… …

1 2 k-1 k…

…a b c

root : … r

r

XML Schemas/XQuery Types as Specialized DTDs

cars

used new

car car car

year model year model model

root : carsT

carsT usedT newT

usedT carU *newT carN *carU yearT modelT

carN (yearT |) modelT

car

modelyear

usedT

newT

carsT

carU carNcarU, carN

modelTyearTyearTmodelT modelT modelTyearT

LABEL TYPEScar {carU, carN}cars {carsT}used {usedT} …

Tree Automata Specialized DTDs

cars

used new

car car car

year model year model model

car

modelyear

usedT

newT

carsT

carU,carN carN

carU,carN

carU,carN

modelTyearTyearTmodelT modelT modelTyearT

Incremental Validation Problem Statement

For each valid tree T use an auxiliary structure A(T)

so that,given a series of update commands

• efficiently decide if the updated tree T’ is valid

• efficiently update A(T) and T

Types of Updates: Node Renaming u(v, )

1 2 ii-1 i+1 k-1 k… …

r

1 2 k-1 k…

…a b c

v

Types of Updates: Deletion d(v)

1 2 i-1 i+1 k-1 k… …

r

…a b c

i

1 2 k-1 k…

v

Types of Updates: Insertion

1 2 i-1 i+1 k-1 k… …

r

…a b c

vi+1

i

insert_after(vi-1, i)

vi-1

Validating a Renaming u(i, ) on a Regular String of N : Take One

12 ii-1 i+1 n-1 n

… N…

Validation of one update in O(1) given

precomputedPre and Post

Post(i+1)

Pre(i-1)

u(i, ) requires recomputation of Pre(i),

Pre(i+1), … and of Post(i), Post(i-1), …

q0 1

2 i-1

qF

n

n-1i+1 …

q0

1

2 i-1

Transition Relation Definition

12 i j n-1 n

… …… …m

Ti,j = { (q, q’) | }

i+1

q i…i+1

q’j

m+1

Ti,j = Ti,m Tm+1,j

Transition Relation Trees

1 2 3 4 5 6 7 8

T5,8T1,4

T3,4T1,2 T5,6 T7,8

T1,1 T2,2

T3,3 T4,4

T5,5 T6,6

T7,7 T8,8

T1,8

Maintenance of the Structure and Validation in O(log n)

1 2 3 4 5 6 7 8

T1,1 T2,2 T3,3 T4,4 T5,5 T6,6 T7,7 T8,8

T1,2 T3,4 T5,6 T7,8

T5,8T1,4

T1,8

u(6, )

If (q0, qF) then valid

T6,6

T5,6

T5,8

T1,8

Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions

1

2

3

5

6

7

9

T1 T2 T3 T5 T6 T7 T9

Ta Tb Tc

Ta = T1 T2

If (q0, qF) Ta Tb Tc then valid

Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions

1

2

3

5

6

7

9

8

T1 T2 T3 T5 T6 T7 T8 T9

Ta Tb Tc

Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions

1

2

3

5

6

4

7

9

8

T1 T2T7 T8 T9

Ta Tb Tc

T3 T5 T6

Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions

T3 T4 T5 T6

1

2

3

5

6

4

7

9

8

T1 T2T7 T8 T9

Ta Tb Tc

Transition B-Trees (2-3 Trees) for O(log n) Insertions and Deletions

Ta Td Te Tc

T3 T4 T5 T6

1

2

3

5

6

4

7

9

8

T1 T2T7 T8 T9

Tf Tg

Auxiliary Structures for Incremental DTD Validation

1 2 ii-1 i+1 k-1 k… …

r

1 2 k-1 k…

vi

u(vi, )

r

i…

r

r

Specialized DTD Incremental Validation: Take One

a1 aiai-1 ai+1 ak…

r

b1 bk-1 bk…

vi

u(vi, )…

types(vi)={i,1,…, i,n}

types()

types()

types()

types(vi)={i,1,…, i,n}

types()

types()

types()

Inefficient for Deep Trees: Apply Divide-And-Conquer in Vertical Direction

Turn Specialized DTD into NFA

that validates a vertical line

“Fuse” vertical and horizontal directions

using binary treeand split work in both

Tree Satisfying Specialized DTD transformed into Binary Tree Accepted By Tree Automaton

a

b

c

d j k

e

f h

g i

a

b

c

d j k

e

f h

g i

#

#

#

#

# #

#

#

#

#

# #

Designate Lines in Binary Trees

Size( ) > 2 Size( )

Size( ) > 2 Size( )

Size( ) > 4 Size( )

Example Line Structure

a

b

c

d j k

e

f h

g i

#

#

#

#

# #

#

#

#

#

# #

a

c

db

#

f

#

j

e

k

#

h

g i

##

#

#

#

#

#

#

#

From Tree Automaton to Validating Lines with NFA

a

c

b

j

e

k

h

g

id fd

From Tree Automaton to Validating Lines with NFA

a

c

b, Tc

j

e

k

h

g

id, Tj f, Tg

Incremental Validation of the Line Structure in O(log2|T|)

a

c

b, Tc

j

e

k

h

g

if, Tg

m

d, Tj

Insert m after k #updated lines < 1 + log |T|Cost of line update O(log |T|)

Validating Insertions and Deletions: the Non-Line-Preserving Case

Inse

rtion

Key Complexity Results

Given m updates on tree of size n, incrementally validate DTD in O(m log n) given alphabet , size of maximum regular

expression d: O(m || d2 log d log n) Data structure of size O(d2 n)

Specialized DTDs in O(m log2 n) given set of types ’

O(m |’|2 d2 (log d + log |’|) log2 n) Data structure of size O(|’|2 d2 log2 n)

Lower complexity for 1-unambiguous

Ongoing and Future Work (with Andrey Balmin)

Incorporate Transition Relation Trees in B-Tree Structure

Exploit “locality” Experimental evaluation on set of 65 DTDs: In 96% of

type definitions an update may only affect transition relations of length<4

Common case much more efficient than worse case Detect the property and employ algorithms that do not

build trt’s in such cases Optimization over multiple updates More complex updates & edit operations

Recommended