31
XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian Baras, Brandon Berg, Denis Churin, Eugene Kogan SQL Server Microsoft Corp

XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

Embed Size (px)

Citation preview

Page 1: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

XQuery Implementation in a Relational Database System

Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian Baras, Brandon Berg, Denis Churin, Eugene Kogan

SQL ServerMicrosoft Corp

Page 2: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 2

Overview Background

XML Support in SQL Server 2005 OrdPath labeling of XML nodes XML indexes – PATH, VALUE, PROPERTY

Main topic – XQuery compilation Architecture XML operators Mapping XML operators to relational+ ops

Conclusions

Page 3: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 3

Create table DOCS ( ID int primary key,

XDOC xml)

XML stored in an internal, binary form (‘blob’) Optionally typed by a collection of XML schemas

Used for storage and query optimizations 3 of 5 methods on XML data type:

query(): returns XML type value(): returns scalar value exist(): checks conditions on XML nodes

XML indexing More information at http://msdn.microsoft.com/xml

Background XML Support in SQL Server 2005

Page 4: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 4

Background XQuery embedded in SQL Retrieve section titles from <book>

wrapped in new <topic> elements:

SELECT ID, XDOC.query(' for $s in /BOOK/SECTION return <topic>

{data($s/TITLE)} </topic>

') FROM DOCS

Page 5: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 5

Background XQuery – supported features XQuery clauses “for”, “where”, “return” and “order by” XPath axes – child, descendant, parent, attribute, self

and descendant-or-self Functions – numeric, string, Boolean, nodes, context,

sequences, aggregate, constructor, data accessor SQL Server extension functions to access SQL variable

and column data within XQuery Numeric operators (+, -, *, div, mod) Value comparison operators (eq, ne, lt, gt, le, ge) General comparison operators (=, !=, <, >, <=, >=)

Page 6: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 6

Background [SIGMOD04] ORDPATH Label of Nodes

BOOKBOOK11

SectionSection1.31.3

FigureFigure1.3.31.3.3

TitleTitle1.3.11.3.1

SectionSection1.51.5

TitleTitle1.5.11.5.1

FigureFigure1.5.31.5.3

@[email protected]

node1 precedes node2 in document order ORDPATH (node1) < ORDPATH (node2)

node1 is ancestor of node2 ORDPATH (node1) is prefix of ORDPATH (node2)

ORDPATH(1.3) ≤ id < Descendant_Limit (1.3) = 1.4

Page 7: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 7

Background [VLDB 2004]Indexing XML column Primary XML index on an XML column

Creates B+tree tree on data model content of the XML nodes

Adds column Path_ID for the reversed, encoded path from each XML node to root of XML tree

OrdPath labeling schema is used for XML nodes Relative order of nodes Document hierarchy

Page 8: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 8

Background XML exampleINSERT INTO myTable VALUES (7,‘<Book xmlns="myns" ISBN = "1-55860-3612">

<Section><Title>Bad Bugs</Title>

</Section><Section>

<Title> Tree frogs </Title><Figure>…</Figure>

</Section></Book>’)

Page 9: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 9

Background Primary XML Index Entries

ID ORDPATH TAG NODETYPE VALUE PATH_ID

7 1 1 (Book) 10 (ns:bT) NULL #1

7 1.1 2 (ISBN) 2 (xs:string) '1-55860-…' #2#1

7 1.3 3 (Section) 11 (ns:sT) NULL #3#1

7 1.3.1 4 (Title) 2 (xs:string) 'Bad Bugs' #4#3#1

7 1.3.3 5 (Figure) 12 (ns:fT) NULL #5#3#1

7 1.5 3 (Section) 11 (ns:sT) NULL #3#1

7 1.5.1 4 (Title) 2 (xs:string) 'Tree frogs' #4#3#1

7 1.5.3 5 (Figure) 12 (ns:fT) NULL #5#3#1

Clustering key- Encoding of tags & types stored in system meta-data - Additional details not shown

Page 10: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 10

Background Secondary XML indexes To speed up different classes of commonly

occurring queries

Statistics created on key columns of the primary and secondary XML indexes Used for cost-based selection of secondary XML

indexes

PATH path-based queries PATH_ID, VALUE, ID, ORDPATH

VALUE value-based queries

VALUE, PATH_ID, ID, ORDPATH

PROPERTY Object properties ID, PATH_ID, VALUE, ORDPATH

Page 11: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 11

Background Handling Types If XML column is typed

Values are stored in XML blob and XML indexes with appropriate typing

Untyped XML Values are stored as strings Convert to appropriate types for operations

SQL typed values stored in primary XML index Most SQL types are compatible with XQuery

types (integer) Value comparisons on XML index columns suffice Some types (e.g. xs:datetime) are stored in

internal format and processed specially

Page 12: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 12

XQuery Processing Architecture XQuery Compiler:

Parses XQuery expr Checks static type

correctness Type annotations Applies static optimiztns

Path collapsing Rewrites using XML

schemas XML Operator Mapper

Recursively traverses XML algebra tree

Converts each XmlOp to reln+ operator sub-tree

Mapping depends upon existence of primary XML index

XQuery expression

XQuery Compiler

XML algebra tree (XmlOp ops)

XML Operator Mapper

Relational Operator Tree (relational+ operators)

Reln Query Processor Reln Query Processor

Page 13: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 13

Examples of XML Operators

XmlOp_Select In: list of items, conditionOut: items satisfying condition

XmlOp_Path In: simple paths, no predicatesOpt: path context to collapse paths Out: eligible XML nodes

XmlOp_Apply In: two item lists Out: one item listVariable binding in “for” expression

XmlOp_Construct In: sub-nodes for element construction, otherwise valueOut: constructed node

Page 14: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 14

XML Operator Mapping – Overview

1

20

35

XMLPK

XQUERY

1

1

1

1

20

20

20

35

35

PK

REL+ tree

PrimaryXMLIndex

PATH Index

VALUE Index

PROPERTY Index

OrdPath

Special handling forSELECT * | XDOC

Page 15: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 15

New operators Some produce N rows from M (≠ N) rows

XML_Reader – streaming, pull-model XML parser XML_Serializer – to serialize query result as XML

Some are for efficiency Contains – to evaluate XQuery contains() TextAdd – to evaluate the XQuery function

string() Data – to evaluate XQuery data() function

Some are for specific needs Check – validate XML during insertion or

modification

Page 16: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 16

XML Operator Mapping

Following categories: Mapping of XPath expressions Mapping of XQuery expressions Mapping of XQuery built-in functions

Page 17: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 18

Non-indexed XML, Full Path XML_Reader produces

subtrees of <SECTION> Node table rows Contains OrdPath No PK or PATH_ID

XML_Serialize reassembles those row into XML data type To output result

XML operator tree:XML operator tree:

XmlOp_Path PATH = XmlOp_Path PATH = “ “/BOOK/SECTION”/BOOK/SECTION”

Rel+ operator tree:Rel+ operator tree:

XML_SerializeXML_Serialize

XML_Reader (XDOC, XML_Reader (XDOC, “/BOOK/SECTION”)“/BOOK/SECTION”)

Page 18: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 20

Sample query execution using Primary XML Index

ID ORDPATH TAG NODETYPE VALUE PATHID

7 1 1 (Book) 10 (ns:bT) NULL #1

7 1.1 2 (ISBN) 2 (xs:string) '1-55860-…' #2#1

7 1.3 3 (Section) 11 (ns:sT) NULL #3#1

7 1.3.1 4 (Title) 2 (xs:string) 'Bad Bugs' #4#3#1

7 1.3.3 5 (Figure) 12 (ns:fT) NULL #5#3#1

7 1.5 3 (Section) 11 (ns:sT) NULL #3#1

7 1.5.1 4 (Title) 2 (xs:string) 'Tree frogs' #4#3#1

7 1.5.3 5 (Figure) 12 (ns:fT) NULL #5#3#1

Clustering key• /Book/Section /Book/Section #3#1 (by #3#1 (by XML Op XML Op Mapper)Mapper)

Page 19: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 21

Indexed XML, Full Path XmlOp_Path

mapped to SELECT GET(PXI) – rows

from primary XML index Match PATH_ID

Not shown: JOIN with base table

on PK

XML_SerializeXML_Serialize

ApplyApply

Select ($b)Select ($b)

GETGET(PXI)(PXI)

Path_ID=#SECTION#BOOKPath_ID=#SECTION#BOOK

$b.OrdP $b.OrdP ≤ OrdP< ≤ OrdP< DL($b)DL($b)

GETGET(PXI)(PXI)

SelectSelect

Assemble Assemble SubtreeSubtree

Page 20: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 22

XML index – PATH PATH_ID VALUE ID ORDPATH

#1 NULL 7 1

#2#1 '1-55860-…' 7 1.1

#3#1 NULL 7 1.3

#3#1 NULL 7 1.5

#4#3#1 'Bad Bugs' 7 1.3.1

#4#3#1 'Tree frogs' 7 1.5.1

#5#3#1 NULL 7 1.3.3

#5#3#1 NULL 7 1.5.3 Speeds up path evaluations Example – /Book/Section #3#1

Page 21: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 23

Indexed XML, Imprecise Paths

/BOOK/SECTION//TITLE Matched using LIKE

operator on Path_ID

ApplyApply

Select ($s)Select ($s)

GETGET(PXI)(PXI)

Path_ID LIKE #TITLEPath_ID LIKE #TITLE%#SECTION#BOOK%#SECTION#BOOK

XML_SerializeXML_Serialize

Assemble Assemble subtree of subtree of <TITLE><TITLE>

Page 22: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 24

Path_ID=#@ISPath_ID=#@ISBN#BOOK & BN#BOOK & VALUE=“12” VALUE=“12”

&&Par($b)Par($b)

Predicate Evaluation /BOOK[@ISBN = “12”] Search value compared

with VALUE column in PXI Collapsed path

/BOOK/@ISBN Induce index seeks Reduce intermediate

result size Parent check – Par($b)

Using OrdPath Value conversion might

be needed

XML_SerializeXML_Serialize

ApplyApply

SelectSelect

GETGET(PXI)(PXI)

ApplyApply

Select ($b)Select ($b)

GETGET(PXI)(PXI)

Path_ID=Path_ID=#BOOK#BOOK

Assemble Assemble subtree of subtree of <BOOK><BOOK>

Page 23: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 25

Ordinal Predicate /BOOK[n] Adds ranking column to the rows for

<BOOK> elements Retrieves the nth <BOOK> node

Special optimizations [1] TOP 1 ascending [last()] TOP 1 descending Avoids sorting when input is sorted

Example – in XML_Serializer

Page 24: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 26

Error handling Static type errors at compilation time

Raises static type errors if an expression could fail at runtime due to type safety violation Addition of string to integer Querying non-existent node name in typed XML Non-singleton in “eq”

Some can be fixed using explicit cast or ordinal specification

Dynamic error converted to empty sequence Yields correct result in predicates without

negations

Page 25: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 27

“for” Iterator

Path_ID LIKE #@num#SEC%#BK & VALUE >= 3 & Par($s)

Select

Select ($s)

GET(PXI)

Path_ID LIKE #SECTION%#BOOK

Exists

GET(PXI)

Select

XML_Serialize

Assemble <SECTION>

Path_ID LIKE #TITLE#SECTION%#BOOK & Par($s)

Apply ($s)

Apply

for $s in /BOOK//SECTION where $s/@num >= 3 return $s/TITLE XML op for “for” is

XmlOp_Apply Maps to APPLY Binds $s and iterates

over <SECTION> Determines its <TITLE>

children Nested “for” and “for”

with multiple bindings turn into nested APPLY Each APPLY binds to a

different variable

Page 26: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 28

XQuery “order by” and “where” Order by:

Sorts rows based on order-by expression Adds a ranking column to these rows Ranking column converted into OrdPath values

Yield the new order of the rows Fits rest of query processing framework

Where Becomes SELECT on input sequence Filters rows satisfying specified condition

Page 27: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 29

XQuery “return”

Return nodes sequence in document order Use OrdPath values and XML_Serialize operator

New element and sequence constructions Merge constructed and existing nodes

into a single sequence (SWITCH_UNION)

Page 28: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 30

XQuery Functions & Operators

Built-in fn and op are mapped to relational fn and op if possible fn:count() count()

Additional support for XQuery types, functions and operators that cannot be mapped directly Intrinsics

Page 29: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 31

Optimizations

Exploiting Ordered Sets Sorting information (OrdPath) made

available to further relational operators XML_Serialize is an example

Using static type information Eliminates CONVERT() in operations Allows range scan on VALUE index

Page 30: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 32

Conclusions Built-up infrastructure for query processing

framework Other XQuery features (such as “let” and

typeswitch) can be implemented Data modification language

Fits into relational query processing framework XQuery features can be implemented using

rel++ operators Optimizations pose the biggest challenges More cost-based optimizations can be done

Enhanced costing model (e.g. choice of PXI) Matching materialized views

Page 31: XQuery Implementation in a Relational Database System Shankar Pal Istvan Cseri, Oliver Seeliger, Michael Rys, Gideon Schaller, Wei Yu, Dragan Tomic, Adrian

VLDB 2005 - Sep 1 S. Pal et al. 33

Thank you!