51
Module 8 Module 8 General remarks General remarks about about XQuery XQuery

Module 8 General remarks about XQuery

  • Upload
    katen

  • View
    75

  • Download
    0

Embed Size (px)

DESCRIPTION

Module 8 General remarks about XQuery. Plan for today. The semantics of XQuery and the optimization XQuery and the static typing XQueryX -- the XML syntax of XQuery Current limitations of XQuery XQuery usage scenarios XQuery as a full (declarative) programming language XQuery vs. SQL - PowerPoint PPT Presentation

Citation preview

Page 1: Module 8 General remarks about XQuery

Module 8Module 8

General remarks General remarks aboutaboutXQuery XQuery

Page 2: Module 8 General remarks about XQuery

04/22/23 2

Plan for todayPlan for today The semantics of XQuery and the The semantics of XQuery and the optimizationoptimization

XQuery and the static typingXQuery and the static typing XQueryX -- the XML syntax of XQueryXQueryX -- the XML syntax of XQuery Current limitations of XQueryCurrent limitations of XQuery XQuery usage scenariosXQuery usage scenarios XQuery as a XQuery as a full (declarative) programming full (declarative) programming languagelanguage XQuery vs. SQLXQuery vs. SQL XQuery vs. other programming languagesXQuery vs. other programming languages

Why XQuery has big chances for successWhy XQuery has big chances for success Plan for the rest of the classPlan for the rest of the class

Page 3: Module 8 General remarks about XQuery

04/22/23 3

XQuery expressionsXQuery expressionsXQuery Expr :=Constants | Variable | FunctionCalls | XQuery Expr :=Constants | Variable | FunctionCalls |

PathExpr |PathExpr |

ComparisonExpr | ArithmeticExpr | LogicExpr |ComparisonExpr | ArithmeticExpr | LogicExpr |

FLWRExpr | ConditionalExpr | QuantifiedExpr |FLWRExpr | ConditionalExpr | QuantifiedExpr |

TypeSwitchExpr | InstanceofExpr | CastExpr |TypeSwitchExpr | InstanceofExpr | CastExpr |

UnionExpr | IntersectExceptExpr |UnionExpr | IntersectExceptExpr |

ConstructorExpr | ValidateExprConstructorExpr | ValidateExpr

Expressions can be nested with full generality !Expressions can be nested with full generality !

Functional programming heritage.Functional programming heritage.

Page 4: Module 8 General remarks about XQuery

04/22/23 4

A fraction of a real A fraction of a real customer XQuerycustomer XQuery

Page 5: Module 8 General remarks about XQuery

04/22/23 5

let $wlc := document("tests/ebsample/data/ebSample.xml")let $ctrlPackage := "foo.pkg"let $wfPath := "test"

let $tp-list :=for $tp in $wlc/wlc/trading-partnerreturn<trading-partner name="{$tp/@name}" business-id="{$tp/party-identifier/@business-id}" description="{$tp/@description}" notes="{$tp/@notes}" type="{$tp/@type}" email="{$tp/@email}" phone="{$tp/@phone}" fax="{$tp/@fax}" username="{$tp/@user-name}"

Page 6: Module 8 General remarks about XQuery

04/22/23 6

{ for $tp-ad in $tp/address return $tp-ad } { for $eps in $wlc/extended-property-set where $tp/@extended-property-set-name eq $eps/@name return $eps } { for $client-cert in $tp/client-certificate return <client-certificate name="{$client-cert/@name}" > </client-certificate> }

Page 7: Module 8 General remarks about XQuery

04/22/23 7

{ for $server-cert in $tp/server-certificate return <server-certificate name="{$server-cert/@name}" > </server-certificate> } { for $sig-cert in $tp/signature-certificate return <signature-certificate name="{$sig-cert/@name}" > </signature-certificate> } { for $enc-cert in $tp/encryption-certificate return <encryption-certificate name="{$enc-cert/@name}" > </encryption-certificate> }

Page 8: Module 8 General remarks about XQuery

04/22/23 8

{ for $eb-dc in $tp/delivery-channel for $eb-de in $tp/document-exchange for $eb-tp in $tp/transport where $eb-dc/@document-exchange-name eq $eb-de/@name and $eb-dc/@transport-name eq $eb-tp/@name and $eb-de/@business-protocol-name eq "ebXML" return <ebxml-binding name="{$eb-dc/@name}" business-protocol-name="{$eb-de/@business-protocol-name}" business-protocol-version="{$eb-de/@protocol-version}" \

is-signature-required="{$eb-dc/@nonrepudiation-of-origin}" is-receipt-signature-required="{$eb-dc/@nonrepudiation-of-receipt}"

signature-certificate-name="{$eb-de/EBXML-binding/@signature-certificate-n}" delivery-semantics="{$eb-de/EBXML-binding/@delivery-semantics}" { if(xf:empty($eb-de/EBXML-binding/@ttl)) then() else attribute persist-duration {concat(($eb-de/EBXML-binding/@ttl div 1000), " seconds")}

}

Page 9: Module 8 General remarks about XQuery

04/22/23 9

{ if( xf:empty($eb-de/EBXML-binding/@retries)) then () else $eb-de/EBXML-binding/@retries } { if( xf:empty($eb-de/EBXML-binding/@retry-interval)) then () else attribute retry-interval {concat(($eb-de/EBXML-binding/@retry-interval div 1000), " seconds")} }

<transport protocol="{$eb-tp/@protocol}" protocol-version="{$eb-tp/@protocol-version}" endpoint="{$eb-tp/endpoint[1]/@uri}" > {

Page 10: Module 8 General remarks about XQuery

04/22/23 10

for $ca in $wlc/wlc/collaboration-agreement for $p1 in $ca/party[1] for $p2 in $ca/party[2] for $tp1 in $wlc/wlc/trading-partner for $tp2 in $wlc/wlc/trading-partner where $p1/@delivery-channel-name eq $eb-dc/@name and $tp1/@name eq $p1/@trading-partner-name and $tp2/@name eq $p2/@trading-partner-name or $p2/@delivery-channel-name eq $eb-dc/@name and $tp1/@name eq $p1/@trading-partner-name and $tp2/@name eq $p2/@trading-partner-name

Page 11: Module 8 General remarks about XQuery

04/22/23 11

return if ($p1/@trading-partner-name=$tp/@name) then <authentication client-partner-name="{$tp2/@name}" client-certificate-name="{$tp2/client-certificate/@name}" client-authentication="{ if(xf:empty($tp2/client-certificate)) then "NONE" else "SSL_CERT_MUTUAL" }" server-certificate-name="{ if($tp1/@type="REMOTE") then $tp1/server-certificate/@name else "" }" server-authentication="{ if($eb-tp/@protocol="http") then "NONE" else "SSL_CERT" }"

Page 12: Module 8 General remarks about XQuery

04/22/23 12

> </authentication> else <authentication client-partner-name="{$tp1/@name}" client-certificate-name="{$tp1/client-certificate/@name}" client-authentication="{ if(xf:empty($tp1/client-certificate)) then "NONE" else "SSL_CERT_MUTUAL" }" server-certificate-name="{ if($tp2/@type="REMOTE") then $tp2/server-certificate/@name else "" }" server-authentication="{ if($eb-tp/@protocol="http") then "NONE" else "SSL_CERT" }" > </authentication>

Page 13: Module 8 General remarks about XQuery

04/22/23 13

} </transport> </ebxml-binding> }{-- RosettaNet Binding --} { for $eb-dc in $tp/delivery-channel for $eb-de in $tp/document-exchange for $eb-tp in $tp/transport where $eb-dc/@document-exchange-name eq $eb-de/@name and $eb-dc/@transport-name eq $eb-tp/@name and $eb-de/@business-protocol-name eq "RosettaNet" return <rosettanet-binding name="{$eb-dc/@name}" business-protocol-name="{$eb-de/@business-protocol-name}" business-protocol-version="{$eb-de/@protocol-version}"

Page 14: Module 8 General remarks about XQuery

04/22/23 14

is-signature-required="{$eb-dc/@nonrepudiation-of-origin}" is-receipt-signature-required="{$eb-dc/@nonrepudiation-of-receipt}" signature-certificate-name="{$eb-de/RosettaNet-binding/@signature-certi\ficate-name}" encryption-certificate-name="{$eb-de/RosettaNet-binding/@encryption-cer\tificate-name}" cipher-algorithm="{$eb-de/RosettaNet-binding/@cipher-algorithm}" encryption-level="{ if ($eb-de/RosettaNet-binding/@encryption-level = 0) then "NONE" else if($eb-de/RosettaNet-binding/@encryption-level = 1) then "PAYLOAD" else "ENTIRE_PAYLOAD" }" {-- process-timeout="{$eb-de/RosettaNet-binding/@time-out}" --}

> { if( xf:empty($eb-de/RosettaNet-binding/@retries)) then () else $eb-de/RosettaNet-binding/@retries }

Page 15: Module 8 General remarks about XQuery

04/22/23 15

{ if(xf:empty($eb-de/RosettaNet-binding/@retry-interval)) then () else attribute retry-interval {concat(($eb-de/RosettaNet-binding/@retry-interval div 1000), "\ seconds")} } { if(xf:empty($eb-de/RosettaNet-binding/@time-out)) then() else attribute process-timeout {concat(($eb-de/RosettaNet-binding/@time-out div 1000), " secon\ds")}

} <transport protocol="{$eb-tp/@protocol}" protocol-version="{$eb-tp/@protocol-version}" endpoint="{$eb-tp/endpoint[1]/@uri}" > {

Page 16: Module 8 General remarks about XQuery

04/22/23 16

for $ca in $wlc/wlc/collaboration-agreement for $p1 in $ca/party[1] for $p2 in $ca/party[2] for $tp1 in $wlc/wlc/trading-partner for $tp2 in $wlc/wlc/trading-partner where $p1/@delivery-channel-name eq $eb-dc/@name and $tp1/@name eq $p1/@trading-partner-name and $tp2/@name eq $p2/@trading-partner-name or $p2/@delivery-channel-name eq $eb-dc/@name and $tp1/@name eq $p1/@trading-partner-name and $tp2/@name eq $p2/@trading-partner-name

return if ($p1/@trading-partner-name=$tp/@name) then <authentication

Page 17: Module 8 General remarks about XQuery

04/22/23 17

<authentication client-partner-name="{$tp2/@name}" client-certificate-name="{$tp2/client-certificate/@name}" client-authentication="{ if(xf:empty($tp2/client-certificate)) then "NONE" else "SSL_CERT_MUTUAL" }" server-certificate-name="{ if($tp1/@type="REMOTE") then $tp1/server-certificate/@name else "" }" server-authentication="{ if($eb-tp/@protocol="http") then "NONE" else "SSL_CERT" }"

> </authentication>

Page 18: Module 8 General remarks about XQuery

04/22/23 18

else <authentication client-partner-name="{$tp1/@name}" client-certificate-name="{$tp1/client-certificate/@name}" client-authentication="{ if(xf:empty($tp1/client-certificate)) then "NONE" else "SSL_CERT_MUTUAL" }" server-certificate-name="{ if($tp2/@type="REMOTE") then $tp2/server-certificate/@name else "" }" server-authentication="{ if($eb-tp/@protocol="http") then "NONE" else "SSL_CERT" }" > </authentication>

Page 19: Module 8 General remarks about XQuery

04/22/23 19

} </transport> </rosettanet-binding> }

</trading-partner>

let $sv :=for $cd in $wlc/wlc/conversation-definitionfor $role in $cd/role

where xf:not(xf:empty($role/@wlpi-template) or $role/@wlpi-template="") and $cd/@business-protocol-name="ebXML" or $cd/@business-protocol-name="RosettaNet"

return <servicePair> <service name="{xf:concat($wfPath, $role/@wlpi-template, '.jpd')}" description="{$role/@description}" note="{$role/@note}" service-type="WORKFLOW" business-protocol="{xf:upper-case($cd/@business-protocol-name)}" >

Page 20: Module 8 General remarks about XQuery

04/22/23 20

. . . (60 % more to come)

Page 21: Module 8 General remarks about XQuery

04/22/23 21

XQuery is XQuery is not (only) not (only) a a queryquery

languagelanguage Declarative programming languageDeclarative programming language General purpose XML to XML General purpose XML to XML transformation enginetransformation engine

Designed for Designed for optimizabilityoptimizability. . Primary goal in the design of Primary goal in the design of the language.the language.

Impact on the semantics of the Impact on the semantics of the language.language.

Page 22: Module 8 General remarks about XQuery

04/22/23 22

The XQuery semantics The XQuery semantics and the optimization and the optimization

(1)(1) Trade-offTrade-off between between optimizabilityoptimizability (on one (on one side) and side) and complexity, non-determinism and complexity, non-determinism and expressive powerexpressive power (on the other side) (on the other side)

Query languages are more optimizable but pay a price on the Query languages are more optimizable but pay a price on the other sideother side

Imperative languages lack optimizability but the semantics is Imperative languages lack optimizability but the semantics is simpler, deterministic , and richersimpler, deterministic , and richer

How can we achieve better performance ?How can we achieve better performance ?1.1. Allow to execute sub-computations in a different orderAllow to execute sub-computations in a different order

1.1. Parallelization, reschedulingParallelization, rescheduling2.2. Possible to use various data access pathsPossible to use various data access paths3.3. Allow lazy evaluationAllow lazy evaluation4.4. Allow streaming/pipelining between operations (no Allow streaming/pipelining between operations (no

materialization of intermediate results)materialization of intermediate results)5.5. Allow various evaluation algorithms for the same logical Allow various evaluation algorithms for the same logical

operationoperation

Page 23: Module 8 General remarks about XQuery

04/22/23 23

The XQuery semantics The XQuery semantics and the optimization and the optimization

(2)(2) Allow to execute sub-computations in a different order (e.g. Allow to execute sub-computations in a different order (e.g. parallelization, rescheduling)parallelization, rescheduling)

XQueryXQuery: no real side-effects, errors are non-deterministic: no real side-effects, errors are non-deterministic (1,2,3, 1 div 0) [1] can return either (1,2,3, 1 div 0) [1] can return either 1,1, or or errorerror

Allow lazy evaluationAllow lazy evaluation Possible to use various data access pathsPossible to use various data access paths

XQueryXQuery: and, or commutative, shortcircuiting operations, errors : and, or commutative, shortcircuiting operations, errors again non-deterministicagain non-deterministic

( 1 eq 2) and (1 div 0 eq 2) can return both ( 1 eq 2) and (1 div 0 eq 2) can return both falsefalse, or , or errorerror Allow streaming/pipelining between operations (no Allow streaming/pipelining between operations (no

materialization of intermediate results)materialization of intermediate results) Allow various evaluation algorithms for the same logical Allow various evaluation algorithms for the same logical

operationoperation XQuery XQuery : unordered { expr } : unordered { expr } ( unordered { (1,2,3,4) } ) [1] => 1, 2, 3 or 4 can be result( unordered { (1,2,3,4) } ) [1] => 1, 2, 3 or 4 can be result (unordered { //book[@year=1999]/title } )[1](unordered { //book[@year=1999]/title } )[1]

Page 24: Module 8 General remarks about XQuery

04/22/23 24

XQuery type system XQuery type system XQuery has a powerful (and complex!) type systemXQuery has a powerful (and complex!) type system XQuery types are imported from XML SchemasXQuery types are imported from XML Schemas Every XML data model instance has a dynamic typeEvery XML data model instance has a dynamic type Every XQuery expression has a static typeEvery XQuery expression has a static type PessimisticPessimistic static type inference static type inference

However, most implementations have an However, most implementations have an optimistic optimistic static typing inferencestatic typing inference(1, “foobar”)[1] + 1 => pessimistic static typing error, optimistic no(1, “foobar”)[1] + 1 => pessimistic static typing error, optimistic no

Optional feature, few implement it, Galax the correct oneOptional feature, few implement it, Galax the correct one The goal of the type system is:The goal of the type system is:

1.1. detect statically errors in the queriesdetect statically errors in the queries2.2. infer the type (and or the shape/schema) of the result of valid queriesinfer the type (and or the shape/schema) of the result of valid queries

#### Type and schemas are not the same thing !!!!#### Type and schemas are not the same thing !!!!

3.3. ensure statically that the result of a given query is of a given (expected) type ensure statically that the result of a given query is of a given (expected) type if the input dataset is guaranteed to be of a given typeif the input dataset is guaranteed to be of a given type

Page 25: Module 8 General remarks about XQuery

04/22/23 25

XQuery type system XQuery type system componentscomponents

Atomic typesAtomic types xs:untypedAtomicxs:untypedAtomic All 19 primitive XML Schema typesAll 19 primitive XML Schema types All user defined atomic typesAll user defined atomic types

Empty, NoneEmpty, None Type constructors (simplification!)Type constructors (simplification!)

Elements: Elements: element name {type}element name {type} Attributes: Attributes: attribute name {type}attribute name {type} Alternation : Alternation : type1 | type 2type1 | type 2 Sequence: Sequence: type1, type2type1, type2 Repetition: Repetition: type*type* Interleaved product: Interleaved product: type1 & type2type1 & type2

• type1 intersect type2 ?• type1 subtype of type2 ?• type1 equals type2 ?

Page 26: Module 8 General remarks about XQuery

04/22/23 26

XQueryX: the XML syntax XQueryX: the XML syntax for XQueryfor XQuery

Most XML languages (schema, Most XML languages (schema, programming, forms, etc) have an XML programming, forms, etc) have an XML syntaxsyntax

““normal” XQuery doesn’tnormal” XQuery doesn’t Has been designed for human programmersHas been designed for human programmers

XQueryX: an alternative, XML-based XQueryX: an alternative, XML-based syntax for Xquerysyntax for Xquery The parsed abstract syntax tree in XMLThe parsed abstract syntax tree in XML Has an XML Schema for an XQuery programHas an XML Schema for an XQuery program

Go to the XQueryX specification…Go to the XQueryX specification…

Page 27: Module 8 General remarks about XQuery

04/22/23 27

XQueryX: the advantagesXQueryX: the advantages XQuery programs are also dataXQuery programs are also data

Can be stored, queried, updated, processed Can be stored, queried, updated, processed in the same way, with the same languages in the same way, with the same languages like the rest of the data (remember like the rest of the data (remember Lisp ?)Lisp ?)

Code becomes dataCode becomes data Automatic code generation, rewritingAutomatic code generation, rewriting We can blend data with code, and with We can blend data with code, and with schemas -- all have an XML syntaxschemas -- all have an XML syntax

Blurs the distinction between data, Blurs the distinction between data, metadata, codemetadata, code

Page 28: Module 8 General remarks about XQuery

04/22/23 28

Mistakes and limitations Mistakes and limitations of Xquery 1.0of Xquery 1.0

Mistakes: TheMistakes: The namename ! ! Limitations, missing functionality in XQuery 1.0Limitations, missing functionality in XQuery 1.0

Dynamic namespace generationDynamic namespace generation Better support for group-by and outer-joinsBetter support for group-by and outer-joins Support for referencesSupport for references Continuous queries, window queriesContinuous queries, window queries Better integration with XSLT Better integration with XSLT Integration with Web Services Integration with Web Services AssertionsAssertions Error handling: try/catchError handling: try/catch Scripting extensionsScripting extensions

Variable assignmentVariable assignment Sequential evaluation mode for side-effectsSequential evaluation mode for side-effects BlocksBlocks

evaleval((XQueryX-fragmentXQueryX-fragment)) Integration with Semantic Search and ontologies Integration with Semantic Search and ontologies

Page 29: Module 8 General remarks about XQuery

04/22/23 29

XQuery Use Case Scenarios XQuery Use Case Scenarios (1)(1)

XML transformation language in Web Services Large and very complex queries Input message + external data sources Small and medium size data sets Transient and streaming data (no indexes) With or without schema validation

XML message brokers Simple path expressions, single input message Small data sets Transient and streaming data (no indexes) Mostly non schema validated data

Semantic data verification Mostly messages Potentially complex (but small) queries

Mid-tier

Mid-tier

Mid-tier, server, client

Page 30: Module 8 General remarks about XQuery

04/22/23 30

XQuery Usage Scenarios XQuery Usage Scenarios (2)(2)

Data Integration Complex but smaller queries (FLOWRs, aggregates, constructors) Large, persistent, external data repositories Dynamic data (via Web Services invocations)

Large volumes of blend relational and XML data Structured data with unstructured/semistructured extensions Complex queries Read/write data

Large volumes of XML logs and archives Web services, RFIDs, etc Complex queries (statistics, analytics) Mostly read only

Large content repositories Large volume of data (books, manuals, etc) With or without schema validation Full text essential, update required

Mid-tier, server, client

Database server

Content server

Database server

Page 31: Module 8 General remarks about XQuery

04/22/23 31

Large volumes of distributed textual data XML search engines High volume of data sources

Full text, semantic search crucial RSS filtering and aggregation

High number of input data channels Data is pushed, not pulled Structure of the data very simple, each item bounded size

Aggregators using mostly full-text search XML data transformation and integration on mobile devices Small XML messages Transformation or aggregation queries Caching is important Streaming very important

XQuery Usage Scenarios XQuery Usage Scenarios (3)(3)

Web

Web

Mobile devices

Page 32: Module 8 General remarks about XQuery

04/22/23 32

XQuery usage scenarios XQuery usage scenarios (4)(4)

Content re-purposing E.g. customized books and articlesE.g. customized books and articles E.g. enterprise customized engineering documentation E.g. enterprise customized engineering documentation (product requirements, specs, etc)(product requirements, specs, etc)

Streamline automatic processing E.g. the creation of the W3C specificationsE.g. the creation of the W3C specifications

From the From the samesame XML document we generate automatically the XML document we generate automatically the XQuery, Xpath 2.0, Function Libraries specifications, plus XQuery, Xpath 2.0, Function Libraries specifications, plus the Javacc code that implements the XQuery parser, plus the Javacc code that implements the XQuery parser, plus the tests that correctly test the grammar. All those are the tests that correctly test the grammar. All those are Xquery views of the same XML document !Xquery views of the same XML document !

(Ajax-style) dynamic Web pages Xquery is a better way to manipulate the XML of the Web Xquery is a better way to manipulate the XML of the Web pages then Javascriptpages then Javascript

Re-programming the Web /scripting the Web /mashups

Page 33: Module 8 General remarks about XQuery

04/22/23 33

Criteria for XQuery Criteria for XQuery usagesusages1. Type of queries (e.g. simple, complex, construction-

intensive, full text search intensive)2. Volume of queries3. Native XML or virtual XML views of other forms of data4. XML Schema validated data or not5. Volume of data per query6. Number of data sources7. Transient data vs. persistent data8. Transacted vs. non-transacted data9. Push data vs. pull data10. Typed vs. untyped data11. Read only data vs. updatable data12. Distributed vs. centralized data sets13. Data compressed/encrypted or not14. Target architectures15. Customer expectation

Each scenario requires Each scenario requires differentdifferent processing techniques. processing techniques.

Page 34: Module 8 General remarks about XQuery

04/22/23 34

XQuery vs. SQL: beyond XQuery vs. SQL: beyond the tree vs. tablethe tree vs. table

Persistent data

SQL

Transacted data Declarative

processing

Persistent data

Transacted data Declarative

processing

XQuery

“XQuery: the XML replacement for SQL ?”No, it’s more likely that in the long term will be the declarative replacement for imperative programming languages like Java or C#.

Page 35: Module 8 General remarks about XQuery

04/22/23 35

Making XQuery a full XML Making XQuery a full XML scripting language (1)scripting language (1)

XQuery is Turing complete, yet “incomplete”XQuery is Turing complete, yet “incomplete” Users need to write application logic on their dataUsers need to write application logic on their data The killer advantages of XML erased by Java, The killer advantages of XML erased by Java, JavaScript, or C#JavaScript, or C#

Huge pressure to integrate native XML processing Huge pressure to integrate native XML processing with existing programming languages:with existing programming languages: C#, EcmaScript, Python, PhP extensions, etc, etcC#, EcmaScript, Python, PhP extensions, etc, etc

JavaScript XMLXQuery

XML scripting

extensions

Page 36: Module 8 General remarks about XQuery

04/22/23 36

Making XQuery a full Making XQuery a full scripting language (2)scripting language (2)

Users are already using Users are already using XQuery as a scripting XQuery as a scripting language !language !

Major missing pieces in Major missing pieces in XQuery:XQuery: Order of evaluation has Order of evaluation has to be deterministicto be deterministic

Visible updates/side-Visible updates/side-effectseffects

Variable assignmentVariable assignment Error handlingError handling

DB storage(supports XML)

Application logic(Java/C#)

Communication(XML)

Client(XHTML, scripts)

XQuery

XQuery

Page 37: Module 8 General remarks about XQuery

04/22/23 37

Why will XQuery be Why will XQuery be successful ?successful ?

Let’s look back at why is XML Let’s look back at why is XML successful.successful.

Page 38: Module 8 General remarks about XQuery

04/22/23 38

Reasons for the Reasons for the overwhelming success of overwhelming success of

XMLXML XML is a general data representation format XML is a general data representation format XML is human readableXML is human readable XML is machine readableXML is machine readable XML is internationalized (UNICODE)XML is internationalized (UNICODE) XML is platform independentXML is platform independent XML is vendor independentXML is vendor independent XML is endorsed by the World Wide Web Consortium XML is endorsed by the World Wide Web Consortium XML is not a new technology (SGML, HTML)XML is not a new technology (SGML, HTML) XML is not XML is not onlyonly a data representation format, it’s a data representation format, it’s

a full infrastructure of technologiesa full infrastructure of technologies

Page 39: Module 8 General remarks about XQuery

04/22/23 39

REALREAL reason for the reason for the overwhelming successoverwhelming success

Helps companies to Helps companies to cut costscut costs in information in information exchangeexchange 1. Avoids the cost of building custom parsers1. Avoids the cost of building custom parsers 2. Good quality, low cost parsing software becomes 2. Good quality, low cost parsing software becomes

a commoditya commodity 3. Minimizes the cost of training3. Minimizes the cost of training 4. Avoids the cost associated with schemas (the 4. Avoids the cost associated with schemas (the

evil of all evils :)evil of all evils :) … … sometimes at the expense of increased sometimes at the expense of increased

hardware cost due to (bad) parsing hardware cost due to (bad) parsing performance…performance… But that’s OK as long it can be parallelized on But that’s OK as long it can be parallelized on

cheap machinescheap machines

Page 40: Module 8 General remarks about XQuery

04/22/23 40

The cost of schemasThe cost of schemas Methodology we teach the database students:Methodology we teach the database students:

Gather requirements from the application domainGather requirements from the application domainDesign (and agree on) a schemaDesign (and agree on) a schemaWrite the code (queries + application)Write the code (queries + application)Populate the database Populate the database Execute the codeExecute the code

Agreeing on schemas is Agreeing on schemas is the most expensive stepthe most expensive step in software in software engineeringengineering

Plus prohibits the evolution and the customizationPlus prohibits the evolution and the customization

The current information management technology (Java, SQL) The current information management technology (Java, SQL) doesn’t allow us to apply the previous steps in other doesn’t allow us to apply the previous steps in other order, nor to bypass the schema designorder, nor to bypass the schema designXML does.XML does.Semi-structured data is the new black in IT industry :-)Semi-structured data is the new black in IT industry :-)

GoogleBase, etc.GoogleBase, etc.

Page 41: Module 8 General remarks about XQuery

04/22/23 41

Processing XML dataProcessing XML data• Huge amount of XML information, and Huge amount of XML information, and

rapidly growingrapidly growing• We need to “We need to “processprocess” it” it

• Store it efficiently• Verify the correctness Verify the correctness • Filter, search, selectFilter, search, select• Transform, normalize, reshapeTransform, normalize, reshape• Join, aggregateJoin, aggregate• Create new dataCreate new data • Update the dataUpdate the data• Take actions based on the existing dataTake actions based on the existing data

• XQuery has been designed as a XQuery has been designed as a solution to the XML processing solution to the XML processing problemproblem

Page 42: Module 8 General remarks about XQuery

04/22/23 42

Alternative solutions to Alternative solutions to XML processingXML processing

Java, C + APIs (e.g. DOM, SAX, Java, C + APIs (e.g. DOM, SAX, JSR170)JSR170)

Perl, PhP, JavaScript + APIsPerl, PhP, JavaScript + APIs Xlinq (MSFT C# extension)Xlinq (MSFT C# extension) Code generatorsCode generators SQL/XMLSQL/XML XSLTXSLT See Sigmod’06 tutorial on “XML See Sigmod’06 tutorial on “XML programming techniques”programming techniques”

Page 43: Module 8 General remarks about XQuery

04/22/23 43

Why XQuery Why XQuery willwill be be successfulsuccessful

XQuery helps companies XQuery helps companies cut costs cut costs on information processingon information processingAvoids the cost of building custom Avoids the cost of building custom XML processorsXML processorsImproves productivityImproves productivityGood quality, low cost XML Good quality, low cost XML processing software (processing software (will) will) become a become a commoditycommodity(Will)(Will) minimize the cost of training minimize the cost of training(Partially) avoids the cost (Partially) avoids the cost associated with schemas associated with schemas (Will) (Will) guarantee best performanceguarantee best performance

Page 44: Module 8 General remarks about XQuery

04/22/23 44

XQuery and the XQuery and the productivityproductivity

Manipulates Manipulates onlyonly XML XML Dealing with two type systems (e.g. Java Dealing with two type systems (e.g. Java

integer vs. XML integer) is extremely integer vs. XML integer) is extremely tedioustedious

Handles Handles all all XML correctlyXML correctly Typed and untyped, all corner cases of Typed and untyped, all corner cases of

XML (e.g. NS)XML (e.g. NS) DeclarativeDeclarative

Smaller amount of code to writeSmaller amount of code to write Less decisions to make as a programmerLess decisions to make as a programmer

streaming, or not, indexes, parallelizationstreaming, or not, indexes, parallelization Possible to generate automaticallyPossible to generate automatically

Page 45: Module 8 General remarks about XQuery

04/22/23 45

XQuery performanceXQuery performance Lots of folklore in industryLots of folklore in industry

““Everything related to XML Everything related to XML has has to be slow”to be slow” No, writing No, writing manually optimizedmanually optimized C# or Java C# or Java

over SAX isn’t the answer -- over SAX isn’t the answer -- it is not it is not robust to evolutions !!robust to evolutions !!

Unfortunately:Unfortunately: We don’t have benchmarks yetWe don’t have benchmarks yet We don’t have good XML processing literature We don’t have good XML processing literature

yetyet Situation now:Situation now:

In DB, XQuery is executed with the same In DB, XQuery is executed with the same engines as SQLengines as SQL

Good new engines, and improving fast (BEA, Good new engines, and improving fast (BEA, Saxon, exist, BerkelyDB, etc)Saxon, exist, BerkelyDB, etc)

XQuery has better chances for good XQuery has better chances for good performance then any of the alternativesperformance then any of the alternatives

Page 46: Module 8 General remarks about XQuery

04/22/23 46

XQuery automatic XQuery automatic optimizationoptimization Feasible (done in many implementations) in Feasible (done in many implementations) in

XQuery:XQuery: Automatic data partitioning, clustering and placementAutomatic data partitioning, clustering and placement Automatic use of secondary access structures (indexes)Automatic use of secondary access structures (indexes) Automatic decision about streaming vs. materializationAutomatic decision about streaming vs. materialization Automatic cachingAutomatic caching Parallelization of codeParallelization of code Program decompositionProgram decomposition Program shipping vs. data shippingProgram shipping vs. data shipping Rewriting based on assertionsRewriting based on assertions Detecting code (in)dependenceDetecting code (in)dependence Rescheduling/reordering of the codeRescheduling/reordering of the code

Impossible (or Impossible (or muchmuch harder) in Xlinq, Perl, harder) in Xlinq, Perl, Javascript, etcJavascript, etc

Global dataflow requiredGlobal dataflow required What operations are executed on each data itemWhat operations are executed on each data item What data items will be processed by each operationWhat data items will be processed by each operation

Declarativity of XQuery helpsDeclarativity of XQuery helps

Page 47: Module 8 General remarks about XQuery

04/22/23 47

Frequent criticism of Frequent criticism of XQueryXQuery ““The performance of XQuery engines isn’t The performance of XQuery engines isn’t

acceptable”acceptable” Why is the alternative any better !?Why is the alternative any better !? For XQuery, there is hope. For XLinq, PhP, little.For XQuery, there is hope. For XLinq, PhP, little. Take ideas from both database optimization Take ideas from both database optimization andand

programming languages compilation, plus innovateprogramming languages compilation, plus innovate Lots of fun research to be done !!!Lots of fun research to be done !!!

““It will never perform as well as if we write It will never perform as well as if we write the application in Java + SAX”the application in Java + SAX” Maybe true today, not sure in near futureMaybe true today, not sure in near future Optimizing a single XML applications vs. optimizing an Optimizing a single XML applications vs. optimizing an

XQuery(P) engine (I.e.XQuery(P) engine (I.e. all all XML applications) XML applications) ““There are no libraries”There are no libraries”

Let’s build some…Let’s build some… We will not need the We will not need the samesame libraries like in Java or C# libraries like in Java or C#

Different level of abstractionDifferent level of abstraction The target applications are differentThe target applications are different

““XQuery is too complicated”XQuery is too complicated” !?!?

Page 48: Module 8 General remarks about XQuery

04/22/23 48

Frequent criticism (2)Frequent criticism (2) ““Programmers do not know how to Programmers do not know how to program declaratively”program declaratively” What about SQL !?What about SQL !? You are the generation who will decide You are the generation who will decide this.this.

““This would require users to learn a This would require users to learn a new language”new language” Smooth transition, easy integration of Smooth transition, easy integration of pieces written in other languages pieces written in other languages (thanks WS!)(thanks WS!)

Page 49: Module 8 General remarks about XQuery

04/22/23 49

How bad is the bleeding How bad is the bleeding edge ?edge ?

Yes, XQuery is newYes, XQuery is new All solutions in XML processing All solutions in XML processing are on the bleeding edge at this are on the bleeding edge at this pointpoint Xlinq is worse in factXlinq is worse in fact XQuery is much olderXQuery is much older XQuery is subject to open public XQuery is subject to open public scrutiny, which insures better scrutiny, which insures better qualityquality

Page 50: Module 8 General remarks about XQuery

04/22/23 50

Potential impact of Potential impact of XQuery on Web X.0 XQuery on Web X.0

architecturesarchitectures Web 2.0: so much marketing, very Web 2.0: so much marketing, very little technical substancelittle technical substance

However, we all know that we’ve However, we all know that we’ve reached the limits of Web 1.0reached the limits of Web 1.0 User User experience -- the Web is becoming experience -- the Web is becoming annoyingannoying

Too static, no customization, no push dataToo static, no customization, no push data System builderSystem builder experience -- building experience -- building the Web is really expensive, and hardthe Web is really expensive, and hard

We need something new. What ?We need something new. What ?

Page 51: Module 8 General remarks about XQuery

04/22/23 51

Potential impact of Potential impact of XQuery on Web X.0 XQuery on Web X.0

architectures (cont.)architectures (cont.) Imagine the following scenario:Imagine the following scenario: XQuery becomes a full programming language, XQuery becomes a full programming language,

integrated with Web Services (XQueryP)integrated with Web Services (XQueryP) Good implementations of XQueryP become available in Good implementations of XQueryP become available in

open source, and commodityopen source, and commodity Databases will implement XQueryP Databases will implement XQueryP XML repositories will support an HTTP-based simple XML repositories will support an HTTP-based simple

query protocol (OpenSearch-style, but adapted to XML query protocol (OpenSearch-style, but adapted to XML and XQuery)and XQuery)

XQueryP plug-ins in browsers (Ajax ++)XQueryP plug-ins in browsers (Ajax ++) What will happen to:What will happen to:

SQL, Java !?SQL, Java !? Perl, PhP JavaScript !?Perl, PhP JavaScript !? Client-server !?Client-server !? Thin clients !?Thin clients !?