Upload
katen
View
75
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Module 8 General remarks about XQuery. Plan for today. The semantics of XQuery and the optimization XQuery and the static typing XQueryX -- the XML syntax of XQuery Current limitations of XQuery XQuery usage scenarios XQuery as a full (declarative) programming language XQuery vs. SQL - PowerPoint PPT Presentation
Citation preview
Module 8Module 8
General remarks General remarks aboutaboutXQuery XQuery
04/22/23 2
Plan for todayPlan for today The semantics of XQuery and the The semantics of XQuery and the optimizationoptimization
XQuery and the static typingXQuery and the static typing XQueryX -- the XML syntax of XQueryXQueryX -- the XML syntax of XQuery Current limitations of XQueryCurrent limitations of XQuery XQuery usage scenariosXQuery usage scenarios XQuery as a XQuery as a full (declarative) programming full (declarative) programming languagelanguage XQuery vs. SQLXQuery vs. SQL XQuery vs. other programming languagesXQuery vs. other programming languages
Why XQuery has big chances for successWhy XQuery has big chances for success Plan for the rest of the classPlan for the rest of the class
04/22/23 3
XQuery expressionsXQuery expressionsXQuery Expr :=Constants | Variable | FunctionCalls | XQuery Expr :=Constants | Variable | FunctionCalls |
PathExpr |PathExpr |
ComparisonExpr | ArithmeticExpr | LogicExpr |ComparisonExpr | ArithmeticExpr | LogicExpr |
FLWRExpr | ConditionalExpr | QuantifiedExpr |FLWRExpr | ConditionalExpr | QuantifiedExpr |
TypeSwitchExpr | InstanceofExpr | CastExpr |TypeSwitchExpr | InstanceofExpr | CastExpr |
UnionExpr | IntersectExceptExpr |UnionExpr | IntersectExceptExpr |
ConstructorExpr | ValidateExprConstructorExpr | ValidateExpr
Expressions can be nested with full generality !Expressions can be nested with full generality !
Functional programming heritage.Functional programming heritage.
04/22/23 4
A fraction of a real A fraction of a real customer XQuerycustomer XQuery
04/22/23 5
let $wlc := document("tests/ebsample/data/ebSample.xml")let $ctrlPackage := "foo.pkg"let $wfPath := "test"
let $tp-list :=for $tp in $wlc/wlc/trading-partnerreturn<trading-partner name="{$tp/@name}" business-id="{$tp/party-identifier/@business-id}" description="{$tp/@description}" notes="{$tp/@notes}" type="{$tp/@type}" email="{$tp/@email}" phone="{$tp/@phone}" fax="{$tp/@fax}" username="{$tp/@user-name}"
04/22/23 6
{ for $tp-ad in $tp/address return $tp-ad } { for $eps in $wlc/extended-property-set where $tp/@extended-property-set-name eq $eps/@name return $eps } { for $client-cert in $tp/client-certificate return <client-certificate name="{$client-cert/@name}" > </client-certificate> }
04/22/23 7
{ for $server-cert in $tp/server-certificate return <server-certificate name="{$server-cert/@name}" > </server-certificate> } { for $sig-cert in $tp/signature-certificate return <signature-certificate name="{$sig-cert/@name}" > </signature-certificate> } { for $enc-cert in $tp/encryption-certificate return <encryption-certificate name="{$enc-cert/@name}" > </encryption-certificate> }
04/22/23 8
{ for $eb-dc in $tp/delivery-channel for $eb-de in $tp/document-exchange for $eb-tp in $tp/transport where $eb-dc/@document-exchange-name eq $eb-de/@name and $eb-dc/@transport-name eq $eb-tp/@name and $eb-de/@business-protocol-name eq "ebXML" return <ebxml-binding name="{$eb-dc/@name}" business-protocol-name="{$eb-de/@business-protocol-name}" business-protocol-version="{$eb-de/@protocol-version}" \
is-signature-required="{$eb-dc/@nonrepudiation-of-origin}" is-receipt-signature-required="{$eb-dc/@nonrepudiation-of-receipt}"
signature-certificate-name="{$eb-de/EBXML-binding/@signature-certificate-n}" delivery-semantics="{$eb-de/EBXML-binding/@delivery-semantics}" { if(xf:empty($eb-de/EBXML-binding/@ttl)) then() else attribute persist-duration {concat(($eb-de/EBXML-binding/@ttl div 1000), " seconds")}
}
04/22/23 9
{ if( xf:empty($eb-de/EBXML-binding/@retries)) then () else $eb-de/EBXML-binding/@retries } { if( xf:empty($eb-de/EBXML-binding/@retry-interval)) then () else attribute retry-interval {concat(($eb-de/EBXML-binding/@retry-interval div 1000), " seconds")} }
<transport protocol="{$eb-tp/@protocol}" protocol-version="{$eb-tp/@protocol-version}" endpoint="{$eb-tp/endpoint[1]/@uri}" > {
04/22/23 10
for $ca in $wlc/wlc/collaboration-agreement for $p1 in $ca/party[1] for $p2 in $ca/party[2] for $tp1 in $wlc/wlc/trading-partner for $tp2 in $wlc/wlc/trading-partner where $p1/@delivery-channel-name eq $eb-dc/@name and $tp1/@name eq $p1/@trading-partner-name and $tp2/@name eq $p2/@trading-partner-name or $p2/@delivery-channel-name eq $eb-dc/@name and $tp1/@name eq $p1/@trading-partner-name and $tp2/@name eq $p2/@trading-partner-name
04/22/23 11
return if ($p1/@trading-partner-name=$tp/@name) then <authentication client-partner-name="{$tp2/@name}" client-certificate-name="{$tp2/client-certificate/@name}" client-authentication="{ if(xf:empty($tp2/client-certificate)) then "NONE" else "SSL_CERT_MUTUAL" }" server-certificate-name="{ if($tp1/@type="REMOTE") then $tp1/server-certificate/@name else "" }" server-authentication="{ if($eb-tp/@protocol="http") then "NONE" else "SSL_CERT" }"
04/22/23 12
> </authentication> else <authentication client-partner-name="{$tp1/@name}" client-certificate-name="{$tp1/client-certificate/@name}" client-authentication="{ if(xf:empty($tp1/client-certificate)) then "NONE" else "SSL_CERT_MUTUAL" }" server-certificate-name="{ if($tp2/@type="REMOTE") then $tp2/server-certificate/@name else "" }" server-authentication="{ if($eb-tp/@protocol="http") then "NONE" else "SSL_CERT" }" > </authentication>
04/22/23 13
} </transport> </ebxml-binding> }{-- RosettaNet Binding --} { for $eb-dc in $tp/delivery-channel for $eb-de in $tp/document-exchange for $eb-tp in $tp/transport where $eb-dc/@document-exchange-name eq $eb-de/@name and $eb-dc/@transport-name eq $eb-tp/@name and $eb-de/@business-protocol-name eq "RosettaNet" return <rosettanet-binding name="{$eb-dc/@name}" business-protocol-name="{$eb-de/@business-protocol-name}" business-protocol-version="{$eb-de/@protocol-version}"
04/22/23 14
is-signature-required="{$eb-dc/@nonrepudiation-of-origin}" is-receipt-signature-required="{$eb-dc/@nonrepudiation-of-receipt}" signature-certificate-name="{$eb-de/RosettaNet-binding/@signature-certi\ficate-name}" encryption-certificate-name="{$eb-de/RosettaNet-binding/@encryption-cer\tificate-name}" cipher-algorithm="{$eb-de/RosettaNet-binding/@cipher-algorithm}" encryption-level="{ if ($eb-de/RosettaNet-binding/@encryption-level = 0) then "NONE" else if($eb-de/RosettaNet-binding/@encryption-level = 1) then "PAYLOAD" else "ENTIRE_PAYLOAD" }" {-- process-timeout="{$eb-de/RosettaNet-binding/@time-out}" --}
> { if( xf:empty($eb-de/RosettaNet-binding/@retries)) then () else $eb-de/RosettaNet-binding/@retries }
04/22/23 15
{ if(xf:empty($eb-de/RosettaNet-binding/@retry-interval)) then () else attribute retry-interval {concat(($eb-de/RosettaNet-binding/@retry-interval div 1000), "\ seconds")} } { if(xf:empty($eb-de/RosettaNet-binding/@time-out)) then() else attribute process-timeout {concat(($eb-de/RosettaNet-binding/@time-out div 1000), " secon\ds")}
} <transport protocol="{$eb-tp/@protocol}" protocol-version="{$eb-tp/@protocol-version}" endpoint="{$eb-tp/endpoint[1]/@uri}" > {
04/22/23 16
for $ca in $wlc/wlc/collaboration-agreement for $p1 in $ca/party[1] for $p2 in $ca/party[2] for $tp1 in $wlc/wlc/trading-partner for $tp2 in $wlc/wlc/trading-partner where $p1/@delivery-channel-name eq $eb-dc/@name and $tp1/@name eq $p1/@trading-partner-name and $tp2/@name eq $p2/@trading-partner-name or $p2/@delivery-channel-name eq $eb-dc/@name and $tp1/@name eq $p1/@trading-partner-name and $tp2/@name eq $p2/@trading-partner-name
return if ($p1/@trading-partner-name=$tp/@name) then <authentication
04/22/23 17
<authentication client-partner-name="{$tp2/@name}" client-certificate-name="{$tp2/client-certificate/@name}" client-authentication="{ if(xf:empty($tp2/client-certificate)) then "NONE" else "SSL_CERT_MUTUAL" }" server-certificate-name="{ if($tp1/@type="REMOTE") then $tp1/server-certificate/@name else "" }" server-authentication="{ if($eb-tp/@protocol="http") then "NONE" else "SSL_CERT" }"
> </authentication>
04/22/23 18
else <authentication client-partner-name="{$tp1/@name}" client-certificate-name="{$tp1/client-certificate/@name}" client-authentication="{ if(xf:empty($tp1/client-certificate)) then "NONE" else "SSL_CERT_MUTUAL" }" server-certificate-name="{ if($tp2/@type="REMOTE") then $tp2/server-certificate/@name else "" }" server-authentication="{ if($eb-tp/@protocol="http") then "NONE" else "SSL_CERT" }" > </authentication>
04/22/23 19
} </transport> </rosettanet-binding> }
</trading-partner>
let $sv :=for $cd in $wlc/wlc/conversation-definitionfor $role in $cd/role
where xf:not(xf:empty($role/@wlpi-template) or $role/@wlpi-template="") and $cd/@business-protocol-name="ebXML" or $cd/@business-protocol-name="RosettaNet"
return <servicePair> <service name="{xf:concat($wfPath, $role/@wlpi-template, '.jpd')}" description="{$role/@description}" note="{$role/@note}" service-type="WORKFLOW" business-protocol="{xf:upper-case($cd/@business-protocol-name)}" >
04/22/23 20
. . . (60 % more to come)
04/22/23 21
XQuery is XQuery is not (only) not (only) a a queryquery
languagelanguage Declarative programming languageDeclarative programming language General purpose XML to XML General purpose XML to XML transformation enginetransformation engine
Designed for Designed for optimizabilityoptimizability. . Primary goal in the design of Primary goal in the design of the language.the language.
Impact on the semantics of the Impact on the semantics of the language.language.
04/22/23 22
The XQuery semantics The XQuery semantics and the optimization and the optimization
(1)(1) Trade-offTrade-off between between optimizabilityoptimizability (on one (on one side) and side) and complexity, non-determinism and complexity, non-determinism and expressive powerexpressive power (on the other side) (on the other side)
Query languages are more optimizable but pay a price on the Query languages are more optimizable but pay a price on the other sideother side
Imperative languages lack optimizability but the semantics is Imperative languages lack optimizability but the semantics is simpler, deterministic , and richersimpler, deterministic , and richer
How can we achieve better performance ?How can we achieve better performance ?1.1. Allow to execute sub-computations in a different orderAllow to execute sub-computations in a different order
1.1. Parallelization, reschedulingParallelization, rescheduling2.2. Possible to use various data access pathsPossible to use various data access paths3.3. Allow lazy evaluationAllow lazy evaluation4.4. Allow streaming/pipelining between operations (no Allow streaming/pipelining between operations (no
materialization of intermediate results)materialization of intermediate results)5.5. Allow various evaluation algorithms for the same logical Allow various evaluation algorithms for the same logical
operationoperation
04/22/23 23
The XQuery semantics The XQuery semantics and the optimization and the optimization
(2)(2) Allow to execute sub-computations in a different order (e.g. Allow to execute sub-computations in a different order (e.g. parallelization, rescheduling)parallelization, rescheduling)
XQueryXQuery: no real side-effects, errors are non-deterministic: no real side-effects, errors are non-deterministic (1,2,3, 1 div 0) [1] can return either (1,2,3, 1 div 0) [1] can return either 1,1, or or errorerror
Allow lazy evaluationAllow lazy evaluation Possible to use various data access pathsPossible to use various data access paths
XQueryXQuery: and, or commutative, shortcircuiting operations, errors : and, or commutative, shortcircuiting operations, errors again non-deterministicagain non-deterministic
( 1 eq 2) and (1 div 0 eq 2) can return both ( 1 eq 2) and (1 div 0 eq 2) can return both falsefalse, or , or errorerror Allow streaming/pipelining between operations (no Allow streaming/pipelining between operations (no
materialization of intermediate results)materialization of intermediate results) Allow various evaluation algorithms for the same logical Allow various evaluation algorithms for the same logical
operationoperation XQuery XQuery : unordered { expr } : unordered { expr } ( unordered { (1,2,3,4) } ) [1] => 1, 2, 3 or 4 can be result( unordered { (1,2,3,4) } ) [1] => 1, 2, 3 or 4 can be result (unordered { //book[@year=1999]/title } )[1](unordered { //book[@year=1999]/title } )[1]
04/22/23 24
XQuery type system XQuery type system XQuery has a powerful (and complex!) type systemXQuery has a powerful (and complex!) type system XQuery types are imported from XML SchemasXQuery types are imported from XML Schemas Every XML data model instance has a dynamic typeEvery XML data model instance has a dynamic type Every XQuery expression has a static typeEvery XQuery expression has a static type PessimisticPessimistic static type inference static type inference
However, most implementations have an However, most implementations have an optimistic optimistic static typing inferencestatic typing inference(1, “foobar”)[1] + 1 => pessimistic static typing error, optimistic no(1, “foobar”)[1] + 1 => pessimistic static typing error, optimistic no
Optional feature, few implement it, Galax the correct oneOptional feature, few implement it, Galax the correct one The goal of the type system is:The goal of the type system is:
1.1. detect statically errors in the queriesdetect statically errors in the queries2.2. infer the type (and or the shape/schema) of the result of valid queriesinfer the type (and or the shape/schema) of the result of valid queries
#### Type and schemas are not the same thing !!!!#### Type and schemas are not the same thing !!!!
3.3. ensure statically that the result of a given query is of a given (expected) type ensure statically that the result of a given query is of a given (expected) type if the input dataset is guaranteed to be of a given typeif the input dataset is guaranteed to be of a given type
04/22/23 25
XQuery type system XQuery type system componentscomponents
Atomic typesAtomic types xs:untypedAtomicxs:untypedAtomic All 19 primitive XML Schema typesAll 19 primitive XML Schema types All user defined atomic typesAll user defined atomic types
Empty, NoneEmpty, None Type constructors (simplification!)Type constructors (simplification!)
Elements: Elements: element name {type}element name {type} Attributes: Attributes: attribute name {type}attribute name {type} Alternation : Alternation : type1 | type 2type1 | type 2 Sequence: Sequence: type1, type2type1, type2 Repetition: Repetition: type*type* Interleaved product: Interleaved product: type1 & type2type1 & type2
• type1 intersect type2 ?• type1 subtype of type2 ?• type1 equals type2 ?
04/22/23 26
XQueryX: the XML syntax XQueryX: the XML syntax for XQueryfor XQuery
Most XML languages (schema, Most XML languages (schema, programming, forms, etc) have an XML programming, forms, etc) have an XML syntaxsyntax
““normal” XQuery doesn’tnormal” XQuery doesn’t Has been designed for human programmersHas been designed for human programmers
XQueryX: an alternative, XML-based XQueryX: an alternative, XML-based syntax for Xquerysyntax for Xquery The parsed abstract syntax tree in XMLThe parsed abstract syntax tree in XML Has an XML Schema for an XQuery programHas an XML Schema for an XQuery program
Go to the XQueryX specification…Go to the XQueryX specification…
04/22/23 27
XQueryX: the advantagesXQueryX: the advantages XQuery programs are also dataXQuery programs are also data
Can be stored, queried, updated, processed Can be stored, queried, updated, processed in the same way, with the same languages in the same way, with the same languages like the rest of the data (remember like the rest of the data (remember Lisp ?)Lisp ?)
Code becomes dataCode becomes data Automatic code generation, rewritingAutomatic code generation, rewriting We can blend data with code, and with We can blend data with code, and with schemas -- all have an XML syntaxschemas -- all have an XML syntax
Blurs the distinction between data, Blurs the distinction between data, metadata, codemetadata, code
04/22/23 28
Mistakes and limitations Mistakes and limitations of Xquery 1.0of Xquery 1.0
Mistakes: TheMistakes: The namename ! ! Limitations, missing functionality in XQuery 1.0Limitations, missing functionality in XQuery 1.0
Dynamic namespace generationDynamic namespace generation Better support for group-by and outer-joinsBetter support for group-by and outer-joins Support for referencesSupport for references Continuous queries, window queriesContinuous queries, window queries Better integration with XSLT Better integration with XSLT Integration with Web Services Integration with Web Services AssertionsAssertions Error handling: try/catchError handling: try/catch Scripting extensionsScripting extensions
Variable assignmentVariable assignment Sequential evaluation mode for side-effectsSequential evaluation mode for side-effects BlocksBlocks
evaleval((XQueryX-fragmentXQueryX-fragment)) Integration with Semantic Search and ontologies Integration with Semantic Search and ontologies
04/22/23 29
XQuery Use Case Scenarios XQuery Use Case Scenarios (1)(1)
XML transformation language in Web Services Large and very complex queries Input message + external data sources Small and medium size data sets Transient and streaming data (no indexes) With or without schema validation
XML message brokers Simple path expressions, single input message Small data sets Transient and streaming data (no indexes) Mostly non schema validated data
Semantic data verification Mostly messages Potentially complex (but small) queries
Mid-tier
Mid-tier
Mid-tier, server, client
04/22/23 30
XQuery Usage Scenarios XQuery Usage Scenarios (2)(2)
Data Integration Complex but smaller queries (FLOWRs, aggregates, constructors) Large, persistent, external data repositories Dynamic data (via Web Services invocations)
Large volumes of blend relational and XML data Structured data with unstructured/semistructured extensions Complex queries Read/write data
Large volumes of XML logs and archives Web services, RFIDs, etc Complex queries (statistics, analytics) Mostly read only
Large content repositories Large volume of data (books, manuals, etc) With or without schema validation Full text essential, update required
Mid-tier, server, client
Database server
Content server
Database server
04/22/23 31
Large volumes of distributed textual data XML search engines High volume of data sources
Full text, semantic search crucial RSS filtering and aggregation
High number of input data channels Data is pushed, not pulled Structure of the data very simple, each item bounded size
Aggregators using mostly full-text search XML data transformation and integration on mobile devices Small XML messages Transformation or aggregation queries Caching is important Streaming very important
XQuery Usage Scenarios XQuery Usage Scenarios (3)(3)
Web
Web
Mobile devices
04/22/23 32
XQuery usage scenarios XQuery usage scenarios (4)(4)
Content re-purposing E.g. customized books and articlesE.g. customized books and articles E.g. enterprise customized engineering documentation E.g. enterprise customized engineering documentation (product requirements, specs, etc)(product requirements, specs, etc)
Streamline automatic processing E.g. the creation of the W3C specificationsE.g. the creation of the W3C specifications
From the From the samesame XML document we generate automatically the XML document we generate automatically the XQuery, Xpath 2.0, Function Libraries specifications, plus XQuery, Xpath 2.0, Function Libraries specifications, plus the Javacc code that implements the XQuery parser, plus the Javacc code that implements the XQuery parser, plus the tests that correctly test the grammar. All those are the tests that correctly test the grammar. All those are Xquery views of the same XML document !Xquery views of the same XML document !
(Ajax-style) dynamic Web pages Xquery is a better way to manipulate the XML of the Web Xquery is a better way to manipulate the XML of the Web pages then Javascriptpages then Javascript
Re-programming the Web /scripting the Web /mashups
04/22/23 33
Criteria for XQuery Criteria for XQuery usagesusages1. Type of queries (e.g. simple, complex, construction-
intensive, full text search intensive)2. Volume of queries3. Native XML or virtual XML views of other forms of data4. XML Schema validated data or not5. Volume of data per query6. Number of data sources7. Transient data vs. persistent data8. Transacted vs. non-transacted data9. Push data vs. pull data10. Typed vs. untyped data11. Read only data vs. updatable data12. Distributed vs. centralized data sets13. Data compressed/encrypted or not14. Target architectures15. Customer expectation
Each scenario requires Each scenario requires differentdifferent processing techniques. processing techniques.
04/22/23 34
XQuery vs. SQL: beyond XQuery vs. SQL: beyond the tree vs. tablethe tree vs. table
Persistent data
SQL
Transacted data Declarative
processing
Persistent data
Transacted data Declarative
processing
XQuery
“XQuery: the XML replacement for SQL ?”No, it’s more likely that in the long term will be the declarative replacement for imperative programming languages like Java or C#.
04/22/23 35
Making XQuery a full XML Making XQuery a full XML scripting language (1)scripting language (1)
XQuery is Turing complete, yet “incomplete”XQuery is Turing complete, yet “incomplete” Users need to write application logic on their dataUsers need to write application logic on their data The killer advantages of XML erased by Java, The killer advantages of XML erased by Java, JavaScript, or C#JavaScript, or C#
Huge pressure to integrate native XML processing Huge pressure to integrate native XML processing with existing programming languages:with existing programming languages: C#, EcmaScript, Python, PhP extensions, etc, etcC#, EcmaScript, Python, PhP extensions, etc, etc
JavaScript XMLXQuery
XML scripting
extensions
04/22/23 36
Making XQuery a full Making XQuery a full scripting language (2)scripting language (2)
Users are already using Users are already using XQuery as a scripting XQuery as a scripting language !language !
Major missing pieces in Major missing pieces in XQuery:XQuery: Order of evaluation has Order of evaluation has to be deterministicto be deterministic
Visible updates/side-Visible updates/side-effectseffects
Variable assignmentVariable assignment Error handlingError handling
DB storage(supports XML)
Application logic(Java/C#)
Communication(XML)
Client(XHTML, scripts)
XQuery
XQuery
04/22/23 37
Why will XQuery be Why will XQuery be successful ?successful ?
Let’s look back at why is XML Let’s look back at why is XML successful.successful.
04/22/23 38
Reasons for the Reasons for the overwhelming success of overwhelming success of
XMLXML XML is a general data representation format XML is a general data representation format XML is human readableXML is human readable XML is machine readableXML is machine readable XML is internationalized (UNICODE)XML is internationalized (UNICODE) XML is platform independentXML is platform independent XML is vendor independentXML is vendor independent XML is endorsed by the World Wide Web Consortium XML is endorsed by the World Wide Web Consortium XML is not a new technology (SGML, HTML)XML is not a new technology (SGML, HTML) XML is not XML is not onlyonly a data representation format, it’s a data representation format, it’s
a full infrastructure of technologiesa full infrastructure of technologies
04/22/23 39
REALREAL reason for the reason for the overwhelming successoverwhelming success
Helps companies to Helps companies to cut costscut costs in information in information exchangeexchange 1. Avoids the cost of building custom parsers1. Avoids the cost of building custom parsers 2. Good quality, low cost parsing software becomes 2. Good quality, low cost parsing software becomes
a commoditya commodity 3. Minimizes the cost of training3. Minimizes the cost of training 4. Avoids the cost associated with schemas (the 4. Avoids the cost associated with schemas (the
evil of all evils :)evil of all evils :) … … sometimes at the expense of increased sometimes at the expense of increased
hardware cost due to (bad) parsing hardware cost due to (bad) parsing performance…performance… But that’s OK as long it can be parallelized on But that’s OK as long it can be parallelized on
cheap machinescheap machines
04/22/23 40
The cost of schemasThe cost of schemas Methodology we teach the database students:Methodology we teach the database students:
Gather requirements from the application domainGather requirements from the application domainDesign (and agree on) a schemaDesign (and agree on) a schemaWrite the code (queries + application)Write the code (queries + application)Populate the database Populate the database Execute the codeExecute the code
Agreeing on schemas is Agreeing on schemas is the most expensive stepthe most expensive step in software in software engineeringengineering
Plus prohibits the evolution and the customizationPlus prohibits the evolution and the customization
The current information management technology (Java, SQL) The current information management technology (Java, SQL) doesn’t allow us to apply the previous steps in other doesn’t allow us to apply the previous steps in other order, nor to bypass the schema designorder, nor to bypass the schema designXML does.XML does.Semi-structured data is the new black in IT industry :-)Semi-structured data is the new black in IT industry :-)
GoogleBase, etc.GoogleBase, etc.
04/22/23 41
Processing XML dataProcessing XML data• Huge amount of XML information, and Huge amount of XML information, and
rapidly growingrapidly growing• We need to “We need to “processprocess” it” it
• Store it efficiently• Verify the correctness Verify the correctness • Filter, search, selectFilter, search, select• Transform, normalize, reshapeTransform, normalize, reshape• Join, aggregateJoin, aggregate• Create new dataCreate new data • Update the dataUpdate the data• Take actions based on the existing dataTake actions based on the existing data
• XQuery has been designed as a XQuery has been designed as a solution to the XML processing solution to the XML processing problemproblem
04/22/23 42
Alternative solutions to Alternative solutions to XML processingXML processing
Java, C + APIs (e.g. DOM, SAX, Java, C + APIs (e.g. DOM, SAX, JSR170)JSR170)
Perl, PhP, JavaScript + APIsPerl, PhP, JavaScript + APIs Xlinq (MSFT C# extension)Xlinq (MSFT C# extension) Code generatorsCode generators SQL/XMLSQL/XML XSLTXSLT See Sigmod’06 tutorial on “XML See Sigmod’06 tutorial on “XML programming techniques”programming techniques”
04/22/23 43
Why XQuery Why XQuery willwill be be successfulsuccessful
XQuery helps companies XQuery helps companies cut costs cut costs on information processingon information processingAvoids the cost of building custom Avoids the cost of building custom XML processorsXML processorsImproves productivityImproves productivityGood quality, low cost XML Good quality, low cost XML processing software (processing software (will) will) become a become a commoditycommodity(Will)(Will) minimize the cost of training minimize the cost of training(Partially) avoids the cost (Partially) avoids the cost associated with schemas associated with schemas (Will) (Will) guarantee best performanceguarantee best performance
04/22/23 44
XQuery and the XQuery and the productivityproductivity
Manipulates Manipulates onlyonly XML XML Dealing with two type systems (e.g. Java Dealing with two type systems (e.g. Java
integer vs. XML integer) is extremely integer vs. XML integer) is extremely tedioustedious
Handles Handles all all XML correctlyXML correctly Typed and untyped, all corner cases of Typed and untyped, all corner cases of
XML (e.g. NS)XML (e.g. NS) DeclarativeDeclarative
Smaller amount of code to writeSmaller amount of code to write Less decisions to make as a programmerLess decisions to make as a programmer
streaming, or not, indexes, parallelizationstreaming, or not, indexes, parallelization Possible to generate automaticallyPossible to generate automatically
04/22/23 45
XQuery performanceXQuery performance Lots of folklore in industryLots of folklore in industry
““Everything related to XML Everything related to XML has has to be slow”to be slow” No, writing No, writing manually optimizedmanually optimized C# or Java C# or Java
over SAX isn’t the answer -- over SAX isn’t the answer -- it is not it is not robust to evolutions !!robust to evolutions !!
Unfortunately:Unfortunately: We don’t have benchmarks yetWe don’t have benchmarks yet We don’t have good XML processing literature We don’t have good XML processing literature
yetyet Situation now:Situation now:
In DB, XQuery is executed with the same In DB, XQuery is executed with the same engines as SQLengines as SQL
Good new engines, and improving fast (BEA, Good new engines, and improving fast (BEA, Saxon, exist, BerkelyDB, etc)Saxon, exist, BerkelyDB, etc)
XQuery has better chances for good XQuery has better chances for good performance then any of the alternativesperformance then any of the alternatives
04/22/23 46
XQuery automatic XQuery automatic optimizationoptimization Feasible (done in many implementations) in Feasible (done in many implementations) in
XQuery:XQuery: Automatic data partitioning, clustering and placementAutomatic data partitioning, clustering and placement Automatic use of secondary access structures (indexes)Automatic use of secondary access structures (indexes) Automatic decision about streaming vs. materializationAutomatic decision about streaming vs. materialization Automatic cachingAutomatic caching Parallelization of codeParallelization of code Program decompositionProgram decomposition Program shipping vs. data shippingProgram shipping vs. data shipping Rewriting based on assertionsRewriting based on assertions Detecting code (in)dependenceDetecting code (in)dependence Rescheduling/reordering of the codeRescheduling/reordering of the code
Impossible (or Impossible (or muchmuch harder) in Xlinq, Perl, harder) in Xlinq, Perl, Javascript, etcJavascript, etc
Global dataflow requiredGlobal dataflow required What operations are executed on each data itemWhat operations are executed on each data item What data items will be processed by each operationWhat data items will be processed by each operation
Declarativity of XQuery helpsDeclarativity of XQuery helps
04/22/23 47
Frequent criticism of Frequent criticism of XQueryXQuery ““The performance of XQuery engines isn’t The performance of XQuery engines isn’t
acceptable”acceptable” Why is the alternative any better !?Why is the alternative any better !? For XQuery, there is hope. For XLinq, PhP, little.For XQuery, there is hope. For XLinq, PhP, little. Take ideas from both database optimization Take ideas from both database optimization andand
programming languages compilation, plus innovateprogramming languages compilation, plus innovate Lots of fun research to be done !!!Lots of fun research to be done !!!
““It will never perform as well as if we write It will never perform as well as if we write the application in Java + SAX”the application in Java + SAX” Maybe true today, not sure in near futureMaybe true today, not sure in near future Optimizing a single XML applications vs. optimizing an Optimizing a single XML applications vs. optimizing an
XQuery(P) engine (I.e.XQuery(P) engine (I.e. all all XML applications) XML applications) ““There are no libraries”There are no libraries”
Let’s build some…Let’s build some… We will not need the We will not need the samesame libraries like in Java or C# libraries like in Java or C#
Different level of abstractionDifferent level of abstraction The target applications are differentThe target applications are different
““XQuery is too complicated”XQuery is too complicated” !?!?
04/22/23 48
Frequent criticism (2)Frequent criticism (2) ““Programmers do not know how to Programmers do not know how to program declaratively”program declaratively” What about SQL !?What about SQL !? You are the generation who will decide You are the generation who will decide this.this.
““This would require users to learn a This would require users to learn a new language”new language” Smooth transition, easy integration of Smooth transition, easy integration of pieces written in other languages pieces written in other languages (thanks WS!)(thanks WS!)
04/22/23 49
How bad is the bleeding How bad is the bleeding edge ?edge ?
Yes, XQuery is newYes, XQuery is new All solutions in XML processing All solutions in XML processing are on the bleeding edge at this are on the bleeding edge at this pointpoint Xlinq is worse in factXlinq is worse in fact XQuery is much olderXQuery is much older XQuery is subject to open public XQuery is subject to open public scrutiny, which insures better scrutiny, which insures better qualityquality
04/22/23 50
Potential impact of Potential impact of XQuery on Web X.0 XQuery on Web X.0
architecturesarchitectures Web 2.0: so much marketing, very Web 2.0: so much marketing, very little technical substancelittle technical substance
However, we all know that we’ve However, we all know that we’ve reached the limits of Web 1.0reached the limits of Web 1.0 User User experience -- the Web is becoming experience -- the Web is becoming annoyingannoying
Too static, no customization, no push dataToo static, no customization, no push data System builderSystem builder experience -- building experience -- building the Web is really expensive, and hardthe Web is really expensive, and hard
We need something new. What ?We need something new. What ?
04/22/23 51
Potential impact of Potential impact of XQuery on Web X.0 XQuery on Web X.0
architectures (cont.)architectures (cont.) Imagine the following scenario:Imagine the following scenario: XQuery becomes a full programming language, XQuery becomes a full programming language,
integrated with Web Services (XQueryP)integrated with Web Services (XQueryP) Good implementations of XQueryP become available in Good implementations of XQueryP become available in
open source, and commodityopen source, and commodity Databases will implement XQueryP Databases will implement XQueryP XML repositories will support an HTTP-based simple XML repositories will support an HTTP-based simple
query protocol (OpenSearch-style, but adapted to XML query protocol (OpenSearch-style, but adapted to XML and XQuery)and XQuery)
XQueryP plug-ins in browsers (Ajax ++)XQueryP plug-ins in browsers (Ajax ++) What will happen to:What will happen to:
SQL, Java !?SQL, Java !? Perl, PhP JavaScript !?Perl, PhP JavaScript !? Client-server !?Client-server !? Thin clients !?Thin clients !?