23
1 1 Extreme Java G22.3033-007 Session 3 - Sub-Topic 6 XML Information Retrieval Dr. Jean-Claude Franchitti New York University Computer Science Department Courant Institute of Mathematical Sciences 2 Agenda Applications of XML to Database Technology XML Query Languages XPath XML Queries XQuery: A Query Language for XML XML Query Engines XML Object Persistence Advanced XQuery Concepts Presentation Oriented Publishing (POP) Frameworks

g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

1

1

Extreme JavaG22.3033-007

Session 3 - Sub-Topic 6XML Information Retrieval

Dr. Jean-Claude Franchitti

New York UniversityComputer Science Department

Courant Institute of Mathematical Sciences

2

AgendaApplications of XML to Database Technology

XML Query LanguagesXPathXML Queries

XQuery: A Query Language for XMLXML Query EnginesXML Object PersistenceAdvanced XQuery ConceptsPresentation Oriented Publishing (POP)Frameworks

Page 2: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

2

3

XML-Based Retrieval DevelopmentXML Software Development Methodology

Language + Stepwise Process + ToolsRational Unified Process (RUP) v.s. “XML Unified Process”

XML Application Development InfrastructureMetadata Management (e.g., XMI)XML Query Engine (3rd party software)XML Tools (e.g., XML Editors)

XML Applications Involved in the Rendering Phase:Application(s) of XMLXML-based applications/services (markup language mediators)

MOM, POP, Persistence service

Application Infrastructure Frameworks

4

XML Data Retrieval Patterns

XML Data Retrieval OperationsAccessQueryManipulate

Multiple XML Data Sources IntegrationXML Message FilteringDBMS Data ViewsDatabase System Interfacing

Page 3: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

3

5

Part I

Applications of XML toDatabase Technology

6

Towards XML Application ServicesProcessing

DOM ExtensionsBinding ExtensionsComponent Frameworks (reusable component models)Model-Based Automation (MDA)

RenderingDOM 2.1.0, SAX 2.0, JAXP 1.1 & TraX, XSL-FO 1.0Component Frameworks

QueryingXQuery 1.0, XSLT 1.1/2.0, XPath 1.0/2.0

Security (signatures encryption/decryption, etc.)Etc.

Page 4: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

4

7

Retrieval Software DevelopmentLanguages (XML-QL, YaTL, XQL, etc.)

Data Model + Operations + Syntax

Process (“XUP”)Frameworks

Custom Enginee.g., XQEngine

Translation to SQLe.g., DB2XML, Oracle’s XML I/F

Translation to OQL

XML Query InfrastructureXPath Processors: Saxon 6.1, Xalan-J 2.1.0XQuery ProcessorsSQLX

8

XML Query History

SGML Query FacilitiesAd-hoc Approach to Query Languages02/98: XQL Proposal08/98: XML-QL Submission12/98: W3C QL’98 Workshop

Candidate Requirements for XML QueryDatabase Desiderata for an XML Query Language

11/99: XPath Recommendation

Page 5: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

5

9

W3C XML Query WG

07/99: WG Proposal09/99: WG Official InceptionToday:

30 W3C Member Companies11 Meetings, 60+ TelconsHeartbeat every Three MonthsProposed Recommendation(s)

Goals:XML Query Data Model for XML DcoumentsQuery Operators for XML Query Data ModelQuery Language Based on XML Query Operators

10

W3C’s Related Standards

XML Query Specifications:XML Query Requirements (02/16/01 - orig. 01/00)XML Query Use Cases (06/08/01 - orig. 08/00)XQuery 1.0 and XPath 2.0 Data Model (06/07/01 - orig. 05/00)XQuery 1.0 Formal Semantics (06/07/01 - orig. 12/00)XQuery 1.0: An XML Query Language (06/07/01)XML Syntax for XQuery 1.0 (XQueryX) (06/07/01)

XPath 2.0 SpecificationsXPath Requirements Version 2.0 (02/15/01)XQuery 1.0 and XPath 2.0 Data Model (06/07/01)

Page 6: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

6

11

Related XML TechnologiesXPathXSLXPointerXML SchemaXML InfosetWAIInternationalizationIETF DASL

Distributed Authoring Searching and Locatinghttp://www.ics.uci.edu/pub/ietf/dasl/

12

Properties of RDBMS Queries

Pattern + Filter + Construction clauseConstruction clause may have orderingsubclausesQueries may perform joins across multipleinput setsQueries may generate intermediatevariables or path expressions

Page 7: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

7

13

Mapping XML to a RDBMS

SQL-like queries that return XMLdocuments

e.g., Microsoft IIS + SQL Servere.g., Oracle Database Server

Broad spectrum of possible mappingsHierarchical v.s. limited RDBMS tree structure

14

JDBC Refresher

See section 6.2 of XML and Java textbookImporting JDBC PackageLoading a JDBC DriverConnecting to a DatabaseSubmitting a Query

Page 8: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

8

15

XML Embedded in SQL (SQLX)

SQL Embedded in XMLSee Section 6.3 of XML and Java textbookFront-end to RDBMS that provides XML-basedInput/OutputTranslates XML query into sequence of JDBCcalls, and converts the result to a DOMstructure which is returned

16

Part II

XML Query

Page 9: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

9

17

XML Query Requirements (Part I)General:

Declarative LanguageReadable XML SyntaxProtocol IndependenceStandard Error ConditionsSupport for Future Updates

Data ModelBased on XML InfosetsNamespace AwareSupport for XML Schema Data TypesSupport Inter/Intra Document References

18

XML Query Requirements (Part II)Query Functionality:

Operators on All Data TypesText Operators Across Element BoundariesHierarchies and SequencesCombination of Data from Various LocationsAggregation and SortingCombination of Operators (Queries as Operands)Support NULL valuesPreservation of Structure/IdentityOperations on Names/SchemasExtensibility & Closure

Page 10: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

10

19

XML Query Use CasesApproach

Description, DTD/Schema, Input, Queries, ResultsExisting Use Cases

XMP (examples)TREE (queries that preserve hierarchy)SEQ (queries based on sequence)R (relational data access)TEXT (text search)NS (namespace-based queries)PARTS (recursive parts queries)REF (queries based on references)

20

XML Query Data Model

Information Presented to a Query ProcessorAugmented Infoset:

XML Schema Data Types (PSVI)Document CollectionsReferences

Node-Labeled Tree Constructor Model withNode IdentityInfoset Mapping to Query Data Model isDefined as Part of the Specification

Page 11: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

11

21

XML Query Data Model(continued)

NodesNode = DocNode | ElemNode | AttrNode | ValueNode| NSNode | PINode | CommentNode | InfoItemNode

XML Schema Primitive Typesstring, boolean, ID, IDREF, decimal, etc.

Collectionslist [T], set {T}, bag {|T|}, disjoint/union (T1 | T2),tuple (T1, …, Tn)

Referencesref(T)

22

XML Query Algebra

Defines Static and Dynamic SemanticsStatic Semantics are Type Inference Rules

Relate Algebra Expressions to Types

Dynamic Semantics are Value Inference RulesRelate Algebra Expressions to Values

Issues:Algebra Type System Alignment with XML SchemaOperators on Schema Simple Types not DefinedLexical Representation of Schema Simple Types notDefined

Page 12: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

12

23

ConstructorsConstruct Values in XML Query Data ModelattrNode : (Ref(QNameValue), Ref(ValueNode))-> AttrNodeValueNode = QNameValue | StringValue |DecimalValue | ...qnameValue : (uriReference | null, string,Ref(Def_QName))-> QNameValuedecimalValue : (decimal, Ref(Def_decimal)) ->DecimalValue

<part price=10.50/><xsd:attribute name=“price” type=xsd:decimal/>

attrNode(ref(qnameValue(null, “price”,Ref(Def_QName)),ref(decimalValue(10.50, Ref(Def_decimal))))

24

Assessors

Deconstruct Values in XML Query Data Modelname : AttrNode -> Ref(QNameValue)value : AttrNode -> Ref(ValueNode)type : AttrNode -> Ref(ElemNode)

<xsd:attribue name=“price” type=xsd:decimal/><part price=10.50/>

name(A1) = ref(qnameValue(null, “price”))value(A1) = ref(decimalValue(10.50,Ref(Def_decimal)))type(A1) =<!-- data model representation of simple typedecimal -->

Page 13: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

13

25

Part III

XML Query Languages

26

XQueryFunctional Language

Query Represented as an ExpressionExpressions can be Nested without RestrictionInput/Output of an XQuery are Instances ofthe XML Query Data ModelBased on OQL, SQL, XML-QL, XPathReadable XML Syntax

Page 14: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

14

27

XQuery Expressions

Path ExpressionsElement ConstructorsFLWR ExpressionsExpressions with Operators/FunctionsConditional ExpressionsQuantified ExpressionsList ConstructorsExpressions to Test/Modify Datatypes

28

XQuery Path ExpressionsAbbreviated XPath 1.0 Syntax

Find figure(s) with caption “Tree Frogs” insecond chapter of “zoo.xml”document(“zoo.xml)/chapter[2]//figure[caption= “Tree Frogs”]

ExtensionsDereference OperatorRange PredicateDocument FormatsFind captions of figures referenced by <figref”elements in “Frogs” chapter of “zoo.xml”document(“zoo.xml”)/chapter[title =“Frogs”]//figref/@refid->fig/caption

Page 15: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

15

29

XQuery Element Constructor

Start/End Tag + Enclosed List of ExpressionsGenerate an element with a computed name thatcontains nested elements:

<$tagname><description> $d </description><price> $p </price></$tagname>

30

XQuery For Let Where Return (FLWR)

FOR and LET ClauseGenerate a List of Tuples that Preserves Doc Order

WHERE ClauseApplies a Predicate to Eliminate Some Tuples

RETURN ClauseExecuted on Resulting Tuples -> Ordered Output List

Syntax:FOR var IN expr WHERE expr RETURN exprLET var := expr

Page 16: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

16

31

FLWR Sample Expressions

List titles of books published by MK in 98FOR $b IN document (“bib.xml”)//bookWHERE $b/publisher = “Morgan Kaufmann”

AND $b/year = “1998”RETURN $b/title

List each publisher and its books average priceFOR $p INdistinct(document(“bib.xml”)//publisher)LET $a := avg(document(“bib.xml”)

/book[publisher = $p]/price)

32

XQuery Operators and Functions

Infix/Prefix Operatorse.g., Infix Operators BEFORE and AFTER

Parenthesized ExpressionsArithmetic/Logical OperatorsCollection Operators

e.g., UNION, INTERSECT, EXCEPTFunctions Can Be Defined in XQuery

Page 17: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

17

33

Sample Operators and Functions

Find max depth of “partlist.xml”NAMESPACExsd=“http”//www.w3.org/2001/03/XMLSchema-datatypes”

FUNCTION depth(ELEMENT $e) RETURNS xsd:integer{

IF empty ($e/*) THEN 1ELSE max (depth($e/*))+1

}depth(document(“partlist.xml”))

34

XQuery Conditional Expressions

FOR $h IN //holdingRETURN

<holding>$h/titleIF $h/@type=“Journal” THEN $h/editorELSE $h/author

<holding> SORTBY (title)

Page 18: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

18

35

XQuery Quantified Expressions

Example 1:FOR $b IN //bookWHERE SOME $p IN $b//para SATISFIES

contains($p, “sailing”)AND contains($p, “windsurfing”)

RETURN $b/title

Example 2:FOR $b IN //bookWHERE EVERY $p IN $b//para SATISFIES

contains($p, “sailing”)RETURN $b/title

36

XQuery List Constructors

List encloses zero or more expressions insquare brackets, separated by commasList of member variables: [$x, $y, $z]Empty list: [ ]

Page 19: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

19

37

XQuery Operators on Data Types

INSTANCEOF (instance, type)CAST

Convert value from one datatype to anotherTREAT

Causes the query processor to treat an expressionas if its datatype were a subtype of its static type

38

XQuery Outstanding Issues

Integration with XPath 2.0Alignment of XQuery and XML QueryAlgebra SyntaxInternationalization

e.g., Collation Sequences for Sorting, Strings opsXML Query SyntaxOperators and Functions TBD

Page 20: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

20

39

Part IV

XML Query Enginesand

Advanced Concepts

40

Various ApproachesXQEngine

Full-text search engine for XMLJava APIs availableW3C XQuery Specification Support

DB2XMLStandalone tool (with GUI or command line)Servlet to dynamically generate XML-documentsDB2XML API

Oracle XML Developer Kit (XDK )Microsoft SQL Server support for XML

Page 21: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

21

41

XML Object Persistence

Started as SODL and XMOPSimple Object Definition LanguageXML Metadata Object Persistence

XML and JavaBeans integration (e.g., BML,Coins, etc.)XML and EJB integration

See XML Development with Java 2 (chapter 8)XML serialization for Java (e.g., Koala, etc.)SOAP - XML-RPC protocol

42

Advanced XQuery Concepts

Mainstream XQuery EnginesSoftware AG’s QuiPH. Katz XQEngine

Experiment with Complex Queries and QuiP

Page 22: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

22

43

POP Frameworks

Client-Side POPIE5

Server-Side POPCocoon & XSPRocketCPAN’s Perl Framework

44

Part V

Conclusions

Page 23: g22 3033 007 c37 - nyu.edu€¦ · QTranslation to SQL Qe.g., DB2XML, Oracle’s XML I/F QTranslation to OQL QXML Query Infrastructure QXPath Processors: Saxon 6.1, Xalan-J 2.1.0

23

45

SummaryApplying XML to Database Technology allowsthe viewing of database data as an XMLdocument.XML Query is based on a well defined DataModel and AlgebraVarious syntaxes are possible for an XML QueryLanguageXML Query Engines are infrastructurecomponents that support XML Query

46

Summary(continued)

Bindings approaches are currently implementedbetween XML and JavaBeans/EJBsSoftware AG’s Quip implements complex queryprocessing as per XQuery 1.0Server-side POP is the approach of choice forXML processing