45
By Intan, Chan & Lina February, 2003 XML Databases

By Intan, Chan & Lina February, 2003 XML Databases

Embed Size (px)

Citation preview

Page 1: By Intan, Chan & Lina February, 2003 XML Databases

By Intan, Chan & LinaFebruary, 2003

XML Databas

es

Page 2: By Intan, Chan & Lina February, 2003 XML Databases

2

Contents

1. Introduction 2. XML Databases3. XML- Enabled Databases4. Native XML Databases5. XML Database Products,

Benchmarks and Cost Issues6. XML Database Applications7. Future Trends8. Conclusion

Page 3: By Intan, Chan & Lina February, 2003 XML Databases

3

1.IntroductionWhat is XML?

• XML (eXtensible Markup Language) is an open standard for describing data from the W3C (World Wide Web Consortium)

• used for defining data elements on a Web page and business-to-business documents

• uses a similar tag structure as HTML

• HTML uses predefined tags, but XML allows tags to be defined by the developer of the page

Page 4: By Intan, Chan & Lina February, 2003 XML Databases

4

1.IntroductionData-centric documents

• are documents that use XML as a data transport

• designed for machine consumption

• characterised by fairly regular structure, fairly consistent organisation of detail and fine-grained data, with little or no mixed content

• examples are sales orders, flight schedules, scientific data, and stock quotes

Page 5: By Intan, Chan & Lina February, 2003 XML Databases

5

1.Introduction

Document-centric documents

• designed for human consumption

• characterised by less regular or irregular structure, larger grained data and highly mixed content

• books, email, advertisements, and almost any hand-written XHTML document

Page 6: By Intan, Chan & Lina February, 2003 XML Databases

6

1.IntroductionData, Documents and Databases

• distinction between data-centric and document-centric not always clear

• characterising documents as data-centric or document-centric will help to decide what kind of database to use

• data-centric documents are stored in a traditional database, such as a relational, object-oriented, or hierarchical database

• Document-centric documents are stored in a native XML database or a content management system

Page 7: By Intan, Chan & Lina February, 2003 XML Databases

7

2.XML Databases

• XML & Database: two very different concepts driven by two very different communities with different expectations and requirements.

• Yet, an increasing demand for consistent and reliable methods to manage XML data suggests the marriage of the two.

Page 8: By Intan, Chan & Lina February, 2003 XML Databases

8

2.XML Databases

• Is XML a database? --An XML document is a database only in the strictest sense of the term since basically it is a collection of data.

• XML facilitates some operations, which are commonly used in databases such as storage, schemas, query languages, programming interfaces, etc.

Page 9: By Intan, Chan & Lina February, 2003 XML Databases

9

2.XML Databases

• It may be possible to use XML document as a database only in a scenario with small volume of data, few users, and modest performance requirements.

• It won’t function satisfactorily in a production environment which have many users, strict data integrity requirements, and the need for good performance.

Page 10: By Intan, Chan & Lina February, 2003 XML Databases

10

2.XML Databases

• An XML document database (or more generally an XML database, since every XML database must manage documents) can be defined to be a collection of XML documents and their parts, maintained by a system having capabilities to manage and control the collection itself and the information represented by that collection.

Page 11: By Intan, Chan & Lina February, 2003 XML Databases

11

2.XML Databases

• XML databases are schema agnostic.

• Capability of managing XML data that supports extensibility and granular access simultaneously.

• Ideal for information that is likely to change unpredictably.

• Unique and targeted at solving new and different problems.

Page 12: By Intan, Chan & Lina February, 2003 XML Databases

12

2.XML Databases

• XML databases manage active data that is being shared between legacy systems, partners, and web services.

• The management process can be automated, audited, and dynamically improved.

• XML’s inherent flexibility and extensibility make it easy to design and build an infrastructure for business information interoperability that is designed for change.

Page 13: By Intan, Chan & Lina February, 2003 XML Databases

13

2.XML Databases

• Further demands :• Closely related W3C specifications

that extend the capabilities specified in XML 1.0 should be accommodated.

• XML database systems should include Internet resource management.

• An SGML document was always associated with a DTD, and the DTD could be used in many different ways to support the data management.

Page 14: By Intan, Chan & Lina February, 2003 XML Databases

14

2.XML Databases• Benefits of XML Databases

• Unrivaled performance: designed for quick handling of very large data volumes, and profits from technologies to be executed quickly.

• Data independence: XML databases inherit all the benefits derived from using XML which is easy to use, remains flexible and extensible.

• Quick access and high-speed retrieval: provide lightening-fast access to any type of stored data either from a single resource or from a distributed system across a network.

Page 15: By Intan, Chan & Lina February, 2003 XML Databases

15

2.XML Databases

• Benefits of XML Databases• Manages and accesses all types

of data: even allow storage of and access to audio, video or other files and handling of several nested objects

• Support for major application servers: With proper API services, XML database can play the role of a content store.

• Reduce production cost for business significantly: support automation of business process from order through delivery reduces production cost significantly.

Page 16: By Intan, Chan & Lina February, 2003 XML Databases

16

2.XML Databases

• Data Models• Modeling document collections

as well as enterprises: support the description of the documents. W3C has developed the abstract structures in four different specifications, namely, the Infoset model, the XPath data model, the DOM model, and the XQuery 1.0 and XPath 2.0 data model that are often used to encode enterprise data.

Page 17: By Intan, Chan & Lina February, 2003 XML Databases

17

2.XML Databases

• Data Models• Conceptual model for

documents: the conceptual model incorporates not only all the objects and relationships, but also all the document components that are to be made available to any XML application.

Page 18: By Intan, Chan & Lina February, 2003 XML Databases

18

2.XML Databases

• Data Models• Well-defined equivalence: W3C has

proposed that Canonical XML be used to compare the equivalence of two documents. And another possible solution is to define documents equivalence in terms of a model that include all document features, after which such equivalence can be specified by applying document equivalence to application-specific transformations.

Page 19: By Intan, Chan & Lina February, 2003 XML Databases

19

2.XML Databases

Query Languages

• There are currently 3 query languages that are used• Template-Based Query Language• SQL-Based Query Language• XML Query Language (Bourret,

2003)

Page 20: By Intan, Chan & Lina February, 2003 XML Databases

20

2.XML Databases

Template-based Query Language

• most common query language that returns XML from relational databases

• no predefined mapping between the document and the database

• SELECT statements are embedded in a template and the data transfer software processes the results

Page 21: By Intan, Chan & Lina February, 2003 XML Databases

21

2.XML Databases

SQL-based Query Language

• uses modified SELECT statements, the results of which are transformed to XML

• a number of proprietary SQL-based languages are currently available

• simplest of these SQL-based languages uses nested SELECT statements, which are transformed directly to nested XML

Page 22: By Intan, Chan & Lina February, 2003 XML Databases

22

2.XML Databases

XML Query Language

• XML Query Language was specifically designed by Microsoft, Texcel and WebMethods to cross-examine XML documents

• XML query languages can be used over any XML document, unlike the previous two that can be used only with relational databases

• To use these with relational databases, the data in the database must be modeled as XML, thereby allowing queries over virtual XML documents

Page 23: By Intan, Chan & Lina February, 2003 XML Databases

23

3. XML-Enabled DatabasesXML-Enabled Database Concept

• Using BLOB (Binary Large Object) to store XML documents with document extensibility

Weakness : Not support node-level access, update or any structure dependent query such as XPath and XQuery.

• Mapping XML documents to tables in relational databases or objects in object oriented databases

Weakness : do not support extensibility and do not support important feature such as round tripping

Page 24: By Intan, Chan & Lina February, 2003 XML Databases

24

3. XML-Enabled DatabasesMapping Document Schemas to

Database Schemas• To transfer data between XML documents

and a database, it is necessary to map the XML document schema to the database schema

• 2 types of mappings that are used to map an XML document schema to the database schema• Table-based Mapping• Object-Relational Mapping (Bourret, 2003)

Page 25: By Intan, Chan & Lina February, 2003 XML Databases

25

3. XML-Enabled DatabasesTable-based Mapping

• used by many of the middleware products that transfer data between an XML document and a relational database

• documents that use table-based mappings often include table and column metadata

• useful for serialising relational data, such as when transferring data between two relational databases

Page 26: By Intan, Chan & Lina February, 2003 XML Databases

26

3. XML-Enabled Databases

Table-based Mapping (cont’d)<database> <table> <row>

<column1>...</column1> <column2>...</column2>

... </row> <row>

... </row>

…</table>

<table> ...

</table> ...

</database>

Page 27: By Intan, Chan & Lina February, 2003 XML Databases

27

3. XML-Enabled DatabasesObject-Relational Mapping

• used by all XML-enabled relational databases, and some middleware products

• models the data in XML document as a tree of objects that are specific to data in the document

• model is then mapped to relational databases using traditional object-relational mapping techniques or SQL 3 object views

Page 28: By Intan, Chan & Lina February, 2003 XML Databases

28

3. XML-Enabled DatabasesObject-Relational Mapping

(cont’d)

Sales Order

Customer Item Item

Price Price

Page 29: By Intan, Chan & Lina February, 2003 XML Databases

29

4. Native XML Databases

Native XML Database Concept

• designed especially to be stored XML documents

• A native XML database defines a (logical) model for an XML document, stores and retrieves documents according to that model.

Page 30: By Intan, Chan & Lina February, 2003 XML Databases

30

4. Native XML Databases

Native XML Database Concept• Database management features

•transaction management•Security•multi user access and •interface APIs

Page 31: By Intan, Chan & Lina February, 2003 XML Databases

31

4. Native XML Databases

Text-based Native XML Database

• Stores XML documents as text

• BLOB in relational database or

• A proprietary text format

• Retrieving and returning data according to predefined path is outperformed

Page 32: By Intan, Chan & Lina February, 2003 XML Databases

32

4. Native XML Databases

Model-based Native XML Database

• Internal object model

• Performances similar to text-based native XML databases

Page 33: By Intan, Chan & Lina February, 2003 XML Databases

33

4. Native XML Databases

Features Native XML Databases

• Data Definition

• Support the notion of collections similar to a table in a relational database or

• A directory in a file system

• Allow to stores chema-independent XML documents

• Risk of lower data integrity

Page 34: By Intan, Chan & Lina February, 2003 XML Databases

34

4. Native XML Databases

Features Native XML Databases• Data Manipulation

• Query Language XPath and XQL • XPath - a lack of grouping, sorting,

cross document joins, and support for data types

• Use XSLT• more database-oriented language -

XQuery.

Page 35: By Intan, Chan & Lina February, 2003 XML Databases

35

4. Native XML Databases

• Data Manipulation• Updates ad Deletes

• a real area of weakness for current NXDs • XML:DB XUpdate from the XML:DB initiative

• Indexes• Management Tools

• programmatic API, ODBC-like interface • Round-Tripping

• get the same document back again • External Entity

• how to handle external entities ?

Page 36: By Intan, Chan & Lina February, 2003 XML Databases

36

4. Native XML DatabasesDifferences between Native XML

Databases & Relational Databases

• on well established Codd’s relational theory

• XML is yet immature

• Relational databases are the best for long term storage of the durable data at the back end

• XML databases sit in the middle tier and manage active data between systems

Page 37: By Intan, Chan & Lina February, 2003 XML Databases

37

5.XML Database Products

• Middleware• XML-Enabled Databases• Native XML Databases• XML Servers• Content Management Systems • Discontinued Products • Related products: XML Query

Engines and XML Data Binding

Page 38: By Intan, Chan & Lina February, 2003 XML Databases

38

5. XML Database Products

• What to choose?• If your goal is to store and

retrieve data-centric documents, it might be an XML-enabled database, middleware or an XML server.

• If it is for document-centric documents, a native XML database or content management system might be appropriate.

Page 39: By Intan, Chan & Lina February, 2003 XML Databases

39

5.Benchmarks

• Has to meet the ten challenges: • Bulk loading• Reconstruction• Path traversals• Casting• Missing elements• Ordered access• References• Joins• Construction of large results• Containment, full-text search

Page 40: By Intan, Chan & Lina February, 2003 XML Databases

40

5.Benchmarks

• Infrastructure and total cost of ownership. • Eg. Access protocols, Result

representations, Responsiveness versus completeness, The expressiveness of the query language, and Data throughput.

• XML database API, enable a common access mechanism to XML databases.

Page 41: By Intan, Chan & Lina February, 2003 XML Databases

41

5.Cost Issues

• Comparison of products available in the market.

• The total cost of ownership.• Installation effort • Generality support• Consistency support• Preparation effort• Training • Interaction paradigm• Updates

Page 42: By Intan, Chan & Lina February, 2003 XML Databases

42

6.XML Database Applications

• Key applications include: web services, B2B document exchange, e-commerce which most probably require online and often interactive processing.

• And all information-rich scenarios: corporate information portals, membership databases, product catalogs, parts databases, patient information tracking, etc.

Page 43: By Intan, Chan & Lina February, 2003 XML Databases

43

7.Future Trends

• XML:DB initiative are working very hard on the benchmarking for XML database industry, to be made into the standard toolset used by IT departments worldwide.

• Comformint to the XML:DB API, some developers are also working on the graphical query.

Page 44: By Intan, Chan & Lina February, 2003 XML Databases

44

7.Future Trends

• Better solutions for query optimization in the web context, compressing XML data and guaranteeing transparent access to compressed data through existing APIs.

• New XML related languages has been creating such as XML Update Language-XUpdate, Simple XML Manipulation Language.

• The potential project may be the XML Access Control.

Page 45: By Intan, Chan & Lina February, 2003 XML Databases

45

8.Conclusion

• XML is changing the way that data and documents are represented, exchanged and integrated among heterogeneous computing systems

• it is also inducing and facilitating the convergence of the World Wide Web, the Internet and database research communities

• it is expected that XML databases will be extensively used in numerous domains and applications in the near future