29
CSCI5333 DBMS

CSCI5333 DBMS. Chapter 26 XML and Internet Databases

Embed Size (px)

Citation preview

CSCI5333 DBMS

CSCI5333 DBMS

Chapter 26

XML and Internet Databases

CSCI5333 DBMS

Outline

Structured, Semistructured, & Unstructured Data

XML Hierarchical Data Model

XML Document, DTD, & XML Schema

XML Documents & Databases

XML Querying

4CSCI5333 DBMS

Structured vs Semistructured Data

Structured Data:

e.g., information stored in databases; all records

have the same format as defined in the

relational schema

Semistructured data may have a certain structure

but no all the information collected will have

identical structure.

5CSCI5333 DBMS

FIGURE 26.1

Representing semistructured data as a graph.

6CSCI5333 DBMS

FIGURE 26.2Part of an HTML

document representing

unstructured data

(c.f., the company database schema)

7CSCI5333 DBMS

XML Hierarchical (Tree) Data ModelProblem with HTML document:

Difficult to interpret automatically by programs because they do not include schema information about the type of data in the documents

Inappropriate as intermediate Web documents to be exchanged among various computer sites

Solution XML documentsTwo main structuring concepts: elements, attributes

c.f., In XML, tag names are defined to describe the meaning of the data elements, rather than to describe how the text is to be displayed (as in HTML).

8CSCI5333 DBMS

FIGURE 26.3A complex

XML element called

<projects>.

Correction: <project>

Complex elements: <projects>, <project>, <Worker>

Simple elements: <Name>, <Number>, <SSN>, …

Standalone=“yes” - schemaless

9CSCI5333 DBMS

XML Documents, DTD, and XML Schema

A well-formed XML document is one that follows a few conditions.– Start with an XML declaration (version, …)

– Tree model

– A single root element

– Matching start and end tags for an element must be within the tags of the parent element

– Syntactically correct

10CSCI5333 DBMS

XML Documents, DTD, and XML Schema

A valid XML document is well formed, and in addition the element names used in the start and end tag pairs must follow the structure specified in a separate XML DTD (Document Type Definition) file or XML schema file.

Figure 26.4: a sample XML DTD called projects* Zero or more, + one or more, ? Zero or one

Otherwise: exactly once

(data type)

(#PCDATA) parsed character data

11CSCI5333 DBMS

FIGURE 26.4 An XML DTD file called projects

To use the DTD file: (1) Store the DTD file in the same file system as the XML document(2) <?xml version=“1.0” standalone=“no”?>

<!DOCTYPE projects SYSTEM “proj.dtd”>

12CSCI5333 DBMS

DTD Limitations

1) Data types in DTD are not very general

2) Has its own special syntax and thus requires specialized processors

3) All DTD elements are always forced to follow the specified ordering of the documents, so unordered elements are not permitted.

Solution XML Schema

13CSCI5333 DBMS

FIGURE 26.5 An XML schema file called company

Schema namespace

the root element company; also an unnamed complex element

• “Department”, “Employee”, etc. must be named types.• The selector “employeeDependent” is an attribute of “Employee”, of type “Dependent”.• The field “dependentName” in “Dependent” must be unique.

14CSCI5333 DBMS

FIGURE 26.5 (continued)

An XML schema file

called company. <xsd:uniqu …> specifies a key constraint for non-primary key element.

<xsd:key> specifies a primary key.

<xsd:keyref> specifies a foreign key; <xsd:selector> refers to the referencing element type; <xsd:field> refers to the referencing attribute.

15CSCI5333 DBMS

FIGURE 26.5 (continued)An XML schema file called

company

Exercise: Define the element “projectWorker” in the type “Project” as an embedded sub-element.

Answer:

<xsd:element name=“projectWorker” minOccurs=“1” maxOccurs=“unbound”> <xsd:sequence> <xsd:element name=“SSN” type=“xsd:string” /> <xsd:element name=“hours” type=“xsd:float” /> </xsd:sequence></xsd:element>

16CSCI5333 DBMS

FIGURE 26.5 (continued)An XML schema file called company

17CSCI5333 DBMS

XML Documents and Databases

Approaches to Storing XML DocumentsExtracting XML Documents from Relational

DatabasesBreaking Cycles to Convert Graphs into TreesOther Steps for Extracting XML Documents from

Databases

18CSCI5333 DBMS

FIGURE 26.6

An ER schema diagram for a simplified UNIVERSITY database.

19CSCI5333 DBMS

FIGURE 26.7Subset of the UNIVERSITY database schema

needed for XML document extraction.

20CSCI5333 DBMS

FIGURE 26.8Hierarchical (tree) view

with COURSE as

the root.

21CSCI5333 DBMS

FIGURE 26.9

XML schema document with COURSE as the root.

22CSCI5333 DBMS

FIGURE 26.10Hierarchical (tree)

view with STUDENT as the

root.

23CSCI5333 DBMS

FIGURE 26.11

XML schema document with STUDENT as the root.

24CSCI5333 DBMS

FIGURE 26.12

Hierarchical (tree) view with SECTION as the root.

25CSCI5333 DBMS

FIGURE 26.13Converting a graph with cycles into a hierarchical

(tree) structure.

26CSCI5333 DBMS

XML Query

XPath: Specifying Path Expressions in XML

XQuery: Specifying Queries in XML

27CSCI5333 DBMS

FIGURE 26.14Some examples of XPath expressions on XML

documents that follow the XML schema file COMPANY in Figure 26.5

28CSCI5333 DBMS

FIGURE 26.15Some examples of XQuery queries on XML documents that

follow the XML schema file COMPANY in Figure 26.5.

29CSCI5333 DBMS

Summary

XML documentsXML & databases