Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
The Semistructured-Data Model
Programming Languages for XML
Spring 2011
Instructor: Hassan Khosravi
11.2
Semistructured Data
Another data model, based on trees.
Self-describing:
The data implicitly carries information about what its schema is.
May only carry the names of attributes (so possibly untyped), and
has a lower degree of organization than the data in a relational
database.
May have no associated schema (i.e. may be schema-less)
Motivation:
flexible representation of data.
sharing of documents among systems and databases.
Information integration
– E.g. want to “merge” or query two databases.
Data exchange
– E.g. two enterprises may want to exchange data (such as
buyers and sellers)
11.3
Semistructured Data representation
11.4
Relational Semistructured
Structure Tables Hierarchical tree,
graph
Schema Fixed in advance Flexible, self
describing
Queries Simple nice language Less so
Ordering None (has order by) Implied
Implementation Mature and native Add-on
11.5
Comparison with Relational Data
Inefficient: tags, which in effect represent schema information, are
repeated
Access: data is structured hierarchically.
Better than relational tuples as a data-exchange format
Unlike relational tuples, semistructured data is self-documenting
due to presence of tags
Flexible, non-rigid format: tags can be added
Allows nested structures
Wide acceptance, not only in database systems, but also in
browsers, tools, and applications
11.6
Flexibility in Schema
11.7
XML
XML : Extensible Markup Language
A standard adopted in 1998
While HTML uses tags for formatting (e.g., “italic”), XML uses tags for
semantics (e.g., indicating “this is an address” or “this is a title”).
Key idea: create tag sets for a domain (e.g., genomics), and translate
all data into properly tagged XML documents.
There are two different modes of use of XML:
Well-Formed XML allows you to invent your own tags.
No predefined schema
Valid XML conforms to a certain DTD.
The DTD describes allowable tags and their nesting.
But still reasonably flexible – e.g. may allow optional or missing
fields
11.8
Well-Formed XML
Begins with a declaration that it is XML
It has a root element that is the entire body of the text
11.9
Well-Formed XML
Valid XML
11.10
Valid XML Document Type Descriptor (DTD)
Grammar-like language for specifying elements, attributes,
nesting, ordering, #occurrences
Special attribute types ID and IDREF
Example
11.11
QUERYING SEMISTRUCTURED
DATA
11.12
Querying XML
Not nearly as mature as Querying relational
Newer
No underlying theory as in relational models
Sequence of development
Xpath – path expressions + conditions
Xquery – Xpath + full featured query language
11.13
XPath
Think of XML as a tree
path expressions + conditions
11.14
Xpath Syntax
/ root element
name of element “book”
Use name as * to match everything
@ISBN
// matches all descendant
conditions [@price < 50]
[N] nth child author [2]
Axes (to navigate around tree 13)
Parent::
Following-sibling::
Descendants::
Self::
11.15
Xpath Demo
Example
11.16
XQuery
11.17
XQuery Demo
Example