View
215
Download
1
Embed Size (px)
Citation preview
2
Simple Querying
• A search engine looks for the word (or the words) that a document contains.
• A search engine looks for a Web document which contains the word.
3
Querying structured data
• A data on the Web may be structured (e.g. books catalog).
• A “structure” means schema.
• The schema may not be rigid (semi- structured data).
• More complex queries may be executed.
4
CGI
• Advantage– Uses the existing DBMS (e.g. relational).
• Disadvantage– Problems on integrating a data from the
different Web sources.
5
XML(Extensible Markup Language)
• A subset of SGML
• Benefits– Arbitrary extension of a document’s tags and
attributes.– Support for documents with complex structure.– Validation of documents structure (with respect
to an optional Document Type Descriptor).
6
Example of XML data<book year=“1995”>
<title> Database Systems </title>
<author><name> Date </name></author>
<publisher> Addison-Wesley </publisher>
</book>
<book year=“1998”>
<publisher> The Math Works </publisher>
<title> MATLAB </title>
</book>
7
Example of Document Type Descriptor (DTD)
<!ELEMENT book
(title,author?,publisher)>
<!ATTLIST book year CDATA>
<!ELEMENT author (name)>
8
Semi-structured Data Model
• Non-rigid schema
• Object Exchange Model (OEM)
• Data represented by a graph.
9
Example of XML data<book year=“1995”>
<title> Database Systems </title>
<author><name> Date </name></author>
<publisher> Addison-Wesley </publisher>
</book>
<book year=“1998”>
<publisher> The Math Works </publisher>
<title> MATLAB </title>
</book>
10
book book
(year=“1995”) (year=“1998”)
title author
name
publisher
Database Systems
Addison-Wesley
Date
publishertitle
MATLAB The Math
Works
11
Example of XML data<book year=“1995” id=“o100”>
<title> Database Systems </title>
<author><name> Date </name></author>
<publisher> Addison-Wesley </publisher>
</book>
<book year=“1998” related=“o100”>
<publisher> The Math Works </publisher>
<title> MATLAB </title>
</book>
12
book book
(year=“1995”) (year=“1998”)
title author
name
publisher
Database Systems
Addison-Wesley
Date
publishertitle
MATLAB The Math
Works
related
13
XML-QL
• Extracts data from large XML documents.
• Integrates XML data from multiple sources.
• Translates XML data between different DTD.
• Processes a request by– sending queries to XML sources, or by– transporting large amounts of XML data to
clients.
14
Example of XML-QL
WHERE
<book>
<publisher> Addison-Wesley </>
<title> $t </title>
</book> IN “www.a.b.c/books.xml”
CONSTRUCT <result><title> $t </></>
15
Example of XML data<book year=“1995” id=“o100”>
<title> Database Systems </title>
<author><name> Date </name></author>
<publisher> Addison-Wesley </publisher>
</book>
<book year=“1998” related=“o100”>
<publisher> The Math Works </publisher>
<title> MATLAB </title>
</book>
17
WHERE
<book> <publisher> Addison-Wesley </>
<author> $a1 </>
</> IN “www.a.b.c/books1.xml”,
<book> <publisher>
<name> The Math Works </>
</>
<author> $a2 </>
</> IN “www.d.e.f/books2.xml”,
$a1 = $a2
CONSTRUCT <author> $a1 </>
18
Regular Path Expressions
• Permitted wherever XML permits an element.
• Provide:– alternation ( | )– concatenation ( . )– Kleene-star operators ( * )
19
Example of a regular path expression
WHERE
<part+.(subpart | component.piece)>
$r
</> IN “www.a.b.c/parts.xml”
CONSTRUCT <result> $r </>
20
<part><subpart> $r </></>
<part><part><component><piece>$r</></></></>
<part><part><subpart> $r </></></>
.
.
.
21
XQL
• Is designed specifically for XML documents.
• Provides a simple syntax (patterns modeled after directory notation).
• Expressed in strings that can be embedded in programs, scripts, and XML or HTML attributes.
22
The Result of XQL Query
• Depends on implementation. One of the following:– XML document.– A tree that can be fed back in to XQL.– Different type of structure (e.g. set of pointers
to nodes).
23
Search Context
• Is the set of nodes against which a query operates.
• The “root context” and the “current context”:
• / use the “root context”
• . / use the “current context” explicitly
24
Example of an XQL query
./book[@style = /bookstore/@specialty]
book[@style = /bookstore/@specialty]
Find all books where the value of style attribute of
the book is equal to the value of the specialty
attribute of the bookstore element at the root of the
XML document.
25
Additional examples
author[lastname = ‘Bob’]
Find all author elements whose last name
sub element is Bob.
author[. = ‘Bob’]
Find all author elements whose value is Bob.
26
Regular path expressions in XQL
• bookstore//title
Find all title elements, one or more levels
deep in the bookstore.
• bookstore/*/title
Find all title elements that are grandchildren
of bookstore elements.
27
Indices in XQL
<x>
<y> Text1 </y>
<y> Text2 </y>
</x>
<x>
<y> Text3 </y>
<y> Text4 </y>
</x>
x/y[0]
Text1,Text3
(x/y)[3]
Text4
x[1]/y[0]
Text3