28
1 New Ways of Querying the Web by Eliahu Brodsky and Alina Blizhovsky

1 New Ways of Querying the Web by Eliahu Brodsky and Alina Blizhovsky

  • View
    215

  • Download
    1

Embed Size (px)

Citation preview

1

New Ways of Querying the Web

by

Eliahu Brodsky

and

Alina Blizhovsky

2

Simple Querying

• A search engine looks for the word (or the words) that a document contains.

• A search engine looks for a Web document which contains the word.

3

Querying structured data

• A data on the Web may be structured (e.g. books catalog).

• A “structure” means schema.

• The schema may not be rigid (semi- structured data).

• More complex queries may be executed.

4

CGI

• Advantage– Uses the existing DBMS (e.g. relational).

• Disadvantage– Problems on integrating a data from the

different Web sources.

5

XML(Extensible Markup Language)

• A subset of SGML

• Benefits– Arbitrary extension of a document’s tags and

attributes.– Support for documents with complex structure.– Validation of documents structure (with respect

to an optional Document Type Descriptor).

6

Example of XML data<book year=“1995”>

<title> Database Systems </title>

<author><name> Date </name></author>

<publisher> Addison-Wesley </publisher>

</book>

<book year=“1998”>

<publisher> The Math Works </publisher>

<title> MATLAB </title>

</book>

7

Example of Document Type Descriptor (DTD)

<!ELEMENT book

(title,author?,publisher)>

<!ATTLIST book year CDATA>

<!ELEMENT author (name)>

8

Semi-structured Data Model

• Non-rigid schema

• Object Exchange Model (OEM)

• Data represented by a graph.

9

Example of XML data<book year=“1995”>

<title> Database Systems </title>

<author><name> Date </name></author>

<publisher> Addison-Wesley </publisher>

</book>

<book year=“1998”>

<publisher> The Math Works </publisher>

<title> MATLAB </title>

</book>

10

book book

(year=“1995”) (year=“1998”)

title author

name

publisher

Database Systems

Addison-Wesley

Date

publishertitle

MATLAB The Math

Works

11

Example of XML data<book year=“1995” id=“o100”>

<title> Database Systems </title>

<author><name> Date </name></author>

<publisher> Addison-Wesley </publisher>

</book>

<book year=“1998” related=“o100”>

<publisher> The Math Works </publisher>

<title> MATLAB </title>

</book>

12

book book

(year=“1995”) (year=“1998”)

title author

name

publisher

Database Systems

Addison-Wesley

Date

publishertitle

MATLAB The Math

Works

related

13

XML-QL

• Extracts data from large XML documents.

• Integrates XML data from multiple sources.

• Translates XML data between different DTD.

• Processes a request by– sending queries to XML sources, or by– transporting large amounts of XML data to

clients.

14

Example of XML-QL

WHERE

<book>

<publisher> Addison-Wesley </>

<title> $t </title>

</book> IN “www.a.b.c/books.xml”

CONSTRUCT <result><title> $t </></>

15

Example of XML data<book year=“1995” id=“o100”>

<title> Database Systems </title>

<author><name> Date </name></author>

<publisher> Addison-Wesley </publisher>

</book>

<book year=“1998” related=“o100”>

<publisher> The Math Works </publisher>

<title> MATLAB </title>

</book>

16

Result of the query

<result>

<title> Database Systems </title>

</result>

17

WHERE

<book> <publisher> Addison-Wesley </>

<author> $a1 </>

</> IN “www.a.b.c/books1.xml”,

<book> <publisher>

<name> The Math Works </>

</>

<author> $a2 </>

</> IN “www.d.e.f/books2.xml”,

$a1 = $a2

CONSTRUCT <author> $a1 </>

18

Regular Path Expressions

• Permitted wherever XML permits an element.

• Provide:– alternation ( | )– concatenation ( . )– Kleene-star operators ( * )

19

Example of a regular path expression

WHERE

<part+.(subpart | component.piece)>

$r

</> IN “www.a.b.c/parts.xml”

CONSTRUCT <result> $r </>

20

<part><subpart> $r </></>

<part><part><component><piece>$r</></></></>

<part><part><subpart> $r </></></>

.

.

.

21

XQL

• Is designed specifically for XML documents.

• Provides a simple syntax (patterns modeled after directory notation).

• Expressed in strings that can be embedded in programs, scripts, and XML or HTML attributes.

22

The Result of XQL Query

• Depends on implementation. One of the following:– XML document.– A tree that can be fed back in to XQL.– Different type of structure (e.g. set of pointers

to nodes).

23

Search Context

• Is the set of nodes against which a query operates.

• The “root context” and the “current context”:

• / use the “root context”

• . / use the “current context” explicitly

24

Example of an XQL query

./book[@style = /bookstore/@specialty]

book[@style = /bookstore/@specialty]

Find all books where the value of style attribute of

the book is equal to the value of the specialty

attribute of the bookstore element at the root of the

XML document.

25

Additional examples

author[lastname = ‘Bob’]

Find all author elements whose last name

sub element is Bob.

author[. = ‘Bob’]

Find all author elements whose value is Bob.

26

Regular path expressions in XQL

• bookstore//title

Find all title elements, one or more levels

deep in the bookstore.

• bookstore/*/title

Find all title elements that are grandchildren

of bookstore elements.

27

Indices in XQL

<x>

<y> Text1 </y>

<y> Text2 </y>

</x>

<x>

<y> Text3 </y>

<y> Text4 </y>

</x>

x/y[0]

Text1,Text3

(x/y)[3]

Text4

x[1]/y[0]

Text3

28

XML-QL vs. XQL

• XQL may easily be embedded into programs, scripts, XML and HTML tags.

• XQL assume the user understand XML document as a graph.

• XML-QL provides construction of new complicated XML documents.

• XML-QL provides XML-like patterns.