67
1 Web Data Management WebOQL WebOQL

1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

Embed Size (px)

Citation preview

Page 1: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

1

Web Data Management

WebOQLWebOQL

Page 2: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

2

OVERVIEW• Data model supports abstractions for modeling record-

based data, structured documents and hypertexts• Supports querying small databases represented as

documents (such as catalogs), restructuring single pages (converting a large page into smaller pages), restructuring sets of pages, for example, creating an index page containing a hyperlink to each of them and adding to each page a hyperlink to index page.

• Restructuring the content of a web site in order to show the same content in another view

Page 3: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

3

Data ModelData ModelThe WebOQL data model introduces the hypertree: a tree based Data model representing structured document containing hyperlinks

Hypertrees are Ordered arc-labeled trees with two kinds of arcs – Internal and external.

Internal arc: represent structured objects

External arc: represent references (links),cannot have descendants and their records must contain a ‘URL’ field.

Page 4: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

4

Data ModelData ModelExample:

[Group: students] [Group: professors]

[Name: moshe.Sem: 5]

[Name: arik.Sem: 8]

[Label: moshe home page.URL: www…/index.html]

[Label: arik home page.URL: www…/index.html]

[Name: oded.Seniority: 8]

[Label: seminar in www.URL: www…/s.html]

[Label: databases.URL: www…/index.html]

Page 5: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

5

Data ModelData ModelHyper trees are a useful data structure because they have three important abstractions:

•Collections•Nesting•Ordering

The reference notion which is very important to the web structure is captured through the distinction between internal and external arcs.

Because the nodes have no type the tree can hold heterogeneous records within its arcs.

Page 6: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

6

Data AbstractionsData Abstractions

WEB a pair (t,F) where: t is a hypertree and

F : URLs Hypertreesschema browsing

function

PAGE F(u) where u is a URL

Page 7: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

7

Tree operatorsTree operators

Definitions:

Tails: tails of a tree t are trees obtained by chopping prefixes of t.

Simple tree: simple trees of a tree t are the trees that arecomposed of an arc that stems from the root of tand its sub-tree .

Subtree: subtrees of t are the trees at the end of arcs which stem from the root of t.

Page 8: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

8

[Label:1]

[Label:2]

[Label:3]

[A:1] [A:2][B:1]

[Label:2] [Label:3]

[B:1]

[Label:1]

[A:1] [A:2]

Tree t

Simple trees of t

[A:1] [A:2] [B:1] null

Sub trees of t

Page 9: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

9

Tails of T ! (prefixes)

[Label:1]

[Label:2]

[Label:3]

[A:1] [A:2][B:1]

[Label:2]

[Label:3]

[B:1]

[Label:3]

Page 10: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

10

Connects two trees by their roots:

t1:

[label1: a2]

[label1: a1]

[label1: a] [label1: c]

t2:

[label1: b]

[label1: c2]

[label1: c1]

t1 + t2:

[label1: b]

[label1: c2]

[label1: c]

[label1: c1]

[label1: a2]

[label1: a1]

Tree operatorsTree operators

Concatenate : Tree1 + Tree2

Page 11: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

11

Hangs the tree from a new arc.

t1:

[label1: a2]

[label1: a1]

Tree operatorsTree operators

Hang : [ Arc1 / Tree1 ]

[label1: a2]

[label1: a1]

[label1: a]

[ label1: a / t1 ]

Page 12: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

12

The first subtree of the argument.

Tree operatorsTree operators

Prime : Tree’

t1:

[label1: a2]

[label1: a1]

[label1: a][label1: b]

t1’ :

[label1: a2]

[label1: a1]

Page 13: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

13

The first x simple trees of the argument. If x is not specified then only the first simple tree.

Tree operatorsTree operators

Head : Tree & [x]

t1:

[label1: a2]

[label1: a1]

[label1: a][label1: b]

t1& :

[label1: a2]

[label1: a1]

[label1: a]

Page 14: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

14

q4’

q5&

q5!

q5&2

q5

q6

q7

q4

Page 15: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

15

HANG[Label: “papers from smith”, Format: “ps.Z”/q1]

[Tag: “UL”/[Tag: “LI”, Text: “First Child”]+

[Tag: “LI”, Text: “Second Child”]+

[Tag: “LI”, Text: “Third Child”]]+

[Url: “http://a.b.c.”, Label “Click Here”]

[Label:Papers from smith Format:ps.Z]

[Title : Are……….. Url:http://www……….]

[Title:Recent……….. Url:http://………..]

HANG + concatenate

[Tag:UL]

[Tag:LIText:FirstChild]

[ ][ ]

[Url: “http://a.b.c.”,

Label “Click Here”]

Page 16: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

16

Extracts a field from an arc’s label, e.g. Example.Groupcan have a value of ‘students’. If this field does not exist a value of ‘null’ is returned.

Tree operatorsTree operators

Peek : Arc.field

Test for the presence of a field from an arc’s label, e.g. Example?Group evaluates to true, whileExample?Name evaluates to false.

IsField : Arc?field

Page 17: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

17

Definitions• Page – when a hypertree has an associated URL

that identifies it.

• Web – Collection of interrelated pages.

• External Arc of each page is a link in the web

• Schema – A web can optionally have a distinguished page to provide entry point to the web

Page 18: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

18

http://a.b.c./two.html

http://a.b.c./one.html http://a.b.c./three.html

•No Schema: One must know URL of one or more pages

Page 19: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

19

Web Web

Schema

http://a.b.c./two.html

http://a.b.c./one.htmlhttp://a.b.c./four.html

http://a.b.c./three.html

Weboql queryNew page

Page 20: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

20

<UL>

<LI> First Child

<LI> Second Child

<LI> Third Child

</UL>

<A HREF=“http://a.b.c.”> Click Here </A >

[Tag: “UL”/[Tag: “LI”, Text: “First Child”]+

[Tag: “LI”, Text: “Second Child”]+

[Tag: “LI”, Text: “Third Child”]]+

[Url: “http://a.b.c.”, Label “Click Here”]

[Tag:UL]

[Url: “http://a.b.c.”,

Label “Click Here”]

[Tag:LIText:FirstChild]

[ ][ ]

Page 21: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

21

[Tag: LIText:First Child]

[Tag: LIText:Second Child]

[Tag: LIText:Third Child]

[Url:http://a.b.c.Label: Click here]

Tree representing HTML document consisting of a list and a hyperlink•Trees are ordered•Arcs are not labeled with atomic values but records

Page 22: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

22

Paper Database CS papers

[group:DBMS]

[group:ProgLang]

[group:Card]

[Titl

e:Rec

ent

Author

s:Smith

Public

ation

s:Tec

h]

[Title:Are……

Authors:Smith

Publications:ACM]

[Lab

el:Ful

l Pap

ers

Url: w

ww…] [Label:Abstract

Url: www…]

Page 23: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

23

SELECT - FROM - WHERESELECT - FROM - WHEREThis familiar query language construct is used by WebOQL asthe main construct of queries.

Select

From

Where A boolean condition

Definition of variables

Query to evaluate

x in example, y in x!

x.Seniority = 8

[y.Label, y.URL]

Page 24: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

24

SELECT - FROM - WHERESELECT - FROM - WHEREFor each instantiation of the variables in the from clause check the condition in the where clause, if its true then evaluate the query in the select clause and append it to the result.

[Label: seminar in www.URL: www…/s.html]

[Label: databases.URL: www…/index.html]

Page 25: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

25

Select [y.title, y.publication]From x in cs papers, y in x’

missing data Publication - undefined

Page 26: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

26

• Compute a listing of the papers’ publication data grouped by title.

Select [x.Title /

Select [z.Publication] from y in csPapers, z in y’

Where x.title = z.title ]

From w in csPapers , x in w’

Page 27: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

27

• Schema – a distinguished hypertree

• Browsing function – maps strings (URLs) to hypertree, it defines a graph where the nodes are pages and there is an arc between node a and b if the content of the page at node a contains an external arc whose url attribute is the url of the page at node b.

Page 28: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

28

• Analogy with Relational database

• Hypertree > Relations

• Webs > databases

• Schema of a web >catalog of a database

Page 29: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

29

• Select [x.Tag]

From x in

browse(http://www.cs.toronto.edu”)

[Tag:head] [Tag : body]

Page 30: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

30

• SFW creates a web

• Select Title and URLs of papers authored by Smith.

Select [y.Title, y’.URL] as schema

From x in csPapers , y in x’

Where y.authors ~”smith”

Page 31: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

31

Queries• Create a web page with URL “Group

Names” whose content is the list of group names (assume that there is no such page in the current web)

• Select [x.Group] as “Group Names” from x in csPapers

Page 32: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

32

Queries

• Create several pages ; one for each research group (using the group name as URL). Each page contains the publications of the corresponding group

• Select x’ as x.Group from x in csPapers

Page 33: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

33

Data Model• Records as Labels on Arcs

• Internal and External Arcs

[Label: Theatres Online, Url: http://www…,Base: http://www…,Text: This page contains...]

[Tag: ULText: one of the…]

[Label: Sports Zone, Url: http://www…,Base: http://www…,Text: Sports Zone…]

[Tag: XYZ,Text: One of the…]

[Tag: XYZ,Text: If you are…]

[Tag: XYZ,Text: …]

[Tag: H1,Text: City Overview…]

[Tag: LI,Text: One of the…]

[Tag: L1,Text: If you are interested…]

[Tag: L1,Text: All the hotels…]

[Tag: XYZ,Text: Contains…]

[Label: All the Hotels, Url: http://www…,Base: http://www…,Text: These are all…]

Page 34: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

34

Query: list elements containing “ticket”doc := “http://www.citynet.com/overview.html”;

[tag “UL”/

Select y

from y in doc !’

where y’.text ~ “ticket”]

[Tag: LI][Tag: LI]

[Label: Theatres Online, Url: http://www…,Base: http://www…,Text: This page contains...]

[Tag: UL]

[Label: Sports Zone, Url: http://www…,Base: http://www…,Text: Sports Zone…][Tag: XYZ,

Text: One of the…]

[Tag: XYZ,Text: If you are…]

[Tag: XYZ,Text: …]

Page 35: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

35

Web restructuringWeb restructuring

Using these tree operators we have shown how a tree canbe restructured.

To restructure a web we must have a function which maps one web to another. The new web has some hypertree as its schema while the browsing function is an extension of the old web’s browsing function - targets URLs which werenot previously targeted.

The way it is done in WebOQL is by using the AS clause.

Page 36: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

36

Web restructuringWeb restructuringGenerally the select clause of WebOQL has the form of:

Select q1 as s1, q2 as s2, …., qn as sn

Si can be either the key word schema, or a string query.

An as clause which evaluates to schema defines the schema of the web.

[Title: y.Group] as schema

Title: students

Title: professors

Page 37: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

37

Web restructuringWeb restructuringGenerally the select clause of WebOQL has the form of:

Select q1 as s1, q2 as s2, …., qn as sn

Si can be either the key word schema, or a string query.

An as clause which evaluates to a string defines a page and is treated as the URL for it.

[x.Name] as y.Group

[Name: moshe] [Name: arik]

students

Page 38: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

38

Web restructuringWeb restructuring

After a web is created there are two possibilities : either query it further (restructure it) or return it to the host application.

If we want to return the web to the host application for thesake of showing it to a browser then we must format the pages in an HTML compliant way. This is easily done by restructuring it using HTML tags as labels.

Page 39: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

39

Document restructuringDocument restructuringWeb documents are a perfect example of semi structured datasince they do not have a fixed schema and can have various irregularities. In an HTML document most of the tags mayappear any number of times or not at all.

WebOQL uses a wrapper which creates abstract syntax trees (AST) from any arbitrary HTML document. This is easily donesince the markup tags of HTML reflects the logical relationshipbetween the various information items.

Example: <UL><LI> item 1. </LI><LI> item 2. </LI><LI> item 2. </LI></UL>

Page 40: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

40

• Generate a web consisting of a page for each research group containing a title and author of all its publications, and an index web page , that lists all the groups and provides links to their pages

newWeb Select unique [Name : x.Group, url : x.Group] as schema

[y.Title, y.Authors ] as x.Group

From x in csPapers, y in x’

Page 41: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

41

[Name:…Url:..]

[Name: Prog. LangUrl: Prog.Lang..]

[Name: Card PunchingUrl: Card Punching] “As Schema”

Card Punching

[Titles: Recent…Authors: Smith]

[Titles: Arc…Authors: Smith]

Prog. Lang.

[Titles: Cobol…Authors: James J]

[Titles: Assembly LanAuthors: John,..]

“As x. group”

Page 42: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

42

• NewerWeb newWeb |• select [ Tag: “H3”, Text: y.Title ] +• [ Tag: “BR”, Text: y.Publication ] +• [ Tag: “BR”, Text: y.Authors ] +• [ Tag: “P” ]• as x.Name• from x in schema, y in x.Name• |• select [ Tag: “H2”, Text: “Publications of the” *

x.Name * “ Group” ] + x.Name +• [ Tag: “A”, Label: “To Index”, Url: “http://a.b.c/Index

of Projects.html” ]• as “http://a.b.c/” * x.Name * “.html”• from x in schema

Page 43: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

43

• |• select [ Url: “http://a.b.c/Index of Projects.html” ]

as schema,• [ Tag: “H2”, Text: “Index of Projects” ] +• [ Tag: “UL” /• select [ Tag: “LI” /• [Tag: “A”, Label: x.Name,• Url: “http://a.b.c/” * x.name * “.html”• ]]• from x in schema• ] as “http://a.b.c/Index of Projects.html

Page 44: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

44

<H2> Index of Projects </H2>

<UL>

<LI> <A HREF = “http://a.b.c./cardpunching.html”>

Card Punching

</A>

</LI>

<LI> <A HREF = “http://a.b.c./programminglanguages.html”>

Programming Languages

</A>

</LI>

<LI> …..

</UL> Index Page

Page 45: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

45

<H2>Publications of the Card Punching group </H2>

<H3> recent Discoveries in Card Punching </H3>

<BR> Technical Report TROIS

<BR> Peter Smith, John Brown

<P>

<H3> Are Magnetic Media Better ? </H3>

<BR> ACM TOCP Vol 3 No. (1942) pp.2337

<BR> Peter Smith, John Brown

<P>

<A HREF=“http://a.b.c./IndexnProject.html”>

To index

</A>

Group Pages

Page 46: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

46

Document restructuringDocument restructuringNavigation patterns:

In the examples we have seen the variables used in the queriesranged over simple trees of the tree we queried, however in theWWW variables may range over several linked sub trees whosestructure is not fully known to us.

^ - record predicate which is true for every internal arc.

[Tag=“H2”] - record predicate which is true for every arc which has an ‘H2’ tag.

select [x.text]from x in “someone’s.html”

via ^*[Tag = “H2”]

Page 47: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

47

Document restructuringDocument restructuringNavigation patterns:

In the examples we have seen the variables used in the queriesranged over simple trees of the tree we queried, however in theWWW variables may range over several linked sub trees whosestructure is not fully known to us.

select [x.text]from x in “someone’s.html”

via >*[not(Tag = “H2”)]

> - record predicate which is true for every external arc.

[not(Tag=“H2”)] - record predicate which is true for every arc which does not have an ‘H2’ tag.

Page 48: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

48

Document restructuringDocument restructuringNavigation patterns:

When navigation patterns are omitted then the query is treatedas if there was a navigation pattern which always evaluated to true.

Variables are instantiated in left to right depth-first or breadth-first search. Since the default is depth-first to usebreadth-first the key word viabfs is used instead of via.

Page 49: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

49

Navigation Pattern

[Not (Tag = “A”)]* - Path of any length composed of arcs not having an attribute tag with value “A”.

[Tag = “LI”] [Tag = “A”] – path of length 2

^*> - all paths in a tree that lead from root to an external arc

Select [x.url]

from x in “http://a.b.c./index.html”

Via [not (tag = “Table”)]*>

All the external arcs in the document pointed to by the “http”……” that do not occur within a table

Page 50: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

50

Select [x.url,x.text]

From x in “http://a.b.c./root.html”

Via (^*[Labled “Next’’]>)*

What this query will produce?

Page 51: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

51

[Tag: H3,Text: Price…] [Tag: H3,

Text: Price…]

[Tag: UL]

[Tag: UL]

[Tag: LI]

[Tag: LI] [Tag: LI]

Select X ! & From X in http://a.b.c./large.html

via ^* [Tag = “H3”]Where X!.Tag=“UL” and X.Text ~ “Price”

Page 52: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

52

[Tag: H2,Text: Publications of the]

[Tag: H3,Text:]

[Tag: BR,Text:] [Tag: BR,

Text: y][Tag: P,Text: ]

[Tag: P,Text: ]

[Label: To index, Url:Base: http://a.b.c./cardpunching.html,Text: indexofprojects]

[Tag: H3,Text:]

[Tag: BR,Text:]

[Tag: BR,Text: y]

Tree generated by Query

[Tag: “OL”/Select [Tag: “LI” / X&3]from X in http://a.b.c./cardpunching.html!where X.tag = “H3”

[Tag: LI][Tag: LI][Tag: H3]

[Tag: OL]

Page 53: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

53

[Tag: “OL”/Select [Tag: “LI”/

Select y

from y in X while not y.Tag=“p”]

From X in http://a.b.c.//IrregularDoc.html”!

where X.tag = “H3”

]

Page 54: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

54

Project web select [x.proj name, x.proj descr] as “projects”[x.emp name, x.emp phone] as “people”[x.proj name] as “x.proj name”[x.emp name] as “x.emp name”

From x in “SQLDb. Select proj name, emp name, emp phone, proj descr from proj, emp, worksin

where Emp.id = worksIn.empid and proj.id = worksIn.projId;”

Generate a web containing a page for each project, a page for eachperson and two index pages, listing all the projects and all the people, a person’s page contains pointers to the Projects in whichhe /she is involved and a project page contains pointers to the

pages or the people involved in it.

Page 55: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

55

[Label: Full Version, Url: http://www…/paper2.ps.z,Base: http://www…/cspapers.html,Text: 1k098k79…]

[Tag: UL,Text: Recent…]

[Tag: LI,Text: Are Magnetic…]

[Tag: BR,Text: ]

[Tag: H2,Text: Card Punching…] [Tag: H2,

Text: Programming…]

[Tag: H1,

Text: Publications of Research…][Tag: UL,Text: …]

[Tag: H2,Text: Databases…]

[Tag: UL,Text: Cobol in AI Sam James…]

[Tag: LI,Text: Recent…] [Tag: LI,

Text: Cobol in…]

[Tag: LI,Text: Assembly for…]

…. ….….

[Label: Abstract, Url: http://www…/abstr2.html,Base: http://www…/cspapers.html,Text: Are Magnetic Media…]

[Tag: BR,Text: ]

[Tag: BR,Text: ACM TOCP Vol. 3 No. (1942) pp 23-37]

[Tag: B,Text: Peter Smith…]

[Tag: BR,Text: ]

[Tag: XYZ,Text: Are Magnetic]

….

[Tag: CITE,Text: Are Magnetic…]

Page 56: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

56

Select [Title: y”.Text, Authors: y”!!.text]

From x in “http://www.a.b.c./paper.html”,y in x’

Where x.Tag = UL

Retrieve titles and authors of each paper

x range over simple trees and y over elements under UL

Page 57: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

57

Select [title: y”.Text,

authors: y”!!.text,

Publications: y”!3.Text

ps-url: y’!4.url

abstract-url:y’!!.url]

as “pubsdb: insert”

From X in http://www.a.b.c./paper.html,

y in X!’

Where X.tag = “H2”

Page 58: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

58

[Tag: H1,Text: Reports in …]

[Tag: HR,Text:]

[Tag: BR,Text:]

[Tag: BR,Text: ] [Tag: P,

Text: ]

[Tag: P,Text: ][Tag: CITE,

Text:Efficient]

[Tag: BR,Text:]

[Tag: H2,Text: David Rice]

[Tag: CITE,Text: Indexing]

[Label: Indexing Sound, Url: http://www…/pl.ps.gz,Base: http://www…./trs.html,Text: ;sd..sGhj&9870….]

[Tag: XYZ,Text:CS-TR-0327..]

[Label:Abstract Available Online, Url: http://www…/pl.html,Base: http://www…./trs.html,Text: Indexing Sound….]

[Label: Efficient Clustering…., Url: http://www…/p2.ps.gz,Base: http://www…./trs.html,Text: .fHjs*9))fujs…….]

[Tag: XYZ,Text:CS-TR-0029..]

[Label:Temporal Constraints, Url: http://www…/p3.ps.gz,Base: http://www…./trs.html,Text: ;+-9ivm27&813nd….]

[Tag: XYZ,Text:CS-TR-0120..]

[Tag: HR,Text:]

[Tag: H2,Text: John Smith] …

Page 59: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

59

Select [title: Y.textauthor: X.textpublications: Y!!.TextPS-Url: Y’:Url

abstract-url:Y!4.Url] as “PubsDb: insert”

From X in “http://www.x.y.z./papers.html”Y in X! while not (Y.Tag = “HR”)where X.Tag = “H2”

and Y.Tag=“CITE”

Page 60: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

60

[Label: Full Version, Url: http://www…/paperl.ps.z,Base: http://www…/cspapers.html,Text: #hH6YiaP….]

[Tag: UL,Text: Recent…]

[Tag: LI,Text: Are Magnetic…]

[Tag: BR,Text: ]

[Tag: H2,Text: Card Punching…]

[Tag: H2,Text: Programming…]

[Tag: H1,

Text: Publications of Research…][Tag: UL,Text: …]

[Tag: H2,Text: Databases…]

[Tag: UL,Text: Cobol in AI Sam James…]

[Tag: LI,Text: Recent…] [Tag: LI,

Text: Cobol in…]

[Tag: LI,Text: Assembly for…]

…. ….

….

[Label: Abstract, Url: http://www…/abstrl.html,Base: http://www…/cspapers.html,Text: It is company…]

[Tag: BR,Text: ]

[Tag: BR,Text: Technical…….][Tag: B,

Text: Peter Brown…]

[Tag: BR,Text: ]

[Tag: XYZ,Text: Recent….]

Figure 5.6 Instantiation of Variables in Query 4

….

[Tag: CITE,Text: Recent…]

X

y

y’

y”

Page 61: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

61

Query 4:

csPapers select[Group: X.Text /

select[Title: y”.Text ,

Authors: y”!!.Text,

Publication:y”!3.Text/

[Label: “Abstract”,Url:y’!!.Url]+

[Label: “Full Version”,Url:y’!4.Url]

]

from y in X!’

]

from X in “http://www.a.b.c./papers.html”

where X.Tag = “H2”

Page 62: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

62

ArchitectureArchitecture

API

Wrapper Manager

Wrapper Wrapper Wrapper Wrapper

DBMS File System

Web1 Web k

Query Engine

...

query

URL tree

web

Page 63: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

63

• Each node corresponds to either a subdocument enclosed in an occurrence of a paired tag. For example, root node corresponds to the subdocument enclosed between <html> and </html> or to a subdocument enclosed in an occurrence of a non-paired tag and the tag that follows it

• Arcs leading to nodes corresponding to the <a> tag and for which the protocol of the associated URL is http are external. All other arcs are internal.

Page 64: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

64

• The incoming arc to a node contains the attributes of the subdocument represented by this node.

• Internal arcs are labeled with a record containing two fields: Tag and Text.

• Tag is the HTML tag corresponding to the subtree that is the destination of the arc.

• The value of the Text depends on whether Tag is paired or non-paired.

• If paired, then the value of the text is the text that is enclosed between <Tag> and </Tag> excluding markups.

• If Tag is non-paired, the value of text is the text between <Tag> and the tag that comes after it in document.

Page 65: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

65

• External arcs are labeled with a record containing four fields, label, url, base and text.

• Label is the label of the hyperlink, the text enclosed between <a href =…> and the </a> tags; url is the value of the href attribute, base is the url of the document being processed and Text is the text of the referred document excluding markup.

• A dummy tag named <xyz> is used to enclose pieces of text that are not explicitly tagged.

• Rules are applied recursively to the text inside occurrences of paired tags.

Page 66: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

66

• <HTML> <H1> Publications of Research Groups at Cs

Dept</H1><H2> Card Punching </H2><UL>

<LI> <CTTE> Recent Advances in Card

Punching> <BR><B> Peter Smith, John Brown</B><BR>Technical Report TR015</CTTE><BR><A HREF = http://../abstract.html> Abstract

</A><BR>

Page 67: 1 Web Data Management WebOQL. 2 OVERVIEW Data model supports abstractions for modeling record-based data, structured documents and hypertexts Supports

67

• <a href =“http://../paper.ps.Z> Full version</a>• </LI>• <LI>• <CTTE> Are magnetic Media Better?<BR><B> Peter Smith, John Brown, Tom</B><BR>ACM TOCP Vol. 3, No. , pp</CTTE><BR><A HREF = HTTP://../abst2.html>

Abstract</A><BR><A HREF=“http://../paper2.ps.Z”> Full version</A></LI></UL><H2> Programming lang</H2>