View
215
Download
1
Embed Size (px)
Citation preview
PatManQL: A language to manipulate patterns and data in hierarchical catalogs
Panagiotis Bouros, Theodore Dalamagas, Timos Sellis, Manolis Terrovitis
Knowledge and Database Systems LabSchool of Electrical and Computer Engineering
National Technical University of Athens{pbour,dalamag,timos,mter}@dblab.ece.ntua.gr
PatManQL 3
Introduction
Huge volumes of data on the Web Hierarchical structures and catalogs Paths → knowledge artifacts
Represent group of data Conceptual clustering of raw data based
on common properties Semantic guides
Example: Portal catalogs
PatManQL 4
Introductionroot
cameras& lenses
lenses
point & shoot negative
filters film
35mmSLR
APSprinters
cameras
UV
PLCloseUp
digital
slide
b&w
brand model ppm
hp 3820 12
hp 7350 17
hp 6122 20brand model price
Canon EOS-3 990
Nikon N65 300
Pentax ZX-M 350
root
general
filterstripods
B&H
bags
printers
cameras
digitalphotographyphoto
memorycards
scanners
filmscanners
flatbedscanners
35mmsystems
SLRcameras
lenses
otherformats
APSmedium
adorama
... ... ...
... ... ...
brand cam_model
Canon 50 EOS-3
Canon 80 EOS-3
400
450
... ... ... ...
pricefocald
Sigma 28 N65 150
(a)
(b)
Paths → alternative pattern versions for the same group of data
Example: searching for lenses /cameras & lenses/lenses (adorama) /photo/35mm systems/lenses (B&H)
PatManQL 5
Introductionroot
cameras& lenses
lenses
point & shoot negative
filters film
35mmSLR
APSprinters
cameras
UV
PLCloseUp
digital
slide
b&w
brand model ppm
hp 3820 12
hp 7350 17
hp 6122 20brand model price
Canon EOS-3 990
Nikon N65 300
Pentax ZX-M 350
root
general
filterstripods
B&H
bags
printers
cameras
digitalphotographyphoto
memorycards
scanners
filmscanners
flatbedscanners
35mmsystems
SLRcameras
lenses
otherformats
APSmedium
adorama
... ... ...
... ... ...
brand cam_model
Canon 50 EOS-3
Canon 80 EOS-3
400
450
... ... ... ...
pricefocald
Sigma 28 N65 150
(a)
(b)
Paths → complex pattern Example: searching for integrated photo
systems /cameras & lenses/35mm SLR
(adorama) /photo/35mm systems/lenses (B&H)
PatManQL 6
Contribution
A model to represent paths as knowledge artifacts
The PatManQL language: Operators to manipulate path-like
patterns Relational operators for data
A prototype
PatManQL 7
Catalog Schema
A tree with: a root (⊗) a set of non-leaf nodes () a set of resource items as leaves (□)
Data: instances (records) of resource item Resource item: Relation R(a1, a2, …,
an), where a1, a2, … attributes
PatManQL 8
Catalog SchemaX
cameras& lenses
lenses
point & shootnegative
filters film
35mmSLR
APSprinters
cameras
UV
PL
digital
slide
2 10SLR cameras1 Digital printers
brandmodelprice
Canon EOS-3 990
Nikon N65 205
Pentax ZX-M 148.50
hp 3820 12
hp 7350 17
hp 6122 20
4 87 95
brandmodelppm
Hierarchy
Resource items
Data
Cata
log
sch
em
a
PatManQL 9
Tree-Structure Relations (TSRs) Combining catalog schemas with common
resource item Tree-Structure Relation (AND/OR-like
graph): One resource item Paths organized in OR components
OR component: group of one or more paths (AND group)
OR components are alternative ways to access the common resource item
Paths = patterns
PatManQL 10
Tree-Structure Relations (TSRs)
X
35mmSLR
photo
35mmsystems
(a)model
price
brand
SLR cameras
X
35mmSLR
photo
lenses
(b)
model
price
brand
SLR systems
photocamera &lenses
cameras
photo
35mmsystemsbodies
lens_id
OR #1OR #2
OR #2OR #1
PatManQL 11
Operators
Select (σ) σ<attribute condition><path condition> (TSR)
attribute condition: {=, ≠, <} path condition: {=, ≠, , }
Filters instances of resource items and OR components
PatManQL 12
Select example'Select all non Pentax cameras with price greater than200Euros, having "/photo/35mm systems" in their paths':
σ<brand !="Pentax", price > 200><"/photo/35mm systems" $_>(SLR systems)
(a) SLR systems (b) SLR systems
brand model price
Canon EOS-3 990
Nikon N65 205
Pentax ZX-M 148.5
... ... ...
X
35mmSLR
photo
lenses
photo
photo
35mmsystems
bodies
X
photo
35mmsystems
lens_id
1
2
3
...
brand model price
Canon EOS-3 990
Nikon N65 205
... ... ...
lens_id
1
2
...
PatManQL 13
Operators
Project (π) π<attribute list><variable list> (TSR)
attribute list: {attribute} variable list: {$i (path variable),
#i (OR variable)} Keeps attributes of resource item and
paths of each OR component or OR components on the whole
PatManQL 14
Project example'Cameras with only the model and lens_id attributes andthe rightmost component':
π<model, lens_id><#2>(SLR systems)
(a) SLR systems (b) SLR systems
brand model price
Canon EOS-3 990
Nikon N65 205
Pentax ZX-M 148.5
... ... ...
X
35mmSLR
photo
lenses
photo
photo
35mmsystemsbodies
X
photo
35mmsystems
lens_id
1
2
2
...
model
EOS-3
N65
ZX-M
... ... ...
lens_id
1
2
2
PatManQL 15
Operators
Cartesian product (X) (ΤSR1) Χ (TSR2) Combine instances of resources and
OR components
PatManQL 16
Cartesian product example(SLR systems) X (Lenses)
(a) SLR systems (b) Lenses
cbrand cmodel cprice
Canon EOS-3 990
Nikon N65 205
Pentax ZX-M 148.5
... ... ...
X
35mmSLR
photo
lenses
photo
photo
35mmsystemsbodies
X
camera &lenses
lenses
clensid
1
1
2
...
lensid
1
2
...
lprice
200
100
...
lbrand
SigmaTamron
...
(c) SLR systems
cbrand cmodel cprice
Canon EOS-3 990
Pentax ZX-M 148.5
... ... ...
X
35mmSLR
photo
lenses
photo
photo
35mmsystems
bodies
camera &lenses
lenses
clensid
1
2
...
lensid
1
...
lprice
200
...
lbrand
Sigma
...
Canon EOS-3 990 1
Nikon N65 205 1
Nikon N65 205 1
Pentax ZX-M 148.5 2
1 200Sigma
1 200Sigma
2 100Tamron
2 100Tamron
2 100Tamron
camera &lenses
lenses
X =
PatManQL 17
Operators Union (U)
(TSR) U (TSR) Union of instances and all OR components
Intersection (∩) (TSR) ∩ (TSR) Intersection of instances and all OR components
Difference (–) (ΤSR) – (TSR) Instances of the first TSR not present in the
second one and all OR components of the first TSR
PatManQL 18
Union example(SLR systems) U (SLR systems)
(a) SLR systems
cbrand cmodel cprice
Canon EOS-3 990
Nikon N65 205
Pentax ZX-M 148.5
... ... ...
X
35mmSLR
photo
lenses
photo
bodies
clensid
1
1
2
...
(c)(b) SLR systems
Canon EOS-3 990
Nikon FM2 800
Pentax ZX-M 148.5
... ... ...
X
photo
35mmsystems
1
1
2
...
SLR systems
Canon EOS-3 990
Nikon N65 205
Pentax ZX-M 148.5
... ... ...
X
35mmSLR
photo
lenses
photo
photo
35mmsystemsbodies
1
1
2
...
Nikon FM2 800 1
U =
cbrand cmodel cprice clensid cbrand cmodel cprice clensid
PatManQL 19
Prototype
Interpreter Query Execution Engine Storage mechanism
XML files MySQL RDBMS
All-edges-in-one-table storage approach
Graphical Interface
PatManQL 20
Related work Pattern management (PANDA project) (S. Rizzi et al.) Inductive databases framework (Tomasz Imielinski et
al.) DMQL (Jiawei Han et al.), MINE RULE(R.Meo et al.)
Descriptive rules Tree algebras
TAX (H. V. Jagadish et al.) Selecting – reconstructing bulk XML data
YAT (V. Christophides et al.) Tuple-based, not tree-based
PatManQL 21
Conclusion
A model to represent paths as knowledge artifacts (patterns) Catalog schema Tree-Structure Relations (TSRs)
The PatManQL language: Operators to manipulate paths as
patterns and data A prototype system
PatManQL 24
Tree-Structure Relations (TSRs)
X
35mmSLR
photo
35mmsystems
(a)model
price
brand
SLR cameras
X
35mmSLR
photo
lenses
(b)
model
price
brand
SLR systems
photocamera &lenses
cameras
photo
35mmsystemsbodies
lens_id
$1
$1$1
$1
$2
PatManQL 25
Storage mechanismX
35mmSLR
photo
lenses
model
price
brand
SLR systems
photo
photo
35mmsystemsbodies
lens_id
XML file<tsr name="SLR systems">
<or><and>/photo/35mm
SLR/bodies</and><and>/photo/lenses</and>
</or><or>
<and>/photo/35mm systems</and></or><item>
<attribute name="brand" type="…"/><attribute name="model" type="…"/>…<tuple>…</tuple>…
</item></tsr>
PatManQL 26
Storage mechanismX
35mmSLR
photo
lenses
model
price
brand
SLR systems
photo
photo
35mmsystemsbodies
lens_id
Database
brand model price lens_id
… … … …
tid orid andid path
1 1 1 /photo/35mm SLR/bodies
1 1 2 /photo/lenses
1 2 1 /photo/35mm systems
tid name file
1 SLR systems portal.xml
PatManQL 27
Catalog Schemas examplesroot
cameras& lenses
lenses
point & shoot negative
filters film
35mmSLR
APSprinters
cameras
UV
PLCloseUp
digital
slide
b&w
brand model ppm
hp 3820 12
hp 7350 17
hp 6122 20brand model price
Canon EOS-3 990
Nikon N65 300
Pentax ZX-M 350
root
general
filterstripods
B&H
bags
printers
cameras
digitalphotographyphoto
memorycards
scanners
filmscanners
flatbedscanners
35mmsystems
SLRcameras
lenses
otherformats
APSmedium
adorama
... ... ...
... ... ...
brand cam_model
Canon 50 EOS-3
Canon 80 EOS-3
400
450
... ... ... ...
pricefocald
Sigma 28 N65 150
(a)
(b)
PatManQL 28
Catalog Schema Manipulation SLR integrated systems from X – fig. (a) SLR cameras from Adorama – fig. (b) Lenses from B&H – fig. (c) Scenario for X:
New lenses out in the market Lenses provided by B&H,
that fit in Canon bodies provided by Adorama Above SLR systems not present in her stock
PatManQL 29
Catalog Schema Manipulation
(b)
SLR cameras
X
camera &lenses
35mmSLR
cbrand cmodel cprice
Canon EOS-3 990
Nikon N65 300
Pentax ZX-M 350
... ... ...
lbrand cam_model
Canon 50 EOS-3
Canon 80 EOS-3
400
450
... ... ... ...
lpricefocald
Sigma 28 N65 150
Lenses
X
35mmsystems
photo
lenses
(c)
SLR systems
X
photo
35mmsystems
cbrand cmodel cprice
Canon EOS-3 990
Nikon N65 205
... ... ...
lmodel
100
340
...
(a)
lmodel
100
110
340
lprice
390
160
...
PatManQL 30
Catalog Schema Manipulation Systems with Canon bodies from Adorama and lenses
from B&H – fig. (d): q1 = π<cbrand,cmodel,lmodel><>
(σ<cmodel=cam_model, cbrand="Canon"><>
((SLR cameras) X (lenses))) Systems with Canon bodies from Adorama and lenses
from B&H which are not in X's catalog – fig. (e): q2 = (q1) – π<cbrand,cmodel,lmodel><>(SLR cameras)
Lenses only without the appropriate camera bodies – fig. (f): π<lmodel><$2>(q2)
PatManQL 31
Catalog Schema Manipulation
(d)X
camera &lenses
35mmSLR
35mmsystems
photo
lenses
SLR systems
cmodel lmodel
EOS-3
EOS-3
... ...
100
110
(e)X
camera &lenses
35mmSLR
35mmsystems
photo
lenses
SLR systems
cmodel lmodel
EOS-3
... ...
110
(f)X
35mmsystems
photo
lenses
Lensescbrand
Canon
Canon
...
cbrand
Canon
...lmodel
...
110
PatManQL 32
Prototype Architecture
Database
<root><tsr></tsr>
</root>
XML file
Query ExecutionEngine(QE)
tsr>
Interpreter
Database Manager(DM)
XML File Manager(XFM)
Graphic Result Interface(GRI)
PatManQL 33
XML File Manager (XFM)
OR compontents,paths and resource
items retrieval
XML file
OR compontents,paths and resource
items storage
TSR
TSR
Interpreter
Graphic ResultInterface
PatManQL 34
Database Manager (DM)
OR compontents,paths and resource
items retrieval
Database
OR compontets,paths and resource
items storage
TSR
TSR
Interpreter
Graphic ResultInterface
PatManQL 35
Query Execution Engine (QE)
TSR
Paths constructionand OR
componentscreation
Resource itemconstruction
Attributes andresords
TSRParameters list
Interpreter Interpreter
PatManQL 36
Interpreter
Parser
Print errormessage
Parametercollection
XML FileManager
DatabaseManager
TSR
Queryparameters Query Execution
Engine
DatabaseManager
Graphic ResultInterface
TSRQuery
Error message