22
1 Modeling and Language Support for the management of PBMS Manolis Terrovitis Panos Vassiliadis Spiros Skiadopoulos Elisa Bertino Barbara Catania Anna Maddalena

1 Modeling and Language Support for the management of PBMS Manolis Terrovitis Panos Vassiliadis Spiros Skiadopoulos Elisa Bertino Barbara Catania Anna

  • View
    216

  • Download
    0

Embed Size (px)

Citation preview

1

Modeling and Language Support for the management of PBMS

Manolis TerrovitisPanos Vassiliadis

Spiros SkiadopoulosElisa Bertino

Barbara CataniaAnna Maddalena

2

Outline

IntroductionModeling of data and patternsQuery operatorsSummary and future work

3

Motivation

Huge amounts of data are produced.Interesting knowledge has to be detected

and extracted.Knowledge extraction techniques (i.e.,

Data Mining) are not sufficient: Huge amounts of results (clusters, association

tules, decision trees etc) Arbitrary modeling of results

4

Motivation (con’t)

We need to be able to manipulate the knowledge discovered!

The basic requirements: A generic and homogenous model for patterns. Well defined query operators. Efficient storage.

5

The Patterns and PBMS [Rizzi et. al. ER 2003]

Patterns are compact and rich in semantics representations of raw data. Clusters, association rules, decision trees e.t.c.

Pattern Base Management System Patterns are treated as first class citizens Pattern-based queries Approximate mapping between patterns and

raw data

6

Contributions

We formally define the logical foundations for pattern management

We present a pattern specification language

We introduce queries and query operators

7

Outline

IntroductionModeling of data and patternsQuery operatorsSummary and future work

8

PBMS architecture

Pattern Space: Pattern Types Pattern Classes Patterns

Intermediate Results

Data Space

PatternClasses

PatternTypes

Patterns

Member of Instanceof

DataMining

Algorithms

PatternRecognitionAlgorithms

DataSpace

PatternSpace

IntermediateMappings

DB1 DB2

9

The patterns

Patterns hold information for: the data source the structure of the pattern The relation between the structure and the

source, in an approximate logical formula.

10

Pattern - Cluster Example

Pid 337

Structure [CENTER: [X: 21, Y: 1200], RAD: 12 ]

Data EMP: {[Age, Salary]}

Formula (t.Age - 21)2 + (t.Salary - 1200)2 ≤ 12 2 where t EMP

11

Pattern type - example

Name Disk

Structure Schema [CENTER: [X:real, Y: real], RAD: real ]

Data Schema REL: {[X: real, Y: real]}

Formula Schema (t.X - CENTER.X)2 + (t.Y - CENTER.Y )2 ≤RAD2

where t REL

12

The formula

An intentional description of the pattern-data relation pros:

Efficiency, more intuitive results cons:

Accuracy

13

Intentional vs. Extensional

Salary

30

30 31

AGE

14

The formula (con’t)

The formula is a predicate:

fp(x,y) where x Source,y Structure Expressiveness.

Functions and predicates

Safety. Range restriction.

Queries employing the formula are n-depth domain independent.

15

Outline

IntroductionModeling of data and patternsQuery operatorsSummary and future work

16

Query Operators

Query operator classes: Database operators Pattern Base operators Crossover database operators Crossover pattern base operators

17

Crossover Operators

PID

data

structure

formula

PatternSpace

DataSpace

Exact

Exact

Approximation

Exact evaluation, via the intermediate mappings

Approximate evaluation, via the formula

18

Crossover Operators

Database Drill-Through: Which data are represented by

these patterns? Data-Covering: Which data from this dataset

can be represented by this pattern?

Pattern Base Pattern-Covering: Which of these patterns

represent this dataset?

19

Query Example

Salary

AGE p

q

Drill-through(

{ p |

p intersects q})

20

Outline

IntroductionModeling of data and patternsQuery OperatorsSummary and future work

21

Summary

Formal specification of basic PBMS concepts

Investigation on the representation of the pattern-data relation

Formal definition of query operators

22

Future Work

Query languageGeneric similarity measuresEfficient implementation of intermediate

mappingsStatistical measures for the patterns.