15
Towards OLAP Query Reformulation in P2P Data Warehousing Matteo Golfarelli Wilma Penzo Stefano Rizzi Elisa Turricchia (University of Bologna - Italy) Federica Mandreoli (University of Modena and Reggio Emilia - Italy) 2 DOLAP 2010 - Toronto, Oct. 30 2010 Summary p Motivating scenario p Envisioned architecture p Mapping language p A reformulation framework p Summary and future work

DOLAP10 [modalità compatibilità]bias.csr.unibo.it/golfarelli/papers/DOLAP2010-Slides.pdf · gender segment city region category diagnosis ADMISSIONS totStayCost totExamCost totLength

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: DOLAP10 [modalità compatibilità]bias.csr.unibo.it/golfarelli/papers/DOLAP2010-Slides.pdf · gender segment city region category diagnosis ADMISSIONS totStayCost totExamCost totLength

Towards OLAP Query Reformulation in P2P Data Warehousing

Matteo GolfarelliWilma PenzoStefano RizziElisa Turricchia

(University of Bologna - Italy)Federica Mandreoli

(University of Modena and Reggio Emilia - Italy)

2DOLAP 2010 - Toronto, Oct. 30 2010

Summary

p Motivating scenariop Envisioned architecturep Mapping languagep A reformulation frameworkp Summary and future work

Page 2: DOLAP10 [modalità compatibilità]bias.csr.unibo.it/golfarelli/papers/DOLAP2010-Slides.pdf · gender segment city region category diagnosis ADMISSIONS totStayCost totExamCost totLength

3DOLAP 2010 - Toronto, Oct. 30 2010

From BI 1.0 to BI 2.0

p Business intelligence (BI) transformed the role of computer science in companies from a technology for storing data into a discipline for timely detecting key business factors and effectively solving strategic decisional problems

p In the current changeable and unpredictable market scenarios, the needs of decision makers are rapidly evolving

p To meet the new, more sophisticated user needs, a new generation of BI systems (BI 2.0) has been emerging

4DOLAP 2010 - Toronto, Oct. 30 2010

Issues in BI 2.0

p Pervasive BIp On-demand BIp Real-time BIp Situational BIp Collaborative BIp BI as a servicep ....

Page 3: DOLAP10 [modalità compatibilità]bias.csr.unibo.it/golfarelli/papers/DOLAP2010-Slides.pdf · gender segment city region category diagnosis ADMISSIONS totStayCost totExamCost totLength

5DOLAP 2010 - Toronto, Oct. 30 2010

Motivating scenario

p Cooperation is seen today by companies as one of the major means for increasing flexibility and innovating so as to survive in today uncertain and changing market

p Companies need strategic information about the outer world, for instance about trading partners and related business areas

p It is estimated that above 80% of waste in inter-company and supply-chain processes is due to a lack of communication between the companies involved

6DOLAP 2010 - Toronto, Oct. 30 2010

Motivating scenario

p A collaborative context where multiple partner companies organize and coordinate themselves to share opportunities, respecting their own autonomy but pursuing a common goal3 Traditional stand-alone BI systems are not

sufficient for decision making3 Users need to transparently and uniformly access

information scattered across several heterogeneous BI platforms3 Information must be found through a semantic

process and integrated on the fly

Page 4: DOLAP10 [modalità compatibilità]bias.csr.unibo.it/golfarelli/papers/DOLAP2010-Slides.pdf · gender segment city region category diagnosis ADMISSIONS totStayCost totExamCost totLength

7DOLAP 2010 - Toronto, Oct. 30 2010

Envisioned architecture

p Only a few works in the literature are focused on strategies for data warehouse integration and federation

p Problems related to data heterogeneity are usually solved by ETL processes that load data into a single multidimensional repository...3 ...but a centralized architecture is hardly feasible in the

context of a BINp Peer Data Management Systems (PDMSs) have

been proposed as architectures to support sharing of operational data across networks of peers while guaranteeing peers’ autonomy, based on interlinked semantic mappings that mediate between the heterogeneous schemata exposed by peers

8DOLAP 2010 - Toronto, Oct. 30 2010

Envisioned architecture

p Business Intelligence Network (BIN):a dynamic, collaborative network of peers, each hosting a local, autonomous BI platform1. Each peer relies on a local multidimensional schema that

represents the peer's view of the business, and it offers monitoring and decision support functionalities to the other peers

2. Users transparently access business information distributed over the network in a pervasive and personalized fashion

3. Access is secure, depending on the access control and privacy policies adopted by each peer

4. Participants are collaborative, even if with different grades5. Inclination to collaboration does not reduce autonomy of

participants, who are not subject to a shared schema6. A BIN is decentralized and scalable because the number of

participants, the complexity of business models, and the workload can change

Page 5: DOLAP10 [modalità compatibilità]bias.csr.unibo.it/golfarelli/papers/DOLAP2010-Slides.pdf · gender segment city region category diagnosis ADMISSIONS totStayCost totExamCost totLength

9DOLAP 2010 - Toronto, Oct. 30 2010

Envisioned architecture

peer i

queryforwarding

queryreformulation

query resultreconcil iation

query dispatchingpeer N

peer 1

BusinessIntell igence

Networklocal BI platform

user interface

multidimensional database

semanticmappings

peer j

queryforwarding

queryreformulation

query resultreconcil iation

query dispatching

local BI platform

user interface

multidimensional database

semanticmappings

10DOLAP 2010 - Toronto, Oct. 30 2010

Envisioned architecture

peer i

queryforwarding

queryreformulation

query resultreconcil iation

query dispatchingpeer N

peer 1

BusinessIntell igence

Networklocal BI platform

user interface

multidimensional database

semanticmappings

peer j

queryforwarding

queryreformulation

query resultreconcil iation

query dispatching

local BI platform

user interface

multidimensional database

semanticmappings

Page 6: DOLAP10 [modalità compatibilità]bias.csr.unibo.it/golfarelli/papers/DOLAP2010-Slides.pdf · gender segment city region category diagnosis ADMISSIONS totStayCost totExamCost totLength

11DOLAP 2010 - Toronto, Oct. 30 2010

Envisioned architecture

peer i

queryforwarding

queryreformulation

query resultreconcil iation

query dispatchingpeer N

peer 1

BusinessIntell igence

Networklocal BI platform

user interface

multidimensional database

semanticmappings

peer j

queryforwarding

queryreformulation

query resultreconcil iation

query dispatching

local BI platform

user interface

multidimensional database

semanticmappings

12DOLAP 2010 - Toronto, Oct. 30 2010

Envisioned architecture

peer i

queryforwarding

queryreformulation

query resultreconcil iation

query dispatchingpeer N

peer 1

BusinessIntell igence

Networklocal BI platform

user interface

multidimensional database

semanticmappings

peer j

queryforwarding

queryreformulation

query resultreconcil iation

query dispatching

local BI platform

user interface

multidimensional database

semanticmappings

Page 7: DOLAP10 [modalità compatibilità]bias.csr.unibo.it/golfarelli/papers/DOLAP2010-Slides.pdf · gender segment city region category diagnosis ADMISSIONS totStayCost totExamCost totLength

13DOLAP 2010 - Toronto, Oct. 30 2010

Envisioned architecture

peer i

queryforwarding

queryreformulation

query resultreconcil iation

query dispatchingpeer N

peer 1

BusinessIntell igence

Networklocal BI platform

user interface

multidimensional database

semanticmappings

peer j

queryforwarding

queryreformulation

query resultreconcil iation

query dispatching

local BI platform

user interface

multidimensional database

semanticmappings

14DOLAP 2010 - Toronto, Oct. 30 2010

Research issues

p Query reformulation on peersp Query routing strategies to forward queries to the most

promising peers onlyp Advanced approaches for securityp Mechanisms for controlling data provenance and

qualityp Data lineage to help users understand how data have

been transformedp Object fusion techniques to integrate returned data

Page 8: DOLAP10 [modalità compatibilità]bias.csr.unibo.it/golfarelli/papers/DOLAP2010-Slides.pdf · gender segment city region category diagnosis ADMISSIONS totStayCost totExamCost totLength

15DOLAP 2010 - Toronto, Oct. 30 2010

Mapping language

p Handling the asymmetry between dimensions and measures

p Specifying the relationship between two attributes of different multidimensional schemata in terms of their granularity

p Considering aggregation operators to avoid the risk of inconsistent query reformulations

p Expressing also mappings at the instance level to transcode data

16DOLAP 2010 - Toronto, Oct. 30 2010

HOSPITALIZATION

costdurationOfStay

ward

unit

patientbirthDate

gender

segment

city region

diagnosiscategory

ADMISSIONS

totStayCosttotExamCost

totLengthnumAdmissions

ward

LHD

patientBirthYear

patientGender

patientCity patientNation

datemonthyear

disease

@Rome

@Florence

datemonthyear

week

organ

Mapping languagep Health-care example

Page 9: DOLAP10 [modalità compatibilità]bias.csr.unibo.it/golfarelli/papers/DOLAP2010-Slides.pdf · gender segment city region category diagnosis ADMISSIONS totStayCost totExamCost totLength

17DOLAP 2010 - Toronto, Oct. 30 2010

Mapping language

HOSPITALIZATION

costdurationOfStay

ward

unit

patientbirthDate

gender

segment

city region

category

ADMISSIONS

totStayCosttotExamCost

totLengthnumAdmissions

ward

LHD

patientBirthYear

patientGender

patientCity patientNation

monthyear

disease

@Rome

datemonthyear

week

organ

@Florence

date

diagnosis

same

roll-up

equi-level

equi-level

p Health-care example

drill-down

18DOLAP 2010 - Toronto, Oct. 30 2010

Mapping language

HOSPITALIZATION

costdurationOfStay

ward

unit

patientbirthDate

gender

segment

city region

category

ADMISSIONS

totStayCosttotExamCost

totLengthnumAdmissions

ward

LHD

patientBirthYear

patientGender

patientCity patientNation

monthyear

disease

@Rome

datemonthyear

week

organ

@Florence

date

diagnosis

same

roll-up

equi-level

equi-level

p Health-care examplemappings can be annotated with

encoding functions

drill-down

Rome.cost = Florence.totStayCost+Florence.totExamCost

Page 10: DOLAP10 [modalità compatibilità]bias.csr.unibo.it/golfarelli/papers/DOLAP2010-Slides.pdf · gender segment city region category diagnosis ADMISSIONS totStayCost totExamCost totLength

19DOLAP 2010 - Toronto, Oct. 30 2010

Mapping language

HOSPITALIZATION

costdurationOfStay

ward

unit

patientbirthDate

gender

segment

city region

category

ADMISSIONS

totStayCosttotExamCost

totLengthnumAdmissions

ward

LHD

patientBirthYear

patientGender

patientCity patientNation

monthyear

disease

@Rome

datemonthyear

week

organ

@Florence

date

diagnosis

same

roll-up

equi-level

equi-level

p Health-care examplemappings can be annotated with

encoding functions

drill-down

Rome.disease = substring(Florence.diagnosis, 1, 20)Rome.organ = substring(Florence.diagnosis, 21, 40)

20DOLAP 2010 - Toronto, Oct. 30 2010

Mapping language

HOSPITALIZATION

costdurationOfStay

ward

unit

patientbirthDate

gender

segment

city region

category

ADMISSIONS

totStayCosttotExamCost

totLengthnumAdmissions

ward

LHD

patientBirthYear

patientGender

patientCity patientNation

monthyear

disease

@Rome

datemonthyear

week

organ

@Florence

date

diagnosis

same

roll-up

equi-level

equi-level

p Health-care examplemappings can be annotated with

encoding functions

drill-downRome.region = regionOf(Florence.patientCity)

Page 11: DOLAP10 [modalità compatibilità]bias.csr.unibo.it/golfarelli/papers/DOLAP2010-Slides.pdf · gender segment city region category diagnosis ADMISSIONS totStayCost totExamCost totLength

21DOLAP 2010 - Toronto, Oct. 30 2010

Query Reformulation Framework

p Reformulation takes in input an OLAP query on a target schema t and the mappings between t and the schema of one of its neighbors, the source schema s, and it outputs an OLAP query that refers only to s

p In our approach, queries are reformulated by means of the underlying relational (star) schema

22DOLAP 2010 - Toronto, Oct. 30 2010

Md-Schema@peeri Md-Schema@peerj

Semantic Mappings

Mappingtranslation

Schematranslation

Schematranslation

s-t tgds

OLAP Query

Relational Query

Querytranslation

Query Reformulation Framework

p To translate semantic mappings we use a logical formalism called source-to-target tuple generating dependencies

Page 12: DOLAP10 [modalità compatibilità]bias.csr.unibo.it/golfarelli/papers/DOLAP2010-Slides.pdf · gender segment city region category diagnosis ADMISSIONS totStayCost totExamCost totLength

23DOLAP 2010 - Toronto, Oct. 30 2010

Example: Schema translation

ward

unit

diagnosiscategory

ADMISSIONS

totStayCosttotExamCost

totLengthnumAdmissions

patientBirthYear

patientGender

patientCity patientNation

datemonthyear

@Florence

HOSPITALIZATION

costdurationOfStay

patientbirthDate

gender

segment

city region

ward

LHD

disease

@Rome

datemonthyear

week

organ

HospFT(organ,disease,date,ward,patient,cost,durationOfStay)OrganDT(organ)DiseaseDT(disease)DateDT(date,week,month,year)WardDT(ward,LHD)PatientDT(patient,birthDate,city,region,segment,gender)

AdmFT(diagnosis,date,ward,patientCity,patientBirthYear,patientGender,totStayCost,totExamCost,totLength,numAdmissions)

DiagnosisDT(diagnosis,category)DateDT(date,month,year)WardDT(ward,unit)PatientCityDT(patientCity,patientNation)PatientBirthYearDT(patientBirthYear)PatientGenderDT(patientGender)

24DOLAP 2010 - Toronto, Oct. 30 2010

Example: Query translation

p Total hospitalization costs for region and yearπregion,year,SUM(cost) (HospFT DateDT PatientDT)

q(R,Y,SUM(C)) ←HospFT(_,_,D,_,P,C,_),DateDT(D,_,_,Y),PatientDT(P,_,_,R,_,_))

HOSPITALIZATION

costdurationOfStay

patientbirthDate

gender

segment

city region

ward

LHD

disease

@Rome

datemonthyear

week

organ

Page 13: DOLAP10 [modalità compatibilità]bias.csr.unibo.it/golfarelli/papers/DOLAP2010-Slides.pdf · gender segment city region category diagnosis ADMISSIONS totStayCost totExamCost totLength

25DOLAP 2010 - Toronto, Oct. 30 2010

Example: Mapping translation

∀S,E,C (AdmFT(_,...,S,E,_,_), C=S+E→HospFT(_,...,C,_)

HOSPITALIZATION

costdurationOfStay

ward

unit

patientbirthDate

gender

segment

city region

category

ADMISSIONS

totStayCosttotExamCost

totLengthnumAdmissions

ward

LHD

patientBirthYear

patientGender

patientCity patientNation

monthyear

disease

@Rome

datemonthyear

week

organ

@Florence

date

diagnosis

same

26DOLAP 2010 - Toronto, Oct. 30 2010

Example: Reformulation

p The group-by is reformulated using the roll-upmapping from region to patientCity, while measure cost is derived using the samemapping

πyear,regionOf(patientCity),SUM(totStayCost+totExamCost)(AdmFT DateDT PatientCityDT)

ward

unit

diagnosiscategory

ADMISSIONS

totStayCosttotExamCost

totLengthnumAdmissions

patientBirthYear

patientGender

patientCity patientNation

datemonthyear

@Florence

Page 14: DOLAP10 [modalità compatibilità]bias.csr.unibo.it/golfarelli/papers/DOLAP2010-Slides.pdf · gender segment city region category diagnosis ADMISSIONS totStayCost totExamCost totLength

27DOLAP 2010 - Toronto, Oct. 30 2010

Summary

pWe have outlined a peer-to-peer architecture for supporting distributed and collaborative decision-making scenarios

pWe have shown how an OLAP query formulated on one peer can be reformulated on a different peer, based on a set of inter-peer semantic mappings

28DOLAP 2010 - Toronto, Oct. 30 2010

Future work

p Adopting MDX as the language for expressing multidimensional queries and closing the reformulation query language

p Devising multidimensional-aware object fusion techniques for integrating data returned by different peers

p Finding smart algorithms for routing queries to the most “promising” peers in the BI network

Page 15: DOLAP10 [modalità compatibilità]bias.csr.unibo.it/golfarelli/papers/DOLAP2010-Slides.pdf · gender segment city region category diagnosis ADMISSIONS totStayCost totExamCost totLength

29DOLAP 2010 - Toronto, Oct. 30 2010

Some advertising...

30DOLAP 2010 - Toronto, Oct. 30 2010

Thank you for you attention

Questions?