Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Towards OLAP Query Reformulation in P2P Data Warehousing
Matteo GolfarelliWilma PenzoStefano RizziElisa Turricchia
(University of Bologna - Italy)Federica Mandreoli
(University of Modena and Reggio Emilia - Italy)
2DOLAP 2010 - Toronto, Oct. 30 2010
Summary
p Motivating scenariop Envisioned architecturep Mapping languagep A reformulation frameworkp Summary and future work
3DOLAP 2010 - Toronto, Oct. 30 2010
From BI 1.0 to BI 2.0
p Business intelligence (BI) transformed the role of computer science in companies from a technology for storing data into a discipline for timely detecting key business factors and effectively solving strategic decisional problems
p In the current changeable and unpredictable market scenarios, the needs of decision makers are rapidly evolving
p To meet the new, more sophisticated user needs, a new generation of BI systems (BI 2.0) has been emerging
4DOLAP 2010 - Toronto, Oct. 30 2010
Issues in BI 2.0
p Pervasive BIp On-demand BIp Real-time BIp Situational BIp Collaborative BIp BI as a servicep ....
5DOLAP 2010 - Toronto, Oct. 30 2010
Motivating scenario
p Cooperation is seen today by companies as one of the major means for increasing flexibility and innovating so as to survive in today uncertain and changing market
p Companies need strategic information about the outer world, for instance about trading partners and related business areas
p It is estimated that above 80% of waste in inter-company and supply-chain processes is due to a lack of communication between the companies involved
6DOLAP 2010 - Toronto, Oct. 30 2010
Motivating scenario
p A collaborative context where multiple partner companies organize and coordinate themselves to share opportunities, respecting their own autonomy but pursuing a common goal3 Traditional stand-alone BI systems are not
sufficient for decision making3 Users need to transparently and uniformly access
information scattered across several heterogeneous BI platforms3 Information must be found through a semantic
process and integrated on the fly
7DOLAP 2010 - Toronto, Oct. 30 2010
Envisioned architecture
p Only a few works in the literature are focused on strategies for data warehouse integration and federation
p Problems related to data heterogeneity are usually solved by ETL processes that load data into a single multidimensional repository...3 ...but a centralized architecture is hardly feasible in the
context of a BINp Peer Data Management Systems (PDMSs) have
been proposed as architectures to support sharing of operational data across networks of peers while guaranteeing peers’ autonomy, based on interlinked semantic mappings that mediate between the heterogeneous schemata exposed by peers
8DOLAP 2010 - Toronto, Oct. 30 2010
Envisioned architecture
p Business Intelligence Network (BIN):a dynamic, collaborative network of peers, each hosting a local, autonomous BI platform1. Each peer relies on a local multidimensional schema that
represents the peer's view of the business, and it offers monitoring and decision support functionalities to the other peers
2. Users transparently access business information distributed over the network in a pervasive and personalized fashion
3. Access is secure, depending on the access control and privacy policies adopted by each peer
4. Participants are collaborative, even if with different grades5. Inclination to collaboration does not reduce autonomy of
participants, who are not subject to a shared schema6. A BIN is decentralized and scalable because the number of
participants, the complexity of business models, and the workload can change
9DOLAP 2010 - Toronto, Oct. 30 2010
Envisioned architecture
peer i
queryforwarding
queryreformulation
query resultreconcil iation
query dispatchingpeer N
peer 1
BusinessIntell igence
Networklocal BI platform
user interface
multidimensional database
semanticmappings
peer j
queryforwarding
queryreformulation
query resultreconcil iation
query dispatching
local BI platform
user interface
multidimensional database
semanticmappings
10DOLAP 2010 - Toronto, Oct. 30 2010
Envisioned architecture
peer i
queryforwarding
queryreformulation
query resultreconcil iation
query dispatchingpeer N
peer 1
BusinessIntell igence
Networklocal BI platform
user interface
multidimensional database
semanticmappings
peer j
queryforwarding
queryreformulation
query resultreconcil iation
query dispatching
local BI platform
user interface
multidimensional database
semanticmappings
11DOLAP 2010 - Toronto, Oct. 30 2010
Envisioned architecture
peer i
queryforwarding
queryreformulation
query resultreconcil iation
query dispatchingpeer N
peer 1
BusinessIntell igence
Networklocal BI platform
user interface
multidimensional database
semanticmappings
peer j
queryforwarding
queryreformulation
query resultreconcil iation
query dispatching
local BI platform
user interface
multidimensional database
semanticmappings
12DOLAP 2010 - Toronto, Oct. 30 2010
Envisioned architecture
peer i
queryforwarding
queryreformulation
query resultreconcil iation
query dispatchingpeer N
peer 1
BusinessIntell igence
Networklocal BI platform
user interface
multidimensional database
semanticmappings
peer j
queryforwarding
queryreformulation
query resultreconcil iation
query dispatching
local BI platform
user interface
multidimensional database
semanticmappings
13DOLAP 2010 - Toronto, Oct. 30 2010
Envisioned architecture
peer i
queryforwarding
queryreformulation
query resultreconcil iation
query dispatchingpeer N
peer 1
BusinessIntell igence
Networklocal BI platform
user interface
multidimensional database
semanticmappings
peer j
queryforwarding
queryreformulation
query resultreconcil iation
query dispatching
local BI platform
user interface
multidimensional database
semanticmappings
14DOLAP 2010 - Toronto, Oct. 30 2010
Research issues
p Query reformulation on peersp Query routing strategies to forward queries to the most
promising peers onlyp Advanced approaches for securityp Mechanisms for controlling data provenance and
qualityp Data lineage to help users understand how data have
been transformedp Object fusion techniques to integrate returned data
15DOLAP 2010 - Toronto, Oct. 30 2010
Mapping language
p Handling the asymmetry between dimensions and measures
p Specifying the relationship between two attributes of different multidimensional schemata in terms of their granularity
p Considering aggregation operators to avoid the risk of inconsistent query reformulations
p Expressing also mappings at the instance level to transcode data
16DOLAP 2010 - Toronto, Oct. 30 2010
HOSPITALIZATION
costdurationOfStay
ward
unit
patientbirthDate
gender
segment
city region
diagnosiscategory
ADMISSIONS
totStayCosttotExamCost
totLengthnumAdmissions
ward
LHD
patientBirthYear
patientGender
patientCity patientNation
datemonthyear
disease
@Rome
@Florence
datemonthyear
week
organ
Mapping languagep Health-care example
17DOLAP 2010 - Toronto, Oct. 30 2010
Mapping language
HOSPITALIZATION
costdurationOfStay
ward
unit
patientbirthDate
gender
segment
city region
category
ADMISSIONS
totStayCosttotExamCost
totLengthnumAdmissions
ward
LHD
patientBirthYear
patientGender
patientCity patientNation
monthyear
disease
@Rome
datemonthyear
week
organ
@Florence
date
diagnosis
same
roll-up
equi-level
equi-level
p Health-care example
drill-down
18DOLAP 2010 - Toronto, Oct. 30 2010
Mapping language
HOSPITALIZATION
costdurationOfStay
ward
unit
patientbirthDate
gender
segment
city region
category
ADMISSIONS
totStayCosttotExamCost
totLengthnumAdmissions
ward
LHD
patientBirthYear
patientGender
patientCity patientNation
monthyear
disease
@Rome
datemonthyear
week
organ
@Florence
date
diagnosis
same
roll-up
equi-level
equi-level
p Health-care examplemappings can be annotated with
encoding functions
drill-down
Rome.cost = Florence.totStayCost+Florence.totExamCost
19DOLAP 2010 - Toronto, Oct. 30 2010
Mapping language
HOSPITALIZATION
costdurationOfStay
ward
unit
patientbirthDate
gender
segment
city region
category
ADMISSIONS
totStayCosttotExamCost
totLengthnumAdmissions
ward
LHD
patientBirthYear
patientGender
patientCity patientNation
monthyear
disease
@Rome
datemonthyear
week
organ
@Florence
date
diagnosis
same
roll-up
equi-level
equi-level
p Health-care examplemappings can be annotated with
encoding functions
drill-down
Rome.disease = substring(Florence.diagnosis, 1, 20)Rome.organ = substring(Florence.diagnosis, 21, 40)
20DOLAP 2010 - Toronto, Oct. 30 2010
Mapping language
HOSPITALIZATION
costdurationOfStay
ward
unit
patientbirthDate
gender
segment
city region
category
ADMISSIONS
totStayCosttotExamCost
totLengthnumAdmissions
ward
LHD
patientBirthYear
patientGender
patientCity patientNation
monthyear
disease
@Rome
datemonthyear
week
organ
@Florence
date
diagnosis
same
roll-up
equi-level
equi-level
p Health-care examplemappings can be annotated with
encoding functions
drill-downRome.region = regionOf(Florence.patientCity)
21DOLAP 2010 - Toronto, Oct. 30 2010
Query Reformulation Framework
p Reformulation takes in input an OLAP query on a target schema t and the mappings between t and the schema of one of its neighbors, the source schema s, and it outputs an OLAP query that refers only to s
p In our approach, queries are reformulated by means of the underlying relational (star) schema
22DOLAP 2010 - Toronto, Oct. 30 2010
Md-Schema@peeri Md-Schema@peerj
Semantic Mappings
Mappingtranslation
Schematranslation
Schematranslation
s-t tgds
OLAP Query
Relational Query
Querytranslation
Query Reformulation Framework
p To translate semantic mappings we use a logical formalism called source-to-target tuple generating dependencies
23DOLAP 2010 - Toronto, Oct. 30 2010
Example: Schema translation
ward
unit
diagnosiscategory
ADMISSIONS
totStayCosttotExamCost
totLengthnumAdmissions
patientBirthYear
patientGender
patientCity patientNation
datemonthyear
@Florence
HOSPITALIZATION
costdurationOfStay
patientbirthDate
gender
segment
city region
ward
LHD
disease
@Rome
datemonthyear
week
organ
HospFT(organ,disease,date,ward,patient,cost,durationOfStay)OrganDT(organ)DiseaseDT(disease)DateDT(date,week,month,year)WardDT(ward,LHD)PatientDT(patient,birthDate,city,region,segment,gender)
AdmFT(diagnosis,date,ward,patientCity,patientBirthYear,patientGender,totStayCost,totExamCost,totLength,numAdmissions)
DiagnosisDT(diagnosis,category)DateDT(date,month,year)WardDT(ward,unit)PatientCityDT(patientCity,patientNation)PatientBirthYearDT(patientBirthYear)PatientGenderDT(patientGender)
24DOLAP 2010 - Toronto, Oct. 30 2010
Example: Query translation
p Total hospitalization costs for region and yearπregion,year,SUM(cost) (HospFT DateDT PatientDT)
q(R,Y,SUM(C)) ←HospFT(_,_,D,_,P,C,_),DateDT(D,_,_,Y),PatientDT(P,_,_,R,_,_))
HOSPITALIZATION
costdurationOfStay
patientbirthDate
gender
segment
city region
ward
LHD
disease
@Rome
datemonthyear
week
organ
25DOLAP 2010 - Toronto, Oct. 30 2010
Example: Mapping translation
∀S,E,C (AdmFT(_,...,S,E,_,_), C=S+E→HospFT(_,...,C,_)
HOSPITALIZATION
costdurationOfStay
ward
unit
patientbirthDate
gender
segment
city region
category
ADMISSIONS
totStayCosttotExamCost
totLengthnumAdmissions
ward
LHD
patientBirthYear
patientGender
patientCity patientNation
monthyear
disease
@Rome
datemonthyear
week
organ
@Florence
date
diagnosis
same
26DOLAP 2010 - Toronto, Oct. 30 2010
Example: Reformulation
p The group-by is reformulated using the roll-upmapping from region to patientCity, while measure cost is derived using the samemapping
πyear,regionOf(patientCity),SUM(totStayCost+totExamCost)(AdmFT DateDT PatientCityDT)
ward
unit
diagnosiscategory
ADMISSIONS
totStayCosttotExamCost
totLengthnumAdmissions
patientBirthYear
patientGender
patientCity patientNation
datemonthyear
@Florence
27DOLAP 2010 - Toronto, Oct. 30 2010
Summary
pWe have outlined a peer-to-peer architecture for supporting distributed and collaborative decision-making scenarios
pWe have shown how an OLAP query formulated on one peer can be reformulated on a different peer, based on a set of inter-peer semantic mappings
28DOLAP 2010 - Toronto, Oct. 30 2010
Future work
p Adopting MDX as the language for expressing multidimensional queries and closing the reformulation query language
p Devising multidimensional-aware object fusion techniques for integrating data returned by different peers
p Finding smart algorithms for routing queries to the most “promising” peers in the BI network
29DOLAP 2010 - Toronto, Oct. 30 2010
Some advertising...
30DOLAP 2010 - Toronto, Oct. 30 2010
Thank you for you attention
Questions?