View
231
Download
4
Category
Preview:
Citation preview
1
DATA WAREHOUSE DESIGNDATA WAREHOUSE DESIGN
ICDE 2001 TutorialICDE 2001 Tutorial
Stefano Rizzi, Matteo Golfarelli
DEIS - University of Bologna, Italy
2
MotivationMotivationBuilding a data warehouse for an enterprise is a huge and complextask, which requires an accurate planning aimed at devisingsatisfactory answers to organizational and architectural questions.Despite the pushing demand for working solutions coming fromenterprises and the wide offer of advanced technologies fromproducers, few attempts towards devising a specific methodology fordata warehouse design have been made. On the other hand, thestatistic reports related to DW project failures state that a majorcause lies in the absence of a global view of the design process: inother terms, in the absence of a design methodology.
SummarySummary� Introduction to Data Warehousing� Conceptual design of Data Warehouses� Workload-based logical design for ROLAP� Indexes for physical design
3
IntroductionIntroductionto Data Warehousingto Data Warehousing
Stefano Rizzi
4
� Information systems are rooted in the relationshipbetween information, decision and control.
� An IS should collectcollect and classifyclassify the information, bymeans of integratedintegrated and suitablesuitable procedures, inorder to produce in timein time and at the right levelsright levels thesynthesis to be used to support the decisionalprocess, as well as to administrate and globallycontrol the enterprise activity.
Information Systems: profile and roleInformation Systems: profile and role
5
Manufacturing
system
Information
systemInformation
Finished product
Information as a resourceInformation as a resource� Information is an increasing value resource,
required from managers to schedule and monitoreffectively the enterprise activities.
� Information is the first matter which is transformedby information systems like unfinished products aretransformed by manufacturing systems.
6
Amount
Value Strategic directions
Reports
Selected information
Primary information sources
Value of informationValue of information
� Information is an enterprise resource like capital, firstmatters, plants and people; thus, it has a cost.
� Hence, understanding the value of information isimportant.
7
Different kinds of information systemsDifferent kinds of information systems
SalesSales and andmarketingmarketing
ManufacturingManufacturing FinanceFinance AccountingAccounting HumanHumanresourcesresources
Operatio
nal
Operatio
nal OperationalOperationalmanagersmanagers
TPSKnowle
dge
Knowledge
KnowledgeKnowledge and anddata data workersworkers
OASKWS
Managem
ent
Managem
entMiddleMiddlemanagersmanagers
MISDSS
Strate
gic
Strate
gic SeniorSeniormanagersmanagers
ESS
8
The The ““Data WarehouseData Warehouse”” phenomenon phenomenon
�� Usual complaints:Usual complaints:
�We have tons of data but we cannot accessthem!�How can people playing the same roleproduce substantially different results?�We want to slice and dice data in anypossible way!�Show me only what is important!�Everyone knows some data are incorrect...
�(R. Kimball, The Data Warehouse Toolkit)
9
Data WarehousingData Warehousing� A collection of technologies and tools supporting the
knowledge worker (executive, manager, analyst) inanalysing data aimed at decision making and atimproving the knowledge assets of the enterprise.
Data WarehouseAt the core of the architecture of modern information systems,it is a data repository:
�Oriented to subjects�Integrated and consistent�Representing temporal evolution�Non volatile
The data warehouse is regularly refreshed, permanently growing,The data warehouse is regularly refreshed, permanently growing,logically centralised and easily accessed by users, essentially read-onlylogically centralised and easily accessed by users, essentially read-only
10
External dataOperational data (relational, legacy)
ReportingtoolsAnalysis tools
(OLAP)
WarehouseWarehouseSummarySummarydatadata
AccessAccess
Data mining
Data WarehouseData Warehouse
What-Ifanalysis
ETL tools
11
Data Data MartsMarts
Data Data WarehouseWarehouse
Data mart
ClientClientmanagementmanagement
GeographicalGeographicalregionsregions
SupplierSuppliermanagementmanagement
MarketingMarketingFinanceFinance
Replication and broadcasting
12
Subject Subject vsvs Process Process
reservations
charge
Medicalreports
admissions
Emphasis on applications
patient
region
consumption
Emphasis on subjects
13
Integration and consistencyIntegration and consistency
DB
DW
Externaldata
Text files
Schema IntegrationExtraction
TransformationCleaning
ValidationFilteringLoading
wrappers mediators
loaders
14
Temporal evolutionTemporal evolution
OLTPDW
Restricted historical content, Often time is not includedin keys,Data are updated
Rich historical content,Time is included in keys,Snapshots cannot beupdated
Current values Snapshot
15
Non-volatilityNon-volatility
OLTP
insert delete
updateDW
load
Huge data volumes:from 20 GBs to some TBs
in a few years
� In a DW, no advanced techniques for transaction managementare required (differently from OLTP systems)
� Key issues are the query throughput and the resilience
access
16
DWDW vs vs. OLTP. OLTP
• 90% ad hoc queries
• Mostly read access• Hundreds users• Denormalised• Supports historical
versions• Optimised for accesses
involving mostdatabase
• Based on summarydata
• 90% predefinedtransactions
• Read/write access• Thousands users• Normalised• Does not support historical
versions• Optimised for accesses
involving a small databasefraction
• Based on elemental data
17
ROLAP (Relational OLAP)ROLAP (Relational OLAP)
� Intermediate level server between a relational back- end serverand the front-end client
� Specialised middleware� Generation of SQL multi-statements for the back-end server� Query scheduling
MOLAP (Multidimensional OLAP)MOLAP (Multidimensional OLAP)
� Direct support of multi-dimensional views� Special data structures (e.g., multi-dimensional arrays)� Compression techniques� Intelligent disk/memory caching� Pre-computation� Complex analysis
18
The technological progressThe technological progress
datadata
knowledgeknowledge
1970 1980 1990 2000
Statistics Statistics &&reportingreporting
DataDataWarehousingWarehousing
OLAPOLAP
DataDataMiningMining
PatternPatternWarehousingWarehousing
Ref
inem
ent
Source:InformationDiscovery
19
The Data The Data Warehouse Warehouse MarketMarket
0
500
1000
1500
2000
2500
3000
3500
4000
4500
1998 1999 2000 2001 2002
RDBMS
OLAP
0
5 0
100
150
200
250
300
350
400
1998 1999 2000 2001 2002
Data Marts
ETL
Data Quality
Metadata
Source: Shilakes, Tylman -Enterprise Information Portals
20
The DW life-cycleThe DW life-cycle
Objective definition andplanning
Clearly determine the scopes,define the borders, estimatedimensions, choose the approach todesign, evaluate the benefits
Infrastructure design Choose the technologies and thetools, analyse the architecturalsolutions, solve the managementproblems
Design and implementationof applications Add iteratively new data marts
and applications to the warehouse
21
BibliographyBibliography� R. Barquin, S. Edelstein. Planning and Designing the Data Warehouse. Prentice Hall
(1996).
� S. Chaudhuri, U. Dayal. An overview of data warehousing and OLAP technology.SIGMOD Record 26,1 (1997).
� G. Colliat. OLAP, relational and multidimensional database systems. SIGMOD Record25, 3 (1996).
� M. Demarest. The politics of data warehousing.Http://www.hevanet.com/demarest/marc/dwpol.html
� U.M. Fayyad, G. Piatetsky-Shapiro, P. Smyth. Data mining and knowledge discoveryin databases: an overview. Comm. of the ACM 39, 11 (1996).
� W.H. Inmon. Building the data warehouse. John Wiley & Sons (1996).� S. Kelly. Data Warehousing in Action. John Wiley & Sons (1997).
� R. Kimball. The data warehouse toolkit. John Wiley & Sons (1996).
� R. Kimball, L. Reeves, M. Ross, W. Thornthwaite. The data Warehouse LifecycleToolkit. John Wiley & Sons (1998).
� C. Shilakes, J. Tylman. Enterprise Information Portals.Http://www.sagemaker.com/company/downloads/eip/indepth.pdf
� P. Vassiliadis. Gulliver inthe land of data warehousing: practical experiences andobservations of a researcher. Proc. DMDW’2000 (2000).
� J. Widom. Research Problems in Data Warehousing. Proc. CIKM (1995).
22
Conceptual modellingConceptual modellingfor Data Warehousingfor Data Warehousing
Stefano Rizzi
23
Why a new conceptual model?Why a new conceptual model?
� While it is universally recognised that a DW leans on amultidimensional model, there is no agreement on theapproach to conceptual modelling.
� On the other hand, an accurate conceptual design isthe necessary foundation for building a “good”information system.
� The Entity/Relationship model is widespread in theenterprises, but….
"Entity relation data models [...] cannot be understoodby users and they cannot be navigated usefully by DBMS
software. Entity relation models cannot be used as thebasis for enterprise data warehouses.” (Kimball, 96)
24
SalesSales
Stor
eSt
ore
ProductProductTim
eTim
e
The multidimensional data modelThe multidimensional data modelNumber of Cokecans sold atBIGSTORES inLondon on 10/10/99
Number of Pepsicans sold at allBIGSTORES on10/10/99
Number of Fantacans globally sold
25
Basic Basic terminologyterminology
� Fact (cube, target). It is a focus of interest for the decision-making process; typically, it models an event occurring in theenterprise world (sales, shipments, purchases). It is essential fora fact to have some dynamic aspects, i.e., to evolve somehowacross time.
� Measures (attributes, variables, metrics, properties). They arecontinuously valued (typically numerical) attributes which describea fact from different points of view. For instance, each sale ismeasured by its revenue.
� Dimensions. They are discrete attributes which determine theminimum granularity adopted to represent facts. Typicaldimensions for the sale fact are product, store and date.
� Hierarchies (dimensions). They contain dimensionattributes (levels, parameters) connected in a tree-likestructure by many-to-one relationships (functional dependencies).
26
DW DW modellingmodelling in the in the literatureliterature
Gyssens, Lakshmanan 97
Agrawal et al. 95
Li, Wang 96
Cabibbo, Torlone 98Datta, Thomas 97
Vassiliadis 98
Tryfona et al. 99
Hüsemann et al. 00
Sapia et al. 98
Franconi, Sattler 99
Golfarelli et al. 98
27
LOGICALLOGICAL
CONCEPTUALCONCEPTUAL
DW DW modellingmodelling in the in the literatureliterature
Gyssens, Lakshmanan 97
Agrawal et al. 95
Li, Wang 96
Cabibbo, Torlone 98Datta, Thomas 97
Vassiliadis 98
Tryfona et al. 99
Hüsemann et al. 00
Sapia et al. 98
Franconi, Sattler 99
Golfarelli et al. 98
28
GRAPHICALGRAPHICAL
FORMALFORMAL
DW DW modellingmodelling in the in the literatureliterature
Gyssens, Lakshmanan 97
Agrawal et al. 95
Li, Wang 96
Cabibbo, Torlone 98Datta, Thomas 97
Vassiliadis 98
Tryfona et al. 99
Hüsemann et al. 00
Sapia et al. 98
Franconi, Sattler 99
Golfarelli et al. 98
29
ALGEBRAALGEBRA
DW DW modellingmodelling in the in the literatureliterature
Gyssens, Lakshmanan 97
Agrawal et al. 95
Li, Wang 96
Cabibbo, Torlone 98Datta, Thomas 97
Vassiliadis 98
Tryfona et al. 99
Hüsemann et al. 00
Sapia et al. 98
Franconi, Sattler 99
Golfarelli et al. 98
30
DESIGNDESIGN
DW DW modellingmodelling in the in the literatureliterature
Gyssens, Lakshmanan 97
Agrawal et al. 95
Li, Wang 96
Cabibbo, Torlone 98Datta, Thomas 97
Vassiliadis 98
Tryfona et al. 99
Hüsemann et al. 00
Sapia et al. 98
Franconi, Sattler 99
Golfarelli et al. 98
31
Conceptual modelsConceptual models
� Sapia, Blaschka, Höfling, Dinter (1998)
dimension level
roll-up relationship
fact relationship
attribute
32
Conceptual models (2)Conceptual models (2)
� Franconi, Sattler (1999)
dimensiontarget
property
level
aggregated entity
33
Conceptual models (3)Conceptual models (3)
� Hüsemann, Lechtenbörger, Vossen (2000)
dimension
dimensionlevel
measure property attribute
optional property attribute
optional
aggregation path
fact
34
The Dimensional Fact ModelThe Dimensional Fact Model
The Dimensional Fact ModelDimensional Fact Model (DFM) is a graphicalconceptual model for DWs, aimed to:� Effectively support conceptual design;� Provide an environment where user queries can be formulated
intuitively;� Enable communication between the designer and the final user
in order to refine requirement specification;� Supply a stable platform for logical design;� Provide an expressive and non-ambiguous documentation.
The DFM is independent of the target logical model(multidimensional or relational)
35
� Three levels of conceptual documentation are provided:� Fact scheme: represents a fact of interest and the associated
measures, dimensions and hierarchies.� Data Mart scheme: summarizes the fact schemes which
constitute each data mart and emphasize the feasibleconnections between them.
� Data Warehouse scheme: shows the different data martsemphasizing their overlaps, the different profiles of the usersaccessing them, and the operational sources which feedthem.
The Dimensional Fact Model (2)The Dimensional Fact Model (2)
� Each documentation level is integrated by glossarieswhich explain the names adopted within the schemes,define a connection between the DW data and theoperational sources, express data volumes.
� Data mart schemes are associated to the workloadspecification.
36
hierarchy
Fact schemesFact schemes
A fact expresses a many-to-many relationship between its dimensions
state
SALE
category
type
quarter month
store
storecity
county
sales manager
year
sale district
date
holidayday of week
marketinggroup
department
brand
qty soldrevenueunit priceno. of customers
brand city
product
week
dimensionattribute
measure
fact
dimension
37
address
non-dimensionattribute
phone
manager
diet
manager
promotion
price reduction
cost
end datebegin date
ad type
optionality
state
SALE
category
type
quarter month
store
storecity
county
sales manager
year
sale district
date
holidayday of week
marketinggroup
department
brand
qty soldrevenueunit priceno. of customers
brand city
product
week
Fact schemes (2)Fact schemes (2)� A non-dimension attribute contains additional information
about a dimension attribute, and is typically connected toit by a one-to-one relationship.It cannot be usedfor aggregation.
� Some links betweenattributes canbe optional.
38
Fact schemes (3)Fact schemes (3)
� Convergence� Cross-dimension attributes� Additivity,
non-additivity,non-aggregability
� Overlap
begin date
end date
store state
diet
marketinggroup
brand city
store county
store city
SALE
product
qty soldrevenueunit priceno. of customers
category
type
department
brand
store
promotion
ad type
price reduction
fiscalweek
fiscalquarter
fiscalmonth
fiscalyear
date
week
day of week
quarter monthyear
manager
sale district
phone
address
V.A.T.
non-aggregabilitycross-dimension
attribute
convergence
39
The SHIPMENTS fact schemeThe SHIPMENTS fact scheme
marketinggroup
brand city
store state
store city
warehousestate
warehouse city
SHIPMENTTO STORES
product
qty shippedshipping cost
category
type
warehouse
department
brand
store
mode
typecarrier
fiscalweek
fiscalquarter
fiscalmonth
fiscalyear
date
week
day of week
quarter monthyear
FACT SCHEME: SHIPMENT TO STORES
40
The INVENTORY fact schemeThe INVENTORY fact scheme
marketinggroup
brand city
warehousenation
warehouse city
INVENTORY
product
level
category
type
warehouse
department
brand
fiscalweek
fiscalquarter
fiscalmonth
fiscalyear
date
week
day of week
quarter monthyear
FACT SCHEME: INVENTORY
units per pallet
package type
package size
weight
AVG,MIN
41
The The ““supply chainsupply chain””
date
product
factoryMANUFACTURING
date
component
factoryCOMPONENTINVENTORY
date
product
factory
packagetype
PACKAGING
date
product
warehouseWAREHOUSEINVENTORY
PRODUCTION OFCOMPONENTS
date
component
factory date
component
to factoryCOMPONENTDELIVERY
from factory
date
product
factory
warehouse
SHIPMENT TOWAREHOUSE
mode
date
product
storeSALES
promotion
date
product
store
warehouse
SHIPMENT TOSTORES
mode
42
GlossariesGlossaries
name description domain card. queryproduct products 5000brand brands 800brand city Where brands are manufactured cities 50type (pasta, soft drink, …) pr. types 200category (food, clothing, music,…) pr. categories 10department Deps. managing categories deps. 5marketing group Responsible for product types groups 20
select prodName,brandName, cityName,…from PRODUCTS P,BRANDS B, CITIES C,…where P.brandId =B.brandIdand B.cityId = C.cityIdand . . . . . . . . . . .
stores stores 100store city cities 80store state states 5
select storeName,cityName,stateName from STORESS,CITIES Cwhere S.cityId = C.cityId
.................... .................... ................. ......... . . . . . . . . . . . . .
ATTRIBUTE GLOSSARY: SHIPMENT TO STORES
name description type queryqty shipped Quantity of each product being
shippedINTEGER select SUM(PS.qty)
from PRODUCTS P,SHIP S,PRODSHIPPS,…where P.prodId = PS.prodIdand PS.shipId = S.shipIdand . . . . . . . . . . . . .group by P.prodId,S.date, . . .
shipping cost Cost of the shipment MONEY . . . . . . . . . . . . .
MEASURE GLOSSARY: SHIPMENT TO STORES (sparsity = 0.01)
refresh frequency: 1 per week; refresh technique: periodic complete
43
Data mart schemesData mart schemes
� The data mart scheme is used to summarize the factschemes which constitute the data mart and to showdrill-across connections between them.
� It is a graph whose nodes are elemental andoverlapped fact schemes; the arcs are directed toeach overlapped scheme from its componentschemes, which in turn may be overlapped.
MANUFACTURING
COMPONENTINVENTORY
PACKAGING
WAREHOUSEINVENTORY
PRODUCTION OFCOMPONENTS
COMPONENTDELIVERY
SHIPMENT TOWAREHOUSE
SALESHIPMENT TOSTORES
DATA MART SCHEME: SUPPLY CHAIN
PRODUCTION AND DELIVERY
DELIVERY AND INVENTORY
MANUFACTURING AND PACKAGING
SHIPMENT AND SALE
DISTRIBUTIONCYCLE
PRODUCT CYCLE
44
The workloadThe workload
� In principle, the workload for a data mart is dynamicand unpredictable.
� In some commercial tools, the actual workload ismonitored while the DW is operating and the logicaland physical schemes are dynamically tuned.
� We claim that a core workload can, and should, bedetermined a priori:� The user typically knows in advance which kind of data
analysis (s)he will carry out more often for decisional orstatistical purposes;
� A substantial amount of queries are aimed at extractingsummary data to fill standard reports.
45
The workload (2)The workload (2)
marketinggroup
brand city
store state
store city
warehousestate
warehouse city
SHIPMENTTO STORES
product
qty shippedshipping cost
category
type
warehouse
department
brand
store
mode
typecarrier
fiscalweek
fiscalquarter
fiscalmonth
fiscalyear
date
week
day of week
quarter monthyear
FACT SCHEME: SHIPMENT TO STORES
46
Data warehouse schemesData warehouse schemes
� At the highest abstraction level, the data warehousescheme shows the different data marts emphasizingthe fact schemes duplicated on two or more of them,the different profiles of the users accessing them, andthe operational sources which feed them.
DEMANDCHAIN
SUPPLYCHAIN
SALES
RENOVATION
personnelmanager
administrativemanager
saleexecutive
buyer
claims
incentives
personneldatabase
purchases
restorationworks
PERSONNEL
SALES
productdatabase
orders
data mart
user
fact scheme
operational db
file transfer
manual input
47
Stefano Rizzi
Conceptual designConceptual designof Data Warehousesof Data Warehouses
48
Designing the DWDesigning the DW
� Within a successful approach to DW design, top-downand bottom-up strategies should be mixed.
� When planning a DW, a bottom-up approach should befollowed.
� One data mart at a time is identified and prototyped.
� Each data mart is designed in a top-down fashion bybuilding a conceptual scheme for each fact of interest.
49
Data Mart prototypingData Mart prototypingPrototype first the data mart which:
� plays the most strategic role for the enterprise;� can convince the final users of the potential benefits;� leans on available and consistent data sources.
DM1
Source 1
DM2
DM3
Source 2
DM4
DM5
Source 3
50
Reference architectureReference architecture
Reconciled data
heterogeneous operational dbs
DW
Problem of designingthe reconciled data(integration ofheterogeneous sources)
51
Methodological frameworkMethodological framework
analysis of theoperational db
requirementspecification
conceptualdesign
workloadrefinement
logicaldesign
physicaldesign
final user
designer
db administrator
DWs are based on a pre-existing information system
52
Methodological framework Methodological framework (2)(2)
LogicalLogicalSchemeScheme
LOGICALDESIGN
WorkloadTargetlogicalmodel
PhysicalPhysicalSchemeScheme
PHYSICALDESIGN
Workload TargetDBMS
E/R E/R SchemeScheme
chiave negozio negozio città regione indirizzo resp. vendite
N1 …. …. …. …… ………
N2
chiave tempo chiave negozio chiave_prodotto quant venduta incasso num_clienti
T1 N1 P1 10 10000002
T1 N1 P2 8 12000008
T1 N2 P5 15 15000005
… ….. …… …….
RelationalRelationalSchemeScheme
ConceptualConceptualSchemeScheme
CONCEPTUALDESIGN
Facts
Preliminaryworkload
53
Conceptual design of the data martConceptual design of the data mart
� Design is based on the documentation of theunderlying operational information system:� E/R schemes� Relational schemes
Golfarelli, Maio, Rizzi 98; Cabibbo, Torlone 98;Moody, Kortink 00; Hüsemann, Lechtenbörger, Vossen 00
� Steps:� Find facts� For each fact:
• Navigate functional dependencies• Drop useless attributes• Define dimensions and measures
54
Finding factsFinding facts
� Within an E/R scheme, a fact is represented by either anentity F or an n-ary relationship between entities E1...En
� Within a relational scheme, a fact is represented by arelation F.
The entities and relationships representing frequentlyupdated archives are good candidates to define facts;those representing nearly-static archives are not.
55
Navigating functional dependenciesNavigating functional dependencies
� Build a tree in which each vertex corresponds to anattribute of the scheme;
� The root corresponds to the identifier (key) of F;� For each vertex v, the corresponding attribute
functionally determines all the attributes correspondingto the descendants of v.
56
Example (from the E/R scheme):Example (from the E/R scheme):
TYPE
PRODUCT
CATEGORY
STORE CITY
(1,1)(0,N)
(1,1) (1,N)
(1,1) (0,N) (1,1) (1,N)
(0,N)
date
qty
unit price
PURCHASETICKET
(1,N)
type category
product
salesmanager
ticket number store city
of
sale
of
in in
weight
diet(0,1)
address
MARKETING GROUP
(1,1)
(1,N)
marketinggroup
for
manager
DEPARTM.
(1,1)
(1,N)
department
for
manager
phone
COUNTY
(1,1)
(1,N)
county
of
STATE
(1,1)
(1,N)
state
of
BRAND
brand
(1,1)(1,N)
(1,1) (1,N)of
WAREHOUSE
(1,N)
(1,N)
fromwarehouse
SALEDISTRICT
district no.
(1,1) (1,N)in
(1,1)
(1,N)
of
address producedin
size
57
district no
unit price
qty
ticketnumber
date
store
salesmanager
city state
product
brand
typecategory
addressdiet
city
weightdept.
manager
mark. grp.manager
phone
county
district no+state
size
city
state
countysale
Example (from the E/R scheme):Example (from the E/R scheme):
58
Dropping useless attributesDropping useless attributes� Some attributes in the tree may be uninteresting for
the DW. In order to drop useless levels of detail, it ispossible to apply the following operators:�� PruningPruning: delete a vertex and its subtree.
ticketnumber
date
store
salesmanager
city state
address
ticketnumber
date
store
salesmanager
address
address
date
store
salesmanager
�� GraftingGrafting: delete a vertex and move its subtree. It isuseful when an attribute is not interesting but theattributes it determines must be preserved.
59
Defining dimensionsDefining dimensions
� The choice of dimensions determines the factgranularitygranularity.
� Dimensions must be chosen among the root childrenin the attribute tree.
� Time should always be a dimension.
unit price
qty
date
store
salesmanager
city state
product
brand
typecategory
addressdiet
city
weightdept.
manager
mark. grp.manager
phone
district no+state
city countysale
60
Defining measuresDefining measures� Measures must be chosen among the children of the root.� Typically, measures are computed either by counting the
number of instances of F, or by summing (averaging, …)expressions which involve numerical attributes.
� An attribute cannot be both a measure and a dimension.� A fact may have no measures.
unit price
qty
date
store
salesmanager
city state
product
brand
typecategory
addressdiet
city
weightdept.
manager
mark. grp.manager
phone
district no+state
city countysale
61
GranularityGranularity
� Defining the granularity of data is a primary issue indetermining performance. Granularity depends on thequeries users are interested in, and represents atrade-off between query response time and detail ofinformation to be stored.� It may be worth adopting a finer granularity than that
required by users, provided that this does not slow downthe system too much.
� Constrained by the maximum time frame for loading.
� Choosing granularity includes defining the refreshinterval.� Issues to be considered:
• Availability of operational data• Workload characteristics• The total time period to be analysed
62
WWANANDDa CASE a CASE tool fortool for data data warehousewarehouse design design
� A design methodology is almost useless, if no CASE tool tosupport it is provided.� Acquire the relational db scheme via ODBC
� Carry out conceptual design
� Define the workload
� Calculate data volume
� Carry out logical design
� Create the documentation (including loading/feeding queries)
63
Bibliography Bibliography (1)(1)� K. Aberer, K. Hemm. A methodology for building a data warehouse in
a scientific environment. Proc. 1st Int. Conf. on Cooperative Inf.Systems, Brussels (1996).
� R. Agrawal, A. Gupta, S. Sarawagi Modeling multidimensionaldatabases. IBM Research Report, IBM Almaden Research Center(1995).
� M. Blaschka et al. Finding your way through multidimensional datamodels. Proc. DEXA’98 (1998).
� L. Cabibbo, R. Torlone. A logical approach to multidimensionaldatabases. EDBT 98 (1998).
� A. Datta, H. Thomas. A conceptual model and algebra for on-lineanalytical processing in data warehouses. Proc. WITS’97 (1997).
� E. Franconi, U. Sattler. A data warehouse conceptual model formultidimensional aggregation. Proc. DMDW’99 (1999).
� M. Golfarelli , D. Maio, S. Rizzi The Dimensional Fact Model: aconceptual model for data warehouses. Int. Jour. of Cooperative Inf.Systems 7, 2&3 (1998).
� M. Golfarelli, S. Rizzi. Designing the data warehouse: key steps andcrucial issues. Jour. of Computer Science and InformationManagement 2, 3 (1999).
64
Bibliography Bibliography (2)(2)� M. Gyssens, L.V.S. Lakshmanan. A foundation for multi-dimensional
databases. Proc. 23rd VLDB, Athens, Greece (1997).� B. Hüsemann , J. Lechtenbörger, G. Vossen. Conceptual data
warehouse design. Proc. DMDW’00 (2000).� R. Kimball. The data warehouse toolkit. John Wiley & Sons (1996).� D. Moody, M. Kortink. From enterprise models to dimensional models:
a methodology for data warehouse and data mart design. Proc.DMDW’00 (2000).
� T. Bach Pedersen, C. Jensen. Multidimensional data modelling forcomplex data. Proc. 15th ICDE, Sydney (1999).
� C. Sapia et al. Extending the E/R model for the multidimensionalparadigm. Proc. ER’98 (1998).
� N. Tryfona, F. Busborg, J. Christiansen. starER: A Conceptual Modelfor Data Warehouse Design. Proc. DOLAP’99 (1999).
� P. Vassiliadis. Modeling multidimensional databases, cubes and cubeoperations. Proc. 10th SSDBM Conf., Capri, Italy (1998).
Recommended