Building a Data Warehouse - Using Oracle Olap Tools

Embed Size (px)

Citation preview

  • 8/4/2019 Building a Data Warehouse - Using Oracle Olap Tools

    1/14

    ACTA Journal, September 1997 Building a Data Warehouse

    Using Oracle OLAP Tools ?-1

    Building a Data WarehouseUsing Oracle OLAP Tools

    Satish Mahajan

    Center of ExpertiseWorldwide Customer Support

    Oracle Corporation

    September 1997

    Technical Report

  • 8/4/2019 Building a Data Warehouse - Using Oracle Olap Tools

    2/14

    Building a Data Warehouse ACTA Journal, September 1997

    ?-2 Using Oracle OLAP Tools

    TABLE OF CONTENTS

    TABLE OF CONTENTS ..............................................................................................................................................2

    INTRODUCTION.........................................................................................................................................................3

    1.0 MANAGEMENT COLLABORATION..................................................................................................................3

    2.0 KNOW THE END USERS .....................................................................................................................................4

    2.1 Prototype Development .......................................................................................................................................43.0 PRE-DEVELOPMENT ACTIVITIES ....................................................................................................................4

    3.1 End User Requirements Analysis.........................................................................................................................4

    3.2 Capacity Planning................................................................................................................................................5

    4.0 DEVELOP SYSTEM ARCHITECTURE ...............................................................................................................7

    5.0 DESIGN AND DEVELOPMENT...........................................................................................................................8

    5.1 Data Management Oracle Side .........................................................................................................................8

    5.2 Data Management Express Side .......................................................................................................................9

    6.0 DESIGN AND DEVELOPMENT PROGRAM MANAGEMENT .....................................................................9

    6.1 Oracle Design Goals............................................................................................................................................9

    6.2 Program Modules ..............................................................................................................................................10

    6.3 Program Management Express Side ...............................................................................................................12

    GLOSSARY ................................................................................................................................................................13

    Copyright Oracle Corporation 1997. All rights reserved. Printed in the U.S.A.

    Author: Satish Mahajan

    History: This paper was original presented at Oracle Open World, Australia, in 1996.

    Oracle is a registered trademark of Oracle Corporation. Oracle7 is a trademark of Oracle Corporation.

    All other products or company names are used for identification purposes only, and may be trademarks of their

    respective owners.

    NO PART OF THIS DOCUMENT MAY BE REPRODUCED, IN ANY FORM, WITHOUT THE

    PERMISSION OF THE AUTHORS. THIS STIPULATION ALSO APPLIES TO ORACLE EMPLOYEES.

  • 8/4/2019 Building a Data Warehouse - Using Oracle Olap Tools

    3/14

    ACTA Journal, September 1997 Building a Data Warehouse

    Using Oracle OLAP Tools ?-3

    INTRODUCTION

    This paper describes how to develop a large data warehouse. Specifically, we are interested in using the Oracle

    Express tools family for Online Analytical Processing (OLAP). The paper is written with the assumption that the

    reader is familiar with basic data warehouse and multi-dimensional database concepts. Definitions of the most

    commonly used terms in this paper are provided in a glossary at the end of this paper.

    Our methodology, as listed below, is generally accepted within the industry and the literature.

    1. Management collaboration get management sponsor, technical and project management, technical resources,

    project scope and schedule in place.

    2. Know the end users gather end user requirements, rapid prototype development.

    3. Pre-development activity -- user requirements analysis, capacity planning, volume sizing, confirmation of

    requirements.

    4. Develop system architecture hardware and software layouts.

    5. Design and Development data management on Oracle side, data management on Express side.

    6. Design and Development continued design goals, program management on Oracle side, program managementon Express side.

    The rest of this paper provides a checklist approach for each of the above steps. Specifically, from an Oracle

    perspective, we are interested in solutions to problems in the areas of design, development, and implementation.

    1.0 MANAGEMENT COLLABORATION

    1.1 Having a management sponsor for a data warehouse project is a mandatory requirement. It is important for the

    management contact to know the following things:

    = Scope of the project is generally very large and hence costs for the proposed systems hardware, software,manpower and maintenance burden should be budgeted.

    = Communication with peer executives should set right expectations about the project. Preparing thebackground for understanding of the basic concepts of data warehousing and OLAP technology is

    important. It is also necessary to prepare a return on investment so that the project effort is unanimously

    backed by the management.

    = The management sponsor should find a project manager who has broad understanding of the business andbusiness goals of the proposed system in addition to the experience of managing large projects.

    1.2 The project manager should opt for a technical architect having a broad experience in development of large

    systems using Oracle database and tools. Familiarity with Oracle Express product family and general multi-

    dimensional database concept is also required.

    1.3 Incremental addition of technical resources during the development is suggested and keeping the team size smalloften helps create solid basic foundation for the project. Initial team can consist of one technical person on

    Oracle and Express side, each in addition to the architect of the system. A detailed project schedule and

    additional resources can be sought after the prototype is done.

  • 8/4/2019 Building a Data Warehouse - Using Oracle Olap Tools

    4/14

    Building a Data Warehouse ACTA Journal, September 1997

    ?-4 Using Oracle OLAP Tools

    2.0 KNOW THE END USERS

    Because the data warehousing and OLAP are relatively new concepts, it is a good idea to develop a small prototype.

    This prototype can then be used for gathering overall end user requirements.

    2.1 Prototype Development2.1.1 After talking to one or two key end users, two simple analysis measures with about three dimensions

    (preferably the same dimensions and time being one of them) from the same operational data source can be

    selected for the prototype.

    2.1.2 The prototype can be created on a desktop PC with a small amount of data such that it represents about three

    levels for the time dimension. Assuming the prototype PC on a corporate network, the data can be loaded into a

    Personal Oracle database. With such a real time data load, the prototype creates good impact on the end users.

    2.1.3 Personal Express Server and Oracle Sales Analyzer are the obvious choices on the Express side. The Personal

    Express Server can communicate with Personal Oracle database on an ODBC link. Because Oracle Sales

    Analyzer is a canned application, it is very easy to create graph and table objects for the selected measures.

    2.1.4 On the Oracle side, a fact table for each measure needs to be created along with the corresponding dimension

    tables.

    2.1.5 On the Express side, dimensions and data cubes corresponding to each measure need to be defined so that these

    objects can be populated with data values from the Oracle side objects. Aggregating data values along all the

    dimensions can be achieved in Express Server since direct command to do the rollup is available which needs no

    programming effort.

    2.2 User Requirements Gathering

    A visual presentation by the prototype demonstration is a very effective way of gathering the end user

    requirements for data analysis and reporting. An analyst with a good understanding of the business is required to

    carry out this task.

    3.0 PRE-DEVELOPMENT ACTIVITIES

    3.1 End User Requirements Analysis

    3.1.1 The requirements gathering phase is followed by their analysis. This can be conceptualized by taking following

    factors into account:

    = business measures of the end users interest= all dimensions of each measure= hierarchies and levels in each dimension= time duration for which data trend is observed for each measure= end user response time requirements for on-line data analysis= data refresh cycle requirements= all operational data sources from which data is required to be gathered

  • 8/4/2019 Building a Data Warehouse - Using Oracle Olap Tools

    5/14

    ACTA Journal, September 1997 Building a Data Warehouse

    Using Oracle OLAP Tools ?-5

    3.1.2 It is also important to decide which measures are important for on-line data analysis as against reporting. The

    OLAP tools are most effective when used for on-line data analysis although they provide some reporting

    capabilities. An important criteria for the distinction is that an on-line measure is able to answer all the end user

    questions regarding the given subject matter with simple user actions such as mouse clicks as against waiting for

    some other reports to come out. This way, end users can correlate different trends and exceptions, rank best and

    worst scenarios on-line and do much more analysis without disturbing their train of thoughts.

    3.1.3 Once all the measures are in perspective, the next task is to prioritize them. Generally, reporting measures can

    be implemented by relational query tools since response time is not a major concern. On-line measures hence

    take precedence over reporting measures in a heavy analysis environment.

    Another angle for prioritization is a group of users that will be using a given measure. Overall, end users can be

    divided into at least two categories as users doing typical analysis and those who do research type analysis. It

    is a good idea to implement a few measures for each category of users at every delivery.

    All these steps of requirements analysis are very important because they pave the foundation of the overall system

    design and its successful implementation.

    3.2 Capacity Planning

    Before entering into actual system development, it is imperative to plan the capacity of the proposed system whichhelps determine delivery and implementation plan. The capacity planning can be achieved by considering

    = processes carried out on the system (for CPU and memory sizing) and= data volume sizing (for disk requirements)

    3.2.1 A major part of capacity planning is applicable to a server side where Oracle and Express databases reside. On

    a client side, a standard desktop PC or a laptop running Windows 95 or Windows NT system is assumed. Oracle

    Sales Analyzer or Express Objects applications run as standard Windows application. With changing product

    directions for the front-end tools, these products are becoming web enabled and the front-end can then be run

    from any web browser. The focus of planning is then going to shift on networking environment.

    3.2.2 On the server side, there are many processes that can determine the size of the system in terms of CPUs,

    memory and disks. A volume sizing step discussed in the next section gives pointers for the disk requirement. Itis a good idea to carry out actual benchmarks with multiple configurations for CPU and memory requirements.

    Important processes to consider on Oracle side are:

    = load rate of raw data since it has a bearing on the refresh cycle= representative programs that will be run as a part of dimension cleanup (scrub process) and fact table

    creations

    = representative queries that will be run as a part of rollup process on Oracle sideImportant processes on Express side are:

    = data load times and space requirements for dimensions in multiple databases= data load times and space requirements with conjoints of the same dimensions= data load times for loading data cubes= CPU and memory requirements with multiple users doing variety of analytical tasks= drill through times for the analysis on virtual cubes

  • 8/4/2019 Building a Data Warehouse - Using Oracle Olap Tools

    6/14

    Building a Data Warehouse ACTA Journal, September 1997

    ?-6 Using Oracle OLAP Tools

    3.2.3. Volume Sizing

    Volume sizing helps determine disk capacity that is needed on the proposed warehouse system. Basic components of

    volume sizing are,

    = Operating system level requirements (swap, mirroring, backup, etc.)= Oracle database system level requirements (system, rollback, temporary, redo logs, etc.)= Staging area for data feeds from operational data sources= Space required for data processing (data loading, data cleaning, rollup)= Express database space required for system, dimensions, physical data cubes, etc.

    It is also necessary to look at the measures from rollup viewpoint. Depending on the quality of data, sometimes data

    volume shrinks as the summarization are carried out but if the data under consideration is sparse, data volume

    explodes at higher levels of summarization. This sparsity effect is more pronounced with measures having more

    number of dimensions and more levels in each dimension. It then becomes necessary to revisit the end user

    requirements analysis and break down a single measure with a large number of dimensions into multiple measures

    having lesser dimensions. This causes some loss of analytical capability but brings the measure from an impossibility

    to a feasible solution.

    This volume sizing exercise also helps in deciding as to which measures will become physical cubes and which ones

    will be virtual cubes in Express Server. The volume sizing is easy and accurate since data is mostly represented in

    number format to convert relational objects into hierarchies to represent in the multi-dimensional world. Also, bytes

    required to represent numbers, levels in each dimension and possible data explosion/shrink at each level are either

    known or can be deduced by minor testing.

    Once the volume sizing is done, it is important to reconcile the end user requirements analysis and possible

    modifications in the measures because of volume constraints. It is a good idea to get an end user confirmation on

    final measures and their dimensionality.

  • 8/4/2019 Building a Data Warehouse - Using Oracle Olap Tools

    7/14

    ACTA Journal, September 1997 Building a Data Warehouse

    Using Oracle OLAP Tools ?-7

    4.0 DEVELOP SYSTEM ARCHITECTURE

    Figure 1: Typical System Architecture

    A typical client server solution in an open systems environment using Oracles relational and Express family

    products is shown in Figure 1. At hardware level, there is a single powerful server at the back-end (mostly UNIX)

    which serves many clients on the front-end (mostly PCs and laptops). With changing product directions, a

    communication link between the client and server nodes is going to become corporate intranet/internet and client

    machines are going to work more like thin clients. The overall solution is still valid and upgradable to the new

    scenario.

    At the software level, the system architecture is a three tier solution: Oracle database and Express database being

    parts of server side and Express applications running on Windows being a part of client side.

    If the load on the server side keeps on growing, Oracle and Express databases can be installed on separate machines

    such that they use SQL*NET for communication. This provides scalability as well as some fall-back mechanism if

    one of the servers goes down.

    This solution is also portable since Oracle and Express Server software is available on majority of hardware

    platforms and the client-server communication being Distributed Computing Environment (DCE) compatible.

    NETWORK CONNECTING OTHER

    OPERATIONAL SYSTEMS

    UNIX SYSTEM

    CLIENT SIDE APPLICATIONS

    USE REMOTE API TO ACCESS

    DATA FROM EXPRESS SERVER

    BASEDON THE DCE SERVICES

    LAPTOP

    ETHERNET10BASETOR

    CDDI

    DESKTOP PC

    DOCKING STATION

    LAPTOP

    DESKTOP PC

    DCE SERVICE SUPPORT FROM

    THE UNIX OPERATING SYSTEMPC DCE SERVICES BUNDLED AS A PART

    OF EXPRESS OBJECTS SOFTWARE

    ORACLE

    SERVERV.7.3

    EXPRESS

    SERVER V.5.0

    RAA/RAM

    EXPRESS

    OBJECT V.2.0/SALES

    ANALYZER

    APPLICATION

    ORACLE

    SERVERDATA

    +

    EXPRESS

    SERVERDATA

    HARDW

    ARE

    SOFTW

    ARE

    C L I E N T S I D E S E R V E R S I D EC O M M U N I C A T I O N

    EXPRESS

    OBJECT V.2.0/

    SALES

    ANALYZER

    APPLICATION

    EXPRESS

    OBJECT V.2.0/

    SALES

    ANALYZER

    APPLICATION

  • 8/4/2019 Building a Data Warehouse - Using Oracle Olap Tools

    8/14

    Building a Data Warehouse ACTA Journal, September 1997

    ?-8 Using Oracle OLAP Tools

    5.0 DESIGN AND DEVELOPMENT

    This phase is applicable to two main areas data management and program management and activities in these

    areas can be carried out concurrently.

    5.1 Data Management Oracle Side5.1.1. Protocols With Data Source Owners

    The outcome of user requirements analysis identifies all the objects that are needed from each data source

    (operational system) in the organization. It is important to form a protocol with such data source owners using

    following points:

    = Name of the data source= Management contact(s) details= Technical contact(s) details= Machine access details= Database access details= Best time for data transfers= Historical data details= History, frequency and details of structure changes in the data

    5.1.2. Data Object Details

    For every data source, it is necessary to identify all the objects needed during each refresh cycle. Following points

    are important in this area:

    = Object details= Object type static or dynamic= Data transfer method= Object size= Growth at every refresh cycle= Relationship with object(s) in the other data sources

    At this point, it is important to understand how object is going to move through various modules on Oracle and

    Express sides so that it is finally either represented as a data cube or a dimension. Details about this and all the above

    mentioned points become a part of metadata for the system.

  • 8/4/2019 Building a Data Warehouse - Using Oracle Olap Tools

    9/14

    ACTA Journal, September 1997 Building a Data Warehouse

    Using Oracle OLAP Tools ?-9

    5.2 Data Management Express Side

    From the volume sizing calculations in the capacity planning stage, it should be clear as to which objects should be

    loaded into Express database. Following points are important while doing data management on Express side:

    = Since data loading into Express database is currently single threaded, a single physical database increasesoverall refresh cycle time for the system.

    = Creating multiple physical databases increases availability as seen by the end users in case of a singledatabase failure. But this also increases load time by an extent that all the dimension objects are loaded in

    all the databases.

    = Conjoint dimensions guarantee 100% dense object and thus can eliminate sparsity issues. However,conjoint maintenance for changing dimensions is not linear as number of values in the conjoint increases.

    = Loading and maintaining small size cubes in Express database and keeping remaining data in Oracledatabase for reach-through gives a feasible solution in many situations.

    6.0 DESIGN AND DEVELOPMENT PROGRAM MANAGEMENT

    6.1 Oracle Design Goals

    The following goals are important for program design:

    = Scalability Once the basic system is designed and developed, it should be extensible in terms of additionof new data sources as new analytical capabilities are added to the system.

    = Maintenance Changes in the object structures and inconsistent data values should be automaticallydetected by the system and communicated to the concerned data administrators. Regular data growth

    patterns, warnings and alerts about space management, regular backup procedures, etc. should be automated

    so that system and data administrators can invest their time in system enhancements.

    = Recoverability In case of a failure:= complete system build should be avoided except for catastrophic failures.= parts of the system should be made incrementally available to the end users according to their

    preference.

    = system should resume its processing from a point where it failed and still produce consistent results atthe end of the processing cycle.

  • 8/4/2019 Building a Data Warehouse - Using Oracle Olap Tools

    10/14

    Building a Data Warehouse ACTA Journal, September 1997

    ?-10 Using Oracle OLAP Tools

    6.2 Program Modules

    With the above design goals in mind, program modules as shown in Figure 2 can be established.

    EXPRESS SIDE PROCESSES

    ORACLE SIDE PROCESSES

    Express ServerDatabases

    Security

    Bridge

    ExpressDBA

    PresentationData

    Talking viaDCE/RPC

    EXTERNAL WORLD INTERFACE

    Sales Analyzer /Express Objects

    Front-End Application

    OperationalData Sources

    MetadataSetup Module

    Global Error &State Processing

    Metadata -Warehouse +OEM + RAA

    Extraction

    Rollup

    Talking via SQL &PL/SQL Blocks

    JobsCommunicating

    via OEM

    Create BaseFact

    Load Scrub

  • 8/4/2019 Building a Data Warehouse - Using Oracle Olap Tools

    11/14

    ACTA Journal, September 1997 Building a Data Warehouse

    Using Oracle OLAP Tools ?-11

    A brief discussion of each module follows:

    = Extraction This module does the extraction of raw data by connecting to all data sources. Variousextraction methods such as ftp and SQL*LOADER, import/export, create table as select, read-only

    snapshots, etc. should be provided in this module. The program should also check availability of various

    resources such as connection establishment, space, etc. at the start. It should also be capable of extracting

    variety of data formats such as Oracle tables, flat file dumps, Excel spreadsheets, etc.

    = Load This module selects only that data which gets converted into a dimension or a fact table measure ofinterest. It also tracks any structural changes in the objects that are extracted. Some structural changes such

    as a deletion of a column from a flat file can not be automatically tracked but needs proactive notification to

    the warehouse administrator so that appropriate changes can be made in the extraction routines.

    = Scrub All the measures selected for on-line analysis can either be presented as having shared dimensionsor data source specific dimensions. With shared dimensions, more analytical capabilities such as comparing

    two measures from two different data sources are achieved. In order to share dimensions, all the data

    coming from different data sources should conform with the master shared dimension values. If any

    anomalies exist then they should be correctly pointed out. All these activities of data cleanup are done in

    this module. Dimensions with multiple hierarchies are also created in this module. Generally, dimensions

    are created from code tables in operational systems which are static in nature.

    = Base Fact Creation Every measure can be represented as a fact table surrounded by the dimension tablescreated in the scrub module. Such fact tables with values in the lowest levels of all the dimensions are

    created in this module and are called as base fact tables. If many measures have the same dimensionality, all

    of them can be represented in a single base fact table. Generally, base fact tables are created from

    transactional tables in operational systems which are dynamic in nature. Objects are represented as star

    schema in this module.

    = Rollup Once the base fact table for each measure is ready, it needs to be rolled up at every level of all thedimensional hierarchies associated with that measure. This is a most important step in the warehouse

    processing. As the data values are summarized at higher levels of hierarchies, data volume can either shrink

    or explode depending on the data sparsity for a given measure. This step is implemented by using typical

    warehousing query consisting of group by clause on union all tables. Union all tables generally

    represent a partition view created on a shared dimension (most commonly time) with a constraint. Aconstraint represents granularity of union all tables (generally one week or a month for time dimension).

    Recent addition of features to Oracle Server such as hash join, bit-mapped index and parallel-aware

    optimizer all help execute such queries very efficiently. Mostly rollup needs to be carried out on refresh

    cycle data but if dimensional hierarchies change, rollup generally works on a large amount of data stored in

    the warehouse. This processing and an issue of finding out only relevant data that is affected by a particular

    hierarchy change pose major challenges.

    = Metadata As all the above modules go in action, the data needed for their own automation and smoothfunctioning has to be kept. A separate repository is created for this data. It is a good idea to create all the

    repository objects in a separate schema. The objects are initially created and maintained by a separate

    application which is accessible to only warehouse administrator. During the warehouse operations, various

    modules may modify repository objects to maintain various process states, statistics, etc.

    Choice of a programming language can become an issue in the development of the system since there are a variety of

    tasks that are carried out at database and operating system level. OraTcl is a language which incorporates tcl shell

    with access to Oracle. It is a shareware software in which Oracle SQL scripts and anonymous PL/SQL blocks can be

    called. Apart from handling all the procedural constructs, oraTcl can also execute C programs.

    Oracle Enterprise Manager (OEM) can become a great vehicle to implement warehouse systems. Apart from regular

    database management, OEM has capabilities to call and execute tcl programs. It has all the scheduling capabilities

    that are needed in an operational warehouse environment. It has event handling capabilities to process

    errors/alerts/warnings during the job executions.

  • 8/4/2019 Building a Data Warehouse - Using Oracle Olap Tools

    12/14

    Building a Data Warehouse ACTA Journal, September 1997

    ?-12 Using Oracle OLAP Tools

    The metadata management and other warehouse tools on the market besides very high costs concentrate more on

    data feeds to warehouses but lack serious functionality in the areas of data scrubbing and rollup. The warehouse

    metadata interface can be developed by using tcl toolkit (tcl/tk) which acts as an oraTcl extension and can be merged

    with the basic OEM interface. The OEM tool can thus be used both for doing both regular operational tasks and

    metadata management.

    6.3 Program Management Express Side

    6.3.1. Express Design Goals

    = User response times: Response times for various analytical tasks should be known up front so that the multi-dimensional database can be designed to optimize certain tasks.

    = Data loading from fact tables: The physical design of the database should be such that measures and changeddimensions can be loaded into Express database during an acceptable refresh cycle time. Loading times for

    simple dimension value changes and for more time consuming hierarchical changes should be acceptable to the

    end users.

    6.3.2. Program Modules

    With the above design goals in mind, program modules as shown in Figure 2 can be established.

    A brief discussion of each module follows:

    = DBA This module defines all the dimensions, measures and supporting data structures in Express database.Express 4GL scripts are used to create and maintain these objects. Since the Express front-end application tools

    are distributed in nature, all the data objects in the applications are physically present on the server side.

    = Security Because the data used for strategic analysis is used by different groups, it is necessary to providesecurity features in the end user interface such that various functional groups of users can look only at the data in

    their own area.

    = Bridge This is a part of configuration and setup of the new tools in the Express family called RelationalAccess Administrator (RAA) and Relational Access Manager (RAM). They work based on their own

    repositories in Oracle and Express databases, and manage data load from Oracle database (relational format) to

    Express database (multi-dimensional format) as well as on-line reach-through for the relational data that is not

    loaded in the multi-dimensional format.

    = Presentation There are two main choices for tool selection in this area: Oracle Sales Analyzer (OSA) andOracle Express Objects (OEO). OSA is a canned read-only application with predefined objects and many

    customization options. This tool can get the overall application ready in a short time and can satisfy majority of

    analytical requirements. OEO is a development environment in which customized read-write applications with

    user defined objects can be developed. Both the tools run as distributed applications with Express database.

  • 8/4/2019 Building a Data Warehouse - Using Oracle Olap Tools

    13/14

    ACTA Journal, September 1997 Building a Data Warehouse

    Using Oracle OLAP Tools ?-13

    GLOSSARY

    Multi-dimensional Database A database where data objects are represented as business measurements. The

    measurements are measured across multiple dimensions. Generally, this type of database is used to perform historical

    data analysis.

    OLAP On-Line Analytical Processing (OLAP) is a type of information processing application used for multi-dimensional analysis, such as trending, exceptions, patterns, ranking, etc. OLAP tools working against multi-

    dimensional database provide this functionality.

    Measure (Data Cube) A fact of business interest that can be measured across multiple other business objects. The

    measurements are generally summarization, rankings, averages, counts, etc. Data cubes are either physically present

    in multi-dimensional database (physical cubes) or are in structural form with no data (virtual cubes). The data is

    reached through from the relational side when a virtual cube is accessed by end users. Examples of measures are:

    sales volume, Customer satisfaction counts, etc.

    Dimension Dimensions are the business objects about which business measures are maintained. Examples of

    dimensions are: time, product, customer, etc.

    Dimension Levels And hierarchies Dimensions are stored in a hierarchical form with multiple levels. One

    dimension can have multiple hierarchies and each hierarchy can have multiple levels. For example, time dimensioncan have levels such as weeks, months, quarters, years; the time dimension can also have two hierarchies such as

    calendar year and fiscal year.

    Sparsity A data cube for certain measures may not have measured values for all possible combinations of the

    cubes dimensions. Such values are represented as no value cells in a data cube. A ratio of such no value cells to

    total number cells is called sparsity. A data cube with more number of dimensions can have more sparsity and may

    require a large amount of disk space during rollup.

    Conjoint Dimension If a combination of multiple dimensions produces a sparse cube then only valid combinations

    (that produce no no value cells) of these dimension values can be stored in a common dimension called conjoint.

    Conjoint dimensions create 100% dense data cubes saving disk space but they can be cumbersome to maintain

    especially with changing hierarchies and large number of values.

    Star Schema A multi-dimensional data model can be represented as a central fact table connected to multipledimension tables around it. This structure resembles a star and is called as star schema.

    DCE Distributed Computing Environment (DCE) is a standard used to write distributed applications in open

    systems so that applications can be used in a heterogeneous operating system and network environments.

    Metadata Data about data generally stored in a separate repository. It is mandatory to keep this data consistent so

    that overall warehouse system operations run smoothly. Generally, metadata is a set of complex structures which

    keep information about data sources, various process states, error handling, data statistics, etc.

  • 8/4/2019 Building a Data Warehouse - Using Oracle Olap Tools

    14/14

    Building a Data Warehouse ACTA Journal, September 1997

    ?-14 Using Oracle OLAP Tools