Upload
dwight-cook
View
225
Download
0
Tags:
Embed Size (px)
Citation preview
NLS/IITB/DWH 1
Data Warehouse : Design and Lifecycle
N. L. Sarda
Professor, IIT Bombay
NLS/IITB/DWH 2
Outline
• Introduction• Warehouse structure• A case study• Lifecycle for development• Dimensional analysis• Technical architecture• Conclusions
NLS/IITB/DWH 3
Introduction
• DW is a single, complete and consistent store of data from different sources to understand & analyze the business
• Contains history data• Typical decision support requires data to be co-
related, aggregated in an interactive manner• Warehouse to facilitate browsing, navigating,
aggregating and visualization of related data to understand performance, problems, customer preferences, trends, etc.
NLS/IITB/DWH 4
Introduction...
• Conventional MIS/reporting applications lacked interactivity and flexibility
• Warehouse data organized by important business subjects (customer, product, etc…)
NLS/IITB/DWH 5
Warehouse Structure
• Organized to facilitate ease of access and aggregation
• warehouse structure decomposed into dimensions and facts– Dimensions like ‘independent variables’, represent
entities for analysis
– Fact represents business data; relates to a set of dimensions
– Eg : customer, time, type of account are dimensions, and balances are facts
NLS/IITB/DWH 6
Warehouse Structure...
• The complex network of business entities and their relationships as depicted in an operational DB (using, say, ER model) is difficult for navigation and analysis
• A ‘2-level’ structure defined by ‘star schema’ is performed where a fact is at the center and dimensions form ‘spokes’
• Data not stored in ‘normalized’ form
NLS/IITB/DWH 7
Star Schema
• Contains a fact table and for each dimension one dimension table
Time Prod
Cust
fact
date, custno, prodno, cityname, ...
City
NLS/IITB/DWH 8
Dimensions
• Stored as a database table• Contains many descriptive attributes for analysis• Small and slowly changing data• Data often group-able for analysis
– Customers by age, occupation, income level
– Time by weeks, months, years
– Branches as rural, suburban or by size
• Thus, dimension data viewable as a hierarchy• For analysis, data here joined with facts
NLS/IITB/DWH 9
Dimensions...
• Joins very frequent; efficient access to dimension (through multiple indexes) and computation of join required
• Heavily used in constraints and GROUP-BY
NLS/IITB/DWH 10
Facts
• Contain business activity data• May be at detailed level or status level; called
transaction-oriented or snap-shot oriented• Deciding on granularity : every sale or total sales
of a day ?• Often contain numeric attributes for aggregation
(additive, semi-additive,…)• Contain dimensional table keys also
NLS/IITB/DWH 11
Snowflake Schema
• Hierarchies not captured explicitly in a star schema
• Snowflake schema represents hierarchy directly• Saves on storage but requires more join
NLS/IITB/DWH 12
Snowflake Schema
• Represent dimensional hierarchy directly by normalizing tables.
T ime
prod
cust
city
fact
date, custno, prodno, cityname, ...
region
NLS/IITB/DWH 13
Fact Constellation
• Fact Constellation– Multiple fact tables that share many dimension tables
– Booking and Checkout may share many dimension tables in the hotel industry
Hotels
Travel Agents
Promotion
Room Type
Customer
Booking
Checkout
NLS/IITB/DWH 14
Data Mart
• A subset of warehouse for use by individuals or departments
• Contents may be differently structured; may contain limited history; may be coarser / aggregated
• Lightens load on central warehouse• Users primarily use marts with OLAP tools for
analysis and decision support• refreshed periodically from central warehouse
NLS/IITB/DWH 15
Aggregates
• An aggregate is a fact table representing a summarization of base-level fact table data
• It is a pre-calculated summaries that are stored in the data warehouse to improve query performance
• Aggregates are used for speeding the queries by a factor of 100 or even 1000
• The IS owners of a data warehouse should exhaust the potential for using aggregates before investing in new hardware
NLS/IITB/DWH 16
Warehouse Architecture
• Building a single organization-wide WH that integrates all data from legacy systems is a very challenging task
• data marts are subject/dept-wise and easier to build
• multiple data marts must be relatable and inter-operable across depts or business areas
• Kimball proposes DW with a ‘bus architecture’; he proposes an architecture phase followed by construction of data marts independently and asynchronously
NLS/IITB/DWH 17
WH Architecture ...
• As marts come on-line, they fit with each other properly
• this approach natural in most cases as extraction of data for WH building is often source-wise and needs to be done independently
NLS/IITB/DWH 18
Conformed Dimensions and Facts
• Goal is to produce a master suite of conformed dimensions and to standardize facts
• resulting dimensions and facts for the ‘bus’• conformed dimension means same thing with
every fact table (eg., customer, time, geography)• it may contain data brought together from many
sources• without conformed dimensions, a WH cannot
function as a whole
NLS/IITB/DWH 19
WH Architecture ...
• Getting conformed dimensions represents 80 % up-front architecture effort
• rest for conformed facts that ensures same terminology across data marts so that ‘drill across’ can be done (eg, price, profit)
• ensures same units and meaning, same time durations and geographies across marts
NLS/IITB/DWH 20
WH Architecture ...
• Advantages of conformed dimensions – a single dimension table can be used against multiple
fact tables in the same WH
– user interfaces and data content are consistent whenever the dimension is used
– there is consistent interpretation of attributes and rollups across marts
– a new data mart can be created such that it can co-exist with other
• Use of conformed dimensions must be supported at the highest executive level
NLS/IITB/DWH 21
Financial Services : A Case Study
• A bank offers various products/services like saving/checking accounts, mortgage loans, personal loans, TD, credit cards, etc…
• Purpose : track various a/c, customer profiles, etc…, for marketing and offering new services
• Requirements:– Get end-of-month summary of a/c for last 5 years– Valid snapshot as of yesterday for current month (with
full details)– Ability to group a/c in various ways & compare
balances– demographic behavior
NLS/IITB/DWH 22
Case Study ...
• Each account type has some unique attributes (requiring customized dimension and facts for each)
• Old data (a/c & customers ) may be incomplete or even different
• The warehouse data may come from multiple sources :– Loan processing system(customer,loan,dues,payment)– Fixed deposit system(customer,TD,…)– Front-office system(customer, account, transaction,..)– Credit-card system customer, transactions, interest,..)
NLS/IITB/DWH 23
Case Study ...
• Must plan extraction, correlation, consistent representation,…
• Let us consider a possible warehouse design for the indicated requirements
• Core fact table : balance in each account, # of transactions, grain : month
• Dimensions : a/c, household, branch, product, status, time
• A/c and household separate : many accounts per family; household definitions change
NLS/IITB/DWH 24
Case Study ...
• Product dimension permits hierarchy and defining specific attributes; separate because it changes
• Status : active or not, closed, etc. with reasons• Account contains customer’s data; for historical
reasons, customer to accounts relationship not well maintained
NLS/IITB/DWH 25
account keyprimary_namesecondary_nameaccount_addressaccount_cityaccount_stateaccount_zipdate_openedprimary_ageprimary_sexprimary_marital
household keyhousehold_head_namehousehold_addresshousehold_cityhousehold_statehousehold_ziphousehold_incomehousehold_type
Household Facts
account_keyhousehold_keybranch_keyproduct_keystatus_keytime_keyprimary_balancetransaction_count
product keyproduct_descriptiontypecategory
time keymonthyearfiscal_quarter
status keystatus_descriptionstatus_reasonnew_account_flagclosed_account_flag
branch keybranch-namebranch_addressbranch_citybranch_statebranch_zipbranch_type
The household data warehouse
NLS/IITB/DWH 26
Case Study ...
• Balance is semi-additive : can not be added across time
• Products highly heterogeneous : different attributes characterize different accounts (balance, deposit options, interest rate, over draft limit,..)
• Can’t combine all in a dimension as many not applicable to all products
NLS/IITB/DWH 27
Case Study ...
• Solution: create many facts, customized for each product, and one core fact with a product dimension having common attributes; leads to 100% replication, but facilitates clarifications, browsing, etc. and avoids join of customized and core facts
• When many facts are to be stored together go for snapshot (eg. monthly) snapshots
NLS/IITB/DWH 28
Case Study ...
• Transaction-gained facts usually have a single fact (eg. amount) that is directly involved in the transaction; we need a transaction dimension to represent these amounts
• In transaction grained fact table, we do not need customized facts tables per product; instead we create customized dimension tables
NLS/IITB/DWH 29
BusinessRequirement
Definition
BusinessRequirement
Definition
TechnicalArchitecture
Design
TechnicalArchitecture
Design
ProductSelection &Installation
ProductSelection &Installation
DimensionalModeling
DimensionalModeling
Data StagingDesign &
Development
Data StagingDesign &
Development
End-UserApplication
Development
End-UserApplication
Development
Projectplanning
End-UserApplication
Specification
Project ManagementProject Management
Deploy-ment
Deploy-ment
Main-tenence &
Growth
Main-tenence &
Growth
PhysicalDesign
Data Warehouse Life Cycle
NLS/IITB/DWH 30
Life Cycle Phases
• Project planning– Life cycle begins with project planning and addresses
the scoping of the project
– focuses on resource and skill-level, staffing requirements, project task assignments, and duration
• Business requirements definition– success of the project depends on the sound
understanding of the business users and their requirements
– Data warehouse designers must understand the key factors driving the business requirement and translate them into design considerations
NLS/IITB/DWH 31
Phases ...
• Dimensional modeling– Dimensional model is performed by combining data
analysis with our earlier understanding of business requirements (represented as a matrix)
– this step identifies the fact table grain, associated dimensions, attributes and hierarchical drill paths, and facts
• Physical design– The primary elements in this phase are defining the
naming standards and setting up the database environment
– It focuses on defining the physical structures necessary to support the logical database design
NLS/IITB/DWH 32
Phases ...
• Data staging design and development– The data staging process has three major steps
– Extraction
• It exposes data quality issues within the operational system
– Transformation
• Consists of data re-structuring and type conversions (eg., form the EBCDIC character set to ASCII)
– Load
• Load the prepared data into the target tables
NLS/IITB/DWH 33
Phases ...
• Technical Architecture Design– It specifies the tools and techniques we will need to
make DW happen
• Product Selection and Installation– Architectural components such as Hardware
platforms, DBMS, and Data staging tools
• End user application specification– Application specification describe the report template,
user driven parameters, and required calculations.
• End user application Development
NLS/IITB/DWH 34
Phases ...
• Deployment– It is the convergence of technology, data, and end user
applications accessible from the business user’s desktop
– Business user education integrating all aspects of the convergence must be developed and delivered
• Maintenance and growth– Data warehouse acceptance and performance metrics
should be measured over time and the maintenance plan should include a communication strategy
– Prioritization processes must be established to deal with user demands for evolution and growth
NLS/IITB/DWH 35
Phases ...
• Project management– Project management ensures that the business
dimensional life cycle activities remain on track and synchronized
– these activities occurs throughout the life cycle
– It focuses on monitoring the project status, issue tracking, and change control to preserve scope
– It includes the development of a comprehensive project communication plan that addresses both the business and information system organization
• Use a good project management tool
NLS/IITB/DWH 36
Life Cycle : summary
• Project planning• Business requirements definition• Data track
– Dimensional modeling
– Physical design
– Data staging design and development
• Technology track– Technical architectural design
– Product selection and and installation
NLS/IITB/DWH 37
Life Cycle...
• Application track– End user application specification
– End user application development
• Deployment• Maintenance and growth• Project management
NLS/IITB/DWH 38
Assess Your Readiness
• Strong business management sponsors• Compelling business motivation• IS/Business partnership• Current analytic culture• Feasibility
NLS/IITB/DWH 39
Core Project Team
• Business system analyst• Data modeler• Data warehouse database administrator• Data staging system designer• End user application developers• Data warehouse educator
NLS/IITB/DWH 40
Special Teams
• Technical/security architect• Technical support specialists• Data staging programmer• Data administrator• Data warehouse quality assurance analyst
NLS/IITB/DWH 41
Develop the Project Plan
• Integrated and detailed• Resources• Original estimated effort• Start date• Original estimated completion date• Current estimated completion date• Status• Effort to complete• Dependencies• Late flags
NLS/IITB/DWH 42
Develop Communication Plan
• To manage expectations at all levels• within project team : share scope, plans, status• face-to-face communications with sponsors • Business user community : inform what is there
for them : capabilities, limitations, timeframes• Communication with other interested parties
– Executive management
– IS organization - to enable integration with existing and proposed systems
– Organization at large
NLS/IITB/DWH 43
Collecting Requirements
•
Business Business RequirementsRequirements
ProjectProjectPlanning &Planning &ManagemenManagemen
tt
MaintenancMaintenancee
and Growthand Growth
DeploymentDeploymentPlanningPlanning
End-UserEnd-UserApplicationApplicationSpecificatioSpecificatio
nn
Data StagingData StagingDesignDesign
PhysicalPhysicalDesignDesign
TechnicalTechnicalArchitectureArchitecture
DesignDesign
DimensionalDimensionalModelingModeling
NLS/IITB/DWH 44
Collecting Requirements...
• Interviews/write-ups• Requirements findings document
– Project overview
– review of business objectives
– analytic and information requirements
– preliminary source systems analysis
– Preliminary success criteria
• Prepare and publish the requirements • Agree on next step after collecting requirements• Facilitation for conforming and prioritization
NLS/IITB/DWH 45
Collecting Data about Existing Systems
• Understanding the candidate data sources• Source data ownership• Data providers• Detailed criteria for selecting the data sources
– Data accessibility– Longevity of the feed– Data accuracy– Project scheduling
• Customer matching and house-holding• Browsing and data content• Mapping data from source to target
NLS/IITB/DWH 46
Designing the Data Warehouse / Data Marts
• Identifying marts and dimensions• identify marts based on facts likely to be used
together, as a mart is a kind of subject area or application (divide-and-conquer strategy)
• often based on a single business process or a single source
• 10 to 30 marts common for a large organization• build a matrix of marts versus dimensions
NLS/IITB/DWH 47
Designing a Fact
• Choose a data mart : start with single source data marts
• Define fact grain based on the basic business facts stored in legacy systems
• Choose dimensions and match them with granularity of facts
• Combine as many facts as possible with the context of defined granularity
NLS/IITB/DWH 48
Detailed Design Tips
• Labels which name data marts, dimensions and attributes should be chosen carefully to refer to corresponding business entities
• An attribute (in a dimension) is not replicated, but a fact may be present in many fact tables
• If a dimension occurs multiple times (eg, time), it is playing multiple roles; name them uniquely
• A single field in the underlying source data can have one or more logical columns associated with it (eg, product having code, description, etc)
• Every fact should have a default aggregation rule so that it is not aggregated wrongly
NLS/IITB/DWH 49
Data Modeling Tool
• The advantages of data modeling tool are– Integrates the data warehouse model with other
corporate data model
– Helps assure consistency in naming
– Creates good documentation
– Generates physical schema
– Provides a reasonably intuitive user interface for entering comments about objects
NLS/IITB/DWH 50
Dimensional Modeling
• Strength of dimensional modeling– It is predictable and standard framework
– It makes the user interfaces more understandable and processing more efficient
– The predictable frame work of a dimensional model allows both database systems and end user query tools to make strong assumptions about the data that aid in presentation and performance
– It is gracefully extensible to accommodate unexpected new data elements and new design decisions
– Number of standard approaches for handling Common modeling situations in the business world
NLS/IITB/DWH 51
Dimension Attributes
• The quality of the data warehouse is measured by the quality of the dimension attributes
• The user interface responses and final reports are restricted to the precise contents of the dimension table attributes
• Properties– Verbose, descriptive, complete
– Quality assured, indexed
– Equally available, documented
NLS/IITB/DWH 52
Time Dimension
• Every data warehouse fact table is a time series of some observations
• We always seems to have one or more time dimensions in our fact table designs
• Provides useful hierarchies : week, month, quarter, year, etc
• Represents calendar with many useful attributes like day of week, day of month, week#, day#, quarter, weekday-flag, last-day-of-month-flag, holiday flag, etc.
NLS/IITB/DWH 53
Slowly Changing Dimensions
• The production key or customer key does not change, but the description of the product or customer does
• The data warehouse has three options for above changes– Overwrite the dimension record with the new values,
thereby losing history
• It is used whenever the old value of the attribute has no significance
• The corrections of any error falls into this category
NLS/IITB/DWH 54
Slowly Changing Dimensions...
– Create a new additional dimension record using a new value of the surrogate key
• is primary technique for accurately tracking a change in an attribute within a dimension
• requires use of a surrogate key
• a slowly changing dimension is used when a true physical change to the dimension entity has taken place
– Create an “old” field in the dimension record to store the immediate previous attribute value
• It is used when a change is tentative
NLS/IITB/DWH 55
Time Stamping the Changes
• The design of slowly changing dimension may be established by adding begin and end time stamps and a transaction description in each instance of a dimension record
• This design allows very precise time slicing of the dimension by itself
NLS/IITB/DWH 56
Large Dimensions
• Data warehouses that store extremely granular data may require some extremely large dimensions
• To support large dimensions we must choose the indexing technologies and data design approaches that:– supports rapid browsing of the unconditional
dimension, especially for low cardinality attributes
– Supports efficient browsing of cross-constrained values in the dimension table
– Find and suppress duplicate entries in the dimension
NLS/IITB/DWH 57
Foreign Key, Primary Key, Surrogate Key
• All dimensional tables have single keys, which, by definition, are primary keys
• All data warehouse keys must be meaningless surrogate keys; you must not use the original production keys
• A four byte integer makes a good surrogate key• Surrogate date keys• Avoid smart keys• Avoid production keys
NLS/IITB/DWH 58
Heterogeneous Product Schemas
• Multiple fact tables are needed when a business has heterogeneous products
• The global view needs a single core fact table crossing all lines of business, whereas local view focuses on specific product
• There are many attributes and facts which apply only to a specific product; a single fact table is not feasible
• create customized fact and (product) dimension table for each product, and build a core fact table with attributes that make sense across all lines of business; this allows to create a single portfolio (of products) for each customer
NLS/IITB/DWH 59
Transaction Schema
• Every data mart needs two separate models– Transaction version
– Periodic snapshot version
• ‘rolling’ snapshot containing averages across time
• Snapshots allow us to quickly measure the status of the enterprise
• The Transaction schema– low level transactions in the organization makes for a
good dimensional frame work
– The fact record for an individual transaction frequently contains only a single value
NLS/IITB/DWH 60
Transaction Schema..
• The transaction-based WH commonly used in– Time of day analysis
– Queue analysis
– Fraud detection
– Basket analysis
– Current status
NLS/IITB/DWH 61
Factless Fact Tables
• useful to describe events and their coverage• an event fact table records occurrence of an
event; has only flag and dimension keys (eg, student attendance)
• coverage fact table is frequently needed when a primary fact table in a dimensional data warehouse is sparse; eg, primary fact table will not provide items which were on promotion but did not sale; the coverage table, containing only dimension keys, lists all items on sale
NLS/IITB/DWH 62
Facts of Different Granularity
• The dimensional model gains power as the individual fact records become more and more atomic
• At the lowest level of individual transactions, the design is most powerful because– More of the descriptive attributes have single values
– The design withstands surprise in the form of new facts, new dimensions, or new attributes within existing dimensions
– More expressiveness at the lowest levels of granularity
NLS/IITB/DWH 63
Source System
DataStaging
Area
MetadataCatalog
MetadataCatalog
Dimensional Data Marts withOnly Aggregated Data
Dimensional Date MartsIncluding Atomic Data
Presentation Servers Desktop DataAccess Tools
ApplicationModels
Operational System
StandardReporting Tools
The Back Room The Front Room
Data ElementService Element
Service ElementKey
Technical Architecture
QueryServices
DataStaging
Services
NLS/IITB/DWH 64
The Technical Architecture...
• Data staging services– Extract
– Transformation
– Load
– Job control
• Query services
– Warehouse browsing
– Access and security
– Query management
– Standard reporting
– Activity monitor
It describes flow of data from the source systemsto the decision makers
NLS/IITB/DWH 65
Metadata Catalog
• It is an integral part of the overall architecture• It contains information that describes the
warehouse and plays an active role in its creation, use, and maintenance
• Contains source system metadata (data and processes), data staging metadata (dimensions, transformations, aggregations), DBMS metadata (tables, indexes, stored procedures), and front-room metadata (users, applications)
NLS/IITB/DWH 66
Technical Architecture Features
• Metadata driven– Metadata provides flexibility by buffering the various
components of the system from each other
– The metadata catalog provides parameters and information that allow the application to perform their task
• Flexible services layers– The data staging services and data query services add
to the flexibility of the architecture
NLS/IITB/DWH 67
Back Room : Data Staging Area
• It is the construction site for the Warehouse• The central role of the staging area is to evolve
the source system of record for all downstream DSS and reporting environment
• Data staging data models– The data models can be designed for performance and
ease for development
– Third normal form often appear in the data staging area because the source systems are duplicated
NLS/IITB/DWH 68
Data Staging Area...
• Atomic data marts hold the lowest level of necessary details to meet the most of the high value business requirements– Atomic data mart storage type should be relational
rather than OLAP because of extreme level of detail, the number of dimensions, and size
– Atomic data mart data model built around the dimensional model, not an ER model
NLS/IITB/DWH 69
Transformation Services
• It is a process of transforming the data from source systems into something presentable to the end users and valuable to the business
• Different transformation services :– Integration – Slowly Changing dimension maintenance– Referential integrity checking– Data type conversion– Aggregation– Data content audit– Pre- and post-step exits
NLS/IITB/DWH 70
Front Room Architecture
• It is the public face of the warehouse, the business users see and work with day-to-day
• The presentation servers are machines on which the data warehouse data is organized for direct querying by the end users and report writers
• The major types of activities here :– Warehouse or metadata browsing– Access and Security– Activity monitoring– Query management– Standard reporting
NLS/IITB/DWH 71
Warehouse Browsing
• Using the browsing tools to find and access the information needed by the user
• The warehouse browser should be dynamically linked to the metadata catalog
• It should be able to pull the definition and derivations of the various data elements and to show a set of standard reports
• Browsing tools– Visual Basic
– Microsoft Access, etc
NLS/IITB/DWH 72
Access and Security Services
• Access and security services facilitate a user’s connection to the data base
• It relies on authorization and authentication services where the user is identified and access rights are determined or access is refused
• Levels of authentication depends on how sensitive the data is
NLS/IITB/DWH 73
Activity Monitoring Services
• Capturing the information about the use of the data warehouse
• The capabilities are :– Performance
– User support
– Marketing
– Planning
NLS/IITB/DWH 74
Query Management Services
• Query management services are the set of capabilities that manage the execution of the query, and return of the result set to the desktop
• The major query management services are :– Query reformulation– Query re-targeting and multi-pass SQL– Aggregate awareness– Query Governing
NLS/IITB/DWH 75
Standard Reporting Services
• It has an ability to create a fixed-format report requiring limited user interaction, and regular execution schedules
• Requirements for standard reporting tools are :– Reporting developing environment– Report execution server– Time-and event-based scheduling of report execution– Iterative execution– Flexible report definition– Flexible report delivery– Report library with browsing capability
NLS/IITB/DWH 76
Back Room infrastructure factors
• Infrastructure for the data warehouse includes the hardware, network, and lower-level functions, such as security etc…
• The data base server is the biggest hardware platform decision for most data warehouse projects
NLS/IITB/DWH 77
Back Room Infrastructure Factors...
• The major factors in determining requirements for the server platforms are :– Data size
• Most data warehouse/data mart projects tend to start out with no more than 200 GB
• The data warehouse of less than 100 GB as small, those from 100 GB as typical, and those with more than 500 GB to be large
– Volatility
• It measures the dynamic nature of the database; it includes how often the data base will be updated, how much data is replaced each time
NLS/IITB/DWH 78
Back Room Infrastructure Factors...
– Number of users
• How active the users are, how many are active concurrently, and their geographical distribution etc. are important factors in selecting a platform
– Number of business processes
• It increases the complexity of the data warehouse
• Separate hardware platforms for each business process
– Nature of use
• It depends on the front-end tools, implication on platform selection, types of queries etc..
NLS/IITB/DWH 79
Technical Factors
• Platforms– NT servers for medium-sized warehouse
• The NT is cost-effective platform for smaller warehouses or data marts
– Open system servers
• The open system, or Unix, servers are the primary platform for most medium-sized or larger warehouse
• If the data warehouse is based on a Unix environment, the warehouse team will need to know administrative tools, basic Unix commands and utilities to be able to develop and manage the warehouse
NLS/IITB/DWH 80
Technical Factors...
• Disks– Disk drives can have a major impact on the
performance, flexibility, and scalability of the warehouse platform
• Memory– More memory is better for data warehousing
– Transaction requests are small and typically don’t need much memory, decision support queries requires more memory and involves large tables
– If the table can fit in memory the performance can improve 10 to 100 times
NLS/IITB/DWH 81
Technical Factors...
• Database platform– Data warehouses are implemented using main frame-
based database products
– Some data warehouses are implemented using a specialized multidimensional database products called MOLAP (multidimensional on-line analytical processing) engines
– MOLAP engines came about in response to three main user requirements: simple data access, cross-tab-style reports, fast response time
– The significant benefit of using a MOLAP engine is the end user query performance
NLS/IITB/DWH 82
Physical Design
• In the physical design, the data warehouse team is required to estimate the warehouse’s size
• In data warehouses, the size of dimension tables is insignificant compared to the size of the fact tables and the size of the indexes on the fact tables
NLS/IITB/DWH 83
Initial Sizing Estimates...
• preliminary sizing estimates include– Estimate row length
– Estimate number of rows
– Count and sizes of indexes
– Temp space
– space for metadata tables
– Considerable space for aggregate tables
NLS/IITB/DWH 84
Indexes and Query Strategies
• To develop an index plan, it’s important to understand how the RDBMS’s query optimizer and indexes work– The B-tree index
– The bitmapped index
– The hash index
– Other index types
– Star schema optimization
• Indexing the fact tables, Dimension tables, and indexing for loads
NLS/IITB/DWH 85
Natureof use
CustomerType
InformationInterface
Value
Strategic
Operational
Ad hocpower
user
Push-buttonknowledge
workers
Standardreport
consumers
Desktop tools fordo-it-yourself queries
Operationalreporting
environment
End UserApplication
Migrationpath
Migrationpath
Reporting/Analysis-ExamplesAssured reference points
-Low effort -Current business view -Flexible
End User Application
NLS/IITB/DWH 86
End User Application Template
• It provides the layout and structure of a report that is driven by a set of parameters
• This approach allows users to generate number of similar structure reports from a single template
• Through the drill-down capabilities, a user could produce reports on other attributes; this action results in changing the actual template structure
• Many data access tools provide this functionality transparently
NLS/IITB/DWH 87
Typical Analysis Cycle
• How’s business?• What are the trends?• What’s unusual?• What is driving those exceptions?• What if…?• Make a business decision• Implement the decision
NLS/IITB/DWH 88
The Desktop Installation Readiness
• The back room architecture and infrastructure will be established long before deployment as it is needed for development activities
• The technology residing on user’s desktop is the last piece that must be put in place prior to the deployment
NLS/IITB/DWH 89
The Desktop Installation Readiness...
• Check list of activities that should occur well before the deployment– Determine the client configuration requirement
– Determine LAN addresses
– Conduct a physical audit
– Complete the contract and procurement process
– Acquire user logons and security approval
– Test installation procedures on a variety of machines
– Schedule the installation
– Install the desktop hardware and/or software
– Complete installation testing
NLS/IITB/DWH 90
End User Education Strategy
• A robust education strategy for business end user is a prerequisite for data warehouse success
• Integrate and tailor education content• Education for business users must address three
key aspects of the data warehouse– Data content
– End user application
– The data access tool
NLS/IITB/DWH 91
The End User Education Strategy…
• Data education content– provide an overview of structures, hierarchies,
business rules, and definitions
– Before deployment, identify, document, and communicate these data to the business users
– Factors causing discrepancy between data from the warehouse and previously reported information are :
• The data warehouse information is incorrect
• The warehouse information has a different or new business definition or meaning
• The previously reported information was incorrect
NLS/IITB/DWH 92
An End User Support Strategy
• The user support strategies vary by organization and culture, based largely on the expectations of senior business management
• Determine the support organization structure– Centralized team of support resources handles the
more global data warehouse maintenance and responsibility
– The team typically serves as a second line of defense, and provides a pool of advanced application development resources
NLS/IITB/DWH 93
An End User Support Strategy...
• Establish support communication and feedback– Communication with your user should be minimum,
consisting of general information, and status updates
– Success stories can help motivate
• Provide support documentation• Create a Warehouse web site