CH1 Data Warehouse Design

Embed Size (px)

Citation preview

  • 8/12/2019 CH1 Data Warehouse Design

    1/33

    Copyright 2005, Oracle. All rights reserved.

    Data Warehouse Design

  • 8/12/2019 CH1 Data Warehouse Design

    2/33

    1-2 Copyright 2005, Oracle. All rights reserved.

    Objectives

    After completing this lesson, you should be able to do

    the following:

    Differentiate OLTP and data warehousing design

    techniques Describe effective data warehouse design

    Identify data warehousing schemas

    Explain implementation models

    List data warehousing objects

  • 8/12/2019 CH1 Data Warehouse Design

    3/33

    1-3 Copyright 2005, Oracle. All rights reserved.

    Characteristics of a Data Warehouse

    A data warehouse is a database designed for

    querying, reporting, and analysis.

    A data warehouse contains historical data derived

    from transaction data. Data warehouses separate analysis workload from

    transaction workload.

    A data warehouse is primarily

    an analytical tool.

  • 8/12/2019 CH1 Data Warehouse Design

    4/33

    1-4 Copyright 2005, Oracle. All rights reserved.

    Comparing OLTP and Data Warehouses

    OLTP

    Many

    Comparativelylower

    NormalizedDBMS

    Rare

    Some

    Largeamount

    DenormalizedDBMS

    Common

    DataWarehouse

    Data accessed

    by queries

    Joins

    Duplicated

    data

    Derived dataand

    aggregates

  • 8/12/2019 CH1 Data Warehouse Design

    5/33

    1-6 Copyright 2005, Oracle. All rights reserved.

    Data Warehouse Architectures

    Basic Data

    Warehouse

    Analysis

    Reporting

    Data mining

    Operational

    systems

    Flat files

    Materialized

    views

    Metadata

    Raw data

  • 8/12/2019 CH1 Data Warehouse Design

    6/33

    1-7 Copyright 2005, Oracle. All rights reserved.

    Data Warehouse Architectures

    Data Warehouse

    with Staging Area

    Analysis

    Reporting

    Data miningFlat files

    Materialized

    views

    Metadata

    Raw data

    Operationalsystems

    Stagingarea

  • 8/12/2019 CH1 Data Warehouse Design

    7/331-8 Copyright 2005, Oracle. All rights reserved.

    Data Warehouse Architectures

    Data Warehouse

    with Staging Area

    Reporting

    Data miningFlat files

    Materialized

    views

    Metadata

    Rawdata

    Operationalsystems

    Stagingarea

    Sales

    Purchasing

    Inventory

    Analysis

  • 8/12/2019 CH1 Data Warehouse Design

    8/331-9 Copyright 2005, Oracle. All rights reserved.

    Data Warehouse Design

    Key data warehouse design considerations:

    Identify the specific data content.

    Recognize the critical relationships within and

    between groups of data. Define the system environment

    supporting your data warehouse.

    Identify the required data

    transformations.

    Calculate the frequency at which

    the data must be refreshed.

  • 8/12/2019 CH1 Data Warehouse Design

    9/331-10 Copyright 2005, Oracle. All rights reserved.

    Logical Design

    A logical design is conceptual and

    abstract.

    Entity-relationship (ER) modeling

    is useful in identifying logicalinformation requirements.

    An enti tyrepresents a chunk of data.

    The properties of entities are known as attr ibutes.

    The links between entities and attributes are known

    as re lat ionships.

    Dimensional modeling is a specialized

    type of ER modeling useful in data warehouse

    design.

  • 8/12/2019 CH1 Data Warehouse Design

    10/331-12 Copyright 2005, Oracle. All rights reserved.

    Oracle Warehouse Builder

    Oracle Database provides tools to implement the

    ETL process.

    Oracle Warehouse Builder is a tool to help in this

    process.

    Oracle Warehouse Builder generates the following

    types of code:

    SQL data definition language (DDL) scripts

    PL/SQL programs

    SQL*Loader control files

    XML Processing Description Language (XPDL)

    ABAP code (used to extract data from SAP

    systems)

  • 8/12/2019 CH1 Data Warehouse Design

    11/331-13 Copyright 2005, Oracle. All rights reserved.

    Data Warehousing Schemas

    Objects can be arranged in data warehousing

    schema models in a variety of ways:

    Star schema

    Snowflake schema

    Third normal form (3NF) schema

    Hybrid schemas

    The source data model and user

    requirements should steer the data

    warehouse schema.

    Implementation of the logical model may require

    changes to enable you to adapt it to your physical

    system.

  • 8/12/2019 CH1 Data Warehouse Design

    12/331-14 Copyright 2005, Oracle. All rights reserved.

    Schema Characteristics

    Star schema

    Characterized by one or more large fact tables and

    a number of much smaller dimension tables

    Each dimension table joined to the fact table using

    a primary key to foreign key join

    Snowflake schema

    Dimension data grouped into multiple tables

    instead of one large table

    Increased number of dimension tables, requiringmore foreign key joins

    Third normal form (3NF) schema

    A classical relational-database model that

    minimizes data redundancy through normalization

  • 8/12/2019 CH1 Data Warehouse Design

    13/331-16 Copyright 2005, Oracle. All rights reserved.

    Data Warehousing Objects

    Fact tables

    Fact tables are the large tables that store business

    measurements.

    Dimension tables A dimension is a structure composed of one or

    more hierarchies that categorizes data.

    Unique identifiers are specified for one distinct

    record in a dimension table.

    Relationships Relationships guarantee

    integrity of business

    information.

  • 8/12/2019 CH1 Data Warehouse Design

    14/331-17 Copyright 2005, Oracle. All rights reserved.

    Fact Tables

    A fact table must be defined for each star schema.

    Fact tables are the large tables that store business

    measurements.

    A fact table contains either detail-level oraggregated facts.

    A fact table usually contains facts with the same

    level of aggregation.

    The primary key of the fact table is

    usually a composite key made up

    of all its foreign keys.

  • 8/12/2019 CH1 Data Warehouse Design

    15/331-18 Copyright 2005, Oracle. All rights reserved.

    Dimensions and Hierarchies

    A dimension is a structure

    composed of one or more

    hierarchies that categorizes data.

    Dimensional attributes help to

    describe the dimensional value.

    Dimension data is collected at the

    lowest level of detail and aggregated

    into higher level totals.

    Hierarchies are structures that useordered levels to organize data.

    In a hierarchy, each level is

    connected to the levels above and

    below it.

    STATE

    COUNTRY

    SUBREGION

    REGION

    CUSTOMERSdimens ion

    hierarch y (by level)

    CITY

    CUSTOMER

  • 8/12/2019 CH1 Data Warehouse Design

    16/331-19 Copyright 2005, Oracle. All rights reserved.

    Dimensions and Hierarchies

    Dimension table Dimension table

    TIMES CHANNELS

    CUSTOMERS#cust_idcust_last_name

    cust_city

    cust_state_province

    PRODUCTS#prod_id

    Fact table

    PROMOTIONS

    Dimension table

    SALEScust_idprod_id Hierarchy

    Unique identifier

    Relationship

  • 8/12/2019 CH1 Data Warehouse Design

    17/331-20 Copyright 2005, Oracle. All rights reserved.

    Physical Design

    Relationships

    Uniqueidentifiers

    Attributes

    Entities Tables

    Integrityconstraints

    - Primary key- Foreign key- Not null

    Columns

    Indexes

    Materializedviews

    Dimensions

    Logical Physical (Tablespaces)

  • 8/12/2019 CH1 Data Warehouse Design

    18/331-21 Copyright 2005, Oracle. All rights reserved.

    Data Warehouse Physical Structures

    Tables and partitioned tables

    Partitioned tables enable you to split

    large data volumes into smaller,

    more manageable pieces.

    Expect performance benefits from:

    Partition pruning

    Intelligent parallel processing

    Compressed tables offer scaleup opportunities for

    read-only operations.

    Table compression saves disk space.

  • 8/12/2019 CH1 Data Warehouse Design

    19/331-22 Copyright 2005, Oracle. All rights reserved.

    Data Warehouse Physical Structures

    Views:

    Are tailored presentations of data contained in oneor more tables or views

    Do not require any space in the database

    Materialized views:

    Are query results that have been stored in advance

    (Like indexes) are used transparently and improveperformance

    Integrity constraints: Are used in data warehouses for query rewrite

    Dimensions:

    Are containers of logical relationships and do notrequire any space in the database

  • 8/12/2019 CH1 Data Warehouse Design

    20/331-23 Copyright 2005, Oracle. All rights reserved.

    Managing Large Volumes of Data

    Work smarterin your data warehouse:

    Partitioning

    Bitmap indexes/Star transformation

    Data compression

    Query rewrite

    Work harderin your data warehouse:

    Parallelism for all operations

    DBA tasks, such as loading, index creation, tablecreation, data modification, backup and recovery

    End-user operations, such as queries

    Unbounded scalability: Real Application Clusters

  • 8/12/2019 CH1 Data Warehouse Design

    21/331-24 Copyright 2005, Oracle. All rights reserved.

    I/O Performance in Data Warehouses

    I/O is typically the primary determinant of data

    warehouse performance.

    Data warehouse storage configurations should be

    chosen by I/O bandwidth, not storage capacity.

    Every component of the I/O

    subsystem should provide

    enough bandwidth:

    Disks

    I/O channels

    I/O adapters

    In data warehouses, maximizing

    sequential I/O throughput is critical.

  • 8/12/2019 CH1 Data Warehouse Design

    22/331-25 Copyright 2005, Oracle. All rights reserved.

    Performance of Sequential I/Os

    In data warehouses, drive arrays generally seerandom large I/Os (1 MB) spread across thedevices.

    This is known as multiuser sequential workload.

    The host operating system, device drivers, orstorage array may fracture large I/Os into smallerI/Os.

    It is common in default Linux configurations to

    fracture large I/Os into smaller ones (up to 32 KB). This level of I/O fracturing can have a disastrous

    effect on the total throughput.

    The implementation of query rewrite has a positiveeffect on minimizing I/O requests.

  • 8/12/2019 CH1 Data Warehouse Design

    23/331-26 Copyright 2005, Oracle. All rights reserved.

    SELECT sum(sales_amount)FROM salesWHERE sales_dateBETWEEN 01-MAR-2005 AND 31-MAY-2005;

    Minimizing I/O Requests

    Only the relevant partitions are accessed.

    Optimizer knows or finds the relevant

    partitions.

    Static pruning uses known values in advance.

    Dynamic pruning uses internal recursive SQL

    to find the relevant partitions.

    It provides order of magnitude performance

    gains.

    Part i t ion p runing

    SALES

    2005-JAN

    2005-FEB

    2005-MAR

    2005-APR

    2005-MAY

    2005-JUN

  • 8/12/2019 CH1 Data Warehouse Design

    24/331-27 Copyright 2005, Oracle. All rights reserved.

    Minimizing I/O Requests

    Bitmap indexes are usually 3 to 20 times

    smaller than B-tree indexes.

    They are ideal for set-based operations.

    Star transformation uses bitmap indexes to

    identify base table records of interest.

    Full table access is replaced with bitmap

    index access.

    Bitmap indexes minimize I/O.

    Bitmap indexes

  • 8/12/2019 CH1 Data Warehouse Design

    25/33

  • 8/12/2019 CH1 Data Warehouse Design

    26/33

    1-30 Copyright 2005, Oracle. All rights reserved.

    I/O Scalability

    Reduces response time for data-intensive operationson large databases

    Benefits systems with the following characteristics:

    Multiprocessors, clusters, or massively parallel systems

    Sufficient I/O bandwidth

    Sufficient memory to support memory-intensiveprocesses such as sorts, hashing, and I/O buffers

    Data on disk

    Query serversCoordinator

    Dispatchwork

    Sort Q4

    Sorters (Aggregators)Scanners

    Paral lel execu tion:

    Sort Q3

    Sort Q2

    Sort Q1Scan

    Scan

    Scan

    Scan

  • 8/12/2019 CH1 Data Warehouse Design

    27/33

    1-31 Copyright 2005, Oracle. All rights reserved.

    I/O Scalability

    Au tom atic Storage Management (ASM)

    Configuring storage for a DB depends on many

    variables:

    Which data to put on which disk Logical unit number (LUN) configurations

    DB types and workloads; data warehouse, OLTP,

    DSS

    Trade-offs between available options

    ASM provides solutions to storage issuesencountered in data warehouses.

  • 8/12/2019 CH1 Data Warehouse Design

    28/33

    1-32 Copyright 2005, Oracle. All rights reserved.

    I/O Scalability

    Au tom atic Storage Management: Overview

    Portable and high-performance

    cluster file system

    Manages Oracle database files Data spread across disks

    to balance load

    Integrated mirroring across

    disks

    Solves many storage

    management challenges

    ASM

    File

    system

    Volume

    manager

    Operating system

    Application

    Database

  • 8/12/2019 CH1 Data Warehouse Design

    29/33

    1-33 Copyright 2005, Oracle. All rights reserved.

    I/O Scalability

    ASM benefi ts

    Stripes files rather thanlogical volumes

    Online disk reconfigurationand dynamic rebalancing

    Provides redundancy on afile basis

    Automatic database filemanagement

    EM-based graphicalmanagement interface

    Hot spots and manual I/Otuning eliminated

  • 8/12/2019 CH1 Data Warehouse Design

    30/33

    1-34 Copyright 2005, Oracle. All rights reserved.

    I/O Scalability

    Real App l icat ion Clusters

    Real Application Clusters (RAC) provides linear

    scalability and availability for data warehouses.

    RAC provides redundancy so that if a node goesdown, the other nodes will continue to execute.

    RAC nodes can share all work equally or perform

    dedicated tasks such as ETL or query processing.

  • 8/12/2019 CH1 Data Warehouse Design

    31/33

    1-35 Copyright 2005, Oracle. All rights reserved.

    Typical Data Warehouse Cluster

    16-port switch

    16-port switch

    1 Gigabit Ethernet interconnects

    Sixteen storage arrays,

    each with 1020 disks

    Four nodes, each

    with four 2 GHz

    CPUs

  • 8/12/2019 CH1 Data Warehouse Design

    32/33

    1-36 Copyright 2005, Oracle. All rights reserved.

    Parallel Execution with RAC

    Execution slaves have node affinity with the execution

    coordinator, but will expand if needed.

    Executioncoordinator

    Parallel

    execution

    server

    Shared disks

    Node 4Node 1 Node 2 Node 3

  • 8/12/2019 CH1 Data Warehouse Design

    33/33

    Summary

    In this lesson, you should have learned how to:

    Differentiate OLTP and data warehousing design

    techniques

    Describe effective data warehouse design Identify data warehousing schemas

    Explain implementation models

    List data warehousing objects