40

Data Warehousing with - download.oracle.com with Oracle 10g.pdfOracle Warehouse Builder • Enterprise Business Intelligence integration design tool that manages the full lifecycle

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

  • Data Warehousing with Oracle Database 10g

  • 22

    Boris GurovSenior Sales ConsultantOracle ECE Ltd.Branch Bulgaria

  • Agenda

    • Oracle Warehouse Builder• Oracle Database 10g for Data Warehousing

    • Ensure a well-tuned I/O subsystem• Find a schema balance• Init.ora settings

    • Summary & Close

  • Oracle Warehouse BuilderData structure designData structure design

    Capture sourceCapture sourcedefinitionsdefinitions

    ETL designETL design

    Generate codeGenerate code

    DeployDeploy

    Extract and Transform DataExtract and Transform DataExtract and Transform Data

    Validate designValidate design

  • Source Support

    • Relational:• Oracle• IBM DB2• SQL Server• Sybase• etc. including ODBC

    • Files• Applications

    • SAP

  • Target Support

    • Oracle 9.2 (tables and queues)• Oracle 10g (tables and queues)• Flat files (Oracle database is transformation

    engine)

  • Design

    • Data structures:• Dimensional• Relational

    • ETL processes:• Data flows• Process flows

    • End user access

  • Dimensional Design

  • ETL Design (Data Flows)

  • ETL Design (Process Flows)

  • Code Generation

    • Data Definition Language (DDL)• OLAP metadata creation• Optimized PL/SQL code• SQL Loader control files• ABAP code• XML Process Definition Language (XPDL)

  • Data Quality in OWB

    • Data Quality functionalities are integrated into ETL processes

    • Disciplined approach to Data Quality, not an afterthought

    • Data Quality is modeled, executed and audited just like any other transformation

    • Currently consists of• Name and Address Cleansing• Match-Merge

  • Name and Address Cleansing• Transformations are:

    • Parsing• Standardization• Correction• Augmentation

  • Oracle Warehouse Builder

    • Enterprise Business Intelligence integration design tool that manages the full lifecycle of data and metadata for Oracle database(s)

  • Agenda

    • Oracle Warehouse Builder• Oracle Database 10g for Data Warehousing

    • Ensure a well-tuned I/O subsystem• Find a schema balance• Init.ora settings

    • Summary & Close

  • Data Warehousing Applications

    • Even after decades of innovation, a computer ‘still’consists of three main components

    – CPU provides the computing power– Memory stores the transient data for computational operations– Disks (I/O) store the persistent information

    • Getting the best performance is finding the right balance of all these components and use them optimally

    – Size your system appropriately– Design your database appropriately– Use the database appropriately

  • Evolution of a 180 GB database

    1993

    1 GB disk2 MB/sec

    50 IO/sec20 ms seek

    3.600 rpm

    2000

    72 GB disk40 MB/sec

    160 IO/sec6 ms seek

    10.000 rpm

    2002

    180 GB disk30 MB/sec

    120 IO/sec8 ms seek

    7.500 rpm

    180 disks 50 disks 3 disks 1 disk360 MB/sec 300 MB/sec 120 MB/sec 30 MB/sec

    9000 IO/sec 3500 IO/sec 480 IO/sec 120 IO/sec

    Note: equivalent I/O Bus is necessary

    Single disk

    DB system

    180 GB

    1996

    4 GB disk6 MB/sec

    70 IO/sec14 ms seek

    7.200 rpm

    I/O - The ‘disk dilemma‘

  • I/O – Unlimited Scalability

    • Use parallelism to enable single process scalability

    • Unrestricted parallelism• No data layout requirement or restriction (shared

    nothing systems) • All operations can be parallelized

    ����������� ���������

    scan

    scan

    scan

    sort A-K

    sort L-S

    sort T-Z

    Dispatch work

    ������� �����������������

    ���������

    Parallel Execution

  • DOP 2

    DOP 2

    DOP 4

    DOP 4

    DOP 4

    DOP 4

    DOP 8

    DOP 8

    DOP 8

    DOP 8

    DOP 8

    DOP 8

    DOP 8

    DOP 8

    DOP 8

    DOP 8

    DOP 8

    DOP 8

    DOP 8

    DOP 8

    DOP 8

    DOP 8

    I/O – Parallel Execution

    • I/O bandwidth requirement increases with single process parallelism and multi-user concurrency

    • Plan for your system’s expected I/O throughput based on average concurrent users and parallelism

  • • Oracle can read 300+MB/sec per GHz/CPU power

    • Direct Read, multi-block IO, e.g, parallel full table scan

    • An ‘average’ DW system should plan for a minimum of 100MB/sec per GHz/CPU

    • Typical mixture of IO and CPU intensive operations

    • Ball park number, adjust accordingly

    I/O – Unlimited ScalabilitySizing Guidelines

  • • Partition Pruning• Only touch the relevant data

    • Star transformation• Bitmap index access instead of full data access

    • Materialized Views• ‘Index’ your business questions, not only the data

    • Table Compression• Store data more efficiently

    • Prioritize requests accordingly

    Sample Oracle FunctionalityI/O – Minimize Requests

  • 04-May

    04-Apr

    04-Feb

    04-Jan

    04-Mar

    04-Jun

    Sales

    SELECT sum(sales_amount) FROM salesWHERE sales_date BETWEEN ‘01-MAR-2004’ AND ‘31-MAY-2004’;

    I/O – Partition PruningPartition Pruning

    • Only relevant partitions will be accessed• Static pruning with known values in

    advance• Dynamic pruning uses internal recursive

    SQL to find the relevant partitions

    • Minimizes I/O operations• Provides massive performance gains

  • I/O – Bitmap IndexingBitmap Indexes

    • Bitmap Indexes are usually 3 to 20 times smaller than b-tree indexes

    • Patented compressed storage• Ideal for set-based operations

    • Star transformation uses bitmap indexes to identify base table records of interest

    • Replaces full table access with bitmap index access

    • Minimizes necessary I/O

  • Average SalesAverage Salesby Regionby Region

    Quarterly Salesby ProductMonthly Sales

    by Region

    Query

    What were the sales in the West and South regionsfor the past three Quarters?

    DetailDetail

    QueryRewrite

    Monthly Salesby Region

    I/O – Materialized ViewsMaterialized Views

    A simple rollup Month -> Quarterprovides unprecedented gain on performance and minimal I/O

  • I/O – Table Compression

    �Tables can be compressed –Compression can also be specified at the partition level

    –Indexes are not compressed

    �Typical compression ratios range from 3:1 to 5:1–Compression is dependent upon the actual data

    –Compression algorithm based on removing data redundancy

    �Key benefit is cost savings–Save TB’s of storage without compromising performance or functionality

  • Resource SchedulerResource Scheduler

    High PriorityHigh Priority

    Medium PriorityMedium Priority

    Low PriorityLow Priority

    Sales Analysis

    AdHoc Reports

    ETL Jobs

    20 users DOP 10

    I/O – Prioritize ResourcesDatabase Resource Manager• Protect the system proactively

    • Maximum number of concurrent operations• Maximum degree of parallelism for a given priority group

    • Subset of Database Resource Manager functionality

    200 users DOP 4

    5 users DOP 6

  • Schema – which way to go?

    � Innovative use of bitmap indexes and bitmap join indexes

    • Support for Complex Star Schemas– Large numbers of dimensions– Multiple fact tables– Snowflake schemas

    • Sophisticated partition pruning• Parallel execution

    Star Schema

  • Schema – which way to go?

    CUSTOMER_ORDERS CUSTOMER_ORDER_PRODUCTS

    ............

    Jan

    Feb

    Mar

    Apr

    ............

    Jan

    Feb

    Mar

    Apr

    Both tables are partitioned by composite range-hash. The tables are partitioned on ORDER_DATE for the range dimension and CUSTOMER_ID for the hash dimension.

  • Schema – which way to go?

    CUSTOMER_ORDERS CUSTOMER_ORDER_PRODUCTS

    ............

    Jan

    Feb

    Mar

    Apr

    ............

    Jan

    Feb

    Mar

    Apr

    Suppose a query examines the orders and products for January and February. First, Oracle can do partition-pruning with the range partitions.

  • Schema – which way to go?

    CUSTOMER_ORDERS CUSTOMER_ORDER_PRODUCTS

    ............

    Jan

    Feb

    Mar

    Apr

    ............

    Jan

    Feb

    Mar

    Apr

    Jan

    Jan

    Second, Oracle will do a partition-wise join on the range partitions.

  • Schema – which way to go?

    CUSTOMER_ORDERS CUSTOMER_ORDER_PRODUCTS

    ............

    Jan

    Feb

    Mar

    Apr

    ............

    Jan

    Feb

    Mar

    Apr

    Jan

    Jan

    Jan, Hash 1

    Jan, Hash 2

    Jan, Hash 3

    Jan, Hash 4

    Third, Oracle will do a partition-wise join on the hash sub-partitions.

  • • Star schema• Range-partition fact tables by time• Bitmap indexes on dimension-key columns of fact table• ‘Star transformation’ for end-user queries

    • 3NF or normalized schema• Composite range-hash partitioning on large tables• ‘Partition-wise’ joins and parallel execution are key performance

    enabler for joining large tables

    • Hybrid environments• Use both dogmas concurrently in the same system without

    affecting each other

    Schema – which way to go?

    Choose what fits your needs best!Oracle provides optimizations for any kind of setup

    Oracle‘s functionality

  • Init.ora Settings

    • Do not de-tune Oracle• Very often, our performance engineers are getting

    improvements just by removing parameters• Results can be poor optimizer plans, wasted

    memory, and serialization points• Trust Oracle

    • Don’t try and second guess the software• With the exception of buffer and subject area related

    parameters, the system defaults are usually optimum

    Lessons learned from History

  • Init.ora – less is more

    • Ensure that data warehouse relevant parameters are set

    • Not all parameters are enabled by default in older database releases prior to Oracle10g

    • Size and set buffer and memory related parameters

    • Two parameters are enough

    • Do not touch other parameters unless necessary

    Basic Rules

  • Init.ora – less is more

    0

    1,000

    2,000

    3,000

    4,000

    5,000

    6,000

    1 2 4 6 8 10 12 16 20

    Users

    Mem

    ory

    Usa

    ge (

    MB

    )

    7.5 MB15 MBAuto

  • Summary

    • Data Warehousing• ‘just a special kind of application’

    • Size for I/O throughput• not for disk capacity

    • Design according your needs using the appropriate model, not the other way around

    • Init.ora settings• Less is more

  • For More Information

    and

    oracle.com/datawarehousing

    otn.oracle.com/datawarehousing