38

Mercury Magazines

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Best Practices for a Data Warehouse on Oracle Database 11g Page 2

NOTE:

Best Practices for a Data Warehouse on Oracle Database 11g Page 3

Best Practices for a Data Warehouse on Oracle Database 11g Page 4

EXECUTIVE SUMMARY

INTRODUCTION

Best Practices for a Data Warehouse on Oracle Database 11g Page 5

BALANCED CONFIGURATION

DiskArray 1

DiskArray 2

DiskArray 3

DiskArray 4

DiskArray 5

DiskArray 6

DiskArray 7

DiskArray 8

FC-Switch1 FC-Switch2

HB

A1

HB

A2

HB

A1

HB

A2

HB

A1

HB

A2

HB

A1

HB

A2

Best Practices for a Data Warehouse on Oracle Database 11g Page 6

Interconnect

Best Practices for a Data Warehouse on Oracle Database 11g Page 7

Disk Layout

Tips for a balanced system

Total throughput = # cores X 200MB

Use 1 HBA port per CPU

Use 1 disk controller per HBA Port

Max of 10 physical disks per controller

Use more smaller drives (146GB or 300 GB)

Use minimum of 4GB of Memory per core

Use RAID 1 with ASM

Interconnect bandwidth = IO subsystem

bandwidth

Best Practices for a Data Warehouse on Oracle Database 11g Page 8

LUN A1 LUN A2 LUN A3 LUN A4

LUN A5 LUN A6 LUN A7 LUN A8

Disk12

Disk8

Disk16

Disk4

Disk11

Disk7

Disk15

Disk3

Disk10

Disk6

Disk14

Disk2

Disk9

Disk5

Disk13

Disk1

LUN A1 LUN A2LUN A2 LUN A3LUN A3 LUN A4LUN A4

LUN A5LUN A5 LUN A6LUN A6 LUN A7LUN A7 LUN A8LUN A8

Disk12Disk12

Disk8Disk8

Disk16Disk16

Disk4Disk4

Disk11Disk11

Disk7Disk7

Disk15Disk15

Disk3Disk3

Disk10Disk10

Disk6Disk6

Disk14Disk14

Disk2Disk2

Disk9Disk9

Disk5Disk5

Disk13Disk13

Disk1Disk1

ASM stripe

11 2 3 4

5 6 7 8

8

1 2 3 4

5 6 77

ASM Diskgroup

LUN A1LUN A1

LUN A2LUN A2

LUN A3LUN A3

LUN A4LUN A4

LUN A5LUN A5

LUN A6LUN A6

LUN A7LUN A7

LUN A8LUN A8

Best Practices for a Data Warehouse on Oracle Database 11g Page 9

LOGICAL MODEL

Best Practices for a Data Warehouse on Oracle Database 11g Page 10

PHYSICAL MODEL

Staging layer

Best Practices for a Data Warehouse on Oracle Database 11g Page 11

Efficient Data Loading

External Tables

CREATE TABLE ext_tab_for_sales_data (

Price NUMBER(6),

Quantity NUMBER(6),

Time_id DATE,

Cust_id NUMBER(12),

Prod_id NUMBER(12))

ORGANIZATION EXTERNAL

(TYPE oracle_loader

DEFAULT DIRECTORY admin

ACCESS PARAMETERS

Best Practices for a Data Warehouse on Oracle Database 11g Page 12

( RECORDS DELIMITED BY newline

BADFILE 'ulcase1.bad'

LOGFILE 'ulcase1.log'

FIELDS TERMINATED BY ","

(Price INTEGER EXTERNAL(6),

Qunantity INTEGER EXTERNAL(6),

Time_id DATE)

LOCATION (sales_data_for_january.dat))

REJECT LIMIT UNLIMITED;

Insert into Sales partition(p2)

Select * From ext_tab_for_sales_data;

Direct Path Load

Insert /*+ APPEND */ into Sales partition(p2)

Select * From ext_tab_for_sales_data;

ALTER SESSION ENABLE PARALLEL DML;

Best Practices for a Data Warehouse on Oracle Database 11g Page 13

Insert /*+ APPEND */ into Sales partition(p2)

Select * from ext_tab_for_sales_data;

Partition exchange loads

Sales Table

M ay 22nd 2008

M ay 23rd 2008

M ay 24th 2008

M ay 18th 2008

M ay 19th 2008

M ay 20th 2008

May 21st 20082. Use CTAS command to create non-partitioned table TMP_SALES

DBA

Tmp_ sales Table

1. Create external table for flat files

3. Create indexes

4. Gather Statistics

Tmp_ sales Table

Sales Table

M ay 22nd 2008

May 23rd 2008

May 24th 2008

May 18th 2008

May 19th 2008

May 20th 2008

M ay 21st 2008

5. Alter table Sales exchange partition May_24_2008 w ith table tmp_sales

Sales table now has all the data

Best Practices for a Data Warehouse on Oracle Database 11g Page 14

Alter table Sales exchange partition p2 with

table tmp_sales including indexes without

validation;

Data Compression

Foundation layer - Third Normal Form

Tips for the Staging Layer

Use external tables

Load using parallel DML stmts CTAS or IAS

Use data compression

Considering range partitioning fact table to

enable partition exchange loads

Best Practices for a Data Warehouse on Oracle Database 11g Page 15

Optimizing 3NF

Partitioning for manageability

Alter table <table_name> drop partition <part_name>

Partitioning for easier data access

Best Practices for a Data Warehouse on Oracle Database 11g Page 16

S elect sum (sale s_am ount)

From SALES

W here sales_d ate betw een

to_date(‘05/20/2008’,’M M /D D/YYYY’)

A nd

to_date(‘05/23/2008’,’M M /D D/YYYY’);

Q : W hat w as the to tal sales for the

w eekend of M ay 20 - 22 2008?

O nly the 3 relevant partitions are acce sse d

Sales Tab le

M ay 22M ay 22n dn d 20082008

M ay 23M ay 23rdrd 20 082008

M ay 24M ay 24thth 20082 008

Ma y 18 t h 2008

M ay 19M ay 19thth 20082 008

M ay 20M ay 20thth 20082 008

M ay 21M ay 21stst 2 008200 8

Partitioning for join performance

Best Practices for a Data Warehouse on Oracle Database 11g Page 17

Select sum(sales_amount)

From

SALES s, CUSTOMER c

Where s.cust_id = c.cust_id;

Both tables have the same degree of parallelism and are partitioned the same way on the join column (cust_id)

SalesSales

Range Range partition May partition May 1818thth 20082008

Sub part 2Sub part 2

Sub part 3Sub part 3

Sub part 4Sub part 4

Sub part 1

CustomerCustomer

Range Range partition May partition May 1818thth 20082008

Sub part 2Sub part 2

Sub part 3Sub part 3

Sub part 4Sub part 4

Sub part 1Sub part 1

Sub part 2Sub part 2

Sub part 3Sub part 3

Sub part 4Sub part 4

Sub part 1Sub part 1

Sub part 2Sub part 2

Sub part 3Sub part 3

Sub part 4Sub part 4

Sub part 1Sub part 1

A large join is divided into multiple smaller joins, each joins a pair of partitions in parallel

Best Practices for a Data Warehouse on Oracle Database 11g Page 18

H A S H

Q C (R A N D )

P QD is tr ib

1 2 8

1 2 8

1 2 8

P stoP stopp

S E L E C T ST A T E M E N T0

Q 1,001S a lesT A B L E A C C E S S F U L L1 0

Q 1,001C u s to m e rsT A B L E A C C E S S F U L L9

Q 1,00H A S H JO IN 8

Q 1,001P X P A R T IT IO N H A S H A L L7

Q 1,00S O R T G R O U P B Y 6

Q 1,00:T Q 1 00 00P X S E N D H A S H 5

Q 1,01P X R E C E IV E 4

Q 1,01S O R T G R O U P B Y 3

Q 1,01:T Q 1 00 01P X S E N D Q C (R A N D O M )2

P X C O O R D IN A T O R 1

T QP sta rP sta rtt

N a m e N a m e O p e ra tio nO p e ra tio nIDID

P a rt it io n H as h A ll a b o ve th e jo in &

s in g le P Q s e t in d ic ate p a rt it io n -w is e jo in

Best Practices for a Data Warehouse on Oracle Database 11g Page 19

Select sum(sales_amount)

From

SALES s, CUSTOMER c

Where s.cust_id = c.cust_id;

Only the Sales table is hash partitioned on the cust_id column

SalesSales

Range Range partition May partition May 1818thth 20082008

Sub part 2Sub part 2

Sub part 3Sub part 3

Sub part 4Sub part 4

Sub part 1Sub part 1

CustomerCustomer

Sub part 2Sub part 2

Sub part 3Sub part 3

Sub part 4Sub part 4

Sub part 1Sub part 1

Sub part 2Sub part 2

Sub part 3Sub part 3

Sub part 4Sub part 4

Sub part 1Sub part 1

Rows from customer are dynamically redistributed on the join key cust_id to enable partition-wise join

Access layer - Star Schema

Tips for the Foundation Layer

••••Partition fact tables to get partition pruning

••••Sub-partitions by hash to achieve partition

wise joins

••••Use parallel execution to allow multiple

process to work on large queries

••••Number of sub-partitions needs to be a

power of 2 and should be greater than or

equal to the DOP

Best Practices for a Data Warehouse on Oracle Database 11g Page 20

Select SUM(s.quanity_sold) total, p.product, t.month

From Sales s, Customers c, Products p, Tim es t

W here s.cust_id = c.cust_id

And s.prod_id = p.prod_id

And s.time_id = t.time_id

And c.cust_city = ‘BOSTON’

And p.product = ‘UMBRELLA’

And t.month = ‘MAY’

And t.year = 2008;

Q: W hat was the total number of umbrellas sold in Boston during

the month of May 2008 ?

Optimizing Star Queries

Best Practices for a Data Warehouse on Oracle Database 11g Page 21

Select SUM(s.quanity_sold), p.productFrom Sales s, Customers c, Products p,

Times t

Where s.cust_id = c.cust_id

And s.prod_id = p.prod_id

And s.time_id = t.time_id

And c.cust_city = ‘BOSTON’

And p.product = ‘UMBRELLA’

And t.month = ‘MAY’

And t.year = 2008;

Select SUM(quanity_sold)

From Sales s

Where s.cust_id IN

(Select c.cust_id From Customers c

Where c.cust_city = ‘BOSTON’)

And s.prod_id IN

(Select p.prod_id From Products p

where p.product = ‘UMBRELLA’)

And s.time_id IN

(Select t.time_id From Times t

Where t.month =‘MAY’ And t.year =2008);

Step 1: Oracle rewrites / transforms the query to retrieve only the necessary rows from the fact table using bitmap indexes on foreign key columns

Step 2: Oracle joins the rows from fact table to the dimension tables

Tips for the Access Layer

••••Partition fact tables to get partition pruning

••••Create bitmap indexes on all FK columns

••••Set STAR_TRANSFORMATION_ENABLED to

true

Best Practices for a Data Warehouse on Oracle Database 11g Page 22

1

1

1

1

1

PstartPstart

BUFFER SORT22

2PRODUCTSTABLE ACCESS FULL23

1SELECT STATEMENT0

16SALES_PROD_BIXBITMAP INDEX RANGE SCAN24

BITMAP KEY ITERATION21

BITMAP MERGE20

16SALES_TIME_BIXBITMAP INDEX RANGE SCAN19

1CUSTOMERSTABLE ACCESS FULL18

BUFFER SORT17

BITMAP KEY ITERATION16

BITMAP MERGE15

16SALES_TIME_BIXBITMAP INDEX RANGE SCAN14

1TIMESTABLE ACCESS FULL13

BUFFER SORT12

BITMAP KEY ITERATION11

BITMAP MERGE10

BITMAP AND9

BITMAP CONVERSION TO ROWIDS8

1644144SALESTABLE ACCESS BY LOCAL INDEX ROWID 7

1644144PARTITION RANGE SUBQUERY6

1TIMESTABLE ACCESS FULL5

1HASH JOIN4

2PRODUCTSTABLE ACCESS FULL3

3HASH JOIN2

1SORT GROUP BY NOSORT 1

PstopPstopRowsRowsName Name OperationOperationIDID

Phase 2

Phase 1

SYSTEM MANAGEMENT

Workload Management

Best Practices for a Data Warehouse on Oracle Database 11g Page 23

MessagesMessages QC connectionParallel server connection

QC is the user session that initiates the parallel SQL statement & it will distribute the work to parallel servers

Parallel servers - individual sessions that perform work in parallel They are allocated from a pool of globally available parallel server processes and assigned to a given operation

Parallel servers communicate among themselves & the QC using messages that are passed via memory buffers in the shared pool

PX

COORDINATOR

Best Practices for a Data Warehouse on Oracle Database 11g Page 24

Parallel Servers Parallel Servers do majority of the workdo majority of the work

Query CoordinatorQuery Coordinator

BROADCASTP->PQ1,01PX SEND BROADCAST5

PCWP

PCWP

PCWP

PCWP

PCWP

PCWP

P->S

ININ--OUTOUT

SELECT STATEMENT0

Q1,01SALESTABLE ACCESS FULL 9

Q1,01PX BLOCK ITERATOR8

Q1,01CUSTOMERSTABLE ACCESS FULL 7

Q1,01PX BLOCK ITERATOR6

Q1,01PX RECEIVE4

Q1,01HASH JOIN3

Q1,01PX SEND QC {RANDOM}2

PX COORDINATOR 1

PQ DistPQ DistTQTQName Name OperationOperationIDID

parallel_max_serversparallel_min_servers

Best Practices for a Data Warehouse on Oracle Database 11g Page 25

Whether or not to use cross instance parallel execution in RAC

instance_groupsparallel_instance_groups

Using Instance Groups to control Parallel Execution in RAC

instance_groupsparallel_instance_groupinstance_groups

Abstract of the Init.ora file

sid[1].INSTANCE_GROUPS=ETL

sid[2].INSTANCE_GROUPS=ETL

sid[3].INSTANCE_GROUPS= ADHOC

sid[4].INSTANCE_GROUPS= ADHOC

sid[1].PARALLEL_INSTANCE_GROUPS=ETL

sid[2].PARALLEL_INSTANCE_GROUPS=ETL

sid[3].PARALLEL_INSTANCE_GROUPS= ADHOC

ETL Ad-Hoc queries

Best Practices for a Data Warehouse on Oracle Database 11g Page 26

Using services to control Parallel Execution in RAC

srvctl

Srvctl add service –d database_name -s ETL -r sid1, sid2 Srvctl add service –d database_name -s ADHOC -r sid3, sid4

Workload Monitoring

Best Practices for a Data Warehouse on Oracle Database 11g Page 27

Best Practices for a Data Warehouse on Oracle Database 11g Page 28

Best Practices for a Data Warehouse on Oracle Database 11g Page 29

GV$SQL_MONITORGV$SQL_MONITOR

Best Practices for a Data Warehouse on Oracle Database 11g Page 30

Best Practices for a Data Warehouse on Oracle Database 11g Page 31

Resource Manager

Best Practices for a Data Warehouse on Oracle Database 11g Page 32

Optimizer Statistics Management

INCREMENTAL TRUEDBMS_STATSGRANULARITYAUTOSQL> exec dbms_stats.set_table_prefs('SH', 'SALES', 'INCREMENTAL', 'TRUE');

SQL> exec dbms_stats.gather_table_stats(

Best Practices for a Data Warehouse on Oracle Database 11g Page 33

Sales TableSales Table

May 22May 22ndnd 20082008

May 23May 23rdrd 20082008

May 18May 18thth 20082008

May 19May 19thth 20082008

May 20May 20thth 20082008

May 21May 21stst 20082008

Sysaux Tablespace

S1

S2

S3

S4

S5

S6

1. Partition level stats are gathered & synopsis created

Global Statistic

2. Global stats generated by aggregating partition synopsis

Frequency of statistics collection

Best Practices for a Data Warehouse on Oracle Database 11g Page 34

Initialization Parameter

Memory allocation

shared_pool_size

Best Practices for a Data Warehouse on Oracle Database 11g Page 35

parallel_min_serversshared_pool_sizesga_targetmemory_target

pga_aggregate_targetpga_aggregate_targetpga_aggregate_targetparallel_max_servers.

Best Practices for a Data Warehouse on Oracle Database 11g Page 36

Controlling Parallel Execution

parallel_execution_message_sizeparallel_execution_message_size parallel_execution_message_size

parallel_min_serversparallel_min_servers

parallel_max_servers. B parallel_min_servers

parallel_max_serverscpu_count * parallel_threads_per_cpuparallel_max_servers

parallel_adaptive_multi_user

Enabling efficient IO throughput

db_block_size

Best Practices for a Data Warehouse on Oracle Database 11g Page 37

db_file_multiblock_read_count

db_file_multiblock_read_countdb_file_multiblock_read_count

disk_async_io

Star Query

CONCLUSION

Tips for System Management

••••Use Parallel Execution where appropriate

••••Take hourly AWR or statspack report

••••Use EM to do real-time system monitoring

••••Use Resource Manager to ensure necessary

users get high priority on the system

••••Always have accurate Optimizer statistics

••••Use INCREMENTAL statistic maintenance or

copy_stats to keep large partitioned fact -

table up to date in a timely manner

••••Set only the initialization parameters that

you need to

Data Warehouse Best Practices for Oracle Database 11g

September 2008

Author: Maria Colgan

Contributing Authors: Doug Cackett, George Spears, and Andrew Bond

Oracle Corporation

World Headquarters

500 Oracle Parkway

Redwood Shores, CA 94065

U.S.A.

Worldwide Inquiries:

Phone: +1.650.506.7000

Fax: +1.650.506.7200

oracle.com

Copyright © 2008, Oracle. All rights reserved.

This document is provided for information purposes only and the

contents hereof are subject to change without notice.

This document is not warranted to be error-free, nor subject to any

other warranties or conditions, whether expressed orally or implied

in law, including implied warranties and conditions of merchantability

or fitness for a particular purpose. We specifically disclaim any

liability with respect to this document and no contractual obligations

are formed either directly or indirectly by this document. This document

may not be reproduced or transmitted in any form or by any means,

electronic or mechanical, for any purpose, without our prior written permission.

Oracle is a registered trademark of Oracle Corporation and/or its affiliates.

Other names may be trademarks of their respective owners.