Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Best Practices for a Data Warehouse on Oracle Database 11g Page 5
BALANCED CONFIGURATION
DiskArray 1
DiskArray 2
DiskArray 3
DiskArray 4
DiskArray 5
DiskArray 6
DiskArray 7
DiskArray 8
FC-Switch1 FC-Switch2
HB
A1
HB
A2
HB
A1
HB
A2
HB
A1
HB
A2
HB
A1
HB
A2
Best Practices for a Data Warehouse on Oracle Database 11g Page 7
Disk Layout
Tips for a balanced system
Total throughput = # cores X 200MB
Use 1 HBA port per CPU
Use 1 disk controller per HBA Port
Max of 10 physical disks per controller
Use more smaller drives (146GB or 300 GB)
Use minimum of 4GB of Memory per core
Use RAID 1 with ASM
Interconnect bandwidth = IO subsystem
bandwidth
Best Practices for a Data Warehouse on Oracle Database 11g Page 8
LUN A1 LUN A2 LUN A3 LUN A4
LUN A5 LUN A6 LUN A7 LUN A8
Disk12
Disk8
Disk16
Disk4
Disk11
Disk7
Disk15
Disk3
Disk10
Disk6
Disk14
Disk2
Disk9
Disk5
Disk13
Disk1
LUN A1 LUN A2LUN A2 LUN A3LUN A3 LUN A4LUN A4
LUN A5LUN A5 LUN A6LUN A6 LUN A7LUN A7 LUN A8LUN A8
Disk12Disk12
Disk8Disk8
Disk16Disk16
Disk4Disk4
Disk11Disk11
Disk7Disk7
Disk15Disk15
Disk3Disk3
Disk10Disk10
Disk6Disk6
Disk14Disk14
Disk2Disk2
Disk9Disk9
Disk5Disk5
Disk13Disk13
Disk1Disk1
ASM stripe
11 2 3 4
5 6 7 8
8
1 2 3 4
5 6 77
ASM Diskgroup
LUN A1LUN A1
LUN A2LUN A2
LUN A3LUN A3
LUN A4LUN A4
LUN A5LUN A5
LUN A6LUN A6
LUN A7LUN A7
LUN A8LUN A8
Best Practices for a Data Warehouse on Oracle Database 11g Page 11
Efficient Data Loading
•
•
•
•
External Tables
CREATE TABLE ext_tab_for_sales_data (
Price NUMBER(6),
Quantity NUMBER(6),
Time_id DATE,
Cust_id NUMBER(12),
Prod_id NUMBER(12))
ORGANIZATION EXTERNAL
(TYPE oracle_loader
DEFAULT DIRECTORY admin
ACCESS PARAMETERS
Best Practices for a Data Warehouse on Oracle Database 11g Page 12
( RECORDS DELIMITED BY newline
BADFILE 'ulcase1.bad'
LOGFILE 'ulcase1.log'
FIELDS TERMINATED BY ","
(Price INTEGER EXTERNAL(6),
Qunantity INTEGER EXTERNAL(6),
Time_id DATE)
LOCATION (sales_data_for_january.dat))
REJECT LIMIT UNLIMITED;
Insert into Sales partition(p2)
Select * From ext_tab_for_sales_data;
Direct Path Load
Insert /*+ APPEND */ into Sales partition(p2)
Select * From ext_tab_for_sales_data;
ALTER SESSION ENABLE PARALLEL DML;
Best Practices for a Data Warehouse on Oracle Database 11g Page 13
Insert /*+ APPEND */ into Sales partition(p2)
Select * from ext_tab_for_sales_data;
Partition exchange loads
Sales Table
M ay 22nd 2008
M ay 23rd 2008
M ay 24th 2008
M ay 18th 2008
M ay 19th 2008
M ay 20th 2008
May 21st 20082. Use CTAS command to create non-partitioned table TMP_SALES
DBA
Tmp_ sales Table
1. Create external table for flat files
3. Create indexes
4. Gather Statistics
Tmp_ sales Table
Sales Table
M ay 22nd 2008
May 23rd 2008
May 24th 2008
May 18th 2008
May 19th 2008
May 20th 2008
M ay 21st 2008
5. Alter table Sales exchange partition May_24_2008 w ith table tmp_sales
Sales table now has all the data
Best Practices for a Data Warehouse on Oracle Database 11g Page 14
Alter table Sales exchange partition p2 with
table tmp_sales including indexes without
validation;
Data Compression
Foundation layer - Third Normal Form
Tips for the Staging Layer
Use external tables
Load using parallel DML stmts CTAS or IAS
Use data compression
Considering range partitioning fact table to
enable partition exchange loads
Best Practices for a Data Warehouse on Oracle Database 11g Page 15
Optimizing 3NF
Partitioning for manageability
Alter table <table_name> drop partition <part_name>
Partitioning for easier data access
Best Practices for a Data Warehouse on Oracle Database 11g Page 16
S elect sum (sale s_am ount)
From SALES
W here sales_d ate betw een
to_date(‘05/20/2008’,’M M /D D/YYYY’)
A nd
to_date(‘05/23/2008’,’M M /D D/YYYY’);
Q : W hat w as the to tal sales for the
w eekend of M ay 20 - 22 2008?
O nly the 3 relevant partitions are acce sse d
Sales Tab le
M ay 22M ay 22n dn d 20082008
M ay 23M ay 23rdrd 20 082008
M ay 24M ay 24thth 20082 008
Ma y 18 t h 2008
M ay 19M ay 19thth 20082 008
M ay 20M ay 20thth 20082 008
M ay 21M ay 21stst 2 008200 8
Partitioning for join performance
Best Practices for a Data Warehouse on Oracle Database 11g Page 17
Select sum(sales_amount)
From
SALES s, CUSTOMER c
Where s.cust_id = c.cust_id;
Both tables have the same degree of parallelism and are partitioned the same way on the join column (cust_id)
SalesSales
Range Range partition May partition May 1818thth 20082008
Sub part 2Sub part 2
Sub part 3Sub part 3
Sub part 4Sub part 4
Sub part 1
CustomerCustomer
Range Range partition May partition May 1818thth 20082008
Sub part 2Sub part 2
Sub part 3Sub part 3
Sub part 4Sub part 4
Sub part 1Sub part 1
Sub part 2Sub part 2
Sub part 3Sub part 3
Sub part 4Sub part 4
Sub part 1Sub part 1
Sub part 2Sub part 2
Sub part 3Sub part 3
Sub part 4Sub part 4
Sub part 1Sub part 1
A large join is divided into multiple smaller joins, each joins a pair of partitions in parallel
Best Practices for a Data Warehouse on Oracle Database 11g Page 18
H A S H
Q C (R A N D )
P QD is tr ib
1 2 8
1 2 8
1 2 8
P stoP stopp
S E L E C T ST A T E M E N T0
Q 1,001S a lesT A B L E A C C E S S F U L L1 0
Q 1,001C u s to m e rsT A B L E A C C E S S F U L L9
Q 1,00H A S H JO IN 8
Q 1,001P X P A R T IT IO N H A S H A L L7
Q 1,00S O R T G R O U P B Y 6
Q 1,00:T Q 1 00 00P X S E N D H A S H 5
Q 1,01P X R E C E IV E 4
Q 1,01S O R T G R O U P B Y 3
Q 1,01:T Q 1 00 01P X S E N D Q C (R A N D O M )2
P X C O O R D IN A T O R 1
T QP sta rP sta rtt
N a m e N a m e O p e ra tio nO p e ra tio nIDID
P a rt it io n H as h A ll a b o ve th e jo in &
s in g le P Q s e t in d ic ate p a rt it io n -w is e jo in
Best Practices for a Data Warehouse on Oracle Database 11g Page 19
Select sum(sales_amount)
From
SALES s, CUSTOMER c
Where s.cust_id = c.cust_id;
Only the Sales table is hash partitioned on the cust_id column
SalesSales
Range Range partition May partition May 1818thth 20082008
Sub part 2Sub part 2
Sub part 3Sub part 3
Sub part 4Sub part 4
Sub part 1Sub part 1
CustomerCustomer
Sub part 2Sub part 2
Sub part 3Sub part 3
Sub part 4Sub part 4
Sub part 1Sub part 1
Sub part 2Sub part 2
Sub part 3Sub part 3
Sub part 4Sub part 4
Sub part 1Sub part 1
Rows from customer are dynamically redistributed on the join key cust_id to enable partition-wise join
Access layer - Star Schema
Tips for the Foundation Layer
••••Partition fact tables to get partition pruning
••••Sub-partitions by hash to achieve partition
wise joins
••••Use parallel execution to allow multiple
process to work on large queries
••••Number of sub-partitions needs to be a
power of 2 and should be greater than or
equal to the DOP
Best Practices for a Data Warehouse on Oracle Database 11g Page 20
Select SUM(s.quanity_sold) total, p.product, t.month
From Sales s, Customers c, Products p, Tim es t
W here s.cust_id = c.cust_id
And s.prod_id = p.prod_id
And s.time_id = t.time_id
And c.cust_city = ‘BOSTON’
And p.product = ‘UMBRELLA’
And t.month = ‘MAY’
And t.year = 2008;
Q: W hat was the total number of umbrellas sold in Boston during
the month of May 2008 ?
Optimizing Star Queries
•
•
Best Practices for a Data Warehouse on Oracle Database 11g Page 21
Select SUM(s.quanity_sold), p.productFrom Sales s, Customers c, Products p,
Times t
Where s.cust_id = c.cust_id
And s.prod_id = p.prod_id
And s.time_id = t.time_id
And c.cust_city = ‘BOSTON’
And p.product = ‘UMBRELLA’
And t.month = ‘MAY’
And t.year = 2008;
Select SUM(quanity_sold)
From Sales s
Where s.cust_id IN
(Select c.cust_id From Customers c
Where c.cust_city = ‘BOSTON’)
And s.prod_id IN
(Select p.prod_id From Products p
where p.product = ‘UMBRELLA’)
And s.time_id IN
(Select t.time_id From Times t
Where t.month =‘MAY’ And t.year =2008);
Step 1: Oracle rewrites / transforms the query to retrieve only the necessary rows from the fact table using bitmap indexes on foreign key columns
Step 2: Oracle joins the rows from fact table to the dimension tables
Tips for the Access Layer
••••Partition fact tables to get partition pruning
••••Create bitmap indexes on all FK columns
••••Set STAR_TRANSFORMATION_ENABLED to
true
Best Practices for a Data Warehouse on Oracle Database 11g Page 22
1
1
1
1
1
PstartPstart
BUFFER SORT22
2PRODUCTSTABLE ACCESS FULL23
1SELECT STATEMENT0
16SALES_PROD_BIXBITMAP INDEX RANGE SCAN24
BITMAP KEY ITERATION21
BITMAP MERGE20
16SALES_TIME_BIXBITMAP INDEX RANGE SCAN19
1CUSTOMERSTABLE ACCESS FULL18
BUFFER SORT17
BITMAP KEY ITERATION16
BITMAP MERGE15
16SALES_TIME_BIXBITMAP INDEX RANGE SCAN14
1TIMESTABLE ACCESS FULL13
BUFFER SORT12
BITMAP KEY ITERATION11
BITMAP MERGE10
BITMAP AND9
BITMAP CONVERSION TO ROWIDS8
1644144SALESTABLE ACCESS BY LOCAL INDEX ROWID 7
1644144PARTITION RANGE SUBQUERY6
1TIMESTABLE ACCESS FULL5
1HASH JOIN4
2PRODUCTSTABLE ACCESS FULL3
3HASH JOIN2
1SORT GROUP BY NOSORT 1
PstopPstopRowsRowsName Name OperationOperationIDID
Phase 2
Phase 1
SYSTEM MANAGEMENT
Workload Management
•
Best Practices for a Data Warehouse on Oracle Database 11g Page 23
•
•
•
•
MessagesMessages QC connectionParallel server connection
QC is the user session that initiates the parallel SQL statement & it will distribute the work to parallel servers
Parallel servers - individual sessions that perform work in parallel They are allocated from a pool of globally available parallel server processes and assigned to a given operation
Parallel servers communicate among themselves & the QC using messages that are passed via memory buffers in the shared pool
PX
COORDINATOR
Best Practices for a Data Warehouse on Oracle Database 11g Page 24
Parallel Servers Parallel Servers do majority of the workdo majority of the work
Query CoordinatorQuery Coordinator
BROADCASTP->PQ1,01PX SEND BROADCAST5
PCWP
PCWP
PCWP
PCWP
PCWP
PCWP
P->S
ININ--OUTOUT
SELECT STATEMENT0
Q1,01SALESTABLE ACCESS FULL 9
Q1,01PX BLOCK ITERATOR8
Q1,01CUSTOMERSTABLE ACCESS FULL 7
Q1,01PX BLOCK ITERATOR6
Q1,01PX RECEIVE4
Q1,01HASH JOIN3
Q1,01PX SEND QC {RANDOM}2
PX COORDINATOR 1
PQ DistPQ DistTQTQName Name OperationOperationIDID
parallel_max_serversparallel_min_servers
•
•
•
Best Practices for a Data Warehouse on Oracle Database 11g Page 25
Whether or not to use cross instance parallel execution in RAC
instance_groupsparallel_instance_groups
Using Instance Groups to control Parallel Execution in RAC
instance_groupsparallel_instance_groupinstance_groups
Abstract of the Init.ora file
sid[1].INSTANCE_GROUPS=ETL
sid[2].INSTANCE_GROUPS=ETL
sid[3].INSTANCE_GROUPS= ADHOC
sid[4].INSTANCE_GROUPS= ADHOC
sid[1].PARALLEL_INSTANCE_GROUPS=ETL
sid[2].PARALLEL_INSTANCE_GROUPS=ETL
sid[3].PARALLEL_INSTANCE_GROUPS= ADHOC
ETL Ad-Hoc queries
Best Practices for a Data Warehouse on Oracle Database 11g Page 26
Using services to control Parallel Execution in RAC
srvctl
Srvctl add service –d database_name -s ETL -r sid1, sid2 Srvctl add service –d database_name -s ADHOC -r sid3, sid4
Workload Monitoring
Best Practices for a Data Warehouse on Oracle Database 11g Page 32
Optimizer Statistics Management
•
•
•
INCREMENTAL TRUEDBMS_STATSGRANULARITYAUTOSQL> exec dbms_stats.set_table_prefs('SH', 'SALES', 'INCREMENTAL', 'TRUE');
SQL> exec dbms_stats.gather_table_stats(
Best Practices for a Data Warehouse on Oracle Database 11g Page 33
Sales TableSales Table
May 22May 22ndnd 20082008
May 23May 23rdrd 20082008
May 18May 18thth 20082008
May 19May 19thth 20082008
May 20May 20thth 20082008
May 21May 21stst 20082008
Sysaux Tablespace
S1
S2
S3
S4
S5
S6
1. Partition level stats are gathered & synopsis created
Global Statistic
2. Global stats generated by aggregating partition synopsis
Frequency of statistics collection
Best Practices for a Data Warehouse on Oracle Database 11g Page 34
Initialization Parameter
Memory allocation
shared_pool_size
Best Practices for a Data Warehouse on Oracle Database 11g Page 35
parallel_min_serversshared_pool_sizesga_targetmemory_target
pga_aggregate_targetpga_aggregate_targetpga_aggregate_targetparallel_max_servers.
Best Practices for a Data Warehouse on Oracle Database 11g Page 36
Controlling Parallel Execution
parallel_execution_message_sizeparallel_execution_message_size parallel_execution_message_size
parallel_min_serversparallel_min_servers
parallel_max_servers. B parallel_min_servers
parallel_max_serverscpu_count * parallel_threads_per_cpuparallel_max_servers
parallel_adaptive_multi_user
Enabling efficient IO throughput
db_block_size
Best Practices for a Data Warehouse on Oracle Database 11g Page 37
db_file_multiblock_read_count
db_file_multiblock_read_countdb_file_multiblock_read_count
disk_async_io
Star Query
CONCLUSION
•
•
•
Tips for System Management
••••Use Parallel Execution where appropriate
••••Take hourly AWR or statspack report
••••Use EM to do real-time system monitoring
••••Use Resource Manager to ensure necessary
users get high priority on the system
••••Always have accurate Optimizer statistics
••••Use INCREMENTAL statistic maintenance or
copy_stats to keep large partitioned fact -
table up to date in a timely manner
••••Set only the initialization parameters that
you need to
Data Warehouse Best Practices for Oracle Database 11g
September 2008
Author: Maria Colgan
Contributing Authors: Doug Cackett, George Spears, and Andrew Bond
Oracle Corporation
World Headquarters
500 Oracle Parkway
Redwood Shores, CA 94065
U.S.A.
Worldwide Inquiries:
Phone: +1.650.506.7000
Fax: +1.650.506.7200
oracle.com
Copyright © 2008, Oracle. All rights reserved.
This document is provided for information purposes only and the
contents hereof are subject to change without notice.
This document is not warranted to be error-free, nor subject to any
other warranties or conditions, whether expressed orally or implied
in law, including implied warranties and conditions of merchantability
or fitness for a particular purpose. We specifically disclaim any
liability with respect to this document and no contractual obligations
are formed either directly or indirectly by this document. This document
may not be reproduced or transmitted in any form or by any means,
electronic or mechanical, for any purpose, without our prior written permission.
Oracle is a registered trademark of Oracle Corporation and/or its affiliates.
Other names may be trademarks of their respective owners.