Upload
berenice-holmes
View
213
Download
0
Embed Size (px)
Citation preview
“You Can Do It” Datawarehouse:
Beginner to AdvancedIn Two Hours
byArup Nanda
Manager – Database SystemsStarwood Hotels & Resorts
InternationalWhite Plains, NY
Objectives
• Exploring DW Techniques in Oracle• Case Study• Oracle 10G Additions
Datawarehouse
Cust1
Cust2
Cust3
Cust4
Cust10
Cust9
Cust8
Cust7
Cust11 ?
Cust5 Cust6
DB3 DB6DB4 DB5
DB1 DB2
A Real Life Case
• Claims Datawarehouse• Several Customers/Sources• Several Quarters• Data Volume Was High• Irregular Frequency• Data Comes Often Late• Near Real Time Requirements
DetailTable
DetailTable
SummaryTable
SummaryTable
DetailTable
DetailTable
CUST2
Problem of Irregular Data
DBMS_MVIEW.REFRESH (…)
Problems
• Incoming Data Irregular• Summary Tables Need Refreshing• Quarters Added Continuously• Archival Requirements Vary Across Customers• Quick Retrieval of Archival Needed
Problems contd.
• Summary on Summary Tables as Materialized Views
• Need Refresh Whenever New Data Arrives• Or When Data is Purged/Reinstated• Customers Added and Deleted Frequently
Objective
• To Minimize Downtime for Refreshes– Incrementally Refresh– Partitioning Techniques
• To Add Customers Easily• To Add Quarters Easily• To Archive Off and Purge Easily and Atomically• To Restore Archives Quickly
Objective contd.
• To have an ETL Setup for Easy Addition of Objects Such As Tables, Indexes, Mat Views.
• Use Only Available Oracle and Unix Tools– PL/SQL– Unix Shell Scripts– SQL*Plus
Design
• Varying Dimensions – – Customer – Quarter
• Composite Partitioning– Range (for Quarters)– List (for Customers)
• Local Indexes
Partitioning
• Partitioned on CLAIM_DATE– RANGE– Partitioned named YyyQq– Storage Clauses Not Defined
• Supartitioned on CUST_NAME– LIST– Named YyyQq_CustName, e.g. Y03Q3_CUST1
Indexing
• All Indexes LocalCREATE INDEX IN_CLAIM_SUM_01
LOCAL
ON SUMTAB1 (COL1, COL2)…
• No Indexes UNIQUE and GLOBAL
Storage
Each Subpartition – of Index or Table is kept in separate tablespaces named in the format
Y<Year>Q<Qtr>_<CustName>_DATAe.g. Y02Q2_CUST1_DATA Y02Q2_CUST2_DATA
Y03Q3_CUST1_DATA
Quarter
Cust
omer
s Cust3Y03 Q3
Table
In Tablespace Y03Q3_CUST3_DATA
Cust3Y03 Q3
Index
In Tablespace Y03Q3_CUST3_INDX
Tablespace
create tablespace y03q3_cust1_data
datafile ‘/oradata/y03q3_cust1_data_01.dbf’
size 500m
autoextend on next 500m
extent management local
segment space management auto
Table DDLCREATE TABLE TAB1( … )PARTITION BY RANGE (CLAIM_DATE)SUBPARTITION BY LIST (CUST_NAME) ( PARTITION Y03Q1 VALUES LESS THAN (TO_DATE(‘2003/04/01’,’YYYY/MM/DD’)), ( SUBPARTITION Y03Q1_CUST1 VALUES (‘CUST1’) TABLESPACE Y03Q1_CUST1_DATA, SUBPARTITION Y03Q1_CUST2 VALUES (‘CUST2’) TABLESPACE Y03Q1_CUST2_DATA, … and so on for all subpartitions … SUBPARTITION Y03Q1_DEF VALUES (DEFAULT) TABLESPACE USER_DATA ), PARTITION Y03Q2 VALUES LESS THAN (TO_DATE(‘2003/07/01’,’YYYY/MM/DD’)), ( SUBPARTITION Y03Q2_CUST1 VALUES (‘CUST1’) TABLESPACE Y03Q2_CUST1_DATA, SUBPARTITION Y03Q2_CUST2 VALUES (‘CUST2’) TABLESPACE Y03Q2_CUST2_DATA, … and so on for all subpartitions … SUBPARTITION Y03Q2_DEF VALUES (DEFAULT) TABLESPACE USER_DATA ), … and so on for all the partitions … PARTITION DEF VALUES LESS THAN (MAXVALUE), ( SUBPARTITION DEF_CUST1 VALUES (‘CUST1’) TABLESPACE USER_DATA, SUBPARTITION DEF_CUST2 VALUES (‘CUST2’) TABLESPACE USER_DATA, … and so on for all subpartitions … SUBPARTITION DEF_DEF VALUES (DEFAULT) TABLESPACE USER_DATA ))
Index DDLCREATE INDEX IN_TAB1_01 ON TAB1 (COL1)LOCAL NOLOGGING( PARTITION Y03Q1 ( SUBPARTITION Y03Q1_CUST1 TABLESPACE Y03Q1_CUST1_INDX, SUBPARTITION Y03Q1_CUST2 TABLESPACE Y03Q1_CUST2_INDX, … and so on for all subpartitions … SUBPARTITION Y03Q1_DEF TABLESPACE USER_DATA ), PARTITION Y03Q2 ( SUBPARTITION Y03Q2_CUST1 TABLESPACE Y03Q2_CUST1_INDX, SUBPARTITION Y03Q2_CUST2 TABLESPACE Y03Q2_CUST2_INDX, … and so on for all subpartitions … SUBPARTITION Y03Q2_DEF TABLESPACE USER_DATA ), … and so on for all the partitions … PARTITION DEF ( SUBPARTITION DEF_CUST1 TABLESPACE USER_DATA, SUBPARTITION DEF_CUST2 TABLESPACE USER_DATA, … and so on for all subpartitions … SUBPARTITION DEF_DEF TABLESPACE USER_DATA ))
Creating DDLs
StaticPart
StaticPart
VariablePart
VariablePart
create table tab1(………)
partition y03q1 (subpartition y03q1_cust1 tablespace …)
DDL toCreateTable
DDL toCreateTable
Constraints
Constraints defined asDISABLE NOVALIDATE RELY
ALTER TABLE … ADD CONSTRAINT …RELY DISABLE NOVALIDATE;
Constraint
• VALIDATE/NOVALIDATE– Table TAB1 (Column: STATUS)– Current Values A, I, F– Check Constraint: STATUS IN (‘A’,’I’)
• ENABLE/DISABLE– New Value ‘F’
• RELY
RELY
Reasons• To Include Relation Information to the
Metadata• To Enable Query Rewrite
Summary Tab and View
Summary Table
CUST_NAMECLAIM_DATEPROVIDER_IDNUM_CLAIMSNUM_LINES
ViewSELECT‘CUST1’ AS CUST_NAME,CLAIM_DATE,PROVIDER_ID,COUNT(DISTINCT CLAIM_ID) AS NUM_CLAIMS,COUNT(*) AS NUM_LINESFROM ….GROUP BY …
On Source
On DW
Casting
SELECT CAST (CUST_NAME AS VARCHAR2(20))AS CUST_NAMEFROM <viewname>
CAST (column_name AS datatype (precision))
cust1
SummaryTable
SummaryTable
DW
ViewView
Owned byCust Schema
TemporaryTable
Index ofTemporary
Table
Massaging
Analyzing
Filter:Where CLAIM_DATE
is in that quarter
INDEX
TABLE
For Customer Cust1 and Quarter Q1
cust
DW
ViewView
Old Sub Partition
Old Sub Partition
INDEX
TABLE
ALTER TABLE … EXCHANGE SUBPARTITION subpartnameWITH TEMPTABLEINCLUDING INDEXES
Technique
• Not Using DBMS_MVIEW.REFRESH• MV is always STALE
Temp TableCREATE TABLE T1_Y03Q1_CUST1TABLESPACE Y03Q1_CUST1_DATAPARALLEL 8 NOLOGGING ASSELECT …FROM CUST1.VIEW1@DB1WHERE CLAIM_DATE >=add_months(trunc(to_date(‘03','RR'),'YYYY'), 3*(to_number(‘1')-1))and batch_date < last_day(add_months(trunc( to_date(‘03','RR'),'YYYY'), 3*(to_number(‘1')) - 1 )) + 1
ScriptCREATE TABLE T1_Y&&YY.Q&&Q._&&CUSTTABLESPACE Y&&YY.Q&&Q._&&CUST._DATAPARALLEL 8 NOLOGGING ASSELECT …FROM &&CUST..VIEW1@&&DBLINKWHERE CLAIM_DATE >=ADD_MONTHS(TRUNC(TO_DATE('&&YY','RR'),'YYYY'), 3*(TO_NUMBER('&&Q')-1))AND BATCH_DATE < LAST_DAY(ADD_MONTHS(TRUNC( TO_DATE('&&YY','RR'),'YYYY'), 3*(TO_NUMBER('&&Q')) -1 )) + 1
External Table
ReasonSource is a non-Oracle DB, e.g. DB2Source is External, no DB Link Allowed
Fixed Format –vs- DelimitedFixed Format
Faster, EasierMore Space
DelimitedLess SpaceSlower, Slightly More Complex
Massaging
• Removing NOT NULL Constraints• Making Datatypes Consistent
– The CAST operation converts NUMBER(m,n) to NUMBER
– cast(col1 as number(10,2)) as col1_m
– COL1 NUMBER(5,2)– COL1_M NUMBER
Analyzing• Using DBMS_STATS.GATHER_TABLE_STATS• PARALLEL Degree
dbms_stats.gather_table_stats ( ownname => ‘DWOWNER', tabname => '&&TABNAME', estimate_percent => dbms_stats.auto_sample_size, method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO', degree => dbms_stats.default_degree, cascade => TRUE );
Mat Views
MVs Created as TablesCREATE TABLE MV_SUMMTAB1
Storage clauses just like the underlying tableCREATE MATERIALIZED VIEW
MV_SUMMTAB1
ON PREBUILT TABLE
AS SELECT ……
http://www.proligence.com/painless_alter.pdf
Query Rewrite
Table SUM_CLAIMS PROVIDER_ID, STATE, TYPE, TOT_AMT
Table MV_SUM_CLAIMS PROVIDER_ID, STATE, SUM(TOT_AMT) TOT_AMTGROUP BY PROVIDER_ID, STATE
SELECT SUM(TOT_AMT) FROM SUM_CLAIMSSELECT SUM(TOT_AMT) FROM
MV_SUM_CLAIMS
Query Rewrite
Init.ora Parametersquery_rewrite_enabled='TRUE'query_rewrite_integrity='STALE_TOLERATED‘
ENFORCED – Rewrite only if guaranteedTRUSTED – Uses only if RELYSTALE_TOLERATED – Even if not RELY
Checking QR
dbms_mview.explain_rewrite (‘select cust_name, count(*) from summtab1 group by
cust_name’ );
select message from rewrite_table;
QSM-01033: query rewritten with materialized view, MV_SUMMTAB1
QSM-01101: rollup(s) took place on mv, MV_SUMMTAB1
Design …
MV_* subpartitions are on the same tablespace as the parents.
Subparts of MV_SUMMTAB1_0? are in the same TS as SUMMTAB1
Subparts of MV_SUMMTAB2_0? in SUMMTAB2
Custom
er na
me Quarter
PARENT
MV1
MV2
TableSpace1 TableSpace2
MV and Parents
• Partition Pruning• Partition-wise Joins• Partition Independence
Adding Quarters/Customers
• Partition– Default Partition – VALUES LESS THAN
(MAXVALUE)• Subpartition
– Default Subpartition – VALUES (DEFAULT)
Qtr1 Qtr2 Qtr3 DEF
Cust1
Cust2
Cust3
DEF
Qtr1 Qtr2 Qtr3 DEF
Cust1
Cust2
Cust3
DEF
Qtr1 Qtr2 Qtr3 DEF
Cust1
Cust2
Cust3
DEF
Cust4
alter table … split subpartition
Qtr1 Qtr2 Qtr3 DEF
Cust1
Cust2
Cust3
DEF
Qtr1 Qtr2 Qtr3 Qtr4
Cust1
Cust2
Cust3
DEF
DEF
alter table … split partition
Backup/Restore
• Backup– ALTER TABLESPACE <TSName> READ ONLY– Copy the files to tape/CD.
• Restore– Copy the file back into the directory– ALTER TABLESPACE <TSName> RECOVER
Archival/Purge
SP1 SP2 SP3Table
SP4
SP1 SP2 SP3Table
Table4
SP1 SP2 SP3 SP4Table Table4
Table4
Archival/PurgeCREATE TABLE S1_Y<yy>Q<q>_<CustName>TABLESPACE Y<yy>Q<q>_<CustName>_<TSType> AS SELECT * FROM SUMMTAB1 WHERE 1=2/CREATE INDEXES, CONSTRAINTS, etc./ALTER TABLE SUMMTAB1 EXCHANGE SUBPARTITION Y<yy>Q<q>_<CustName> WITH TABLE Y<yy>Q<q>_<CustName> INCLUDING INDEXES/
Check TTS
ALTER TABLESPACE Y<yy>Q<q>_<CustName>_<TSType> READ ONLY;
DBMS_TTS.TRANSPORT_SET_CHECK ( <DataTS>,<IndexTS>) ;
SELECT * FROM TRANSPORT_SET_VIOLATIONS;
Transport TS
Export Parameter FileTRANSPORT_TABLESPACE=y TTS_FULLCHECK=YFILE=‘<FileLocation>/exp<TS>.dmp’TABLESPACES=(<DataTS>, <IndexTS>)
Copy the exp.dmp and Datafiles to tape/CD.
Purge
Drop SubpartitionDrop the Tablespace
DROP TABLESPACE <TSName> INCLUDING CONTENTS AND DATAFILES;
Restore
• ALTER TABLE SPLIT SUBPARTITION <DefaultSP>
• Copy Datafiles & Export Dump Files from CD/Tape
• Import Parameter FileTRANSPORT_TABLESPACES=YTABLESPACES=(<DataTS>,<IndexTS>)DATAFILES=(…)
Minimizing Refresh Unit
• Months – instead of quarters refreshed at a time.• Last Quarter Split into a Subpartition per Month• Naming Convention
– YyyQqMmm– Y03Q3M09
• Merge Subpartition
Merging Subpartitions
• Index Subpartitions Created in User’s Default Tablespace
• Subpartition TemplateALTER TABLE SUMTAB1 ADD SUBPARTITION
TEMPLATE
Resumable Statement• When?
– Running Large Report Jobs– Creating Large Indexes
• ALTER SESSION ENABLE RESUMABLE NAME ‘Job1’;• View DBA_RESUMABLE
– NAME – Name specified in ALTER SESSION– COORD_SESSION_ID – Coord Session in PQ– SQL_TEXT – The text of the SQL– STATUS - RUNNING, SUSPENDED, ABORTED, ABORTING,
TIMEOUT – ERROR_NUMBER/ERROR_MSG
Objectives Revisited
• To Minimize Downtime for Refreshes– Incrementally Refresh– Partitioning Techniques
• To Add Customers Easily• To Add Quarters Easily• To Archive Off and Purge Easily and Atomically• To Restore Archives Quickly
Oracle 10G
• Transportable Tablespaces Can Be Reinstated At a Different Operating System– Can be used for Restoring to a Different OS
• Tablespaces Can Be Renamed– Restoring Tablespace of the Same Name
• Multiple Temporary Tablespace– For Large Index Creation, Sorting, etc.
Oracle 10G contd.
• Partition Change Tracking Support for List Partitioning
• Query Rewrites Can Use Multiple MVs• OEM Shows All Partitioning Features• Data Pump
– Export/Import on Steroids– Parallel Operation
Oracle 10G contd.
• External Table Download– A Utility to Create File from Table Data
CREATE TABLE …
ORGANIZATION EXTERNAL
AS SELECT * FROM <a query>
– Platform Independent File– Can Be Used In External Tables