Prligence Empowering Intelligence A Multi-Source Time-Variant Datawarehouse Case Study Session#...

Preview:

Citation preview

pr ligence Empowering Intelligence

AMulti-Source Time-Variant

DatawarehouseCase Study

Session# 36605

byArup Nanda

Proligence, Inc.Norwalk, CT

pr ligence Empowering Intelligence

Objectives

• Exploring DW Techniques in Oracle• Case Study• Oracle 10G Additions

pr ligence Empowering Intelligence

Datawarehouse

Cust1

Cust2

Cust3

Cust4

Cust10

Cust9

Cust8

Cust7

Cust11 ?

Cust5 Cust6

DB3 DB6DB4 DB5

DB1 DB2

pr ligence Empowering Intelligence

A Real Life Case

• Claims Datawarehouse• Several Customers/Sources• Several Quarters• Data Volume Was High• Irregular Frequency• Data Comes Often Late• Near Real Time Requirements

pr ligence Empowering Intelligence

DetailTable

DetailTable

SummaryTable

SummaryTable

DetailTable

DetailTable

CUST2

Problem of Irregular Data

DBMS_MVIEW.REFRESH (…)

pr ligence Empowering Intelligence

Problems

• Incoming Data Irregular• Summary Tables Need Refreshing• Quarters Added Continuously• Archival Requirements Vary Across Customers• Quick Retrieval of Archival Needed

pr ligence Empowering Intelligence

Problems contd.

• Summary on Summary Tables as Materialized Views

• Need Refresh Whenever New Data Arrives• Or When Data is Purged/Reinstated• Customers Added and Deleted Frequently

pr ligence Empowering Intelligence

Objective

• To Minimize Downtime for Refreshes– Incrementally Refresh– Partitioning Techniques

• To Add Customers Easily• To Add Quarters Easily• To Archive Off and Purge Easily and Atomically• To Restore Archives Quickly

pr ligence Empowering Intelligence

Objective contd.

• To have an ETL Setup for Easy Addition of Objects Such As Tables, Indexes, Mat Views.

• Use Only Available Oracle and Unix Tools– PL/SQL– Unix Shell Scripts– SQL*Plus

pr ligence Empowering Intelligence

Design

• Varying Dimensions – – Customer – Quarter

• Composite Partitioning– Range (for Quarters)– List (for Customers)

• Local Indexes

pr ligence Empowering Intelligence

Partitioning

• Partitioned on CLAIM_DATE– RANGE– Partitioned named YyyQq– Storage Clauses Not Defined

• Supartitioned on CUST_NAME– LIST– Named YyyQq_CustName, e.g. Y03Q3_CUST1

pr ligence Empowering Intelligence

Indexing

• All Indexes LocalCREATE INDEX IN_CLAIM_SUM_01

LOCAL

ON SUMTAB1 (COL1, COL2)…

• No Indexes UNIQUE and GLOBAL

pr ligence Empowering Intelligence

Storage

Each Subpartition – of Index or Table is kept in separate tablespaces named in the format

Y<Year>Q<Qtr>_<CustName>_DATAe.g. Y02Q2_CUST1_DATA Y02Q2_CUST2_DATA

Y03Q3_CUST1_DATA

pr ligence Empowering Intelligence

Quarter

Cust

omer

s Cust3Y03 Q3

Table

In Tablespace Y03Q3_CUST3_DATA

Cust3Y03 Q3

Index

In Tablespace Y03Q3_CUST3_INDX

pr ligence Empowering Intelligence

Tablespace

create tablespace y03q3_cust1_data

datafile ‘/oradata/y03q3_cust1_data_01.dbf’

size 500m

autoextend on next 500m

extent management local

segment space management auto

pr ligence Empowering Intelligence

Table DDLCREATE TABLE TAB1( … )PARTITION BY RANGE (CLAIM_DATE)SUBPARTITION BY LIST (CUST_NAME) ( PARTITION Y03Q1 VALUES LESS THAN (TO_DATE(‘2003/04/01’,’YYYY/MM/DD’)), ( SUBPARTITION Y03Q1_CUST1 VALUES (‘CUST1’) TABLESPACE Y03Q1_CUST1_DATA, SUBPARTITION Y03Q1_CUST2 VALUES (‘CUST2’) TABLESPACE Y03Q1_CUST2_DATA, … and so on for all subpartitions … SUBPARTITION Y03Q1_DEF VALUES (DEFAULT) TABLESPACE USER_DATA ), PARTITION Y03Q2 VALUES LESS THAN (TO_DATE(‘2003/07/01’,’YYYY/MM/DD’)), ( SUBPARTITION Y03Q2_CUST1 VALUES (‘CUST1’) TABLESPACE Y03Q2_CUST1_DATA, SUBPARTITION Y03Q2_CUST2 VALUES (‘CUST2’) TABLESPACE Y03Q2_CUST2_DATA, … and so on for all subpartitions … SUBPARTITION Y03Q2_DEF VALUES (DEFAULT) TABLESPACE USER_DATA ), … and so on for all the partitions … PARTITION DEF VALUES LESS THAN (MAXVALUE), ( SUBPARTITION DEF_CUST1 VALUES (‘CUST1’) TABLESPACE USER_DATA, SUBPARTITION DEF_CUST2 VALUES (‘CUST2’) TABLESPACE USER_DATA, … and so on for all subpartitions … SUBPARTITION DEF_DEF VALUES (DEFAULT) TABLESPACE USER_DATA ))

pr ligence Empowering Intelligence

Index DDLCREATE INDEX IN_TAB1_01 ON TAB1 (COL1)LOCAL NOLOGGING( PARTITION Y03Q1 ( SUBPARTITION Y03Q1_CUST1 TABLESPACE Y03Q1_CUST1_INDX, SUBPARTITION Y03Q1_CUST2 TABLESPACE Y03Q1_CUST2_INDX, … and so on for all subpartitions … SUBPARTITION Y03Q1_DEF TABLESPACE USER_DATA ), PARTITION Y03Q2 ( SUBPARTITION Y03Q2_CUST1 TABLESPACE Y03Q2_CUST1_INDX, SUBPARTITION Y03Q2_CUST2 TABLESPACE Y03Q2_CUST2_INDX, … and so on for all subpartitions … SUBPARTITION Y03Q2_DEF TABLESPACE USER_DATA ), … and so on for all the partitions … PARTITION DEF ( SUBPARTITION DEF_CUST1 TABLESPACE USER_DATA, SUBPARTITION DEF_CUST2 TABLESPACE USER_DATA, … and so on for all subpartitions … SUBPARTITION DEF_DEF TABLESPACE USER_DATA ))

pr ligence Empowering Intelligence

Creating DDLs

StaticPart

StaticPart

VariablePart

VariablePart

create table tab1(………)

partition y03q1 (subpartition y03q1_cust1 tablespace …)

DDL toCreateTable

DDL toCreateTable

pr ligence Empowering Intelligence

Constraints

Constraints defined asDISABLE NOVALIDATE RELY

ALTER TABLE … ADD CONSTRAINT …RELY DISABLE NOVALIDATE;

pr ligence Empowering Intelligence

Constraint

• VALIDATE/NOVALIDATE– Table TAB1 (Column: STATUS)– Current Values A, I, F– Check Constraint: STATUS IN (‘A’,’I’)

• ENABLE/DISABLE– New Value ‘F’

• RELY

pr ligence Empowering Intelligence

RELY

Reasons• To Include Relation Information to the

Metadata• To Enable Query Rewrite

pr ligence Empowering Intelligence

Summary Tab and View

Summary Table

CUST_NAMECLAIM_DATEPROVIDER_IDNUM_CLAIMSNUM_LINES

ViewSELECT‘CUST1’ AS CUST_NAME,CLAIM_DATE,PROVIDER_ID,COUNT(DISTINCT CLAIM_ID) AS NUM_CLAIMS,COUNT(*) AS NUM_LINESFROM ….GROUP BY …

On Source

On DW

pr ligence Empowering Intelligence

Casting

SELECT CAST (CUST_NAME AS VARCHAR2(20))AS CUST_NAMEFROM <viewname>

CAST (column_name AS datatype (precision))

pr ligence Empowering Intelligence

cust1

SummaryTable

SummaryTable

DW

ViewView

Owned byCust Schema

TemporaryTable

Index ofTemporary

Table

Massaging

Analyzing

Filter:Where CLAIM_DATE

is in that quarter

INDEX

TABLE

For Customer Cust1 and Quarter Q1

pr ligence Empowering Intelligence

cust

DW

ViewView

Old Sub Partition

Old Sub Partition

INDEX

TABLE

ALTER TABLE … EXCHANGE SUBPARTITION subpartnameWITH TEMPTABLEINCLUDING INDEXES

pr ligence Empowering Intelligence

Technique

• Not Using DBMS_MVIEW.REFRESH• MV is always STALE

pr ligence Empowering Intelligence

Temp TableCREATE TABLE T1_Y03Q1_CUST1TABLESPACE Y03Q1_CUST1_DATAPARALLEL 8 NOLOGGING ASSELECT …FROM CUST1.VIEW1@DB1WHERE CLAIM_DATE >=add_months(trunc(to_date(‘03','RR'),'YYYY'), 3*(to_number(‘1')-1))and batch_date < last_day(add_months(trunc( to_date(‘03','RR'),'YYYY'), 3*(to_number(‘1')) - 1 )) + 1

pr ligence Empowering Intelligence

ScriptCREATE TABLE T1_Y&&YY.Q&&Q._&&CUSTTABLESPACE Y&&YY.Q&&Q._&&CUST._DATAPARALLEL 8 NOLOGGING ASSELECT …FROM &&CUST..VIEW1@&&DBLINKWHERE CLAIM_DATE >=ADD_MONTHS(TRUNC(TO_DATE('&&YY','RR'),'YYYY'), 3*(TO_NUMBER('&&Q')-1))AND BATCH_DATE < LAST_DAY(ADD_MONTHS(TRUNC( TO_DATE('&&YY','RR'),'YYYY'), 3*(TO_NUMBER('&&Q')) -1 )) + 1

pr ligence Empowering Intelligence

External Table

ReasonSource is a non-Oracle DB, e.g. DB2Source is External, no DB Link Allowed

Fixed Format –vs- DelimitedFixed Format

Faster, EasierMore Space

DelimitedLess SpaceSlower, Slightly More Complex

pr ligence Empowering Intelligence

Massaging

• Removing NOT NULL Constraints• Making Datatypes Consistent

– The CAST operation converts NUMBER(m,n) to NUMBER

– cast(col1 as number(10,2)) as col1_m

– COL1 NUMBER(5,2)– COL1_M NUMBER

pr ligence Empowering Intelligence

Analyzing• Using DBMS_STATS.GATHER_TABLE_STATS• PARALLEL Degree

dbms_stats.gather_table_stats ( ownname => ‘DWOWNER', tabname => '&&TABNAME', estimate_percent => dbms_stats.auto_sample_size, method_opt => 'FOR ALL INDEXED COLUMNS SIZE AUTO', degree => dbms_stats.default_degree, cascade => TRUE );

pr ligence Empowering Intelligence

Mat Views

MVs Created as TablesCREATE TABLE MV_SUMMTAB1

Storage clauses just like the underlying tableCREATE MATERIALIZED VIEW

MV_SUMMTAB1

ON PREBUILT TABLE

AS SELECT ……

http://www.proligence.com/painless_alter.pdf

pr ligence Empowering Intelligence

Query Rewrite

Table SUM_CLAIMS PROVIDER_ID, STATE, TYPE, TOT_AMT

Table MV_SUM_CLAIMS PROVIDER_ID, STATE, SUM(TOT_AMT) TOT_AMTGROUP BY PROVIDER_ID, STATE

SELECT SUM(TOT_AMT) FROM SUM_CLAIMSSELECT SUM(TOT_AMT) FROM

MV_SUM_CLAIMS

pr ligence Empowering Intelligence

Query Rewrite

Init.ora Parametersquery_rewrite_enabled='TRUE'query_rewrite_integrity='STALE_TOLERATED‘

ENFORCED – Rewrite only if guaranteedTRUSTED – Uses only if RELYSTALE_TOLERATED – Even if not RELY

pr ligence Empowering Intelligence

Checking QR

dbms_mview.explain_rewrite (‘select cust_name, count(*) from summtab1 group by

cust_name’ );

select message from rewrite_table;

QSM-01033: query rewritten with materialized view, MV_SUMMTAB1

QSM-01101: rollup(s) took place on mv, MV_SUMMTAB1

pr ligence Empowering Intelligence

Design …

MV_* subpartitions are on the same tablespace as the parents.

Subparts of MV_SUMMTAB1_0? are in the same TS as SUMMTAB1

Subparts of MV_SUMMTAB2_0? in SUMMTAB2

pr ligence Empowering Intelligence

Custom

er na

me Quarter

PARENT

MV1

MV2

TableSpace1 TableSpace2

pr ligence Empowering Intelligence

MV and Parents

• Partition Pruning• Partition-wise Joins• Partition Independence

pr ligence Empowering Intelligence

Adding Quarters/Customers

• Partition– Default Partition – VALUES LESS THAN

(MAXVALUE)• Subpartition

– Default Subpartition – VALUES (DEFAULT)

pr ligence Empowering Intelligence

Qtr1 Qtr2 Qtr3 DEF

Cust1

Cust2

Cust3

DEF

pr ligence Empowering Intelligence

Qtr1 Qtr2 Qtr3 DEF

Cust1

Cust2

Cust3

DEF

pr ligence Empowering Intelligence

Qtr1 Qtr2 Qtr3 DEF

Cust1

Cust2

Cust3

DEF

Cust4

alter table … split subpartition

pr ligence Empowering Intelligence

Qtr1 Qtr2 Qtr3 DEF

Cust1

Cust2

Cust3

DEF

pr ligence Empowering Intelligence

Qtr1 Qtr2 Qtr3 Qtr4

Cust1

Cust2

Cust3

DEF

DEF

alter table … split partition

pr ligence Empowering Intelligence

Backup/Restore

• Backup– ALTER TABLESPACE <TSName> READ ONLY– Copy the files to tape/CD.

• Restore– Copy the file back into the directory– ALTER TABLESPACE <TSName> RECOVER

pr ligence Empowering Intelligence

Archival/Purge

SP1 SP2 SP3Table

SP4

SP1 SP2 SP3Table

Table4

SP1 SP2 SP3 SP4Table Table4

Table4

pr ligence Empowering Intelligence

Archival/PurgeCREATE TABLE S1_Y<yy>Q<q>_<CustName>TABLESPACE Y<yy>Q<q>_<CustName>_<TSType> AS SELECT * FROM SUMMTAB1 WHERE 1=2/CREATE INDEXES, CONSTRAINTS, etc./ALTER TABLE SUMMTAB1 EXCHANGE SUBPARTITION Y<yy>Q<q>_<CustName> WITH TABLE Y<yy>Q<q>_<CustName> INCLUDING INDEXES/

pr ligence Empowering Intelligence

Check TTS

ALTER TABLESPACE Y<yy>Q<q>_<CustName>_<TSType> READ ONLY;

DBMS_TTS.TRANSPORT_SET_CHECK ( <DataTS>,<IndexTS>) ;

SELECT * FROM TRANSPORT_SET_VIOLATIONS;

pr ligence Empowering Intelligence

Transport TS

Export Parameter FileTRANSPORT_TABLESPACE=y TTS_FULLCHECK=YFILE=‘<FileLocation>/exp<TS>.dmp’TABLESPACES=(<DataTS>, <IndexTS>)

Copy the exp.dmp and Datafiles to tape/CD.

pr ligence Empowering Intelligence

Purge

Drop SubpartitionDrop the Tablespace

DROP TABLESPACE <TSName> INCLUDING CONTENTS AND DATAFILES;

pr ligence Empowering Intelligence

Restore

• ALTER TABLE SPLIT SUBPARTITION <DefaultSP>

• Copy Datafiles & Export Dump Files from CD/Tape

• Import Parameter FileTRANSPORT_TABLESPACES=YTABLESPACES=(<DataTS>,<IndexTS>)DATAFILES=(…)

pr ligence Empowering Intelligence

Minimizing Refresh Unit

• Months – instead of quarters refreshed at a time.• Last Quarter Split into a Subpartition per Month• Naming Convention

– YyyQqMmm– Y03Q3M09

• Merge Subpartition

pr ligence Empowering Intelligence

Merging Subpartitions

• Index Subpartitions Created in User’s Default Tablespace

• Subpartition TemplateALTER TABLE SUMTAB1 ADD SUBPARTITION

TEMPLATE

pr ligence Empowering Intelligence

Resumable Statement• When?

– Running Large Report Jobs– Creating Large Indexes

• ALTER SESSION ENABLE RESUMABLE NAME ‘Job1’;• View DBA_RESUMABLE

– NAME – Name specified in ALTER SESSION– COORD_SESSION_ID – Coord Session in PQ– SQL_TEXT – The text of the SQL– STATUS - RUNNING, SUSPENDED, ABORTED, ABORTING,

TIMEOUT – ERROR_NUMBER/ERROR_MSG

pr ligence Empowering Intelligence

Objectives Revisited

• To Minimize Downtime for Refreshes– Incrementally Refresh– Partitioning Techniques

• To Add Customers Easily• To Add Quarters Easily• To Archive Off and Purge Easily and Atomically• To Restore Archives Quickly

pr ligence Empowering Intelligence

Oracle 10G

• Transportable Tablespaces Can Be Reinstated At a Different Operating System– Can be used for Restoring to a Different OS

• Tablespaces Can Be Renamed– Restoring Tablespace of the Same Name

• Multiple Temporary Tablespace– For Large Index Creation, Sorting, etc.

pr ligence Empowering Intelligence

Oracle 10G contd.

• Partition Change Tracking Support for List Partitioning

• Query Rewrites Can Use Multiple MVs• OEM Shows All Partitioning Features• Data Pump

– Export/Import on Steroids– Parallel Operation

pr ligence Empowering Intelligence

Oracle 10G contd.

• External Table Download– A Utility to Create File from Table Data

CREATE TABLE …

ORGANIZATION EXTERNAL

AS SELECT * FROM <a query>

– Platform Independent File– Can Be Used In External Tables

pr ligence Empowering Intelligence

Thank You!

Please Submit ReviewsSession# 36605

www.proligence.cowww.proligence.comm

Recommended