51

Best Partitioning

Embed Size (px)

DESCRIPTION

111

Citation preview

  • Get the best out of Oracle Partitioning

    Yasin Mohammed Technology [email protected]

    Nirmal GrewalTechnology Sales [email protected]

  • Agenda

    Partitioning in a nutshell

    Getting optimal pruning

    Partition exchange loading

    Partitioning and unusable indexes

    Efficient statistics management

    Q&A

  • The Concept of Partitioning

    Simple Yet Powerful

    Large Table

    Difficult to Manage

    Partition

    Divide and Conquer

    Easier to Manage

    Improve Performance

    Composite Partition

    Better Performance

    More flexibility to match business needs

    Transparent to applications

  • It is

    Powerful functionality to logically partition objects into

    smaller pieces

    Only driven by business requirements

    Partitioning for Performance, Manageability, and

    Availability

    What is Oracle Partitioning?

    It is not

    Just a way to physically divide or clump - any large

    data set into smaller buckets

    Enabling pre-requirement to support a specific

    hardware/software design

    Hash mandatory for shared nothing systems

  • Physical versus Logical Partitioning

    Shared Nothing Architecture

    Physical Partitioning

    Fundamental system setup

    requirement

    Node owns piece of DB

    Enables parallelism

    Number of partitions is equivalent to min.

    parallelism

    Always needs HASH distribution

    Equally sized partitions per node required

    for proper load balancing

    DB DB DB

  • Physical versus Logical Partitioning

    Shared Everything Architecture - Oracle

    Logical Partitioning

    Does not underlie any constraints

    SMP, MPP, Cluster, Grid does not matter

    Purely based on the business

    requirement

    Availability, Manageability, Performance

    Beneficial for every environment

    Provides the most comprehensive

    functionality

    DB

  • Agenda

    Partitioning in a nutshell

    Getting optimal pruning

    Partition exchange loading

    Partitioning and unusable indexes

    Efficient statistics management

    Q&A

  • Sales Table

    May 22nd 2008

    May 23rd 2008

    May 24th 2008

    May 18th 2008

    May 19th 2008

    May 20th 2008

    May 21st 2008

    Select sum(sales_amount)

    From SALES

    Where sales_date between

    to_date(05/20/2008,MM/DD/YYYY)

    And

    to_date(05/23/2008,MM/DD/YYYY);

    Q: What was the total

    sales for the weekend

    of May 20 - 22 2008?

    Only the 3

    relevant

    partitions are

    accessed

    Partition Pruning

  • Partition Pruning

    Works for simple and complex SQL statements

    Support for every data access

    Transparent to any application

    No extra coding required

    Two flavors of pruning

    Static pruning at compile time

    Dynamic pruning at runtime

    Complementary to Exadata Storage Server

    Partitioning prunes logically through partition elimination

    Exadata prunes physically through storage indexes

    Further data reduction through filtering and projection

  • Relevant Partitions are known at compile time

    Look for actual values in PSTART/PSTOP columns in the

    plan

    Optimizer has most accurate information for the SQL

    statement

    04-May04-Apr04-Feb04-Jan

    SELECT sum(amount_sold) FROM sales

    WHERE times_id

    BETWEEN 01-MAR-2004 and 31-MAY-2004;

    04-Mar 04-Jun

    Static Partition Pruning

  • Static Pruning

    Sample plan

  • Static Pruning

    Sample plan

  • SELECT sum(amount_sold)

    FROM sales s, times t

    WHERE t.time_id = s.time_id

    AND t.calendar_month_desc IN

    (MAR-2004, APR-2004,

    MAY-2004);

    04-May

    04-Apr

    04-Feb

    04-Jan

    04-Mar

    04-Jun

    Sales

    Time

    Dynamic Partition Pruning

    Advanced Pruning mechanism for

    complex queries

    Recursive statement evaluates the

    relevant partitions at runtime

    Look for the word KEY in PSTART/PSTOP

    columns in the Plan

  • Sample explain plan output

    Dynamic Partition Pruning

    Nested Loop

    Sample plan

  • Sample explain plan output

    Dynamic Partition Pruning

    Nested Loop

    Sample plan

  • Sample plan

    Dynamic Partition Pruning

    Subquery pruning

  • Sample plan

    Dynamic Partition Pruning

    Bloom filter pruning

  • 20

    Enhanced Pruning Capabilities

    Oracle Database 11g Release 2

    Extended modeling capabilities for better data

    placement and pruning

    Support for virtual columns as primary and foreign key for

    Reference Partitioning

    Enhanced optimizer support for Partitioning

    AND pruning

    Intelligent multi-branch execution plan with unusable index

    partitions

  • 21

    AND Pruning

    All predicates on partition key will used for pruning

    Dynamic and static predicates will now be used combined

    A.k.a. multi-predicate pruning

    Example:

    Star transformation with pruning predicate on both the FACT

    table and a dimensionFROM sales s, times t

    WHERE s.time_id = t.time_id ..

    AND t.fiscal_year in (2000,1999)

    AND s.time_id

    between TO_DATE('01-JAN-1999','DD-MON-YYYY')

    and TO_DATE('01-JAN-2000','DD-MON-YYYY')

    Dynamic pruning

    Static pruning

  • AND Pruning

    Sample plan

  • Ensuring Partition Pruning

    Dont use functions on partition key filter predicates

  • Ensuring Partition Pruning

    Dont use functions on partition key filter predicates

  • Agenda

    Partitioning in a nutshell

    Getting optimal pruning

    Partition exchange loading

    Partitioning and unusable indexes

    Efficient statistics management

    Q&A

  • Sales Table

    May 22nd 2008

    May 23rd 2008

    May 24th 2008

    May 18th 2008

    May 19th 2008

    May 20th 2008

    May 21st 2008

    DBA

    1. Create external table

    for flat files

    4. Alter table Sales

    exchange partition

    May_24_2008 with table

    tmp_sales

    2. Use CTAS command

    to create non-

    partitioned table

    TMP_SALES

    Tmp_ sales Table

    Sales Table

    May 22nd 2008

    May 23rd 2008

    May 24th 2008

    May 18th 2008

    May 19th 2008

    May 20th 2008

    May 21st 2008

    5. Collect

    stats

    Sales

    table now

    has all the

    data3. Create indexes

    Tmp_ sales

    Table

    Partition Exchange loading

  • Agenda

    Partitioning in a nutshell

    Getting optimal pruning

    Partition exchange loading

    Partitioning and unusable indexes

    Efficient statistics management

    Q&A

  • Unusable Indexes

    Unusable index partitions are commonly used in

    environments with fast load requirements

    Safe the time for index maintenance at data insertion

    Unusable index segments do not consume any space (11.2)

    Unusable indexes are ignored by the optimizer SKIP_UNUSABLE_INDEXES = [TRUE | FALSE ]

    Partitioned indexes can be used by the optimizer

    even if some partitions are unusable

    Prior to 11.2, static pruning and only access of usable index

    partitions mandatory

    With 11.2, intelligent rewrite of queries using UNION ALL

  • Intelligent Multi-Branch Execution

    Intelligent UNION ALL expansion in the presence of

    partially unusable indexes

    Transparent internal rewrite

    Usable index partitions will be used

    Full partition access for unusable index partitions

  • Multi-Branch Execution

    Sample plan

  • Agenda

    Partitioning in a nutshell

    Getting optimal pruning

    Partition exchange loading

    Partitioning and unusable indexes

    Efficient statistics management

    Q&A

  • Statistics Gathering

    You must gather optimizer statistics

    Using dynamic sampling is not an adequate solution

    Statistics on global and partition level recommended

    Run all queries against empty tables to populate

    column usage

    This helps identify which columns automatically get

    histograms created on them

    Optimizer statistics should be gathered after the data

    has been loaded but before any indexes are created

    Oracle will automatically gather statistics for indexes as they

    are being created

  • Efficient Statistics Management

    Use AUTO_SAMPLE_SIZE

    The only setting that enables new efficient statistics collection

    Hash based algorithm, scanning the whole table

    Speed of sampling, accuracy of compute

    Enable incremental global statistics collection

    Avoids scan of all partitions after changing single partitions

    Prior to 11.1, scan of all partitions necessary for global stats

    Managed on per table level

    Static setting

  • Incremental Global Statistics

    Sales Table

    May 22nd 2008

    May 23rd 2008

    May 18th 2008

    May 19th 2008

    May 20th 2008

    May 21st 2008

    Sysaux Tablespace

    1. Partition level stats are

    gathered & synopsis

    created

    2. Global stats generated by aggregating partition

    synopsis

  • Incremental Global Statistics Contd

    Sales Table

    May 22nd 2008

    May 23rd 2008

    May 24th 2008

    May 18th 2008

    May 19th 2008

    May 20th 2008

    May 21st 2008

    Sysaux Tablespace

    3. A new partition is added to the table & Data is

    Loaded

    May 24th 2008 4. Gather partition statistics for new

    partition

    5. Retrieve synopsis for each of the other

    partitions from Sysaux

    6. Global stats generated by aggregating the original

    partition synopsis with the new one

  • Step necessary to gather accurate statistics

    Turn on incremental feature for the tableEXEC

    DBMS_STATS.SET_TABLE_PREFS('SH,'SALES','INCREMENTAL','TRUE');

    After load gather table statistics using GATHER_TABLE_STATS

    No need to specify parameters

    EXEC DBMS_STATS.GATHER_TABLE_STATS('SH','SALES');

    The command will collect statistics for partitions and update the global

    statistics based on the partition level statistics and synopsis

    Possible to set incremental to true for all tables

    Only works for already existing tables

    EXEC DBMS_STATS.SET_GLOBAL_PREFS('INCREMENTAL','TRUE');

    Partition Advisor SQL Access Advisor

  • Summary

    Partitioning in a nutshell

    Getting optimal pruning

    Partition exchange loading

    Partitioning and unusable indexes

    Efficient statistics management

    Demo (Performance & availability)

  • Partitioning Demonstration

    Date 16/03/2010

    Data Partitioning provides significant Service Benefits

  • Scenario Partitioning for Reliability

    Two interactive scenarios will demonstrate:

    Query performance between Partitioned vs Non-

    Partitioned data.

    Query resilience against unanticipated events

    affecting data availability.

  • Demo Data Overview : Sales InformationBelow tables hold same sales information, with different storage structures

    Size: 5,513,058 sales entries

    Table: SALES_p1

    Partitioning Scheme used: Initially Partitioned into yearly, halve-yearly, and quarterly periods

    Further Partitioned by country regions

    Table: SALES_nop1

    For comparison purposes, a similar non-partitioned table is created.

    050000

    100000150000200000250000300000350000400000450000

  • Demo Data Overview : Customer Information

    Below tables hold same customer information, with different storage structures

    Size: 832,500 customer entries

    Table: CUSTOMER_p1

    Partitioning Scheme used: Partitioned by country regions

    Table: CUSTOMER_nop1

    For comparison purposes, a similar table is created as non-partitioned.

  • Demo Distribution of Customers across

    Countries:

    0

    50000

    100000

    150000

    200000

    250000

    300000

    COUNT(C.CUST_ID)

    Brazil

    Denmark

    Poland

    South Africa

    China

    United Kingdom

    New Zealand

    Saudi Arabia

    United States of America

    Germany

    Spain

    France

    Australia

    Canada

    Singapore

    Argentina

    Italy

    Japan

    Turkey

  • Demonstration Infrastructure

    Equipment Amazon Cloud-based Virtual Machine Image (AMI)

    2 Core, 1.7 GB Memory

    Oracle Linux OS (OEL 5)

    Software Configuration of Public Amazon AMI Oracle 11g Enterprise Edition (v11.1.0.7)

    One disk /u02 (dev8-2) dedicated to Oracle storage I/O for benchmark accuracy.

    Oracle Sample Data (i.e. SH repository )installed and extended for demo purposes.

  • Scenario 1 Key Performance Benefits

    Areas highlighted in red show the resulting overhead of accessing normal table structures.

    Areas highlighted in green reveal the overhead benefits of accessing partitions instead the whole table.

    Note: Above graph details were generated by sar directives from a VMWare image installed on a notebook. The Demo will use an Amazon AMI.

  • Scenario 2 Key Availability Benefits

    Non Partitioned Table

    For a particular date range, i.e. 1999 - 2000, relevant non-partitioned tables are not reachable resulting in an error.

    Database files that hold data involved in below queries have

    been accidently removed !!!!

    Partitioned Table

    As expected, for a particular date range, i.e. 1999 - 2000, data in a partitioned table is not reachable resulting in an error.

    Note: sales_nop2 data resides in example tablespace which in-turn references datafile example01.dbf.

  • Scenario 2 Key Availability Benefits

    (Cont.)

    Partitioned Table

    More current information, i.e. 2000 - 2001, data is now reachable as a result of using partitions to better isolate against data disruptions.

    related to database file: example01_01.dbf

    related to missing file: example02_01.dbf

    related to database file: example03_01.dbf

    Non Partitioned Table

    More current date range, i.e. 2000 - 2001, data in is still not reachable resulting in an error.

  • Demo Summary

    Performance improvement - Scenario 1

    Up to 3 times faster than traditional methods of data retrieval.

    Availability Enhancement - Scenario 2

    Limits detrimental effects of data access failures.

    Data Partitioning provides significant Service Benefits

    In General

    Improves system scalability and manageability.

    Adds data-level of protection to any High Availability strategy.

  • Next Steps

    Upcoming Webinars

    More coming soon!!!

    http://otn.oracle.com/database

    Follow OracleDirect ANZ on Twitter at

    http://www.twitter.com/OracleDirectANZ

    Or our NEW!!! blog http://blogs.oracle.com/techtalk

    http://otn.oracle.com/databasehttp://www.twitter.com/OracleDirectANZhttp://blogs.oracle.com/techtalk