40
Copyright © 2003, SAS Institute Inc. All rights reserved. SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or Trademarks of their respective companies Scaling SAS® Data Access to Oracle® RDBMS Howard Plemmons SAS Institute Inc. Andrew Holdsworth Oracle Corporation

Scaling Data

  • Upload
    tess98

  • View
    878

  • Download
    3

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.SAS is a registered trademark or trademark of SAS Institute Inc. in the USA and other countries. ® indicates USA registration. Other brand and product names are registered trademarks or Trademarks of their respective companies

Scaling SAS® Data Access toOracle® RDBMS

Howard PlemmonsSAS Institute Inc.Andrew HoldsworthOracle Corporation

Page 2: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Scaling

What is Scaling?

Page 3: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Scaling

“To remove the scales of a fish”

“To climb up by means of a scaling ladder”

“To reach the highest point”

Data

Page 4: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Scaling Data

Why Scale to Data

Page 5: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Scaling Data

SAS tools, SAS/ACCESS®

SAS Procedure and Processes

Oracle tools

Oracle Procedures and Processes

Page 6: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Intelligence Value Chain

Page 7: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Intelligence Value Chain Silver into Gold

Page 8: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

SAS System 9

Page 9: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

SAS V8 vs. SAS System 9

FEATURE SAS V8 SAS System 9

Libname Engine x x

Procedure Interface x x

Fast Load x x

Threaded Interface x

Page 10: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

SAS V8 I/O Model

Page 11: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Threaded Interface SAS 9

Page 12: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

SAS Procedures proc sort

proc summary

proc dmine

proc reg; proc dmreg

proc means

proc loess; proc dmdb

proc glm

proc robustreg

Page 13: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

SAS/ACCESS® Engines

ORACLE

DB2

Informix

ODBC

Sybase

Teradata

Page 14: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Libname and SAS Procedure Controls

dbslice (“where”,”where”,…)

dbsliceparm (ALL,…)

defaults (THREADED_APPS,2)

options sastrace=‘,,t’;

procedure controls – CPU count

Page 15: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Options In Action - DBSLICEPARM

-dbsliceparm none

option dbsliceparm=

libname x oracle user=scott pass=tiger

dbsliceparm=(threaded_apps,2);

proc print data=y.oratab (dbsliceparm=(all,4)); run;

Page 16: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Options In Action - DBSLICE

libname x oracle user=scott pass=tiger;

proc print data=x.oratab (dbslice= (“where x<100”, “where x >= 100”) );

Page 17: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Options In Action – CPUCOUNT, THREADS

CPUCOUNT=

THREADS | NOTHREADS

Page 18: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Process

Libname controls

Procedure controls

Execution

Page 19: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Linear Scalability

Achieved Speedup

Scalability – SAS 9 Threaded speedup in PROC REG

Run on 12-way Unix Box

Page 20: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Scalability – SAS 9 Threaded speedup in PROC SORT

Run on 8-way Unix BoxTests run in memory cache

Page 21: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

What Does This Mean - access

393000 Rows

No Threads - baseline

Two Threads (DBSLICE) – 31%

Six Threads (DBSLICEPARM) – 54%

Run on 10-way Unix BoxTests run in memory cache

Page 22: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Scaling Data

Data Volumes

Data ACCESS

Data Organization

Scaling using Oracle - Andrew

Page 23: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Scaling with

The Star Query

Use of Parallelism

Use of the Direct Path

Use of Specialist Indexes

Use of Analytical Functions

Use of Materialized Views

Use of The Oracle9i Optimizer

Page 24: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

The Star Query

Fact

Product

Time

Geography

Customer

Page 25: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Star Queries The star query is a very common DW

technique. It is highly optimized in Oracle and can be tuned depending on the type of queries. In summary the more known about the query composition the higher level of optimization possible.

Page 26: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Star Query Optimization

The Optimization is 3 step Process1.Apply query predicates to dimension tables to generate

lists of foreign keys into the fact table.

2.Query the fact table using series of single column bit mapped indexes on the foreign keys

3.Having resolved the query within the fact table complete the query by joining back to dimension tables where needed and roll the query up.

Page 27: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Star Queries

– To enable star queries the DBA should do the following1. Build single column bitmapped indexes on each

foreign key in the fact table

2. Build indexes on the dimension tables for query predicates

3. Build indexes on the dimension tables to assist in the join back and roll up process

4. Generate statistics for the schema

5. Set the parameter STAR_TRANSFORMATION_ENABLED=TRUE

Page 28: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Use of Parallelism

Multiple CPUs to execute a single query as well multiple concurrent queries

Execute Table scans, Index probes and scans in parallel

Execute Joins and Sorts in parallel

Execute DML in parallel

Parallelism can be configured manually or automatically

Page 29: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Use of Partitioning

Partitioning was originally designed to allow management of large db objects however by partitioning data performance gains can be made by the following• Partition pruning

• Join optimizations

Partitioning can be done by the following methods• Range e.g. Data or key ranges

• List e.g. Discrete values such as State

• Hash to achieve equal size partitions

Two types of partitioning can be applied

Page 30: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Use of The Direct Path

By pass the conventional transaction layer to insert and copy data within the database

SQL*Loader is user currently by SAS

Other options include• Insert with /*+ append */ hint

• Create Table as Select with NOLOGGING

These constructs can be used to transform vast amounts of data rapidly in parallel

Page 31: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Specialist Indexes

B-Tree Indexes

Bit Mapped Indexes including join indexes

Functional Indexes

Page 32: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Analytical Functions

Oracle has embraced the ANSI OLAP extensions to SQL

These permit faster response times on queries that would require multiple passes of the data with conventional SQL

This allows grouped results and functionality such as moving averages

Page 33: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Materialized Views

Materialized view allow automatic use of summary tables without a user having to re-write the query

Well designed materialized views are small in size and can increase performance by orders of magnitude.

Materialized views are in fact Oracle tables and can use all other features to improve performance

Page 34: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Oracle9i Optimizer

On upgrade of Oracle Releases the Optimizer behavior will change

The Optimizer is tested with over 400,000 SQL Statements

• Where plans change between releases the actual query is ran to test for degradation

• Slower plans are corrected

It is still important to have good representative Statistics

DBMS_STATS package allows parallel generation and migration of schema statistics

Page 35: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Oracle9i Optimizer

Some common Optimizer problems seen with Oracle9i

• Bad or incomplete statistics

• Init.ora parameters influencing optimizer

• SQL written for RBO

Page 36: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Summary

Oracle and SAS provide techniques for scaling to larger databases by optimizing both query performance and fetch performance.

These techniques are simple to adopt and allow huge productivity improvements

We have identified some core technologies here however this is a partial picture of the SAS/Oracle ability.

Page 37: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

About the Speakers

Howard Plemmons Andrew HoldsworthSenior Software Manager Director

SAS Institute Inc. Oracle Corp.

SAS Circle 500 Oracle Pkwy,

Cary, NC Redwood Shores, CA94065

Phone:

919-531-7779 650-506-2938

E-mail:

[email protected] [email protected]

Page 38: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Other SUGI Papers/Presentations

•PC File Data Objects Directly from UNIX – 8:00am Tuesday

•SAS/ACCESS and use of Metadata – Rm 619 @ 2:30

•Lessons in Scalability – SAS Presents – 3:20 Tuesday

•Data Warehousing section - performance

Page 39: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.

Scaling SAS Data ACCESS to ORACLE RDBMS

Page 40: Scaling Data

Copyright © 2003, SAS Institute Inc. All rights reserved.Copyright © 2003, SAS Institute Inc. All rights reserved. 40