43
Copyright © 2013, SAS Institute Inc. All rights reserved. This information is confidential and covered under the terms of any SAS agreements as executed by customer and SAS Institute Inc. SEPTEMBER 10, 2013 Gary T. Ciampa SAS ® Solutions OnDemand Advanced Analytics Lab NESUG 2013 BIG DATA, FAST PROCESSING SPEEDS

BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SEPTEMBER 10, 2013

Gary T. Ciampa

SAS® Solutions OnDemand Advanced

Analytics Lab

NESUG 2013

BIG DATA, FAST PROCESSING SPEEDS

Page 2: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

2

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

OVERVIEW AND AGENDA

• Big data introduction

• SAS language performance tuning

• SAS system facilities

• SQL, MACRO and DATA STEP examples

• Case study - SAS Revenue Optimization Solution

• History and tuning techniques

• High Performance Revenue Optimization – GRID environment

• SAS emerging big data technologies

Page 3: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

3

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

BIG DATA INTRODUCTION

• Wiki Knows All: … is a collection of data sets so large and complex that it

becomes difficult to process using on-hand database management tools or

traditional data processing applications

• Forrester: … software and/or hardware solutions that allow firms to discover,

evaluate, optimize, and deploy predictive models by analyzing big data

sources to improve business performance or mitigate risk.

• Gartner: … technology is the management of high-volume, high-velocity and

high-variety information assets that demand cost-effective and innovative

forms of information processing for enhanced insight and decision making.

Page 4: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

4

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

… the management of high-volume, high-velocity and high-variety assets that demand

cost-effective and innovative forms of processing for enhanced insight and decision

making

Page 5: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

5

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

BIG DATA ACCORDING TO SAS

• Incorporates concepts of IDC dimensions

Volume – transactions, streaming, sensors, …

Variety – database, warehouse, text, email, metered, OLAP, stocks, etc…

Velocity – how fast the data is produced; and processed (near real-time)

• SAS considers additional dimensions

Variability - in velocity and variety of the data (peaks and valleys, seasonal)

Complexity - handling disparate sources to cleanse, transform, correlate and

establish relationships and hierarchies

• SAS Big Data Starting Point: http://www.sas.com/big-data

Page 6: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

6

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

APPROACHES TO PROCESSING BIG DATA

• Bigger, Faster, More Powerful is Better

Increase CPU processor speed and count

Increase MEMORY capability or speed

Faster Networks and Network Devices

High-speed disk arrays, or, direct memory disk arrays

• Parallel Processing

Multi-threading capabilities, distributed processing within or across nodes

Segmented data along with distributed processing

• Viable, but not always feasible within constraints (time, resource and

dollars)

Page 7: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

7

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SAS SYSTEM FACILITIES

• SAS command line options, AUTOEXEC and CONFIG processing

Customizes the SAS execution environment

Settings can affect performance significantly

Settings may have unexpected or unintended consequences

Set on command line, configuration or within the program

SAS Companion for <OS> (Windows, UNIX, z/OS)

Bonus Options

• VERBOSE option – emits options and configuration details

• RTRACE option – emits list of resources that are read, loaded

Page 8: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

8

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SAS SYSTEM AND HOST OPTIONS

• System Options, SAS Files

BUFNO, BUFSIZE, OBS,

IBUFNO, IBUFSIZE (index processing)

• System Administration Memory

MEMSIZE, SORTSIZE, SUMSIZE

• System Administration, Performance

CPUCOUNT, THREADS

• System Options for Macros

MLOGIC, MPRINT, SYMBOLGEN (everyone has their favorites)

• NOTE: Use the *correct* SAS Companion for the target OS

Page 9: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

9

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SAS SYSTEM FACILITIES

• SAS option STIMER or FULLSTIMER

System performance statistics, CPU, memory, real and elapsed time

Subtle differences depending on the OS

• SAS option MSGLEVEL – level of detail for messages to SAS log

• SAS option OBS – last observation or record to process

• ARM and PERF macro facility

Default or custom performance metrics at programmers discretion

PROC or DATA STEP statistics

User controlled START and STOP semantics across segments of SAS code

Discrete log and format to include macros to process and report on metrics

Page 10: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

10

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SAMPLE OPTIONS STATEMENTS & LOG

options obs=max fullstimer;

data work.sort500k;

set sgf2013.sort_500000;

run;

NOTE: DATA statement used (Total

process time):

real time 1.66 seconds

user cpu time 0.12 seconds

system cpu time 0.34 seconds

memory 356.15k

OS Memory 10424.00k

Timestamp 04/25/2013 03:16:21 PM

options obs=10;

data work.sort500k;

set sgf2013.sort_500000;

run;

NOTE: DATA statement used (Total

process time):

real time 0.03 seconds

user cpu time 0.00 seconds

system cpu time 0.03 seconds

Page 11: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

11

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SAMPLE ARM / PERF MACRO EXECUTION

%let _armexec=1;

%perfinit(applname="Glm_Appl_1");

%perfstrt(txnname="Glm_Txn1");

…. Do some work….

%perfstop;

%perfstrt(txnname="Glm_Txn2");

ods exclude all;

proc GLM data=one; model y = x1; by by; quit;

ods select all;

%perfstop;

Page 12: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

12

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SAMPLE ARM / PERF MACRO EXECUTION

…lines deleted…

G,1682537590.504000,2,2,Glm_Txn1, CPU ,IO_CNT ,MEMORY INFO ,THREAD

S,1682537590.426000,2,1,1,1.060806,1.341608,327491731,7266304,7532544,6,6

P,1682537590.504000,2,1,1,1.123207,1.357208,0,335645285,7266304,7532544,6,6

…lines deleted…

G,1682537590.504000,2,2,Glm_Txn2, CPU ,IO_CNT ,MEMORY INFO ,THREAD

S,1682537590.504000,2,2,2,1.123207,1.357208,335674088,7266304,7532544,6,6

P,1682537591.845000,2,2,2,1.653610,1.575610,0,340575257,11984896,11984896,6,6

SAS 9.3 Interface to Application Response Measurement (http://support.sas.com)

Page 13: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

13

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

OVERVIEW ENVIRONMENT AND INTRODUCTION

• Sample Environment

• RHEL Linux 5.6, Intel Xenon 2.67 GHz, 32 Cores, 256 MB; SAS 9.3,

• Oracle Table, 44 columns, 10 million records

• SAS Language Reference (cost, benefit and considerations)

• Understanding SAS Indexes

• Understanding Integrity Constraints

• Use EXISTS (0:04.6) rather than IN (0:05.2).

• For example,

select * from table_a a

where exists (select * from orders o

where a.prod_id=o.prod_id);

Page 14: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

14

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

INDEXES USING INDEXES FOR PERFORMANCE OPTIMIZATION

• INDEX Considerations (TANSTAAFL)

• Data file size, small tables would be suitable for sequential processing

• Change rate of the data and use key variables, NAME versus GENDER

• Generally used where sub-setting the data, 25% or less is typical

• Sort by key variables, ordered data improves index behavior

• Some operators, conditions are not optimized with an INDEX

• Arithmetic, variable-to-variable, sounds-like operator

• CONTAINS, IS NULL or IS MISSING, TRIM, SUBSTR*

• where amount !=0; 0:28.0 Minutes:Seconds.Tenths

• where amount > 0; 0:26.0 Minutes:Seconds.Tenths

Page 15: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

15

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

PROC SQL OPTIMIZING PROC SQL

• HAVING versus WHERE

• HAVING operates on all rows returned, not a subset

• Use HAVING on summary operations, after a restricted WHERE step

• Order statements, filter or select rows before grouping

• select state

from order

group by state

having state =’nc’;

• 01:50

• select state

from order

where state =’nc’;

group by state;

• 01:31

Page 16: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

16

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

PROC SQL OPTIMIZING PROC SQL

• Nested (sub-)queries

• Minimize nested queries with a small number of tables

• SUBQUERY versus JOIN

• select ename

from employees emp

where exists (select price from prices

where prod_id = emp.prod_id and prices.class=’j’);

• >05:00 minutes (terminated with prejudice)

• select ename,

from prices pr, employees emp

where pr.prod_id=emp.prod_id and pr.class=’j’;

• 01:40 seconds

Page 17: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

17

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

PROC SQL OPTIMIZING PROC SQL

• TABLE order

• Order of tables within the SQL statement impacts performance

• List the tables with the greatest number of rows left to right in the query

• SQL processing scans the last table listed, and merges all of the rows

• Assuming TAB1 has 20,000 rows, TAB2 has 10 rows

• select count (*) from tab2, tab1

• 0.61

• select count (*) from tab1, tab2

• 0.52

Page 18: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

18

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

PROC SQL OPTIMIZING PROC SQL

• EXISTS versus DISTINCT for table join

• select distinct date,name

from sales s, employee emp

where s.prod_id=emp.prod_id;

• > 7 minutes

• select date, name

from sales s

where exists(select ’x’ from

employee emp

where emp.prod_id = s.prod_id);

• 0:11 seconds (including post distinct step)

• SAS 9.3 SQL Procedure User's Guide

Page 19: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

19

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SAS MACRO OPTIONS AND CONSIDERATIONS

• Use MLOGIC, MPRINT & SYMBOLGEN – development phase

• Do NOT use MLOGIC, MPRINT & SYMBOLGEN – production

• Stored Compiled Macro Facility

• Permanent SAS catalog

• Protect intellectual property

• Both AUTOCALL and SESSION macros are available

• Override compiled macros with session instances or AUTOCALL semantics

• Minimize nesting macro definitions

Page 20: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

20

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SAS MACRO NESTING MACRO INSTANCE

• Avoid nesting macros where possible

• %macro m1;

%macro m2; /* nested macro */

%mend m2;

%mend m1;

• 02.81

• %macro m1;

<macro 1 code goes here>

%mend m1;

%macro m2;

<macro 2 code goes here>

%mend m2;

• 02.45

Page 21: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

21

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SAS DATA STEP A FEW EXAMPLES TO CONSIDER

• Missing values may perturb performance

• “.” is propagated across all calculations

• total=t4+(x*b)+c*(abc);

• 01:03 (63 seconds)

• total=(x*b)+c*(abc) + t4;

• 00:59

• Superior practice, check for “.” before expression

• if <operand> ne . then do <expression>; end;

Page 22: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

22

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SAS DATA STEP A FEW EXAMPLES TO CONSIDER

• PROC FORMAT: User defined formats associated with variables

• Details in the Base SAS 9.3 Procedures Guide

• Reference the format throughout the code, simplifies logic and support

• if educ = 0 then neweduc="< 3 yrs old";

else if educ=1 then neweduc="no school";

else if educ=2 then neweduc="nursery school";

• 10:54

• proc format; value educf

0="< 3 yrs old“ 1="no school“ 2="nursery school";

… neweduc=put(educ,educf); …

• 10:32

Page 23: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

23

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SAS DATA STEP A FEW EXAMPLES TO CONSIDER

• Using the IN operator, versus OR conditions

• OR function checks all the conditions

• IN function matches first occurrence

• if x=8 or x=9 or x=23 or x=45 then do; end;

• 01:04

• if x in (8,9,23,45) then do; end;

• 00:58

Page 24: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

24

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SAS USER FEEDBACK: “IN” VERSUS “OR” VALIDATION

• Thanks to Bruce Gilsen at Federal Reserve for independent validation

• Bruce’s Optimization Validation

• 1,000,000 OBS, 100 VARIABLES with RANGE VALUES 1 to 100

• Independent DATA STEP, using IN versus OR

• IN 8.15 / 7.88 Seconds (REAL / CPU)

• OR 21.75 / 21.73 Seconds (REAL / CPU)

data two;

set one;

array vall (*) v1-v100;

drop i;

do i = 1 to 100;

if vall(i) in (1 2 3 4 5 6 7 8 9 10 … 99)

then vall(i) = vall(i) + 100; end; run;

data two;

set one;

array vall (*) v1-v100;

drop i;

do i = 1 to 100;

if vall(i)= 1 or vall(i) = 2 or vall(i) = 3 or vall(i) = 4

… vall(i) = 99 then vall(i) = vall(i) + 1000; end; run;

Page 25: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

25

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

CASE STUDY - SAS REVENUE OPTIMIZATION SOLUTION

• Big Data Introduction

• SAS Language Performance Tuning

• SAS System Facilities

• SQL, MACRO and DATA STEP examples

• Case Study - SAS Revenue Optimization Solution

• History and Tuning Techniques

• High Performance Revenue Optimization – GRID Environment

• SAS Emerging Big Data Technologies

Page 26: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

26

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SOLUTIONS ONDEMAND ADVANCED ANALYTICS LAB

• Over a petabyte of data, 400+ customers

• Customer Profiles

Variety of industry sectors, private as well as public

Multi-tier deployments, client, mid-tier, analytic tier and RDBMS

Daily and Weekly ETL feed requirements

• PROD, QA, DEV environments and data synchronization

• Disparate analytic processing (batch) schedules

• Backup and restore processing that minimizes performance impacts

• 99.5% up time service level agreements

Page 27: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

27

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

CASE STUDY SAS REVENUE OPTIMIZATION SOLUTION

• Problem Statement: 33 hours of processing time for one batch component

using 30% of projected data. Linear projection approximately 110 hours or 4

½ days processing time.

• Requirement to fit batch into a 40 hour window

• AIX 6.1+, Power7, 64 Bit attached to EMC SAN Arrays

• 7 CPUS, SMT=4, 128GB RAM, 3700 IOPS, CPU 45%

• Approximately 1.2 TB of DATA, target 1.6 TB primary warehouse

• Focus on the most significant issues and then repeat as new issues arise

Page 28: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

28

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

Page 29: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

29

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

Page 30: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

30

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

Page 31: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

31

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

• SAS WORK volume

• Eight-way stripe with eight paths

• Warehouse

• Fixed Tier 1 EMC storage; 80 x 100GB disk arrays

• Moved support directories off of volume

Page 32: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

32

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

Weekly Performance

• Parallel Executions

• 16 processes

• 54 processes

• IO/SEC

• 8.5K to 15.3K

• CPU Idle Time

• 42% to 13%

• Weekly Batch Time

• 60 hours

• 43 hours

• GEO_PRODS

• 67 Million

• 92 Million

Page 33: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

33

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

Page 34: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

34

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

Page 35: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

35

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

Page 36: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

36

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SAS GRID SAS REVENUE OPTIMIZATION SOLUTION

• Initial RO Versions used SAS/Connect parallel processing

• Single host deployments with concurrent analytics

• Flat data warehouse structure, non-partitioned SAS tables

• SAS High Performance Revenue Optimization Enhancements

• SAS TK GRID architecture distributed processing across grid nodes

• SAS data partitions distributed across grid nodes

• ETL processes, daily and weekly to distribute data across partitions

• Grid Captain to manage the processing and analytic across grid nodes

Page 37: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

37

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SAS GRID SAS REVENUE OPTIMIZATION NON GRID

Page 38: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

38

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SAS GRID SAS HIGH PERFORMANCE REVENUE OPTIMIZATION

Page 39: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

39

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SAS GRID & EMERGING TECHNOLOGIES

• SAS Grid Manager: distributed SAS processing

Scheduling, Workload Balancing, High Availability & Management

• SAS In-Data Base: queries, aggregations, analytics within DBMS

9.2M3: DB2, EDW & Oracle; 9.3 Netezza

• HADOOP

Scalable, fault tolerant, distributed files system

SAS integration includes access, analysis and management

• SAS In Memory Analytics

Distributed, descriptive, inferential to visualization analytics

• Visual Analytics and Visual Analytics HPA

Page 40: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

40

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SAS TECHNICAL SUPPORT

Page 41: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

41

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SAS BIG-DATA HOME PAGE

Page 42: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

42

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SUMMARY CONSIDERATIONS – PERFORMANCE IMPROVEMENT IS

A CONTINUAL PROCESS

Focus on the most severe hotspots within SAS program

and operating environment

Use INDEX where appropriate

Exploit SAS OPTIONS tuning

Consider SAS Grid Products

Evaluate SAS Visual Analytics

and Visual Analytics HPA

Page 43: BIG DATA, FAST PROCESSING SPEEDSPROC SQL OPTIMIZING PROC SQL • TABLE order • Order of tables within the SQL statement impacts performance • List the tables with the greatest

www.SAS.com

Copyr i g ht © 2013, SAS Ins t i tu t e Inc . A l l r ights reser ve d . T his in f ormat i o n is conf iden t i a l and cover e d under the terms of any SAS agr eem e nts as exec u t e d by cus tomer and SAS Ins t i tu t e Inc .

SAS SOLUTION ON DEMAND

ADVANCED ANALYTICS LAB

[email protected]