Data Diagnositics in SAS Enterprise Guide -...

Preview:

Citation preview

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DATA DIAGNOSTICS IN SAS® ENTERPRISE GUIDE™

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DATA DIAGNOSTICS IN SAS® ENTERPRISE

GUIDE™AGENDA

How to…• impute missing data• describe data (descriptive statistics) • graph the data • detect and deal with outliers • assess normality • transform variables in order to meet assumptions

(transformations) • sample (for Modeling purposes)Q&A

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

ThreeTwoOne

INTRODUCING ENTERPRISE GUIDE SIMPLE AS 1,2,3

To work with SAS Enterprise Guide, you:1. Create a project2. Add data to the project3. Run tasks against the data.

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SCENARIO

• Company sells Outdoor and Sports items• Obtained a list of Customers with valid email

addresses• Compiled a data table with information so we can run

analytics and build a predictive model.

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

LETS TAKE A LOOK AT THE DATA

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

CUSTOMER DATA

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

PRODUCT ORDER DETAIL DATA -

TRANSACTIONAL

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

WE HAVE BUILT OUR ANALYTICAL DATA MART

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

REPLACING MISSING VALUES

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

REPLACING MISSING VALUES

1. SAS Enterprise Guide – Query Builder – Computed Column – Replace Values

2. SAS Code• PROC STDIZE - documentation• PROC DATASETS - documentation• PROC HPIMPUTE - documentation• SAS/STAT Proc MI - documentation

3. SAS Enterprise Miner – Impute Node• Class variables – count, default constant value, distribution,

tree, tree surrogate• Target variables – count, default constant value, distribution• Interval variables – mean, median, midrange, distribution,

tree, tree surrogate, mid-minimum spacing, Tukey’s Biweight, Huber, Andrew’s Wave, default constant

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

REPLACING MISSING VALUES PROC STDIZE OR PROC DATASETS

PROC STDIZE out=dataprep.analytics_table reponly missing=0;run;

PROC DATASETS lib=work;MODIFY zeros;FORMAT _all_; INFORMAT _all_;RUN;

or

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

REPLACING MISSING VALUES HPIMPUTE PROCEDURE

proc hpimpute data=sampsio.hmeq out=out1; input mortdue value clage debtinc; impute mortdue / value = 70000; impute value / method = mean; impute clage / method = random; impute debtinc / method = pmedian;

run; Available in SAS 9.4

HPIMPUTE Procedure Documentation

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

MISSING VALUE DEMO

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DESCRIPTIVE STATISTICS & GRAPHS

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DESCRIPTIVE STATISTICS &

GRAPHS

• Characterize Data• One-way Frequencies• Distributions• Reports• Bar Charts• Box Plots• Scatter Plots

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DESCRIPTIVE STATISTICS & GRAPHS DEMO

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

ASSESS NORMALITY

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

ASSESS NORMALITY

TasksDescribeDistribution Analysis

Graphs• Histograms• Q-Q Plot• Kernel Density Plot

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

ASSESS NORMALITY

TasksDescribeDistribution Analysis

4 Tests• Shapiro-Wilk• Kolmogorow-Smirnov (K-S)• Cramer-von Mises• Anderson-Darling

Testing Normality of Data using SASGuido’s Guide to PROC Univariate: A tutorial for SAS Users

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

ASSESS NORMALITY DEMO

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

TRANSFORM VARIABLES

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

TRANSFORMATIONS FOR NORMALITY

• Log• Square Root• Cube Root• Reciprocal• Square Transformation• Many more…

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

TRANSFORMING VARIABLES

• TotalSpent – Log Transformation• Age – Recode to categorical

Transforming Variables for Normality and LinearityBefore Logistic Modeling – A Toolkit for Identifying and Transforming Relevant Predictors

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

COMPUTED COLUMNS ‘ADVANCED EXPRESSION’

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

COMPUTED COLUMNS ‘RECODED’

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

TRANSFORM VARIABLES DEMO

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DETECT AND DEAL WITH OUTLIERS

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

WHAT IS AN OUTLIER

Outliers are observations that have extreme values relative to other observations observed under the same conditions.Sources:• Data Entry Errors• Implausible Values• Rare Events

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

WHY DETECT AND DEAL WITH OUTLIERS

• Bias or distortion of estimates• Inflated sums of squares• Distortion of p-values• Faulty conclusions

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DETECT OUTLIERS

• Graphs - Box Plots, Distributions, Scatter Plots• Univariate Statistics• Regression Cooks-D RSTUDENT Statistic DFFITS statistic DFBETAS

Introduction to Building a Linear Regression Model

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DEAL WITH OUTLIERS

Several Approaches• Deleting• Capping/Flooring Approach• Sigma Approach• Exponential Smoothing Approach• Mahalanobis Distance Approach• Robust-Reg Approach

Selecting the Appropriate Outlier Treatment for Common Industry ApplicationsA SAS Application to Identify and Evaluate OutliersRobust Regression and Outlier Detection with the RobustReg ProcedureRobust Outlier Identification using SAS

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

DETECT AND DEAL WITH OUTLIERS DEMO

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SAMPLING

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

WHY SAMPLE?

• Smaller Data • exploratory analysis• cost• speed/performance

• Oversample rare events• To get to population of interest• Other Statistical Reasons

• Validation or test of models• Adequate representation of the

population

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

TYPES OF SAMPLING

• Simple Random Sampling (SRS)• Stratified Sampling• Proportional Sampling• Other types

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

SAMPLING DEMO

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

RESOURCES

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

RESOURCES ENTERPRISE GUIDE

Enterprise Guide• Interactive Tour• SAS Talks• Enterprise Guide Public CoursesEnterprise Guide for SAS Programmer• New Goodies for the SAS® Programmer

in SAS® Enterprise Guide® 4.3• SAS® Enterprise Guide® for

Programmers

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

ADDITIONAL SUPPORT ENTERPRISE GUIDE TUTORIALS

• View Free Tutorials• http://support.sas.com/training/resources/

» SAS Enterprise Guide Tutorial» Getting Started with SAS Enterprise Guide » SAS Enterprise Guide Tutorial for Statistics

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

FURTHER TRAINING FROM SAS EDUCATION

• Enterprise Guide 1 : Query and Reporting• Enterprise Guide 2: Advanced Tasks and Querying• Enterprise Guide for Experienced SAS Programmers• Data Preparation for Data Mining

support.sas.com/training

C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .

QUESTIONS?Thank you for your time and attention!

Recommended