Upload
hoangque
View
229
Download
3
Embed Size (px)
Citation preview
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
DATA DIAGNOSTICS IN SAS® ENTERPRISE GUIDE™
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
DATA DIAGNOSTICS IN SAS® ENTERPRISE
GUIDE™AGENDA
How to…• impute missing data• describe data (descriptive statistics) • graph the data • detect and deal with outliers • assess normality • transform variables in order to meet assumptions
(transformations) • sample (for Modeling purposes)Q&A
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
ThreeTwoOne
INTRODUCING ENTERPRISE GUIDE SIMPLE AS 1,2,3
To work with SAS Enterprise Guide, you:1. Create a project2. Add data to the project3. Run tasks against the data.
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
SCENARIO
• Company sells Outdoor and Sports items• Obtained a list of Customers with valid email
addresses• Compiled a data table with information so we can run
analytics and build a predictive model.
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
LETS TAKE A LOOK AT THE DATA
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
CUSTOMER DATA
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
PRODUCT ORDER DETAIL DATA -
TRANSACTIONAL
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
WE HAVE BUILT OUR ANALYTICAL DATA MART
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
REPLACING MISSING VALUES
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
REPLACING MISSING VALUES
1. SAS Enterprise Guide – Query Builder – Computed Column – Replace Values
2. SAS Code• PROC STDIZE - documentation• PROC DATASETS - documentation• PROC HPIMPUTE - documentation• SAS/STAT Proc MI - documentation
3. SAS Enterprise Miner – Impute Node• Class variables – count, default constant value, distribution,
tree, tree surrogate• Target variables – count, default constant value, distribution• Interval variables – mean, median, midrange, distribution,
tree, tree surrogate, mid-minimum spacing, Tukey’s Biweight, Huber, Andrew’s Wave, default constant
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
REPLACING MISSING VALUES PROC STDIZE OR PROC DATASETS
PROC STDIZE out=dataprep.analytics_table reponly missing=0;run;
PROC DATASETS lib=work;MODIFY zeros;FORMAT _all_; INFORMAT _all_;RUN;
or
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
REPLACING MISSING VALUES HPIMPUTE PROCEDURE
proc hpimpute data=sampsio.hmeq out=out1; input mortdue value clage debtinc; impute mortdue / value = 70000; impute value / method = mean; impute clage / method = random; impute debtinc / method = pmedian;
run; Available in SAS 9.4
HPIMPUTE Procedure Documentation
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
MISSING VALUE DEMO
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
DESCRIPTIVE STATISTICS & GRAPHS
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
DESCRIPTIVE STATISTICS &
GRAPHS
• Characterize Data• One-way Frequencies• Distributions• Reports• Bar Charts• Box Plots• Scatter Plots
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
DESCRIPTIVE STATISTICS & GRAPHS DEMO
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
ASSESS NORMALITY
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
ASSESS NORMALITY
TasksDescribeDistribution Analysis
Graphs• Histograms• Q-Q Plot• Kernel Density Plot
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
ASSESS NORMALITY
TasksDescribeDistribution Analysis
4 Tests• Shapiro-Wilk• Kolmogorow-Smirnov (K-S)• Cramer-von Mises• Anderson-Darling
Testing Normality of Data using SASGuido’s Guide to PROC Univariate: A tutorial for SAS Users
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
ASSESS NORMALITY DEMO
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
TRANSFORM VARIABLES
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
TRANSFORMATIONS FOR NORMALITY
• Log• Square Root• Cube Root• Reciprocal• Square Transformation• Many more…
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
TRANSFORMING VARIABLES
• TotalSpent – Log Transformation• Age – Recode to categorical
Transforming Variables for Normality and LinearityBefore Logistic Modeling – A Toolkit for Identifying and Transforming Relevant Predictors
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
COMPUTED COLUMNS ‘ADVANCED EXPRESSION’
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
COMPUTED COLUMNS ‘RECODED’
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
TRANSFORM VARIABLES DEMO
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
DETECT AND DEAL WITH OUTLIERS
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
WHAT IS AN OUTLIER
Outliers are observations that have extreme values relative to other observations observed under the same conditions.Sources:• Data Entry Errors• Implausible Values• Rare Events
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
WHY DETECT AND DEAL WITH OUTLIERS
• Bias or distortion of estimates• Inflated sums of squares• Distortion of p-values• Faulty conclusions
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
DETECT OUTLIERS
• Graphs - Box Plots, Distributions, Scatter Plots• Univariate Statistics• Regression Cooks-D RSTUDENT Statistic DFFITS statistic DFBETAS
Introduction to Building a Linear Regression Model
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
DEAL WITH OUTLIERS
Several Approaches• Deleting• Capping/Flooring Approach• Sigma Approach• Exponential Smoothing Approach• Mahalanobis Distance Approach• Robust-Reg Approach
Selecting the Appropriate Outlier Treatment for Common Industry ApplicationsA SAS Application to Identify and Evaluate OutliersRobust Regression and Outlier Detection with the RobustReg ProcedureRobust Outlier Identification using SAS
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
DETECT AND DEAL WITH OUTLIERS DEMO
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
SAMPLING
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
WHY SAMPLE?
• Smaller Data • exploratory analysis• cost• speed/performance
• Oversample rare events• To get to population of interest• Other Statistical Reasons
• Validation or test of models• Adequate representation of the
population
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
TYPES OF SAMPLING
• Simple Random Sampling (SRS)• Stratified Sampling• Proportional Sampling• Other types
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
SAMPLING DEMO
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
RESOURCES
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
RESOURCES ENTERPRISE GUIDE
Enterprise Guide• Interactive Tour• SAS Talks• Enterprise Guide Public CoursesEnterprise Guide for SAS Programmer• New Goodies for the SAS® Programmer
in SAS® Enterprise Guide® 4.3• SAS® Enterprise Guide® for
Programmers
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
ADDITIONAL SUPPORT ENTERPRISE GUIDE TUTORIALS
• View Free Tutorials• http://support.sas.com/training/resources/
» SAS Enterprise Guide Tutorial» Getting Started with SAS Enterprise Guide » SAS Enterprise Guide Tutorial for Statistics
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
FURTHER TRAINING FROM SAS EDUCATION
• Enterprise Guide 1 : Query and Reporting• Enterprise Guide 2: Advanced Tasks and Querying• Enterprise Guide for Experienced SAS Programmers• Data Preparation for Data Mining
support.sas.com/training
C op yr i g h t © 2015 , SAS Ins t i t u te Inc . A l l r i g h ts r eser v ed .
QUESTIONS?Thank you for your time and attention!