31
SAS Macros are the SAS Macros are the Cure for Quality Cure for Quality Control Pains Control Pains Gary McQuown Data and Analytic Solutions

SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

Embed Size (px)

Citation preview

Page 1: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

SAS Macros are the Cure for SAS Macros are the Cure for Quality Control PainsQuality Control Pains

Gary McQuown

Data and Analytic Solutions

Page 2: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

Rants and Raves of a Rants and Raves of a SAS ProgrammerSAS Programmer

Page 3: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

PurposePurpose

I. Quality Control

II. SAS Macros for Quality Control

III. Sources of SAS Macros and QC Code

Page 4: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

I. Quality ControlI. Quality Control

An ongoing effort for validation, improvement and facilitation of the data related process to insure that data meets the business needs.

Page 5: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

Quality ControlQuality Control

“Quality control means you can have what you need, how you need it, when you need it.” E. Demming

Page 6: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

Why Practice QC?Why Practice QC?

It Saves Time

It Saves Money

It Makes Money

Ignorance is not Bliss

Page 7: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

How Data Goes BadHow Data Goes Bad

“Bad Genes” .. Poor design and collection

“Adoption” … Someone Else’s Design

“Child Abuse” ... Poorly Nurtured

“Terrible Teens” ... Growing Pains

Page 8: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

The QC ProcessThe QC Process

1. Define Requirements

2. Identify Data Issues

3. Analyze Options

4. Improve Data Quality

• Document every step and repeat

Page 9: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

Define RequirementsDefine Requirements

What do you need?

Requires an understanding of the business process, the data, the operating system and the users.

Documentation, business specs and “experts”.

Page 10: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

Devil’s AdvocateDevil’s Advocate

What is correct for one task / group may be incorrect for another.

What is correct now may be incorrect later.

What is correct now ... may not be able to be repeated.

Page 11: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

Identify Data IssuesIdentify Data Issues

AccuracyCompletenessConsistencyTimelinessUniquenessValidity

Page 12: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

G = Good F = Fair B = Bad

Variable Pre

sen

t

Co

nfo

rmin

g

Scr

ub

?

Acc

ura

te

Co

mp

lete

Co

nsi

sten

cy

Tim

elin

ess

Un

iqu

enes

s

Val

idit

y

NAME_LAST 99% 88% 11% G G G G G G

NAME_FIRST 86% 78% 8% G G F G F F

GENDER 63% 59% 4% G G G G G G

TELEPHONE 100% 6% 94% B B G B B B

AGE 57% 55% 2% F G G F G F

SPECIALTY 76% 72% 5% G F G G G G

EDUCATION 100% 100% 0% G G G G B B

G=GOOD F = FAIR B = BAD

EXCEPTION AND PERCENTAGE REPORT (EAP)

Page 13: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

Analyze OptionsAnalyze Options

What do you need?

What do you have?

What changes need to be made?

Will you break anything along the way?

Page 14: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

Improve Data QualityImprove Data Quality Selective Processing

Clean Existing Values

Correcting Existing Values

Delete “bad” data

Add additional data

• Document original and new values.

Page 15: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

DocumentationDocumentation

Design Process ... business specs“As You Go” ... in the code, log, emailInput and Output files (Freqs & Means)Modifications .... “as per xxx “, email Exceptions (Errors and Issues)User’s ManualElizabeth Axelrod ... Big ‘D’

“Just Shoot Them”

Page 16: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

General SuggestionsGeneral Suggestions

“Drive Out Fear” Early Intervention Obtain “Buy In” from all parties Keep it “Simple” ... use macros Be consistent … use macros Monitor results Document everything, every time

Page 17: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

II. SAS MacrosII. SAS Macros

Macros allow you to use, re-use and share “object-oriented” code.

QC is very redundant .... the same or similar process performed on each data set, each variable and each process.

Page 18: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

RealityReality

People are:

Ignorant Forgetful Busy Lazy Don’t Care

Page 19: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

Why MacrosWhy Macros

Minimal Effort

Parameters

Available (FREE)

Page 20: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

FREQOUT

Produces Frequencies for multiple variables

% FREQOUT

(data= /* input dataset name */,

out= freqout /* output data set name ,

vars= /* list of variables */,

by = /* list of by variables */,

fmtassign = /* var fmt var fmt */,

debugging = NO /* YES or NO */

Author: Ian Whitlock

Location: www.lexjansen.com and sconsig.com

Page 21: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

%EAP_RPT (DSN=, LIBIN= , LIBOUT=, _VARS= , _FMTS=); DSN = Name of input SAS data set LIBIN= SAS library of input data set LIBOUT= SAS library of output data set _VARS= list of character variables to review .. paired with _FMTS _FMTS= list of formats to apply ... paired with _VARS

Example: %EAP_RPT(_VARS = AGE INCOME EDUCATION ,

_FMTS = AGE INC EDU , LIBIN = PROJ_IN , LIBOUT = PROJ_OUT , DSN = STUDY_1);

EAP_RPT

Page 22: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

DATA CLEANING

TIP00128a - Cleansing Macro, Data Scrubbing routine (see tip 00128 for more)

  %cleanse(schlib=work, schema=, strlen=50,

var=, target=target, replace=replace, case=nocase); Author: Charles Patridge

Version: 2.1 (sug. by Ian Whitlock) Location: www.sconsig.com

Page 23: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

REMOVE OUTLIERS %outlier ( data = _SAS_dataset_name_, out = _SAS_output_dataset_name var = _variable_to_screen pass = _number_of_passes except = _exception_report_data_set_, mult = _multiplier_of_standard_deviations_) The %OUTLIER macro completes outlier screens based on statistical values of a numeric variable in a SAS data set. It is set up to remove any outlier records that are within a given number of Standard Deviations from the mean, and will run that screen a given number of times. For example, a "3-Pass-2" outlier screen will remove any values outside 3 standard deviations from the mean, and will run that outlier screen twice. The given numbers can be any integer.Author: Unknown Location: www.spikeware.com

Page 24: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

CONT_COMPARE

Compares two data sets, list all variables and reports potential issues:

1) Fields in Both2) Type3) Length

%cont_compare (dsn1, dsn2)

Page 25: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

KEEPDBLS: Documents Duplicates TIP000367- KeepDbls %MACRO KeepDbls (SourceDs =_LAST_, TargetDs =, Overwrit =N, IdList =, Where =); Moves duplicate observations to another file.

Author: Jim GroeneveldLocation: www.sconsig.com

Page 26: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

CK_MISSING

Evaluates variables in regards to missing and non missing status.

Default= _numeric_ missing. _character_ $missing.

Parms:

DSN = libname and name of data set. Default is the last read/created.

PATH= path to directory where QC info is stored.

VAR = list of variables to b evaluated.

FMT = format statment.

%ck_missing( dsn=mylib.recentfile,

var=UPB FICO1 FICO2 FICO3 CHANNEL,

fmt=UPB upb. FICO1 FICO2 FICO3 fico. CHANNEL $chnl. );

Page 27: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

LOG FILTER: Examines and Reports on SAS Log

Log Filter checks your log for errors, warnings, and other "interesting" messages. It then displays what it finds in its summary window. Double-click on a row and it'll reposition the log window to display the message in context (if it's an external log file, it'll open it in a viewer window and position it for you).

Author: Ratcliffe Location: http://ratcliffe.co.uk/rest_logfilt.htm

Page 28: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

MK_FORMATS

Create a format from a SAS data set.

Parms:

DSN = SAS data set

START =Unique key value ie. SSN

LABEL =Value to be associated with start ie. Full Name with SSN

FMTNAME =Name of Format (sans ".")

TYPE = C or N for Character or Numeric

LIBRARY = Libname of Format Library (default =work)

OTHER = Value to supply for missing (default =OTHER)

Page 29: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

III. Sources of SAS Macros III. Sources of SAS Macros and QC Code and QC Code

www.sas.com (examples)

www.lexjansen.com (proceeding)www.sconsig.com

www.ratcliffe.co.uk

www.statetechservices.com

www.spikeware.com

Page 30: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

More SourcesMore Sources

www.mcw.edu/pcor/rsparapa/sasmacro.html www.math.yorku.ca/scs/friendly.html www.stat.ncsu.edu/sas/samples/index.htmlwww.dasconsultants.com SAS-L

Books By Users:Ron Cody’s Data CleaningNumerous books on Macros .... “By Example”

Page 31: SAS Macros are the Cure for Quality Control Pains Gary McQuown Data and Analytic Solutions

Questions ?Questions ?

Gary McQuown

[email protected]

www.DASconsultants.com