39
Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency tables for discrete numeric variables or categorical variables.

Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Embed Size (px)

Citation preview

Page 1: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Chapter 9

Producing Descriptive StatisticsPROC MEANS;

Summarize descriptive statistics for continuous numeric variables.

PROC FREQ;Summarize frequency tables for discrete numeric variables or categorical variables.

Page 2: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Objectives• Compute statistical summaries such as mean,

median, std, min, max, and so on for numeric continuous variables

• Control # of decimals for reporting the summary statistics

• Difference between PROC MEANS and PROC SUMMARY procedures.

• Create one-way frequency table• Create 2-way, n-way cross frequency table

2

Page 3: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

PROC MEANS Output

3

Salary by Job Code

The MEANS Procedure

Analysis Variable : Salary

Job NCode Obs N Mean Std Dev Minimum MaximumƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒFLTAT1 14 14 25642.86 2951.07 21000.00 30000.00

FLTAT2 18 18 35111.11 1906.30 32000.00 38000.00

FLTAT3 12 12 44250.00 2301.19 41000.00 48000.00

PILOT1 8 8 69500.00 2976.10 65000.00 73000.00

PILOT2 9 9 80111.11 3756.48 75000.00 86000.00

PILOT3 8 8 99875.00 7623.98 92000.00 112000.00ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Page 4: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Calculating Summary Statisticsfor Numeric Variables

The MEANS procedure displays simple descriptive statistics for the numeric variables in a SAS data set.

General form of a simple PROC MEANS step:

Example:

4

PROC MEANS DATA=SAS-data-set;RUN;

PROC MEANS DATA=SAS-data-set;RUN;

proc means data=mylib.crew; title 'Salary Analysis';run;

Page 5: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Calculating Summary Statistics

5

Salary Analysis

The MEANS Procedure

Variable N Mean Std Dev Minimum MaximumƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒHireDate 69 9812.78 1615.44 7318.00 12690.00Salary 69 52144.93 25521.78 21000.00 112000.00ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

NOTE: PROC MEANS computes summary statistics for any variable we want. However, it is meaningless to compute some variables, such as Hiredate.

Page 6: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Calculating Summary Statistics

By default, PROC MEANS– analyzes every numeric variable in the

SAS data set– prints the statistics N, MEAN, STD, MIN, and MAX – excludes missing values before calculating

statistics.

6

Page 7: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Specifying summary statistics to be computed

PROC MEANS data = mylib.crew mean median range std ;

To specify the summary statistics to be computed, add them to the PROC MEANS statement as options.

Page 8: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Limitting Decimal PlacesBy default, RPOC MEANS uses the BEST. Format

to display values in the report. It can be many decimal places such as 52.000000

To specify the # of decimal places to k places:

PROC MEAN Data = Mylib.crew MAXDEC=k ;

Maxdec =2 will result in 2 decimals in the report.Maxdec =0 will result in no decimal place in the report.

8

Page 9: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Selecting Variables

The VAR statement restricts the variables processed by PROC MEANS. General form of the VAR statement:

9

VAR SAS-variable(s);VAR SAS-variable(s);

Page 10: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Selecting VariablesHireDate LastName FirstName Location Phone EmpID JobCode Salary 07NOV1992 BEAUMONT SALLY T. LONDON 1132 E00525 PILOT1 72000

12MAY1985 BERGAMASCO CHRISTOPHER CARY 1151 E02466 FLTAT3 41000

04AUG1988 BETHEA BARBARA ANN FRANKFURT 1163 E00802 PILOT2 81000

10

proc means data=Mylib.crew; var Salary; title 'Salary Analysis';run;

Mylib.crew

Salary Analysis

The MEANS Procedure

Analysis Variable : Salary

N Mean Std Dev Minimum Maximum ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ 69 52144.93 25521.78 21000.00 112000.00 ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

Page 11: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Grouping ObservationsUsing CLASS statement

The CLASS statement in the MEANS procedure groups the observations of the SAS data set for analysis.

General form of the CLASS statement:

11

CLASS SAS-variable(s);CLASS SAS-variable(s);

Page 12: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Grouping ObservationsHireDate LastName FirstName Location Phone EmpID JobCode Salary 07NOV1992 BEAUMONT SALLY T. LONDON 1132 E00525 PILOT1 72000

12MAY1985 BERGAMASCO CHRISTOPHER CARY 1151 E02466 FLTAT3 41000

04AUG1988 BETHEA BARBARA ANN FRANKFURT 1163 E00802 PILOT2 81000

12

proc means data=mylib.crew maxdec=2; var Salary; class JobCode; title 'Salary by Job Code';run;

Mylib.crew

NOTE: The MAXDEC= option controls the number of decimal places displayed in the output.

Page 13: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Grouping Observations using CLASS statement

13

Salary by Job Code

The MEANS Procedure

Analysis Variable : Salary

Job NCode Obs N Mean Std Dev Minimum MaximumƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒFLTAT1 14 14 25642.86 2951.07 21000.00 30000.00

FLTAT2 18 18 35111.11 1906.30 32000.00 38000.00

FLTAT3 12 12 44250.00 2301.19 41000.00 48000.00

PILOT1 8 8 69500.00 2976.10 65000.00 73000.00

PILOT2 9 9 80111.11 3756.48 75000.00 86000.00

PILOT3 8 8 99875.00 7623.98 92000.00 112000.00ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ

The summary is displayed based on the order of the categories of the CLASS variable.Variables in CLASS statement can be character or numeric.

Page 14: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Results due to CLASS Statement

14

• The summary is displayed based on the order of the categories of the CLASS variable.

• Variables in CLASS statement can be character or numeric. It is important to make sure you do not use continuous numeric variable in the CLASS statement.

• If there are two or more variables in CLASS statement, the order of the variables in the CLASS statement determined the order in the output report.

Page 15: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

PROC MEAN procedure Using BY Statement

15

PROC MEANS;VAR variable list ;BY Variable;

• It is important to know that when using BY statement, the data set MUST be sorted in ascending order based on the variables in the BY statement first using PROC SORT.

•The result using BY statement is displayed as separate tables each is for the category of the variable in the BY statement.

•If there are two or more variables in the BY statement, the order determines the order of the displayed tables in the report.

Page 16: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

ExerciseWrite a program to read diabetes data set and use PROC Means to produce summary statistics •for variables Age, Height and Weight.Run the program and see the results.•Produce the summary statistics N, mean median, max, min std, and range, and Set decimal places to two by Maxdec =2.Run the program and see the results.•Ass the CLASS statement to produce summary results for each sex. Run the program to see the results.•Practice using BY statement for each sex. Before you add the BY SEX statement, Make sure you sort the data by SEX.Run the program and see the result. •Add a WHERE statement to select cases for AGE > 30 to the program.Run the program and see the results.

16

Page 17: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Create data set for summary statistics in PROC MEANS

In many occasions, we may want to create a SAS data set consisting of the summary statistics calculated by PROC MEANS.

OUTPUT OUT=sas-data-setsummary-keyword(s) = variablename(s);

NOTE: summary-keywords are: Mean, Min, Max, Range, Std, etc.

Variablenames are the variable names you want to call for each summary statistics for each variable.

17

Page 18: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Create Summary Data Set using PROC MEANS

Examples:PROC MEANS data= mylib.crew;VAR Hiredate salary;OUTPUT OUT = mylib.discrip mean = avghiredate avgsalary Median= medhiredate medsalary;Run;

18

Page 19: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

ExerciseRevise the following program to do the following task: Use the OUTPUT OUT= statement to save the summary statistics Mean, Median and Std to a sas data set dia_summary, then print this data set to see what’s in there.

PROC MEANS data = mylib.diabetes maxdec =2 ; var age height weight;class sex;run;

19

Page 20: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

PROC SUMMARY procedurePROC SUMMARY procedure uses the same program codes

as PROC MEANS.

PROC SUMMARY does not produce report by default. In order to produce the report, you need to add PRINT as the option:

PROC SUMMARY data = sasdataset PRINT; When do we use PROC SUMMARY?If you only want to produce and save the summary to a

SAS data set, you can use PROC SUMMARY. OR you can use the option: NOPRINT in PROC MEANS.

Page 21: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

PROC FREQ procedure Objectives

– Generate simple descriptive statistics using the MEANS procedure.

– Group observations of a SAS data set for analysis using the CLASS statement in the MEANS procedure.

– Create one-way and two-way frequency tables using the FREQ procedure.

– Restrict the variables processed by the FREQ procedure.

21

Page 22: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

PROC FREQ Output

22

Distribution of Job Code Values

The FREQ Procedure

Job Cumulative Cumulative Code Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ FLTAT1 14 20.29 14 20.29 FLTAT2 18 26.09 32 46.38 FLTAT3 12 17.39 44 63.77 PILOT1 8 11.59 52 75.36 PILOT2 9 13.04 61 88.41 PILOT3 8 11.59 69 100.00

Page 23: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Goal Report 1International Airlines wants to know how many employees are in each job code.

23

Distribution of Job Code Values

The FREQ Procedure

Job Cumulative Cumulative Code Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ FLTAT1 14 20.29 14 20.29 FLTAT2 18 26.09 32 46.38 FLTAT3 12 17.39 44 63.77 PILOT1 8 11.59 52 75.36 PILOT2 9 13.04 61 88.41 PILOT3 8 11.59 69 100.00

Page 24: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Goal Report 2Categorize job code and salary values to determine how

many employees fall into each group.

24

Salary Distribution by Job Codes

The FREQ Procedure

Table of JobCode by Salary

JobCode Salary

Frequency ‚ Percent ‚ Row Pct ‚ Col Pct ‚Less tha‚25,000 t‚More tha‚ Total ‚n 25,000‚o 50,000‚n 50,000‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Flight Attendant ‚ 5 ‚ 39 ‚ 0 ‚ 44 ‚ 7.25 ‚ 56.52 ‚ 0.00 ‚ 63.77 ‚ 11.36 ‚ 88.64 ‚ 0.00 ‚ ‚ 100.00 ‚ 100.00 ‚ 0.00 ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Pilot ‚ 0 ‚ 0 ‚ 25 ‚ 25 ‚ 0.00 ‚ 0.00 ‚ 36.23 ‚ 36.23 ‚ 0.00 ‚ 0.00 ‚ 100.00 ‚ ‚ 0.00 ‚ 0.00 ‚ 100.00 ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 5 39 25 69 7.25 56.52 36.23 100.00

Page 25: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Creating a Frequency Report• PROC FREQ displays frequency counts of the

data values in a SAS data set.

• General form of a simple PROC FREQ step:

25

PROC FREQ DATA=SAS-data-set;RUN;PROC FREQ DATA=SAS-data-set;RUN;

proc freq data=mylib.crew;run;

Example:

Page 26: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Creating a Frequency Report

• By default, PROC FREQ– analyzes every variable in the SAS data set– displays each distinct data value– calculates the number of observations in which each

data value appears (and the corresponding percentage)

– indicates for each variable how many observations have missing values.

26

Page 27: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Default Frequency Reports

27...

proc freq data=mylib.crew;run;

HireDate LastName FirstName Location Phone EmpID JobCode Salary 07NOV1992 BEAUMONT SALLY T. LONDON 1132 E00525 PILOT1 72000

12MAY1985 BERGAMASCO CHRISTOPHER CARY 1151 E02466 FLTAT3 41000

04AUG1988 BETHEA BARBARA ANN FRANKFURT 1163 E00802 PILOT2 81000

mylib.crew

Distribution of

LastName

Distribution of

Salary

Distribution of

JobCode

Distribution of

FirstNameDistribution of

EmpID

Distribution of

HireDate

Distribution of

PhoneDistribution of

Location

Page 28: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

One-Way Frequency Report• Use the TABLES statement to limit the variables

included in the frequency counts. These are typically variables that have a limited number of distinct values.

• General form of a PROC FREQ step with a TABLES statement:

28

PROC FREQ DATA=SAS-data-set ; TABLES SAS-variables / NOCUM;RUN;

PROC FREQ DATA=SAS-data-set ; TABLES SAS-variables / NOCUM;RUN;

NOCUM option in the TABLES statement suppress Cumulative frequency and Cumulative percentage

Page 29: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Creating a Frequency Report

29

Distribution of Job Code Values

The FREQ Procedure

Job Cumulative Cumulative Code Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ FLTAT1 14 20.29 14 20.29 FLTAT2 18 26.09 32 46.38 FLTAT3 12 17.39 44 63.77 PILOT1 8 11.59 52 75.36 PILOT2 9 13.04 61 88.41 PILOT3 8 11.59 69 100.00

proc freq data=mylib.crew; tables JobCode; title 'Distribution of Job Code Values';run;

Page 30: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Using PROC FORMAT to redefine Categories of Values in TABLES statementInternational Airlines wants to use formats to categorize the flight crew by job code.

30

Pilot

PILOT1PILOT2PILOT3

FLTAT1FLTAT2FLTAT3

Flight Attendant

Stored values Formatted values

Page 31: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Analyzing Categories of Values

31

proc format; value $codefmt 'FLTAT1'-'FLTAT3'='Flight Attendant' 'PILOT1'-'PILOT3'='Pilot';run;proc freq data = mylib.crew; format JobCode $codefmt.; tables JobCode;run;

NOTE: The original data values for Jobocde are not changed. They are still FLTAT1 FLTAT2, and so on.

Page 32: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Analyzing Categories of Values

32

Distribution of Job Code Values

The FREQ Procedure

Cumulative Cumulative JobCode Frequency Percent Frequency Percent ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒ Flight Attendant 44 63.77 44 63.77 Pilot 25 36.23 69 100.00

Page 33: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Crosstabular Frequency Reports• A two-way, or crosstabular, frequency report

analyzes all possible combinations of the distinct values of two variables.

• The asterisk (*) operator in the TABLES statement is used to cross variables.

• General form of the FREQ procedure to create a crosstabular report:

33

PROC FREQ DATA=SAS-data-set;

TABLES variable1*variable2;RUN;

PROC FREQ DATA=SAS-data-set;

TABLES variable1*variable2;RUN;

Variable1 is ROW and Variable2 is Column

Page 34: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Crosstabular Frequency Reports

34

proc format; value $codefmt 'FLTAT1'-'FLTAT3'='Flight Attendant' 'PILOT1'-'PILOT3'='Pilot'; value money low-<25000 ='Less than 25,000' 25000-50000='25,000 to 50,000' 50000<-high='More than 50,000';run;proc freq data=mylib.crew; tables JobCode*Salary; format JobCode $codefmt. Salary money.; title 'Salary Distribution by Job Codes';run;

Page 35: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Crosstabular Frequency Reports

35

Salary Distribution by Job Codes

The FREQ Procedure

Table of JobCode by Salary

JobCode Salary

Frequency ‚ Percent ‚ Row Pct ‚ Col Pct ‚Less tha‚25,000 t‚More tha‚ Total ‚n 25,000‚o 50,000‚n 50,000‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Flight Attendant ‚ 5 ‚ 39 ‚ 0 ‚ 44 ‚ 7.25 ‚ 56.52 ‚ 0.00 ‚ 63.77 ‚ 11.36 ‚ 88.64 ‚ 0.00 ‚ ‚ 100.00 ‚ 100.00 ‚ 0.00 ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Pilot ‚ 0 ‚ 0 ‚ 25 ‚ 25 ‚ 0.00 ‚ 0.00 ‚ 36.23 ‚ 36.23 ‚ 0.00 ‚ 0.00 ‚ 100.00 ‚ ‚ 0.00 ‚ 0.00 ‚ 100.00 ‚ ƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 5 39 25 69 7.25 56.52 36.23 100.00

Page 36: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Additional Syntax for TABLES statement in PROC FREQ; statement

Syntax Equivalent to

tables A*(B C); tables A*B A*C;

tables (A B)*(C D); tables A*C B*C A*D B*D;

tables (A B C)*D; tables A*D B*D C*D;

tables A - - C; tables A B C;

tables (A - - C)*D; tables A*D B*D C*D

TABLES A*B*C;Produces separate two-way tables of B*C for each value of A.

Page 37: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

To Suppress some columns in the PROC FREQ summary report

PROC FREQ;TABLES var1*var2/ <OPTIONS>;

Options for suppressing cell frequency: NOFREQOptions for suppressing cell percent: NOPERCENTOptions for suppressing ROW percent: NOROWOptions for suppressing COLUMN percent: NOCOL

Page 38: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

Additional usages of PROC FREQ statement

In addition to reporting tables, PROC FREQ; statement also conduct many statistical tests for analyzing categorical data such as Chi-square test, Cochran-Mantel-Haenszel test, Fisher’s exact test, Kappa coefficient, Risk, Odds ratio and so on.

This is beyond the programming course.

Page 39: Chapter 9 Producing Descriptive Statistics PROC MEANS; Summarize descriptive statistics for continuous numeric variables. PROC FREQ; Summarize frequency

ExerciseThe Diabetes data set consists of Sex, Age, Height, Weight, Pulse FastGluc PostGluc for 20

patients. Revise the following program by using PROC FREQ procedure to perform the following tasks:

1. Use IF statement to create AGE_G variable : IF AGE > 45 then, AGE_G = ‘Senior’ , otherwise AGE_G = ‘Young’. Create one-way table for variables SEX , Age_G, and Pulse using user-defined format.

Run the program and see the results. 2. Create cross tabular table sex*(Age_G Pulse), make sure the

user-defined format is applied for Pulse variable.Run the program and see the results. 3. Suppressing ROW percent and Column percent.Run the program and see the results.

proc format;value pulft LOW-70 = 'Low' 71-High = 'High'; run;data diab; set mylib.diabetes; run;

39