Upload
xarack
View
222
Download
0
Tags:
Embed Size (px)
DESCRIPTION
SAS intro
Citation preview
ADVANCED DATA ANALYSISAN OVERVIEW
WHAT DOES ADVANCED DATA ANALYSIS INVOLVE?Data Acquisition
Data ManagementData manipulation to get the data into the form you need for analysisData cleaning to identify errors and outliers etc. in data
Exploratory Data AnalysisData visualization: producing graphical representations to see relationships hidden in the dataData summarization
Data Analysis: Wide range of techniques are available for analyzing data. For example:
Regression based techniques & diagnosticsPrincipal components/ Common Factor AnalysisStructural Equations ModelingTime Series Methods (unit root testing, Co-integration tests, VAR etc.)
Reporting
Programming ApproachesGraphical user interface/Point-and-click approach Example: EVIEWS, Pros: Ease of useCons:limited flexibility, narrow area of specializationSuitability:For learning first courses in econometricsPre-Programmed and user-written routinesExample: Stata,Pros: Greater flexibility, greater degree of specializationCons: Data management is good, not great!Suitability: Applied econometrics courses
Programming ApproachR languagePros: Great data management, strong analytics, availability of new packages, and Its Free!Cons: Learning Curve, GUI is still evolving (R Studio)SASPros: Industry Standard (Corporate sector values SAS skills very highly), data management is second to none, Strong analytics, especially business analytics , Mature GUI, and its now Free for academic use!!Cons: learning curve, Expensive (not our concern any more!)
Components of Base SAS Software
I. Data Management FacilitySAS organizes data into a rectangular form called SAS data set. A SAS dataSet is shown below Data sets are created by writing code in SAS programming language. They can be modified by programming statements. A common use of data sets is to provide information input to computational proceduresVariableObservation
Creating SAS data set by reading raw data1. DATA statementTells SAS to begin building a data set. INPUT statement Specifies fields to be read and variables to be created from them. DATALINES Indicates that lines of data are to follow. Semicolon Marks end of in-stream raw data. RUN statement Marks the end of data stepI. Data Management Facility
Other ways of creating SAS data set
Reading data stored in external file (ascii, csv, and tab-delimited etc.)
Importing data from other applicationsSpreadsheet (e.g. Excel) SPSSDbaseOther DBMS (e.g. Oracle)
Reading data from one or more other SAS datasets
I. Data Management Facility
II. Programming LanguageElements of the SAS Language Statements Data Weight_club; or Run;
ExpressionsX + Y ; or Age
II. Programming LanguageRules for SAS Statements SAS statements end with a semicolon. Example: Data Weight_club ;
SAS statements can be entered in lowercase, uppercase, or a mixture of the two. Example : DATA WEIGHT_Club ; or data weight_club ;
SAS statements can begin in any column of a line and several statements can be written on the same lineExample: Data Weight_club; Set clubdata ;
A SAS statement on one line and continue it on another line, but you cannot split a word between two lines.
Words in SAS statements are separated by blanks or by special characters . Example: Loss = Startweight EndWeight ;
II. Programming LanguageRules for SAS Names SAS names are used for SAS data set names, variable names, and other items.
A SAS name can contain from one to 32 characters.
The first character must be a letter or an underscore (_).
Subsequent characters must be letters, numbers, or underscores.
Blanks cannot appear in SAS names.
III. Data Analysis and Reporting UtilitiesData Step programming is a very powerful tool for data analysis SAS also has a library of built-in programs known as SAS procedures.
A SAS procedure is referred as PROC.Example : PROC PRINT
Proc print data= weight_club ; title Weight Club Data ; run;proc print calls the procedure
data is SAS keyword followed by name of SAS dataset which is to supply dataData to the procedure
Title statement makes SAS print the title on output
III. Data Analysis and Reporting UtilitiesProc Print produces the following output.
III. Data Analysis and Reporting UtilitiesConsider another SAS data set named CLASS on Height, Weight and Age of studentsDATA CLASS
III. Data Analysis and Reporting UtilitiesAnother example
SAS PROC REG can be used to run a regression of variable Weight on variableHeight. The SAS code is shown below:
proc reg data=Class; model Weight = Height; run;
III. Data Analysis and Reporting Utilities
Output of PROC REG
Output Produced by the SAS System
Traditional Output
SAS data set SAS log
Listing or report
Other files such as Graphs
Files used in other databases such as ORACLEIII. Data Analysis and Reporting Utilities
III. Data Analysis and Reporting Utilities
The Output Delivery System (ODS) enables you to produce output in a variety of formats, such as
an HTML file
traditional SAS Listing (monospace)
PostScript file
RTF file (for use with Microsoft Word)
an output data setIII. Data Analysis and Reporting UtilitiesOutput from the Output Delivery System (ODS)
III. Data Analysis and Reporting Utilities
III. Data Analysis and Reporting UtilitiesExample of ODS html output from PROC REG
Running Programs in the SAS Windowing EnvironmentShortcut Keys
F5 Editor
F6 Log
F7 Output
F8 Run
Ctrl-E Clear Window
Running Programs in the SAS Windowing EnvironmentEditing a Program in the Program Editor WindowWriteProgram here
Running Programs in the SAS Windowing EnvironmentRemember: F7 key brings up the output window !