Upload
gowtham-sp
View
239
Download
0
Embed Size (px)
Citation preview
7/25/2019 Notes for SAS Programming Fall2009
1/88
Notes for SAS programming
Econ424
Fall 2009
7/25/2019 Notes for SAS Programming Fall2009
2/88
Why SAS?
Able to process large data sets!
Easy to cope "ith m#ltiple $ariables
Able to trac% all the operations on the data sets!
&enerate systematic o#tp#t
S#mmary statistics
&raphs 'egression res#lts
(ost go$ernment agencies and pri$ate sectors #se
SAS
7/25/2019 Notes for SAS Programming Fall2009
3/88
Where to find SAS?
)he (S Windo" $ersion of SAS is only a$ailable on camp#s* class "ebsite+ ,comp#ter reso#rces-+ ,.n/camp#s comp#ter lab-* list comp#ter lab location+ ho#rs and soft"are
o# can access SAS remotely $ia gl#e1#md1ed#+ b#t yo# cannot #se theinteracti$e "indo"s
* #se a sec#red telnet e1g1 ssh! to remotely login gl#e1#md1ed# "ith yo#rdirectory 3 and pass"ord
* type ,tap sas- to tell the system that yo# "ant to access SAS* #se any tet editor say pico! to edit yo#r sas program say myprog1sas!1 o#
can also create the tet/only 1sas file in yo#r 56 and sec#rely! ftp it intogl#e1
* n gl#e+ type ,sas myprog1sas 7- "ill send the sas program to r#n in the
bac%gro#nd1* )he comp#ter "ill a#tomatically generate myprog1log to tell yo# ho" each
command r#ns in sas1 f yo#r program prod#ces any o#tp#t+ the o#tp#t "illbe a#tomatically sa$ed in myprog1lst1 All in the same directory as yo#r 1sasfile1
7/25/2019 Notes for SAS Programming Fall2009
4/88
'oadmap
)hin%ing in ,SAS-
8asic r#les
'ead in data 3ata cleaning commands
S#mmary statistics
6ombine t"o or more datasets ypothesis testing
'egression
7/25/2019 Notes for SAS Programming Fall2009
5/88
)hin%ing in ,SAS-
What is a program?* Algorithm+ recipe+ set of instr#ctions
o" is programming done in SAS?
* SAS is li%e programming in any lang#age: Step by step instr#ctions 6an create yo#r o"n ro#tines to process data (ost instr#ctions are set #p in a logical manner
* SAS is N.) li%e other lang#ages:
Some synta is pec#liar to SAS Written specifically for statistics so it isn;t all/p#rpose 6anned processes that yo# cannot edit nor can yo# see the
code
7/25/2019 Notes for SAS Programming Fall2009
6/88
)hin%ing in ,SAS-
6reating a program
* What is yo#r problem? ta%e pro
7/25/2019 Notes for SAS Programming Fall2009
7/88
8asic r#les >! * organie files
1sas * program file
1log * notes+ errors+ "arnings
1lst * o#tp#t
1sas@bdat * data file library * a cabinet to p#t data in* 3efa#lt: Wor% library
temporary+ erased after yo# close the session
* 5ermanent library
libname mylib ,m:-B mylib1mydata
C a sas data file named ,mydata- in library ,mylib-
r#n and recall 1sas
7/25/2019 Notes for SAS Programming Fall2009
8/88
8asic r#les 2! // program
e$ery command ends "ith B
format does not matter
if C> then yC>B else yC2Bis the same as
if C> then yC>B
else yC2B
case insensiti$e
commentD this is commentBD this is comment DB
7/25/2019 Notes for SAS Programming Fall2009
9/88
8asic r#le =! * $ariable
)ype* n#meric defa#lt+ digit+ 1 stands for missing $al#e!
* character G+ defa#lt digit+ blan% stands for missing!
Hariable names* IC=2 characters if SAS 910 or abo$e
* IC characters if SAS or belo"
* case insensiti$e
(#st start "ith letter or ,J- Jname+ myJname+ ipK+ #JandJme
/name+ my/name+ Kip+ perL+ #7me+ myM"+ myGsign
7/25/2019 Notes for SAS Programming Fall2009
10/88
8asic r#les 4! * data step
3ata step3A)A ne"dataB
set pro
obs2
obs n
define
,frac#nins#red-
define
,percent#nins#red-
o#tp#t data,ne"data-
obs>
obs n
create a ne" data set called ,ne"data- in
the temporary library
#se the data set called ,pro
7/25/2019 Notes for SAS Programming Fall2009
11/88
8asic r#les K! * proc step
5'.6 step
5'.6 5'N) dataCne"dataB
$ar frac#nins#red percent#nins#redB
title ,print o#t ne" data-Br#nB
Action
3ataso#rce
Signal the end of 5'.6 step+
co#ld be ignored if this is
follo"ed by a 3ata or 5roc step
7/25/2019 Notes for SAS Programming Fall2009
12/88
'ead in data >! * table editor
)ools * table editor
choose an eisting library or define a ne"
library rename $ariable
sa$e as
7/25/2019 Notes for SAS Programming Fall2009
13/88
'ead in data 2! * by program
3ata format
*3atalines enter the data in the program!
*Eisting data tet+ comma or tab delimited*Eisting data tet+ fied "idth
*From an eisting ecel file
7/25/2019 Notes for SAS Programming Fall2009
14/88
)he follo"ing se$eral slides "on;t be
co$ered in class1 8#t yo# are "elcome to
#se them by yo#rsel$es1
7/25/2019 Notes for SAS Programming Fall2009
15/88
'ead in data 2! // datalines
data testdata>B
infile datalinesB
inp#t id height "eight gender G ageB
datalinesB
> O >44 ( 2=2 @ 1 ( =4
= O2 99 F =@
B
D yo# only need one semicolon at the end of all data lines+ b#t the
semicolon m#st stand alone in one line D
proc contents dataCtestdata>B r#nB
proc print dataCtestdata>B r#nB
No B #ntil yo# finish all
the data lines
7/25/2019 Notes for SAS Programming Fall2009
16/88
'ead in data =! * more datalines
D read in data in fied col#mns D
data testdata>B
infile datalinesB
inp#t id > height 2/= "eight 4/O gender G@ age /9B
datalinesB
>O>44(2=
2@ (=4
=O299 F=@
B
7/25/2019 Notes for SAS Programming Fall2009
17/88
'ead in data 4! * more datalines
data testdata>B
infile datalinesB
inp#t id : >1 height : 21 "eight : =1 gender : G>1 age : 21B
datalinesB
> O >44 ( 2=
2 @ 1 ( =4
= O2 99 F =@
B
7/25/2019 Notes for SAS Programming Fall2009
18/88
'ead in data 4! * more datalines
Dalternati$elyB
datatestdata>B
infile datalinesB
informat id 1.height 2."eight 3.gender G>1 age 2.B
inp#t id height "eight gender ageB
datalinesB
> O >44 ( 2=
2 @ 1 ( =4
= O2 99 F =@
B
7/25/2019 Notes for SAS Programming Fall2009
19/88
'ead in data K! * 1cs$
data testdata>B
infile datalines dlmCP+; dsd misso$erB
D "hat if yo# do not ha$e dsd andor misso$er D
inp#t id height "eight gender G age BdatalinesB
>+ O+ >44+ (+ 2=
2+ @+ + (+ =4
=+ O2+ 99+ F+ =@B D "hat if yo# forget to type 2= D
r#nB
7/25/2019 Notes for SAS Programming Fall2009
20/88
'ead in data O! * 1cs$ file
D sa$e the pro
7/25/2019 Notes for SAS Programming Fall2009
21/88
2>
'ead in data @! * from ecel
filename myecel ,(:pro
7/25/2019 Notes for SAS Programming Fall2009
22/88
22
'ead in data @! * from ecel
8e caref#l
SAS "ill read the first line as $ariable names+ and ass#methe ra" data start from the second ro"1
SAS assigns n#meric and character type a#tomatically1Sometime it does ma%e mista%e1
7/25/2019 Notes for SAS Programming Fall2009
23/88
3ata cleaning >! * if then
Format:
F condition )EN actionB
ETSE F condition )EN actionB
ETSE actionB
Note:
>! the if/then/else can be nested as many as yo#
"ant2! if yo# need m#ltiple actions instead of one
action+ #se ,3.B action>B action2B EN3B -
7/25/2019 Notes for SAS Programming Fall2009
24/88
3ata cleaning >! * if then
C or EU means eV#als
C or NE means not eV#al
X or &) means greater than
I or T) means less than XC or &E means greater than or eV#al
IC or TE means less than or eV#al
in means s#bset
* if gender in P(;+ PF;! then 11B (#ltiple conditions: AN37!+ .'Y!
7/25/2019 Notes for SAS Programming Fall2009
25/88
3ata cleaning >! * if then
Dreading in program of proK )EN 11B
2!missing $al#e is al"ays co#nted as the smallest negati$e+ sofrac#nins#redC1 "ill satisfy the condition frac#ins#redI01>K1 f yo# "antto ignore the missing obs set the condition as 0ICfrac#nins#redI01>K1
7/25/2019 Notes for SAS Programming Fall2009
26/88
3ata cleaning >! * if then
* (#ltiple actions in each branchB
data proK AN3 #nins#redX>000000 )EN 3.B
#nins#redgrpC0B #nins#redpopCPo$er > millionRB
EN3B
ETSE 3.B
#nins#redgrpC1B #nins#redpopCPless than > millionRB
EN3B
r#nB
proc print dataCpro
7/25/2019 Notes for SAS Programming Fall2009
27/88
3ata cleaning >! * if then
DZse if commands to choose a s#bsampleB
data pro
7/25/2019 Notes for SAS Programming Fall2009
28/88
3ata cleaning >! * eercise
still #se pro if frac#nins#red I01> lo"!
2 if 01>ICfrac#nins#redI01>K mid/lo"!
= if 01>KICfrac#nins#redI012 mid/high!
4 if frac#nins#redXC012 high!1
7/25/2019 Notes for SAS Programming Fall2009
29/88
3ata cleaning >! * eercise ans"er
data pro then ne"grpC>B
else if frac#nins#redI01>K then ne"grpC2Belse if frac#nins#redI012 then ne"grpC=B
else ne"grpC4B
r#nB
proc contents dataCpro
7/25/2019 Notes for SAS Programming Fall2009
30/88
Sa$e data
D Sa$e in sas formatB
libname mylib ,(:-B
data mylib+pro
7/25/2019 Notes for SAS Programming Fall2009
31/88
5ages =>/=4 are optional material fordata cleaning1
)hey are not reV#ired+ b#t yo# may
find them #sef#l in the f#t#re1
We s%ip them in the reg#lar class1
7/25/2019 Notes for SAS Programming Fall2009
32/88
3ata cleaning 2!
* con$ert $ariable typeN#meric to character:
age>Cp#tage+ G21!B
age and age> ha$e the same contents b#t different formats
6haracter to n#meric:
age2Cinp#tage>+ >1!B
no" age and age2 are both n#meric+ b#t age2 is chopped at the firstdigit
)a%e a s#b string of a character
age=Cs#bstrage>+2+>!B
no" age= is a s#b string of age>+ starting from the second digit ofage> the meaning of ,2-! and ha$ing one digit in total the meaningof ,>-!1
7/25/2019 Notes for SAS Programming Fall2009
33/88
3ata cleaning 2! / eample
* we want to convert studid 012345678 to 012-345-678;
data testdata2;
infile datalines;
input studid : 9 studna!e : "1;
datalines;
012345678 #
135792468 $
009876543 %
;
proc print; run;
data testdata2;
set testdata2;
if studid &' 1()7 ten studid1+ ,00..co!press/put/studid "9;
else if 1()7 &( studid &' 1()8 ten studid1+,0,..co!press/put/studid
"9;
else studid1+ put/studid "9;
studid2+sustr/studid113..,-,..sustr/studid143..,-,..
sustr/studid173;
proc print; run;
7/25/2019 Notes for SAS Programming Fall2009
34/88
3ata cleaning 2! / eercise
ou ave te followin data variales in seuence are
score1 score2 score3 score4 score5:
123-45-6789 100 98 96 95 92
344-56-7234 69 79 82 65 88
898-23-1234 80 80 82 86 92
%alculate te averae and standard deviation of te five
scores for eac individual se if-ten co!!and to find
out wo as te iest averae score and report is
witout dases
7/25/2019 Notes for SAS Programming Fall2009
35/88
3ata s#mmary roadmap
5roc contents * $ariable definitions
5roc print * ra" data
5roc format * ma%e yo#r print loo% nicer
5roc sort * sort the data
5roc means * basic s#mmary statistics
5roc #ni$ariate * detailed s#mmary stat
5roc freV * freV#ency co#nt 5roc chart * histogram
proc plot * scatter plot
7/25/2019 Notes for SAS Programming Fall2009
36/88
proc format >!
D6ontin#e the data cleaning eercise on page 29B
data prothen ne"grpC>B
else if frac#nins#redI01>K then ne"grpC2B
else if frac#nins#redI012 then ne"grpC=B
else ne"grpC4B
r#nB
3efining ,gro#p- as a
n#meric $ariable "ill
sa$e space
7/25/2019 Notes for SAS Programming Fall2009
37/88
proc format 2!
proc formatB
$al#e ne"gro#p >CPlo";
2CPmid/lo";
=CPmid/high;
4CPhigh;B r#nB
proc print dataCpro
7/25/2019 Notes for SAS Programming Fall2009
38/88
proc sort
proc sort dataCpro
7/25/2019 Notes for SAS Programming Fall2009
39/88
proc means and proc #ni$ariate
proc means dataCpro
7/25/2019 Notes for SAS Programming Fall2009
40/88
Notes on proc means and proc #ni$ariate
Dif yo# do not #se classorbycommand+ thestatistics are based on the f#ll sample1 f yo# #seclassorby$ar + the statistics are based on the
s#bsample defined by each $al#e of $ar 1Do# can #se classorbyinproc means+ b#t onlyby
inproc #ni$ariateB
D"hene$er yo# #se ,by$ar -+ the data set sho#ld
be sorted by $ar beforehandB
7/25/2019 Notes for SAS Programming Fall2009
41/88
proc means and proc #ni$ariate
allo" m#ltiple gro#psdata pro
7/25/2019 Notes for SAS Programming Fall2009
42/88
proc freV
D 'emember "e already generate a $ariable called ne"grp toindicate categories of fraction #nins#red and a $ariable called
popgrp to indicate categories of pop#lation sieB
proc freV dataCpro
7/25/2019 Notes for SAS Programming Fall2009
43/88
proc chart * histogram for
categorical $ariables
proc chart dataCpro
7/25/2019 Notes for SAS Programming Fall2009
44/88
proc chart * histogram for
contin#o#s $ariable
proc chart dataCpro
7/25/2019 Notes for SAS Programming Fall2009
45/88
proc plot * scatter plot
proc plot dataCpro
7/25/2019 Notes for SAS Programming Fall2009
46/88
scatter plot is less informati$e for
categorical $ariables
proc plot dataCpro
7/25/2019 Notes for SAS Programming Fall2009
47/88
fancy proc means
proc means dataCpro
mean C a$g#nins#red a$gfrac#nins#redB
r#nB
proc print dataCs#mmary>Br#nB
7/25/2019 Notes for SAS Programming Fall2009
48/88
)he follo"ing page may be#sef#l in practice+ b#t am not
going to co$er it in class1
7/25/2019 Notes for SAS Programming Fall2009
49/88
some s#mmary stat1 in proc print
D Ass#me "e ha$e already defined ne"grp andpopgrp in proB
by popgrpB
s#m totalpopB $ar totalpop ins#red #nins#red frac#nins#redB
r#nB
7/25/2019 Notes for SAS Programming Fall2009
50/88
o" to handle m#ltiple data sets?
Add more observationsto an eisting dataand the ne" obser$ations follo" the samedata str#ct#re as the old oneappend
Add more variablesto an eisting data andthe ne" $ariables refer to the sames#b
7/25/2019 Notes for SAS Programming Fall2009
51/88
merge and append
pro high
s#mmary>: ne"grp popgrp a$g#nins#red a$gfrac#nins#red
> high @K00000 010@=
merged:
year state totalpop .fracuninsured newgrp popgrp avguninsure avgfracuninsured
2009 (A O42094@ 010K4 > high @K00000 010@=
appended:year state totalpop .fracuninsured newgrp popgrp avguninsure avgfracuninsured
2009 (A O42094@ 010K4 > high 1 1
1 1 1 > high @K00000 010@=
7/25/2019 Notes for SAS Programming Fall2009
52/88
merge t"o datasets
proc sort dataCproB
by ne"grp popgrpB
r#nB
data mergedB
merge proB
r#nB What if this line is
,if oneC> .' t"oC>B-?
7/25/2019 Notes for SAS Programming Fall2009
53/88
[eep trac% of matched and
#nmatched records
data allrecordsB
merge proB
r#nB
proc freV dataCallrecordsBtables myoneDmyt"oB
r#nB
SAS "ill drop $ariables,one- and ,t"o-
a#tomatically at the end of
the 3A)A step1 f yo# "ant
to %eep them+ yo# can copythem into ne" $ariables
,myone- and ,myt"o-
7/25/2019 Notes for SAS Programming Fall2009
54/88
be caref#l abo#t merge
al"ays p#t the merged data into a ne"data set
m#st sort by the %ey $ariables before merge
o% for one/to/one+ m#lti/to/one+ one/to/m#lti+ b#t
no good for m#lti/to/m#lti be caref#l of "hat records yo# "ant to %eep+ and
"hat records yo# "ant to delete
"hat if $ariable appears in both datasets+ b#t is
not in the ,by- statement?* after the merge ta%es the $al#e defined in the last
dataset of the ,merge- statement
7/25/2019 Notes for SAS Programming Fall2009
55/88
append
data appendedB
set pro
7/25/2019 Notes for SAS Programming Fall2009
56/88
6lass eample of merge and
append: reshape and s#mmarie
So#rce format of 5ro= 010K=O
>2K@O22 010@ 1 >2O@409 010@K 11
7/25/2019 Notes for SAS Programming Fall2009
57/88
(ain data iss#es
From long to "ide+ $ariable names are
different ecept for the %ey $ariable state!
We can generate a$erage frac#nins#red perstate either before or after the reshape+ b#t
the merge code "ill be different depending
on "hen "e comp#te the a$erage
frac#nins#red
7/25/2019 Notes for SAS Programming Fall2009
58/88
Step > to reshape:
generate a s#b/sample for each year+
so that "e ha$e:
s#bsample2009
s#bsample200
1
s#bsample200=
7/25/2019 Notes for SAS Programming Fall2009
59/88
foc#s on 2009
data s#bsample2009B
set pro
7/25/2019 Notes for SAS Programming Fall2009
60/88
Same for 200
data s#bsample200B
set pro
7/25/2019 Notes for SAS Programming Fall2009
61/88
Step 2 to reshape: merge each year;s
s#bsample by state
proc sort dataCs#bsample2009B by stateB r#nB
proc sort dataCs#bsample200=B by stateB r#nB
data reshapedB
merge s#bsample2009 s#bsample200 s#bsample200@
s#bsample200O s#bsample200K s#bsample2004
s#bsample200=Bby stateB
r#nB
proc print dataCreshapedB r#nB
& f i d
7/25/2019 Notes for SAS Programming Fall2009
62/88
&enerate a$erage frac#nins#red
by state
proc means dataCpro
7/25/2019 Notes for SAS Programming Fall2009
63/88
(erge the data file ,a$gperstate-
bac% to reshapedproc sort dataCreshapedB
by stateB
r#nB
proc sort dataCa$gperstateB
by stateB
r#nBdata reshapedJ"itha$gB
merge reshaped inCone! a$gperstate inCt"o!B
by stateB
myoneConeB
myt"oCt"oB
if oneC> Y t"oC>Br#nB
proc print dataCreshapedJ"itha$gB r#nB
6h % b i i
7/25/2019 Notes for SAS Programming Fall2009
64/88
6hec% obser$ations in
reshapedJ"itha$g
proc freV dataCreshapedJ"itha$gB
tables myoneDmyt"oB
r#nB
7/25/2019 Notes for SAS Programming Fall2009
65/88
mean comparison: t"o gro#ps
Eample: does the a$erage frac#nins#red differ bet"een 200 and 2009?
0: mean of frac#nins#red200 C mean of frac#nins#red20091
>: mean of frac#nins#red200 not eV#al to mean of frac#nins#red20091
)his is a t"o/tail mean/comparison test bet"een the 200 sample and the2009 sample1
)he test res#lt "ill be different if
>! "e treat 200 and 2009 as t"o independent samplesB or2! We treat 200 and 2009 as matched pairs matched by state!1
7/25/2019 Notes for SAS Programming Fall2009
66/88
mean comparison: t"o gro#ps
SAS performs mean comparison in a regression frame"or%1
0: mean of frac#nins#red200 C mean of frac#nins#red20091
>: mean of frac#nins#red200 not eV#al to mean of frac#nins#red20091
Step >: foc#s on the s#bsample that has 200 and 2009 data only1
Step 2: create a binary $ariable d#mmy2009C> if yearC2009+ 0 if yearC2001Step =: depend on "hether 200 and 2009 are independent samples or matched pairs1
f independent samples+ regress frac#nins#red as:
frac#nins#red C a \ bD d#mmy2009 \ error
f matched pairs+ regress frac#nins#red as:
frac#nins#red C a \ b D d#mmy2009
\ c>D d#mmyJAT \ c2D d#mmyJA[
\ \cK>Dd#mmyJW \ error
i t
7/25/2019 Notes for SAS Programming Fall2009
67/88
mean comparison: t"o gro#ps as
t"o independent samplesD Foc#s on 200 and 2009 data onlyB
3ata s#bsample009B
Set pro
7/25/2019 Notes for SAS Programming Fall2009
68/88
mean comparison: t"o gro#ps as
matched pairs matched by state!
D Foc#s on 200 and 2009 data onlyB
data s#bsample009B
set pro
7/25/2019 Notes for SAS Programming Fall2009
69/88
mean comparison: more than t"o gro#ps
Eample: does the a$erage frac#nins#red differ bet"een any t"o yearsin o#r data? here year C 200=+ 2004+ 2009!
)he test res#lt "ill be different if
>! "e treat year t and year t; as t"o independent samplesB or2! We treat year t and year t; as matched pairs matched by state!1
i
7/25/2019 Notes for SAS Programming Fall2009
70/88
mean comparison: e$ery year as an
independent sampleD )reat e$ery year as an independent sampleB
D No" "e need to #se the "hole data of pro
7/25/2019 Notes for SAS Programming Fall2009
71/88
mean comparison: e$ery year as
matched pairs matched by state!D )reat e$ery year as matched by stateB
D First define d#mmies for each yearB
data proB else d#mmy2004C0B
if yearC200K then d#mmy200KC>B else d#mmy200KC0B
if yearC2009 then d#mmy2009C>B else d#mmy2009C0B
r#nB
proc glm dataCpro
7/25/2019 Notes for SAS Programming Fall2009
72/88
notes on mean comparison
>1 )he logic of mean comparison is the same as in Ecel
21 8e caref#l abo#t one/tail and t"o/tail tests1
)he standard SAS o#tp#t of coefficient t/stat and p/$al#e are based on a t"o/tail test of 0: coeffC0+b#t co#ld be #sed for a one/tail test if "e compare >/alpha $s1 p/$al#e2 instead of p/$al#e1
=1 6omparison across more than t"o gro#ps as independent samples!
0: all gro#ps ha$e the same mean
F/test of "hole regression.'
0: gro#p and gro#p y has the same mean
"aller or lsd statistics
41 6omparison across more than t"o gro#ps as matched pairs! reV#ires specific test on regressioncoefficients1
0: 200= C 2004test the coefficient of d#mmy2004C0 beca#se 200= is set as the benchmar%
0: 2004C200Ktest the coefficient of d#mmy2004 C coefficient of d#mmy200K1
E li it h th i t t
7/25/2019 Notes for SAS Programming Fall2009
73/88
Eplicit hypothesis tests
in proc reg and proc glmD )reating each year as independent samplesB
5roc reg dataCpro 0 0 0 /> 0B6ontrast Ptest 200 $s1 2009;
year 0 0 0 0 > />B
'#nB)est frac#nins#red200=Cfrac#nins#red200B
)est frac#nins#red200Cfrac#nins#red2009B
)est frac#nins#red200=Cfrac#nins#red200B
)est frac#nins#red200Cfrac#nins#red2009B
n class e ercise
7/25/2019 Notes for SAS Programming Fall2009
74/88
n class eercise
for mean comparison(ain V#estion:
compare frac#nins#red in "est+ midatlantic and e$ery"here else+ "here
"est C 6A+ WA+ .'
midatlantic C 3E+ 36+ (3+ HA
step 0: define "est+ midatlantic and e$ery"hereelse
eercise >: compare "est and midatlantic
>a!1 as t"o independent samples
>b!1 consider the match by year
eercise 2: compare "est+ midatlantic+ and e$ery"here else2a!1 as three independent samplesB
2b!1 consider the match by yearB
7/25/2019 Notes for SAS Programming Fall2009
75/88
regression in SAS
U#estion: ho" do frac#nins#red $ary by total pop#lation of a state?
D model: frac#nins#redCa\bDtotalpop\errorB
proc reg dataCpro
7/25/2019 Notes for SAS Programming Fall2009
76/88
A comprehensi$e eample
A re$ie" of
>1 readin data
21 s#mmary statistics=1 mean comparison
41 regression
reg/cityreg/simple1sas in N:share
7/25/2019 Notes for SAS Programming Fall2009
77/88
A 6ase St#dy of Tos Angeles 'esta#rants
Nov. 16-18, 1997 CBS 2 News Behind the Kitchen Door
Jn!r" 16, 1998, #$ co!nt" ins%ectors strt iss!in& h"&iene
&rde crds
$ &rde i' score o' 9( to 1((
B &rde i' score o' 8( to 89
C &rde i' score o' 7( to 79
score )e*ow 7( ct!* score shown
+rde crds re %roinent*" dis%*"ed
in rest!rnt windows
Score not shown on &rde crds
7/25/2019 Notes for SAS Programming Fall2009
78/88
7/25/2019 Notes for SAS Programming Fall2009
79/88
)a%e the idea to data
betternformation
betterV#ality
reg#lationby co#nty
by city
hygienescores
'esearch U#estion:
3oes better information lead to better hygiene V#ality?
3 t li ti
7/25/2019 Notes for SAS Programming Fall2009
80/88
3ata complicationsbl#e font indicates o#r final choices!
Znit of analysis:
indi$id#al resta#rant? city? ipcode? cens#s tract?
Znit of time:
each inspection?per month? per V#arter? per year?
3efine information:
co#nty reg#lation? city reg#lation? the date of passing the reg#lation?
days since passing the reg#lation? L of days #nder reg#lation?
3efine V#ality:
a$erage hygiene score? the n#mber of A resta#rants? L of A
resta#rants?
7/25/2019 Notes for SAS Programming Fall2009
81/88
o" to test the idea?
'egression:
V#ality C ]\^Dinformation\error
\something else?
Something else co#ld be:
year trend+ seasonality+ city specific effects+ 1
7/25/2019 Notes for SAS Programming Fall2009
82/88
real test
reg/cityreg/simple1sas in N:share
7/25/2019 Notes for SAS Programming Fall2009
83/88
U#estions
o" many obser$ations in the sample?
* log of the first data step+ or o#tp#t from proc contents
o" many $ariables in the sample? o" many
are n#merical+ ho" many are characters?* .#tp#t from proc contents
o" many percentage of resta#rants ha$e A
V#ality in a typical city/month?* .#tp#t from proc means+ on perJA
7/25/2019 Notes for SAS Programming Fall2009
84/88
U#estions
What is the difference bet"een cityreg and ctyreg? We %no" co#ntyreg#lation came earlier than city reg#lation+ is that reflected in o#rdata?* es+ cityregICctyreg in e$ery obser$ation* We can chec% this in proc means for cityreg and ctyreg+ or add a proc
print to eyeball each obs
What is the difference bet"een cityreg and citymper? What is themean of cityreg? What is the mean of citymper? Are they consistent"ith their definitions?* )he #nit of cityreg is _ of days+ so it sho#ld be a non/negati$e integer
* )he #nit of citymper is L of days+ so it sho#ld be a real n#mber bet"een0 and >* )o chec% this+ "e can add a proc means for cityreg and citymper
7/25/2019 Notes for SAS Programming Fall2009
85/88
U#estions
Economic theories s#ggest V#ality be higher after
the reg#lation if reg#lation gi$es cons#mers better
information1 s that tr#e?
*)he s#mmary statistics reported in proc meansclass citymJg or ctymJg! sho" the a$erage
percentage of A resta#rants in different
reg#lation en$ironments1
*'igoro#s mean comparison tests are done in
proc glm "ith "aller or lsd options1
7/25/2019 Notes for SAS Programming Fall2009
86/88
U#estions
S#mmary statistics often reflect many economic factors+ notonly the one in o#r mind1 )hat is "hy "e need regressions1
3oes more reg#lation lead to higher V#ality?
* is the coefficient of city reg#lation positi$e andsignificantly different from ero? proc reg!
* is the coefficient of co#nty reg#lation positi$e andsignificantly different from ero? proc reg!
* 3o "e omit other sensible eplanations for V#alitychanges? What are they? proc glm+ year+ V#arter+ city!
7/25/2019 Notes for SAS Programming Fall2009
87/88
6o#nt d#plicates not reV#ired!
http:s#pport1sas1comctsamplesinde1
7/25/2019 Notes for SAS Programming Fall2009
88/88
co#rse e$al#ation
Zni$ersity "ide:
* """16o#rseE$alZ(1#md1ed#
))class in partic#lar: pass"ord plstt!* """1s#r$eyshare1coms#r$eyta%e?sidC>0@
http://www.courseevalum.umd.edu/http://www.surveyshare.com/survey/take/?sid=81087http://www.surveyshare.com/survey/take/?sid=81087http://www.courseevalum.umd.edu/