Notes for SAS Programming Fall2009

Embed Size (px)

Citation preview

  • 7/25/2019 Notes for SAS Programming Fall2009

    1/88

    Notes for SAS programming

    Econ424

    Fall 2009

  • 7/25/2019 Notes for SAS Programming Fall2009

    2/88

    Why SAS?

    Able to process large data sets!

    Easy to cope "ith m#ltiple $ariables

    Able to trac% all the operations on the data sets!

    &enerate systematic o#tp#t

    S#mmary statistics

    &raphs 'egression res#lts

    (ost go$ernment agencies and pri$ate sectors #se

    SAS

  • 7/25/2019 Notes for SAS Programming Fall2009

    3/88

    Where to find SAS?

    )he (S Windo" $ersion of SAS is only a$ailable on camp#s* class "ebsite+ ,comp#ter reso#rces-+ ,.n/camp#s comp#ter lab-* list comp#ter lab location+ ho#rs and soft"are

    o# can access SAS remotely $ia gl#e1#md1ed#+ b#t yo# cannot #se theinteracti$e "indo"s

    * #se a sec#red telnet e1g1 ssh! to remotely login gl#e1#md1ed# "ith yo#rdirectory 3 and pass"ord

    * type ,tap sas- to tell the system that yo# "ant to access SAS* #se any tet editor say pico! to edit yo#r sas program say myprog1sas!1 o#

    can also create the tet/only 1sas file in yo#r 56 and sec#rely! ftp it intogl#e1

    * n gl#e+ type ,sas myprog1sas 7- "ill send the sas program to r#n in the

    bac%gro#nd1* )he comp#ter "ill a#tomatically generate myprog1log to tell yo# ho" each

    command r#ns in sas1 f yo#r program prod#ces any o#tp#t+ the o#tp#t "illbe a#tomatically sa$ed in myprog1lst1 All in the same directory as yo#r 1sasfile1

  • 7/25/2019 Notes for SAS Programming Fall2009

    4/88

    'oadmap

    )hin%ing in ,SAS-

    8asic r#les

    'ead in data 3ata cleaning commands

    S#mmary statistics

    6ombine t"o or more datasets ypothesis testing

    'egression

  • 7/25/2019 Notes for SAS Programming Fall2009

    5/88

    )hin%ing in ,SAS-

    What is a program?* Algorithm+ recipe+ set of instr#ctions

    o" is programming done in SAS?

    * SAS is li%e programming in any lang#age: Step by step instr#ctions 6an create yo#r o"n ro#tines to process data (ost instr#ctions are set #p in a logical manner

    * SAS is N.) li%e other lang#ages:

    Some synta is pec#liar to SAS Written specifically for statistics so it isn;t all/p#rpose 6anned processes that yo# cannot edit nor can yo# see the

    code

  • 7/25/2019 Notes for SAS Programming Fall2009

    6/88

    )hin%ing in ,SAS-

    6reating a program

    * What is yo#r problem? ta%e pro

  • 7/25/2019 Notes for SAS Programming Fall2009

    7/88

    8asic r#les >! * organie files

    1sas * program file

    1log * notes+ errors+ "arnings

    1lst * o#tp#t

    1sas@bdat * data file library * a cabinet to p#t data in* 3efa#lt: Wor% library

    temporary+ erased after yo# close the session

    * 5ermanent library

    libname mylib ,m:-B mylib1mydata

    C a sas data file named ,mydata- in library ,mylib-

    r#n and recall 1sas

  • 7/25/2019 Notes for SAS Programming Fall2009

    8/88

    8asic r#les 2! // program

    e$ery command ends "ith B

    format does not matter

    if C> then yC>B else yC2Bis the same as

    if C> then yC>B

    else yC2B

    case insensiti$e

    commentD this is commentBD this is comment DB

  • 7/25/2019 Notes for SAS Programming Fall2009

    9/88

    8asic r#le =! * $ariable

    )ype* n#meric defa#lt+ digit+ 1 stands for missing $al#e!

    * character G+ defa#lt digit+ blan% stands for missing!

    Hariable names* IC=2 characters if SAS 910 or abo$e

    * IC characters if SAS or belo"

    * case insensiti$e

    (#st start "ith letter or ,J- Jname+ myJname+ ipK+ #JandJme

    /name+ my/name+ Kip+ perL+ #7me+ myM"+ myGsign

  • 7/25/2019 Notes for SAS Programming Fall2009

    10/88

    8asic r#les 4! * data step

    3ata step3A)A ne"dataB

    set pro

    obs2

    obs n

    define

    ,frac#nins#red-

    define

    ,percent#nins#red-

    o#tp#t data,ne"data-

    obs>

    obs n

    create a ne" data set called ,ne"data- in

    the temporary library

    #se the data set called ,pro

  • 7/25/2019 Notes for SAS Programming Fall2009

    11/88

    8asic r#les K! * proc step

    5'.6 step

    5'.6 5'N) dataCne"dataB

    $ar frac#nins#red percent#nins#redB

    title ,print o#t ne" data-Br#nB

    Action

    3ataso#rce

    Signal the end of 5'.6 step+

    co#ld be ignored if this is

    follo"ed by a 3ata or 5roc step

  • 7/25/2019 Notes for SAS Programming Fall2009

    12/88

    'ead in data >! * table editor

    )ools * table editor

    choose an eisting library or define a ne"

    library rename $ariable

    sa$e as

  • 7/25/2019 Notes for SAS Programming Fall2009

    13/88

    'ead in data 2! * by program

    3ata format

    *3atalines enter the data in the program!

    *Eisting data tet+ comma or tab delimited*Eisting data tet+ fied "idth

    *From an eisting ecel file

  • 7/25/2019 Notes for SAS Programming Fall2009

    14/88

    )he follo"ing se$eral slides "on;t be

    co$ered in class1 8#t yo# are "elcome to

    #se them by yo#rsel$es1

  • 7/25/2019 Notes for SAS Programming Fall2009

    15/88

    'ead in data 2! // datalines

    data testdata>B

    infile datalinesB

    inp#t id height "eight gender G ageB

    datalinesB

    > O >44 ( 2=2 @ 1 ( =4

    = O2 99 F =@

    B

    D yo# only need one semicolon at the end of all data lines+ b#t the

    semicolon m#st stand alone in one line D

    proc contents dataCtestdata>B r#nB

    proc print dataCtestdata>B r#nB

    No B #ntil yo# finish all

    the data lines

  • 7/25/2019 Notes for SAS Programming Fall2009

    16/88

    'ead in data =! * more datalines

    D read in data in fied col#mns D

    data testdata>B

    infile datalinesB

    inp#t id > height 2/= "eight 4/O gender G@ age /9B

    datalinesB

    >O>44(2=

    2@ (=4

    =O299 F=@

    B

  • 7/25/2019 Notes for SAS Programming Fall2009

    17/88

    'ead in data 4! * more datalines

    data testdata>B

    infile datalinesB

    inp#t id : >1 height : 21 "eight : =1 gender : G>1 age : 21B

    datalinesB

    > O >44 ( 2=

    2 @ 1 ( =4

    = O2 99 F =@

    B

  • 7/25/2019 Notes for SAS Programming Fall2009

    18/88

    'ead in data 4! * more datalines

    Dalternati$elyB

    datatestdata>B

    infile datalinesB

    informat id 1.height 2."eight 3.gender G>1 age 2.B

    inp#t id height "eight gender ageB

    datalinesB

    > O >44 ( 2=

    2 @ 1 ( =4

    = O2 99 F =@

    B

  • 7/25/2019 Notes for SAS Programming Fall2009

    19/88

    'ead in data K! * 1cs$

    data testdata>B

    infile datalines dlmCP+; dsd misso$erB

    D "hat if yo# do not ha$e dsd andor misso$er D

    inp#t id height "eight gender G age BdatalinesB

    >+ O+ >44+ (+ 2=

    2+ @+ + (+ =4

    =+ O2+ 99+ F+ =@B D "hat if yo# forget to type 2= D

    r#nB

  • 7/25/2019 Notes for SAS Programming Fall2009

    20/88

    'ead in data O! * 1cs$ file

    D sa$e the pro

  • 7/25/2019 Notes for SAS Programming Fall2009

    21/88

    2>

    'ead in data @! * from ecel

    filename myecel ,(:pro

  • 7/25/2019 Notes for SAS Programming Fall2009

    22/88

    22

    'ead in data @! * from ecel

    8e caref#l

    SAS "ill read the first line as $ariable names+ and ass#methe ra" data start from the second ro"1

    SAS assigns n#meric and character type a#tomatically1Sometime it does ma%e mista%e1

  • 7/25/2019 Notes for SAS Programming Fall2009

    23/88

    3ata cleaning >! * if then

    Format:

    F condition )EN actionB

    ETSE F condition )EN actionB

    ETSE actionB

    Note:

    >! the if/then/else can be nested as many as yo#

    "ant2! if yo# need m#ltiple actions instead of one

    action+ #se ,3.B action>B action2B EN3B -

  • 7/25/2019 Notes for SAS Programming Fall2009

    24/88

    3ata cleaning >! * if then

    C or EU means eV#als

    C or NE means not eV#al

    X or &) means greater than

    I or T) means less than XC or &E means greater than or eV#al

    IC or TE means less than or eV#al

    in means s#bset

    * if gender in P(;+ PF;! then 11B (#ltiple conditions: AN37!+ .'Y!

  • 7/25/2019 Notes for SAS Programming Fall2009

    25/88

    3ata cleaning >! * if then

    Dreading in program of proK )EN 11B

    2!missing $al#e is al"ays co#nted as the smallest negati$e+ sofrac#nins#redC1 "ill satisfy the condition frac#ins#redI01>K1 f yo# "antto ignore the missing obs set the condition as 0ICfrac#nins#redI01>K1

  • 7/25/2019 Notes for SAS Programming Fall2009

    26/88

    3ata cleaning >! * if then

    * (#ltiple actions in each branchB

    data proK AN3 #nins#redX>000000 )EN 3.B

    #nins#redgrpC0B #nins#redpopCPo$er > millionRB

    EN3B

    ETSE 3.B

    #nins#redgrpC1B #nins#redpopCPless than > millionRB

    EN3B

    r#nB

    proc print dataCpro

  • 7/25/2019 Notes for SAS Programming Fall2009

    27/88

    3ata cleaning >! * if then

    DZse if commands to choose a s#bsampleB

    data pro

  • 7/25/2019 Notes for SAS Programming Fall2009

    28/88

    3ata cleaning >! * eercise

    still #se pro if frac#nins#red I01> lo"!

    2 if 01>ICfrac#nins#redI01>K mid/lo"!

    = if 01>KICfrac#nins#redI012 mid/high!

    4 if frac#nins#redXC012 high!1

  • 7/25/2019 Notes for SAS Programming Fall2009

    29/88

    3ata cleaning >! * eercise ans"er

    data pro then ne"grpC>B

    else if frac#nins#redI01>K then ne"grpC2Belse if frac#nins#redI012 then ne"grpC=B

    else ne"grpC4B

    r#nB

    proc contents dataCpro

  • 7/25/2019 Notes for SAS Programming Fall2009

    30/88

    Sa$e data

    D Sa$e in sas formatB

    libname mylib ,(:-B

    data mylib+pro

  • 7/25/2019 Notes for SAS Programming Fall2009

    31/88

    5ages =>/=4 are optional material fordata cleaning1

    )hey are not reV#ired+ b#t yo# may

    find them #sef#l in the f#t#re1

    We s%ip them in the reg#lar class1

  • 7/25/2019 Notes for SAS Programming Fall2009

    32/88

    3ata cleaning 2!

    * con$ert $ariable typeN#meric to character:

    age>Cp#tage+ G21!B

    age and age> ha$e the same contents b#t different formats

    6haracter to n#meric:

    age2Cinp#tage>+ >1!B

    no" age and age2 are both n#meric+ b#t age2 is chopped at the firstdigit

    )a%e a s#b string of a character

    age=Cs#bstrage>+2+>!B

    no" age= is a s#b string of age>+ starting from the second digit ofage> the meaning of ,2-! and ha$ing one digit in total the meaningof ,>-!1

  • 7/25/2019 Notes for SAS Programming Fall2009

    33/88

    3ata cleaning 2! / eample

    * we want to convert studid 012345678 to 012-345-678;

    data testdata2;

    infile datalines;

    input studid : 9 studna!e : "1;

    datalines;

    012345678 #

    135792468 $

    009876543 %

    ;

    proc print; run;

    data testdata2;

    set testdata2;

    if studid &' 1()7 ten studid1+ ,00..co!press/put/studid "9;

    else if 1()7 &( studid &' 1()8 ten studid1+,0,..co!press/put/studid

    "9;

    else studid1+ put/studid "9;

    studid2+sustr/studid113..,-,..sustr/studid143..,-,..

    sustr/studid173;

    proc print; run;

  • 7/25/2019 Notes for SAS Programming Fall2009

    34/88

    3ata cleaning 2! / eercise

    ou ave te followin data variales in seuence are

    score1 score2 score3 score4 score5:

    123-45-6789 100 98 96 95 92

    344-56-7234 69 79 82 65 88

    898-23-1234 80 80 82 86 92

    %alculate te averae and standard deviation of te five

    scores for eac individual se if-ten co!!and to find

    out wo as te iest averae score and report is

    witout dases

  • 7/25/2019 Notes for SAS Programming Fall2009

    35/88

    3ata s#mmary roadmap

    5roc contents * $ariable definitions

    5roc print * ra" data

    5roc format * ma%e yo#r print loo% nicer

    5roc sort * sort the data

    5roc means * basic s#mmary statistics

    5roc #ni$ariate * detailed s#mmary stat

    5roc freV * freV#ency co#nt 5roc chart * histogram

    proc plot * scatter plot

  • 7/25/2019 Notes for SAS Programming Fall2009

    36/88

    proc format >!

    D6ontin#e the data cleaning eercise on page 29B

    data prothen ne"grpC>B

    else if frac#nins#redI01>K then ne"grpC2B

    else if frac#nins#redI012 then ne"grpC=B

    else ne"grpC4B

    r#nB

    3efining ,gro#p- as a

    n#meric $ariable "ill

    sa$e space

  • 7/25/2019 Notes for SAS Programming Fall2009

    37/88

    proc format 2!

    proc formatB

    $al#e ne"gro#p >CPlo";

    2CPmid/lo";

    =CPmid/high;

    4CPhigh;B r#nB

    proc print dataCpro

  • 7/25/2019 Notes for SAS Programming Fall2009

    38/88

    proc sort

    proc sort dataCpro

  • 7/25/2019 Notes for SAS Programming Fall2009

    39/88

    proc means and proc #ni$ariate

    proc means dataCpro

  • 7/25/2019 Notes for SAS Programming Fall2009

    40/88

    Notes on proc means and proc #ni$ariate

    Dif yo# do not #se classorbycommand+ thestatistics are based on the f#ll sample1 f yo# #seclassorby$ar + the statistics are based on the

    s#bsample defined by each $al#e of $ar 1Do# can #se classorbyinproc means+ b#t onlyby

    inproc #ni$ariateB

    D"hene$er yo# #se ,by$ar -+ the data set sho#ld

    be sorted by $ar beforehandB

  • 7/25/2019 Notes for SAS Programming Fall2009

    41/88

    proc means and proc #ni$ariate

    allo" m#ltiple gro#psdata pro

  • 7/25/2019 Notes for SAS Programming Fall2009

    42/88

    proc freV

    D 'emember "e already generate a $ariable called ne"grp toindicate categories of fraction #nins#red and a $ariable called

    popgrp to indicate categories of pop#lation sieB

    proc freV dataCpro

  • 7/25/2019 Notes for SAS Programming Fall2009

    43/88

    proc chart * histogram for

    categorical $ariables

    proc chart dataCpro

  • 7/25/2019 Notes for SAS Programming Fall2009

    44/88

    proc chart * histogram for

    contin#o#s $ariable

    proc chart dataCpro

  • 7/25/2019 Notes for SAS Programming Fall2009

    45/88

    proc plot * scatter plot

    proc plot dataCpro

  • 7/25/2019 Notes for SAS Programming Fall2009

    46/88

    scatter plot is less informati$e for

    categorical $ariables

    proc plot dataCpro

  • 7/25/2019 Notes for SAS Programming Fall2009

    47/88

    fancy proc means

    proc means dataCpro

    mean C a$g#nins#red a$gfrac#nins#redB

    r#nB

    proc print dataCs#mmary>Br#nB

  • 7/25/2019 Notes for SAS Programming Fall2009

    48/88

    )he follo"ing page may be#sef#l in practice+ b#t am not

    going to co$er it in class1

  • 7/25/2019 Notes for SAS Programming Fall2009

    49/88

    some s#mmary stat1 in proc print

    D Ass#me "e ha$e already defined ne"grp andpopgrp in proB

    by popgrpB

    s#m totalpopB $ar totalpop ins#red #nins#red frac#nins#redB

    r#nB

  • 7/25/2019 Notes for SAS Programming Fall2009

    50/88

    o" to handle m#ltiple data sets?

    Add more observationsto an eisting dataand the ne" obser$ations follo" the samedata str#ct#re as the old oneappend

    Add more variablesto an eisting data andthe ne" $ariables refer to the sames#b

  • 7/25/2019 Notes for SAS Programming Fall2009

    51/88

    merge and append

    pro high

    s#mmary>: ne"grp popgrp a$g#nins#red a$gfrac#nins#red

    > high @K00000 010@=

    merged:

    year state totalpop .fracuninsured newgrp popgrp avguninsure avgfracuninsured

    2009 (A O42094@ 010K4 > high @K00000 010@=

    appended:year state totalpop .fracuninsured newgrp popgrp avguninsure avgfracuninsured

    2009 (A O42094@ 010K4 > high 1 1

    1 1 1 > high @K00000 010@=

  • 7/25/2019 Notes for SAS Programming Fall2009

    52/88

    merge t"o datasets

    proc sort dataCproB

    by ne"grp popgrpB

    r#nB

    data mergedB

    merge proB

    r#nB What if this line is

    ,if oneC> .' t"oC>B-?

  • 7/25/2019 Notes for SAS Programming Fall2009

    53/88

    [eep trac% of matched and

    #nmatched records

    data allrecordsB

    merge proB

    r#nB

    proc freV dataCallrecordsBtables myoneDmyt"oB

    r#nB

    SAS "ill drop $ariables,one- and ,t"o-

    a#tomatically at the end of

    the 3A)A step1 f yo# "ant

    to %eep them+ yo# can copythem into ne" $ariables

    ,myone- and ,myt"o-

  • 7/25/2019 Notes for SAS Programming Fall2009

    54/88

    be caref#l abo#t merge

    al"ays p#t the merged data into a ne"data set

    m#st sort by the %ey $ariables before merge

    o% for one/to/one+ m#lti/to/one+ one/to/m#lti+ b#t

    no good for m#lti/to/m#lti be caref#l of "hat records yo# "ant to %eep+ and

    "hat records yo# "ant to delete

    "hat if $ariable appears in both datasets+ b#t is

    not in the ,by- statement?* after the merge ta%es the $al#e defined in the last

    dataset of the ,merge- statement

  • 7/25/2019 Notes for SAS Programming Fall2009

    55/88

    append

    data appendedB

    set pro

  • 7/25/2019 Notes for SAS Programming Fall2009

    56/88

    6lass eample of merge and

    append: reshape and s#mmarie

    So#rce format of 5ro= 010K=O

    >2K@O22 010@ 1 >2O@409 010@K 11

  • 7/25/2019 Notes for SAS Programming Fall2009

    57/88

    (ain data iss#es

    From long to "ide+ $ariable names are

    different ecept for the %ey $ariable state!

    We can generate a$erage frac#nins#red perstate either before or after the reshape+ b#t

    the merge code "ill be different depending

    on "hen "e comp#te the a$erage

    frac#nins#red

  • 7/25/2019 Notes for SAS Programming Fall2009

    58/88

    Step > to reshape:

    generate a s#b/sample for each year+

    so that "e ha$e:

    s#bsample2009

    s#bsample200

    1

    s#bsample200=

  • 7/25/2019 Notes for SAS Programming Fall2009

    59/88

    foc#s on 2009

    data s#bsample2009B

    set pro

  • 7/25/2019 Notes for SAS Programming Fall2009

    60/88

    Same for 200

    data s#bsample200B

    set pro

  • 7/25/2019 Notes for SAS Programming Fall2009

    61/88

    Step 2 to reshape: merge each year;s

    s#bsample by state

    proc sort dataCs#bsample2009B by stateB r#nB

    proc sort dataCs#bsample200=B by stateB r#nB

    data reshapedB

    merge s#bsample2009 s#bsample200 s#bsample200@

    s#bsample200O s#bsample200K s#bsample2004

    s#bsample200=Bby stateB

    r#nB

    proc print dataCreshapedB r#nB

    & f i d

  • 7/25/2019 Notes for SAS Programming Fall2009

    62/88

    &enerate a$erage frac#nins#red

    by state

    proc means dataCpro

  • 7/25/2019 Notes for SAS Programming Fall2009

    63/88

    (erge the data file ,a$gperstate-

    bac% to reshapedproc sort dataCreshapedB

    by stateB

    r#nB

    proc sort dataCa$gperstateB

    by stateB

    r#nBdata reshapedJ"itha$gB

    merge reshaped inCone! a$gperstate inCt"o!B

    by stateB

    myoneConeB

    myt"oCt"oB

    if oneC> Y t"oC>Br#nB

    proc print dataCreshapedJ"itha$gB r#nB

    6h % b i i

  • 7/25/2019 Notes for SAS Programming Fall2009

    64/88

    6hec% obser$ations in

    reshapedJ"itha$g

    proc freV dataCreshapedJ"itha$gB

    tables myoneDmyt"oB

    r#nB

  • 7/25/2019 Notes for SAS Programming Fall2009

    65/88

    mean comparison: t"o gro#ps

    Eample: does the a$erage frac#nins#red differ bet"een 200 and 2009?

    0: mean of frac#nins#red200 C mean of frac#nins#red20091

    >: mean of frac#nins#red200 not eV#al to mean of frac#nins#red20091

    )his is a t"o/tail mean/comparison test bet"een the 200 sample and the2009 sample1

    )he test res#lt "ill be different if

    >! "e treat 200 and 2009 as t"o independent samplesB or2! We treat 200 and 2009 as matched pairs matched by state!1

  • 7/25/2019 Notes for SAS Programming Fall2009

    66/88

    mean comparison: t"o gro#ps

    SAS performs mean comparison in a regression frame"or%1

    0: mean of frac#nins#red200 C mean of frac#nins#red20091

    >: mean of frac#nins#red200 not eV#al to mean of frac#nins#red20091

    Step >: foc#s on the s#bsample that has 200 and 2009 data only1

    Step 2: create a binary $ariable d#mmy2009C> if yearC2009+ 0 if yearC2001Step =: depend on "hether 200 and 2009 are independent samples or matched pairs1

    f independent samples+ regress frac#nins#red as:

    frac#nins#red C a \ bD d#mmy2009 \ error

    f matched pairs+ regress frac#nins#red as:

    frac#nins#red C a \ b D d#mmy2009

    \ c>D d#mmyJAT \ c2D d#mmyJA[

    \ \cK>Dd#mmyJW \ error

    i t

  • 7/25/2019 Notes for SAS Programming Fall2009

    67/88

    mean comparison: t"o gro#ps as

    t"o independent samplesD Foc#s on 200 and 2009 data onlyB

    3ata s#bsample009B

    Set pro

  • 7/25/2019 Notes for SAS Programming Fall2009

    68/88

    mean comparison: t"o gro#ps as

    matched pairs matched by state!

    D Foc#s on 200 and 2009 data onlyB

    data s#bsample009B

    set pro

  • 7/25/2019 Notes for SAS Programming Fall2009

    69/88

    mean comparison: more than t"o gro#ps

    Eample: does the a$erage frac#nins#red differ bet"een any t"o yearsin o#r data? here year C 200=+ 2004+ 2009!

    )he test res#lt "ill be different if

    >! "e treat year t and year t; as t"o independent samplesB or2! We treat year t and year t; as matched pairs matched by state!1

    i

  • 7/25/2019 Notes for SAS Programming Fall2009

    70/88

    mean comparison: e$ery year as an

    independent sampleD )reat e$ery year as an independent sampleB

    D No" "e need to #se the "hole data of pro

  • 7/25/2019 Notes for SAS Programming Fall2009

    71/88

    mean comparison: e$ery year as

    matched pairs matched by state!D )reat e$ery year as matched by stateB

    D First define d#mmies for each yearB

    data proB else d#mmy2004C0B

    if yearC200K then d#mmy200KC>B else d#mmy200KC0B

    if yearC2009 then d#mmy2009C>B else d#mmy2009C0B

    r#nB

    proc glm dataCpro

  • 7/25/2019 Notes for SAS Programming Fall2009

    72/88

    notes on mean comparison

    >1 )he logic of mean comparison is the same as in Ecel

    21 8e caref#l abo#t one/tail and t"o/tail tests1

    )he standard SAS o#tp#t of coefficient t/stat and p/$al#e are based on a t"o/tail test of 0: coeffC0+b#t co#ld be #sed for a one/tail test if "e compare >/alpha $s1 p/$al#e2 instead of p/$al#e1

    =1 6omparison across more than t"o gro#ps as independent samples!

    0: all gro#ps ha$e the same mean

    F/test of "hole regression.'

    0: gro#p and gro#p y has the same mean

    "aller or lsd statistics

    41 6omparison across more than t"o gro#ps as matched pairs! reV#ires specific test on regressioncoefficients1

    0: 200= C 2004test the coefficient of d#mmy2004C0 beca#se 200= is set as the benchmar%

    0: 2004C200Ktest the coefficient of d#mmy2004 C coefficient of d#mmy200K1

    E li it h th i t t

  • 7/25/2019 Notes for SAS Programming Fall2009

    73/88

    Eplicit hypothesis tests

    in proc reg and proc glmD )reating each year as independent samplesB

    5roc reg dataCpro 0 0 0 /> 0B6ontrast Ptest 200 $s1 2009;

    year 0 0 0 0 > />B

    '#nB)est frac#nins#red200=Cfrac#nins#red200B

    )est frac#nins#red200Cfrac#nins#red2009B

    )est frac#nins#red200=Cfrac#nins#red200B

    )est frac#nins#red200Cfrac#nins#red2009B

    n class e ercise

  • 7/25/2019 Notes for SAS Programming Fall2009

    74/88

    n class eercise

    for mean comparison(ain V#estion:

    compare frac#nins#red in "est+ midatlantic and e$ery"here else+ "here

    "est C 6A+ WA+ .'

    midatlantic C 3E+ 36+ (3+ HA

    step 0: define "est+ midatlantic and e$ery"hereelse

    eercise >: compare "est and midatlantic

    >a!1 as t"o independent samples

    >b!1 consider the match by year

    eercise 2: compare "est+ midatlantic+ and e$ery"here else2a!1 as three independent samplesB

    2b!1 consider the match by yearB

  • 7/25/2019 Notes for SAS Programming Fall2009

    75/88

    regression in SAS

    U#estion: ho" do frac#nins#red $ary by total pop#lation of a state?

    D model: frac#nins#redCa\bDtotalpop\errorB

    proc reg dataCpro

  • 7/25/2019 Notes for SAS Programming Fall2009

    76/88

    A comprehensi$e eample

    A re$ie" of

    >1 readin data

    21 s#mmary statistics=1 mean comparison

    41 regression

    reg/cityreg/simple1sas in N:share

  • 7/25/2019 Notes for SAS Programming Fall2009

    77/88

    A 6ase St#dy of Tos Angeles 'esta#rants

    Nov. 16-18, 1997 CBS 2 News Behind the Kitchen Door

    Jn!r" 16, 1998, #$ co!nt" ins%ectors strt iss!in& h"&iene

    &rde crds

    $ &rde i' score o' 9( to 1((

    B &rde i' score o' 8( to 89

    C &rde i' score o' 7( to 79

    score )e*ow 7( ct!* score shown

    +rde crds re %roinent*" dis%*"ed

    in rest!rnt windows

    Score not shown on &rde crds

  • 7/25/2019 Notes for SAS Programming Fall2009

    78/88

  • 7/25/2019 Notes for SAS Programming Fall2009

    79/88

    )a%e the idea to data

    betternformation

    betterV#ality

    reg#lationby co#nty

    by city

    hygienescores

    'esearch U#estion:

    3oes better information lead to better hygiene V#ality?

    3 t li ti

  • 7/25/2019 Notes for SAS Programming Fall2009

    80/88

    3ata complicationsbl#e font indicates o#r final choices!

    Znit of analysis:

    indi$id#al resta#rant? city? ipcode? cens#s tract?

    Znit of time:

    each inspection?per month? per V#arter? per year?

    3efine information:

    co#nty reg#lation? city reg#lation? the date of passing the reg#lation?

    days since passing the reg#lation? L of days #nder reg#lation?

    3efine V#ality:

    a$erage hygiene score? the n#mber of A resta#rants? L of A

    resta#rants?

  • 7/25/2019 Notes for SAS Programming Fall2009

    81/88

    o" to test the idea?

    'egression:

    V#ality C ]\^Dinformation\error

    \something else?

    Something else co#ld be:

    year trend+ seasonality+ city specific effects+ 1

  • 7/25/2019 Notes for SAS Programming Fall2009

    82/88

    real test

    reg/cityreg/simple1sas in N:share

  • 7/25/2019 Notes for SAS Programming Fall2009

    83/88

    U#estions

    o" many obser$ations in the sample?

    * log of the first data step+ or o#tp#t from proc contents

    o" many $ariables in the sample? o" many

    are n#merical+ ho" many are characters?* .#tp#t from proc contents

    o" many percentage of resta#rants ha$e A

    V#ality in a typical city/month?* .#tp#t from proc means+ on perJA

  • 7/25/2019 Notes for SAS Programming Fall2009

    84/88

    U#estions

    What is the difference bet"een cityreg and ctyreg? We %no" co#ntyreg#lation came earlier than city reg#lation+ is that reflected in o#rdata?* es+ cityregICctyreg in e$ery obser$ation* We can chec% this in proc means for cityreg and ctyreg+ or add a proc

    print to eyeball each obs

    What is the difference bet"een cityreg and citymper? What is themean of cityreg? What is the mean of citymper? Are they consistent"ith their definitions?* )he #nit of cityreg is _ of days+ so it sho#ld be a non/negati$e integer

    * )he #nit of citymper is L of days+ so it sho#ld be a real n#mber bet"een0 and >* )o chec% this+ "e can add a proc means for cityreg and citymper

  • 7/25/2019 Notes for SAS Programming Fall2009

    85/88

    U#estions

    Economic theories s#ggest V#ality be higher after

    the reg#lation if reg#lation gi$es cons#mers better

    information1 s that tr#e?

    *)he s#mmary statistics reported in proc meansclass citymJg or ctymJg! sho" the a$erage

    percentage of A resta#rants in different

    reg#lation en$ironments1

    *'igoro#s mean comparison tests are done in

    proc glm "ith "aller or lsd options1

  • 7/25/2019 Notes for SAS Programming Fall2009

    86/88

    U#estions

    S#mmary statistics often reflect many economic factors+ notonly the one in o#r mind1 )hat is "hy "e need regressions1

    3oes more reg#lation lead to higher V#ality?

    * is the coefficient of city reg#lation positi$e andsignificantly different from ero? proc reg!

    * is the coefficient of co#nty reg#lation positi$e andsignificantly different from ero? proc reg!

    * 3o "e omit other sensible eplanations for V#alitychanges? What are they? proc glm+ year+ V#arter+ city!

  • 7/25/2019 Notes for SAS Programming Fall2009

    87/88

    6o#nt d#plicates not reV#ired!

    http:s#pport1sas1comctsamplesinde1

  • 7/25/2019 Notes for SAS Programming Fall2009

    88/88

    co#rse e$al#ation

    Zni$ersity "ide:

    * """16o#rseE$alZ(1#md1ed#

    ))class in partic#lar: pass"ord plstt!* """1s#r$eyshare1coms#r$eyta%e?sidC>0@

    http://www.courseevalum.umd.edu/http://www.surveyshare.com/survey/take/?sid=81087http://www.surveyshare.com/survey/take/?sid=81087http://www.courseevalum.umd.edu/