35
Stata: Getting Stata: Getting Starting Starting and Being Productive and Being Productive with VA Data with VA Data Give me six hours to chop down a tree and I will spend Give me six hours to chop down a tree and I will spend the first four sharpening the axe. the first four sharpening the axe. --Abraham Lincoln --Abraham Lincoln Todd Wagner Todd Wagner June 2007 June 2007

Stata: Getting Starting and Being Productive with VA Data

  • Upload
    urian

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

Stata: Getting Starting and Being Productive with VA Data. Give me six hours to chop down a tree and I will spend the first four sharpening the axe. --Abraham Lincoln Todd Wagner June 2007. Outline. Getting data into Stata Editing in Stata How does Stata handle data - PowerPoint PPT Presentation

Citation preview

Page 1: Stata: Getting Starting  and Being Productive with VA Data

Stata: Getting Starting Stata: Getting Starting and Being Productiveand Being Productive

with VA Datawith VA Data

Give me six hours to chop down a tree and I will spend the Give me six hours to chop down a tree and I will spend the first four sharpening the axe. first four sharpening the axe.

--Abraham Lincoln--Abraham Lincoln

Todd WagnerTodd WagnerJune 2007June 2007

Page 2: Stata: Getting Starting  and Being Productive with VA Data

OutlineOutline

Getting data into StataGetting data into Stata Editing in StataEditing in Stata How does Stata handle dataHow does Stata handle data Stata notation and helpStata notation and help Using Stata and Basic Stata commandsUsing Stata and Basic Stata commands

Page 3: Stata: Getting Starting  and Being Productive with VA Data

Transferring DataTransferring Data

Stattransfer or DBMS copy workStattransfer or DBMS copy work Stattransfer often seeks to optimize the Stattransfer often seeks to optimize the

Stata dataset by defaultStata dataset by default– If transferring data with SCRSSN, If transferring data with SCRSSN, FORCEFORCE

Stattransfer to transfer SCRSSN as double Stattransfer to transfer SCRSSN as double precisionprecision

Page 4: Stata: Getting Starting  and Being Productive with VA Data

StattransferStattransfer

CLICK ON DOUBLE

Page 5: Stata: Getting Starting  and Being Productive with VA Data

Editing in StataEditing in Stata

Any ASCII text editor will workAny ASCII text editor will work Stata has a built in text editor, but it is Stata has a built in text editor, but it is

limited.limited. I recommend using another text editorI recommend using another text editor

http://fmwww.bc.edu/repec/bocode/t/textEditors.htmlhttp://fmwww.bc.edu/repec/bocode/t/textEditors.html

Page 6: Stata: Getting Starting  and Being Productive with VA Data

Handling DataHandling Data

SAS processes one record at a timeSAS processes one record at a time Stata processes all the records at the same Stata processes all the records at the same

timetime– Loops are commonly used in SASLoops are commonly used in SAS

– Loops are very rarely used in StataLoops are very rarely used in Stata

Page 7: Stata: Getting Starting  and Being Productive with VA Data

Loading Data into MemoryLoading Data into Memory

Stata reads the data into memoryStata reads the data into memory– set mem 100m set mem 100m (before you load the data)(before you load the data)

You must have enough memory for your You must have enough memory for your datasetdataset

With large datasets:With large datasets:– drop unnecessary variablesdrop unnecessary variables– Use the compress command (but don’t compress Use the compress command (but don’t compress

SCRSSN)SCRSSN)

Page 8: Stata: Getting Starting  and Being Productive with VA Data

Stata AbbreviationsStata Abbreviations Stata commands can be abbreviated with Stata commands can be abbreviated with

the first three lettersthe first three letters– regression income education femaleregression income education female

could be writtencould be written– reg income education femalereg income education female

Can also abbreviate variables if uniquely Can also abbreviate variables if uniquely defineddefined– reg inc educ femreg inc educ fem

Page 9: Stata: Getting Starting  and Being Productive with VA Data

Stata HelpStata Help

Stata’s built in help is greatStata’s built in help is great– Help <command>Help <command>

Stata manuals are great because they Stata manuals are great because they review theoryreview theory

Page 10: Stata: Getting Starting  and Being Productive with VA Data

Stata and the WebStata and the Web

Stata is “web aware”Stata is “web aware” Check for updates periodicallyCheck for updates periodically–update allupdate all

You can search for user-written programsYou can search for user-written programs–findit outputfindit output–findit outregfindit outreg (click to install) (click to install)

Page 11: Stata: Getting Starting  and Being Productive with VA Data

Stata in WindowsStata in Windows

Page up scrolls through the previous Page up scrolls through the previous commandscommands

There is a graphical user interface There is a graphical user interface (menus) if you forget a command(menus) if you forget a command

We have Stata on rocky and tasha– no We have Stata on rocky and tasha– no graphical capabilities, no menus, and loss graphical capabilities, no menus, and loss of some shortcutsof some shortcuts

Page 12: Stata: Getting Starting  and Being Productive with VA Data

Using StataUsing Stata

Create batch files called “.do” filesCreate batch files called “.do” files I work interactivelyI work interactively

– Run Stata and create do file as I goRun Stata and create do file as I go

– I can then use the do file as neededI can then use the do file as needed Debugging code and exploratory data Debugging code and exploratory data

analysis is very fast in Stataanalysis is very fast in Stata

Page 13: Stata: Getting Starting  and Being Productive with VA Data

Sysdir, ls and cdSysdir, ls and cd

Stata recognizes some unix commands, such Stata recognizes some unix commands, such as ls and cdas ls and cd

Sysdir provides a listing of Stata’s working Sysdir provides a listing of Stata’s working directoriesdirectories

sysdirsysdir

STATA: C:\Program Files\Stata9\STATA: C:\Program Files\Stata9\

UPDATES: C:\ProgramFiles\Stata9\ado\updates\UPDATES: C:\ProgramFiles\Stata9\ado\updates\

BASE: C:\Program Files\Stata9\ado\base\BASE: C:\Program Files\Stata9\ado\base\

SITE: C:\Program Files\Stata9\ado\site\SITE: C:\Program Files\Stata9\ado\site\

PLUS: c:\ado\stbplus\PLUS: c:\ado\stbplus\

PERSONAL: c:\ado\personal\PERSONAL: c:\ado\personal\

OLDPLACE: c:\ado\OLDPLACE: c:\ado\

Page 14: Stata: Getting Starting  and Being Productive with VA Data

DelimitersDelimiters

SAS recognizes “;” as a delimiterSAS recognizes “;” as a delimiter Stata recognizes the carriage returnStata recognizes the carriage return

– Always add a carriage return after your last Always add a carriage return after your last commandcommand

You can change delimiters to ; You can change delimiters to ; #delimit ;#delimit ;

Page 15: Stata: Getting Starting  and Being Productive with VA Data

Missing DataMissing Data

Stata and SAS both use “.” as missingStata and SAS both use “.” as missing Stata implicitly values a missing as a very Stata implicitly values a missing as a very

large numberlarge number SAS implicitly values a missing as a very SAS implicitly values a missing as a very

small numbersmall number

Page 16: Stata: Getting Starting  and Being Productive with VA Data

Generating and Recoding VariablesGenerating and Recoding Variables

In SAS you typeIn SAS you typequality=0; quality=0;

If VA=1 then quality=1;If VA=1 then quality=1; In Stata you typeIn Stata you typegen quality=0 gen quality=0

recode quality 0=1 if VA==1 recode quality 0=1 if VA==1 oror

replace quality=1 if VA==1 replace quality=1 if VA==1

Page 17: Stata: Getting Starting  and Being Productive with VA Data

Boolean LogicBoolean Logic

Stata is picky about Boolean logicStata is picky about Boolean logic

gen y=x if a==bgen y=x if a==b (must use two ==) (must use two ==)

gen y=x if a>b & b>10gen y=x if a>b & b>10 (must use &) (must use &)

gen y=x if a<=bgen y=x if a<=b (< or > must be before =) (< or > must be before =)

Page 18: Stata: Getting Starting  and Being Productive with VA Data

Creating Dummy VariablesCreating Dummy Variables

Goal: create dummy variable for each DRGGoal: create dummy variable for each DRG

gen drgnum1=drg==1 gen drgnum1=drg==1 oror

tab drg, gen(drgnum)tab drg, gen(drgnum)

This second command automatically creates This second command automatically creates dummy variablesdummy variables

Page 19: Stata: Getting Starting  and Being Productive with VA Data

DropDrop

Drop <varnames>Drop <varnames> (drops variables) (drops variables)

Drop if X==1Drop if X==1 (drop cases where (drop cases where value is 1)value is 1)

Page 20: Stata: Getting Starting  and Being Productive with VA Data

egen Commandsegen Commands

You want to generate total costs for a medical You want to generate total costs for a medical centercenter

In SAS this is done by proc summaryIn SAS this is done by proc summary In Stata, you can typeIn Stata, you can typecollapse (sum) costs, by (stan3)collapse (sum) costs, by (stan3) oror

sort sta3nsort sta3n

by sta3n: egen sumcost=total(cost)by sta3n: egen sumcost=total(cost)

Page 21: Stata: Getting Starting  and Being Productive with VA Data

ICD-9 CodesICD-9 Codes

Stata has capabilities to handle ICD-9 Stata has capabilities to handle ICD-9 diagnosis and procedure codesdiagnosis and procedure codes

You can You can – check to see if codes are validcheck to see if codes are valid

– generate identifiers based on codes or generate identifiers based on codes or ranges of codesranges of codes

Page 22: Stata: Getting Starting  and Being Productive with VA Data

DatesDates

Same date functions as SASSame date functions as SAS

Page 23: Stata: Getting Starting  and Being Productive with VA Data

Combining DataCombining Data MergeMerge

– this automatically creates a variable called _mergethis automatically creates a variable called _merge– merge==1 obs. from master data merge==1 obs. from master data – merge==2 obs. from only one using dataset merge==2 obs. from only one using dataset – merge==3 obs. from at least two datasets, master or merge==3 obs. from at least two datasets, master or

using using

merge scrssn admitday disday using data_ymerge scrssn admitday disday using data_y

Append (stacking data)Append (stacking data)

Page 24: Stata: Getting Starting  and Being Productive with VA Data

Explicit SubscriptingExplicit Subscripting

Identify the most recent encounter in an Identify the most recent encounter in an encounter databaseencounter database

gsort id -dategsort id -date

by id : gen n=_nby id : gen n=_n

by id : gen N=_Nby id : gen N=_N

gen select=n==1gen select=n==1

Ascending sort by ID and reverse by date

Record counter from 1 to N per person

Total number of records per person

Page 25: Stata: Getting Starting  and Being Productive with VA Data

Using StataUsing Stata

Page 26: Stata: Getting Starting  and Being Productive with VA Data

Stata Interface in WindowsStata Interface in Windows

Page 27: Stata: Getting Starting  and Being Productive with VA Data

Set, Clear and MoreSet, Clear and More

Set: sets system parametersSet: sets system parameters– Need to set memory size to open a databaseNeed to set memory size to open a database

set mem 100mset mem 100m ClearClear erases data from memory erases data from memory When output is >1 page, you are asked to When output is >1 page, you are asked to

continue (continue (set more offset more off))

Page 28: Stata: Getting Starting  and Being Productive with VA Data

Summarizing DataSummarizing Data

. sum gender age educ

Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------- gender | 4085 1.496206 .5000468 1 2 age | 4085 64.5601 9.451724 50 94 educ | 4085 4.398286 1.662883 1 9

Sum < >, dSum < >, d provides more details on each provides more details on each variablevariable

Tabstat provides summary info, including Tabstat provides summary info, including totalstotals

Page 29: Stata: Getting Starting  and Being Productive with VA Data

Tabulating DataTabulating Data. tab gender. tab gender

gender | Freq. Percent Cum.gender | Freq. Percent Cum.------------+-----------------------------------------------+----------------------------------- 1 | 2,058 50.38 50.381 | 2,058 50.38 50.38 2 | 2,027 49.62 100.002 | 2,027 49.62 100.00------------+-----------------------------------------------+----------------------------------- Total | 4,085 100.00Total | 4,085 100.00

. table gender. table gender-------------------------------------------- gender | Freq.gender | Freq.----------+---------------------+----------- 1 | 2,0581 | 2,058 2 | 2,0272 | 2,027--------------------------------------------

Page 30: Stata: Getting Starting  and Being Productive with VA Data

Tabulating DataTabulating Datatab gender agetab gender agetoo many valuestoo many valuesr(134);r(134);

tab age gendertab age gender | gender| gender age | 1 2 | Totalage | 1 2 | Total-----------+----------------------+---------------------+----------------------+---------- 50 | 49 69 | 118 50 | 49 69 | 118 51 | 72 71 | 14351 | 72 71 | 143……

94 | 1 0 | 1 94 | 1 0 | 1 -----------+----------------------+---------------------+----------------------+---------- Total | 2,058 2,027 | 4,085 Total | 2,058 2,027 | 4,085

Page 31: Stata: Getting Starting  and Being Productive with VA Data

TabstatTabstat. tabstat age, by (gender). tabstat age, by (gender)

gender | meangender | mean---------+-------------------+---------- 1 | 64.774541 | 64.77454 2 | 64.342382 | 64.34238---------+-------------------+---------- Total | 64.5601Total | 64.5601----------------------------------------

. table gender, c(mean age). table gender, c(mean age)

---------------------------------------------- gender | mean(age)gender | mean(age)----------+----------------------+------------ 1 | 64.774541 | 64.77454 2 | 64.342382 | 64.34238----------------------------------------------

Page 32: Stata: Getting Starting  and Being Productive with VA Data

GraphingGraphing

Diagnostic graphicsDiagnostic graphics

Presenting Presenting

resultsresults

wtp

Density-.072394.072394

0

75

500

stage: 1

Density-.060237.060237

0

100

500

stage: 2

Density-.05479 .05479

0

100

500

stage: 3

Density-.055777.055777

0

125

500

stage: 4

Density-.062437.062437

0

75

500

stage: 5

Page 33: Stata: Getting Starting  and Being Productive with VA Data

Basic Analytical FunctionsBasic Analytical Functions

OLS (reg)OLS (reg) Logistic, probit, count data (e.g., CLAD)Logistic, probit, count data (e.g., CLAD) MultinomialsMultinomials GLM/HLMGLM/HLM Duration modelsDuration models Semi and non-parametric modelsSemi and non-parametric models

Page 34: Stata: Getting Starting  and Being Productive with VA Data

OutputOutput

Linear regressionLinear regression Number of obsNumber of obs = 1306= 1306F( 21, 1284)F( 21, 1284) = 10.88= 10.88

Prob > FProb > F = 0.0000= 0.0000R-squaredR-squared = 0.1398= 0.1398Root MSERoot MSE = 90.367= 90.367

RobustRobustwtp Coef.wtp Coef. Std. Err.Std. Err. tt P>tP>t [95% Conf.Interval][95% Conf.Interval]

ethn1 ethn1 1.9900481.990048 8.7420368.742036 0.230.23 0.8200.820 -15.16019-15.16019 19.1402919.14029Ethn2Ethn2 -25.74654-25.74654 11.6999311.69993 -2.20-2.20 0.0280.028 -48.69961-48.69961 -2.793467-2.793467ethn3 ethn3 -35.59552-35.59552 11.9830911.98309 -2.97-2.97 0.0030.003 -59.1041-59.1041 -12.08694-12.08694ethn4 ethn4 -3.244168-3.244168 11.1683611.16836 -0.29-0.29 0.7710.771 -25.15441-25.15441 18.6660718.66607english -11.44402english -11.44402 9.6995769.699576 -1.18-1.18 0.2380.238 -30.47277-30.47277 7.5847417.584741lifeus 37.34419lifeus 37.34419 13.8603713.86037 2.692.69 0.0070.007 10.1527410.15274 64.5356464.53564age1999 -.6272524age1999 -.6272524 .3097408.3097408 -2.03-2.03 0.0430.043 -1.234906-1.234906 -.0195987-.0195987income .8068256income .8068256 .1714309.1714309 4.714.71 0.0000.000 .4705102.4705102 1.1431411.143141incmis 14.07434incmis 14.07434 9.4041499.404149 1.501.50 0.1350.135 -4.374848-4.374848 32.5235232.52352_cons _cons 111.3607111.3607 24.1308324.13083 4.614.61 0.0000.000 64.0205164.02051 158.7009158.7009

Page 35: Stata: Getting Starting  and Being Productive with VA Data

OutregOutreg

Outputs data to a delimited fileOutputs data to a delimited file Delimited file can be read into ExcelDelimited file can be read into Excel Very flexibleVery flexible Creates publishable tables Creates publishable tables