View
7
Download
0
Category
Preview:
Citation preview
Tutorial on STATA
Adrian Rohit Dass Ins3tute of Health Policy, Management, and
Evalua3on Canadian Centre for Health Economics
September 18, 2015
Outline
• Why use STATA? • Reading/Cleaning data • Regression Analysis • Post-‐es3ma3on Diagnos3c Checks • Advanced Topics in STATA • STATA Resources
Learning Curves of Various SoPware Packages
Source: hSps://sites.google.com/a/nyu.edu/sta3s3cal-‐soPware-‐guide/summary
Why STATA? • Feature rich soPware for analyzing various types of data:
– Cross-‐sec3onal data: A collec3on of individuals (persons, countries, etc.) in one 3me period. Quite common form of micro-‐data, like surveys.
– Time Series Data: Many points in 3me, but for one individual en3ty. Usually in aggregated form, like rates or percentages.
– Panel Data: Combina3on of cross-‐sec3onal and 3me series data. Ex: survey of the same individuals over many years, or aggregate data on murder rates for each province in Canada over many years.
• Stata is par3cularly useful in analyzing Panel Data. Wide range of features to handle the problems faced with this data.
Reading/Cleaning data
STATA Basics • Largely menu driven, so fairly easy to work with
• Prior programming experience is not required, but can be helpful (especially with .do files)
• Case sensi3ve, so be careful: I.e. o regress y x results will result in a successful OLS es3ma3on (if everything else is right)
o Regress y x results will in an error message
Command Window
Variables Window
Review Window Results Window
Star3ng a Log File • This should be your first step when using Stata (APer double-‐
clicking on the Stata icon, that is) • File à Log à Begin:
• Stata will prompt you to name the file. Pick a crea3ve name (E.g: logfile1), then click ok
• At this point, Stata will record everything you do (impor3ng data, running commands, etc). It will even store your regression output, which is handy to look at later
Impor3ng Data into Stata • File à Import à Choose appropriate op3on:
• .csv (Comma Separated) is a common op3on, but .xls (MicrosoP Excel Format) and other formats are compa3ble too
Impor3ng Data into STATA (MicrosoP Excel (.xls)
Make sure “Import first row as variable names” is checked, then click ok
Star3ng off
To look at your data, type browse
Type describe to obtain some useful informa3on about your dataset:
Note: dataset has changed since last savedSorted by: Age str2 %9s AgeGender byte %10.0g Genderweight int %10.0g weightheight byte %10.0g heightobs byte %10.0g obs variable name type format label variable label storage display value size: 140 vars: 5 obs: 20 Contains data
Black text is for numeric variables
Blue text is labeled numeric variables
Red text is for character variables (called string variables in Stata)
Convert Character variable to Numeric
Make use of Stata’s destring command: destring [varlist] , {generate(newvarlist)|replace} [destring_op3ons] Eg: destring age, replace ignore(NA)
Sor3ng the Observa3ons and Variables
§ Sor3ng changes the order in which the observa3ons appear. We can sort numbers, leSers, etc.
-‐ Example: sort x § Ordering changes the order variables in dataset appear.
-‐ Example: order x y z
Changing Exis3ng variables: rename
§ Command: rename -‐ changes the name of an exis3ng variable
§ Example, rename variable ‘ZGMFX10A’ as ‘height’ rename ZGMFX10A height
Working with Labels label give descrip3ons to variables or data sets
§ To label the dataset in memory: • label data “Na3onal Popula3on Health Survey” § To label a variable: • label var healthstat “Self-‐Reported Health Status”
§ To label different numeric values the variable may take: • label define vlhealthstat 1 “Excellent” 2 “Very Good” 3
“Good” 4 “Fair” 5 “Poor” • label values healthstat vlhealthstat
Obtaining basic summary sta3s3cs • Summarize command: Use to obtain basic summary sta3s3cs of 1 or more
variables (mean, standard devia3on, min, max, etc.) summarize [varlist] [if] [in] [weight] [, op3ons]
• Correlate command: Creates a matrix of correla3on or covariance coefficients for 2 or more variables
correlate [varlist] [if] [in] [weight] [, correlate_op3ons]
height 20 10.35 2.207046 5 15 weight 20 169.4 16.32692 140 205 Variable Obs Mean Std. Dev. Min Max
. summarize weight height
weight 0.8620 1.0000 height 1.0000 height weight
(obs=20). correlate height weight
tabulate
§ command: tabulate -‐ Calculates and displays frequencies for one or two variables § Syntax: -‐ tabulate varname [if] [in] [weight] [, op3ons]
More detailed descrip3ves • Use tabstat command tabstat varlist [if] [in] [weight] [, op3ons]
• The example above calculates the sum of the variable, but you could specify other sta3s3cs as well (min, max, skewness, kurtosis, etc.). If you don’t specify a par3cular sta3s3c at the end, then tabstat will generate the mean
Changing Exis3ng variables: replace
§ Command ‘replace’ changes the contents of an exis3ng variable
§ Syntax: replace oldvar = exp [if exp] [in range] § replace is most useful in the following cases: ⁻ Crea3ng binary and categorical variables ⁻ Fixing the missing values
Ex: Replace responses coded as “no response” (-‐1 in this case) with missing values replace variable = . if variable == -‐1
Crea3ng a new variable: generate § command: generate
§ Syntax: -‐ generate newvar = exp [if exp] [in range]
§ Example: -‐ generate age_sq=age*age
§ Notes: Can type generate or gen for short
Create a Binary Variable
§ To create a binary variable (0 / 1): -‐ Generate a variable equal to 0 for all observa3ons -‐ Replace it to be 1 for selected observa3ons
§ Example, create a binary variable for people with income over $80,000:
gen highinc=0 replace highinc=1 if hh_inc>=80000
Exploring Missing Values
• Missing values are given by “.” in STATA • To count the number of missing values in a variable, use user-‐wriSen command tabmiss – To install, type findit tabmiss in command window – To use, type tabmiss varname
• Important Note: you can use “findit” to install other user wriSen commands, as well as help files for commands in STATA
Saving data
§ If you’ve imported data into STATA from a spreadsheet, text file, etc., you may want to save it as a STATA dataset.
§ From STATA menu, go File à Save (will give you an op3on to replace the data if it already exists)
Graphing/Ploung Data
• Plain Text Plot plot yvar1 [yvar2 [yvar3]] xvar [if exp] [in range] [, columns(#) encode hlines(#) lines(#) vlines(#) ] • Graphics Plot (generates an image file) [graph] twoway plot [if] [in] [, twoway_opHons]
Graph Examples • Two-‐way scaSer plot twoway scaIer xvar yvar • Two-‐way line plot twoway line xvar yvar • Two-‐way scaSer plot with linear predic3on from regression of x on y
twoway (scaIer xvar yvar) (lfit xvar yvar) • Two-‐way scaSer plot with linear predic3on from regression of x on y with 95% CI
twoway (scaIer xvar yvar) (lfitci xvar yvar)
Regression Analysis
Fiung a Linear Model To The Data General nota3on: regress depvar [indepvars] [if] [in] [weight] [, opHons] Where: Y is our dependent variable X is our independent variable(s) Note: You may type “reg” instead of “regress” Determining which variables are what is usually determined by theory Research Ques6on: Is there a rela6onship between weight and height?
Fiung a Linear Model To The Data
Stata Output:
Follows nota3on (reg Y X)
β2
β1
Fiung a Linear Model To The Data (Graphical Representa3on)
Yhati – Es3mated (or predicted) value of Y based on the regression coefficients Yi – Actual Value of Y ei – Residual (Difference between es3mated Y and actual) B1 – Constant term B2 – Slope of line
Post Es3ma3on
Post Es3ma3on
• Obtaining residuals predict residuals, residuals NB: The “residuals” aPer predict is just the name you want to give to the residuals. You can change this if you want to • Obtaining fiSed values predict fiSedvalues, xb
Heteroscedas3city tes3ng § OLS regression assumes homoskedas3city for valid hypothesis tes3ng. We can test for this aPer running a regression
§ Examine residual paSern from the residual plot
rvfplot, yline(0) § Formal test estat heIest
RVF Plot
Formal Test for Heteroskedas3city
Reject the null (no heteroskedas3city) in favour of the alterna3ve (there is heteroskedas3city of some form).
Linearity tes3ng
§ OLS normally assumes a linear rela3onship between the Y and X’s. We can test for this aPer a regression:
§ Command: acprplot var, lowess
ACPRPLOT Stata 20
4060
8010
0Au
gmen
ted
com
pone
nt p
lus
resi
dual
5 10 15height
Tes3ng for mul3collinearity
OLS regression assump3on: independent variables are not too strongly collinear Detec3on: • Correla3on matrix correlate varlist (before regression) • Variance Infla3on Factor vif (aPer regression)
Specifica3on tes3ng
• To see if there is omiSed variables from the model, or if our model is miss-‐specified
• Syntax: estat ovtest
Prob > F = 0.0010 F(3, 44) = 6.45 Ho: model has no omitted variablesRamsey RESET test using powers of the fitted values of crime
. estat ovtest
Tes3ng Normality of Residuals § We assume that the errors are normally distributed for hypothesis tes3ng. We can use the residuals to test this assump3on.
§ Command predict r, residuals kdensity r, normal
Kernal Density Plot of Residuals 0
.02
.04
.06
.08
Den
sity
-20 -10 0 10 20Residuals
Kernel density estimateNormal density
kernel = epanechnikov, bandwidth = 2.7935
Kernel density estimate
Parameter Hypothesis Tes3ng
• Test whether a parameter equal zero -‐ testparm abc -‐ test (abc) • Test both parameters equal zero -‐ test (var1 var2) • Test if coefficients on two variables are equal -‐ test (var1 = var2)
Storing Es3ma3on Results • STATA can store the results of your regression via the es3mates command:
es3mates store name • This can be very useful in analyzing regression results aPer running mul3ple models
• To list mul3ple results side-‐by-‐side, type es3mates table name1 name2…name5, etc.
• To export results from STATA to excel, word, or LaTeX, use user-‐wriSen command esSab:
hSp://repec.org/bocode/e/estout/esSab.html
Advanced Topics in
STATA
Regression commands for other types of outcome variables
• Binary outcomes: probit or logit (help probit; help probit postes3ma3on) • Ordered discrete outcomes: oprobit (help oprobit; help oprobit postes3ma3on) • Categorical outcomes: mlogit (help mlogit; help mlogit postes3ma3on)
Panel Data Econometrics
• Pooled Linear Regress regress depvar [indepvars] [if] [in] [weight] [, opHons] • Random Effects xtreg depvar [indepvars] [if] [in] [, re RE_opHons] • Fixed Effects xtreg depvar [indepvars] [if] [in] [weight] , fe [FE_opHons]
Working With Do-‐Files
MoHvaHon Why bother? 1) We can ovoid tediously running the same set
of commands over and over again… 2) Creates a document lis3ng all the commands
we’ve run in plain text form 3) Increases our produc3vity with STATA!
How to get to do file editor:
• File à New à Do-‐file • Or “Do-‐file Editor” buSon at top (depending on which version of STATA you have)
Inputs commands here
Press to execute
STATA Resources
STATA Online Resources
• STATA manuals are freely downloadable from the above site
hSp://www.stata-‐press.com/manuals/documenta3on-‐set/ • Typing help [topic] in the command window is also useful, but the online manuals generally contain more detail/examples
STATA Online Resources
UCLA Ins3tute for Digital Research and Educa3on • List of topics and STATA resources can be found here:
hSp://www.ats.ucla.edu/stat/stata/webbooks/reg/default.htm
Other STATA Resources
• Jones, A.M., Rice, N., d’Uva, T.B., Balia, S. 2013. Applied Health Economics -‐ Second Edi3on, Routledge Advanced Texts in Economics and Finance. Taylor & Francis
• Cameron, A.C., Trivedi, P.K. 2010. Microeconometrics Using Stata – Revised Edi3on, Stata Press books.
• Allison, P.D. 2009. Fixed Effects Regression Models, Quan3ta3ve Applica3ons in the Social Sciences. SAGE Publica3ons.
Useful sites to find and download Canadian data
• Ontario Data Documenta3on, Extrac3on Service and Infrastructure (ODESI) website:
hSp://search2.odesi.ca/
• Compu3ng in the Humani3es and Social Sciences (CHASS) at U of T
hSp://www.chass.utoronto.ca
Thanks for Listening
Good luck with STATA!
Recommended