Upload
subhasisbera
View
229
Download
0
Embed Size (px)
Citation preview
8/3/2019 Basic Manual - Stata 10.0
1/34
Introduction to STATA 10.0 Training Manual Page 1
Introduction to STATA 10.0
Basic Training Manual
STAGES IN THE STATA10.0
1. STATA WINDOWS
8/3/2019 Basic Manual - Stata 10.0
2/34
Introduction to STATA 10.0 Training Manual Page 2
2. LOADING YOUR DATA
3. CREATING A DO FILE AND LOG FILE
4. VIEWING YOUR DATA
5. SOME BASIC COMMANDS
IINTRODUCTION
BASIC SYSTEM OF STATA 10.0
STATA WINDOWS
A basic overview of the Stata interface is necessary to effectively use this manual. Open
Stata, which you have already installed on you computer. Figure 1 shows the Stata
interface that should appear.
8/3/2019 Basic Manual - Stata 10.0
3/34
Introduction to STATA 10.0 Training Manual Page 3
Within the main Stata window are four smaller windows:
1. Results
2. Command
3. Review
4. Variables
The Results window is where all of the procedural output will be generated and will be
referred to hereafter as the Results window. The Command window is where the user
inputs the commands that Stata will then execute and will be referred to hereafter as the
Command window. The Review window is where all previous commands are stored
and will be referred to as the Review window. This is particularly useful to reprocess or to
slightly modify a set of previous commands by simply clicking the desired command in the
list without having to retype or even copy and paste each time. Finally, the Variables
window is where the variables stored in Stata are listed for quick reference. When you click
on a variable name, Stata will add that variable name to the text in the Command window
so that you may avoid typing the variable name.
8/3/2019 Basic Manual - Stata 10.0
4/34
Introduction to STATA 10.0 Training Manual Page 4
LOADING YOUR DATA
INPUT FILE
SELECT INPUT FILE AND TYPE OF FILE
8/3/2019 Basic Manual - Stata 10.0
5/34
Introduction to STATA 10.0 Training Manual Page 5
PROVIDE FILE NAME AND SAVE IT INTO STATA FORMAT (WHICH EVER FORMAT YOU FEEL LIKE)
8/3/2019 Basic Manual - Stata 10.0
6/34
Introduction to STATA 10.0 Training Manual Page 6
CREATING DO FILE
8/3/2019 Basic Manual - Stata 10.0
7/34
Introduction to STATA 10.0 Training Manual Page 7
8/3/2019 Basic Manual - Stata 10.0
8/34
Introduction to STATA 10.0 Training Manual Page 8
OPEN LOG FILE
8/3/2019 Basic Manual - Stata 10.0
9/34
Introduction to STATA 10.0 Training Manual Page 9
CLOSE LOG FILE
VIEWING YOUR DATA EDITOR
8/3/2019 Basic Manual - Stata 10.0
10/34
Introduction to STATA 10.0 Training Manual Page 10
8/3/2019 Basic Manual - Stata 10.0
11/34
Introduction to STATA 10.0 Training Manual Page 11
8/3/2019 Basic Manual - Stata 10.0
12/34
Introduction to STATA 10.0 Training Manual Page 12
VIEW DATA BROWSER
8/3/2019 Basic Manual - Stata 10.0
13/34
Introduction to STATA 10.0 Training Manual Page 13
SOME BASIC COMMANDS
LIST
8/3/2019 Basic Manual - Stata 10.0
14/34
Introduction to STATA 10.0 Training Manual Page 14
ED
BROWSE
SUM
TAB
REGRESSION
SET TIME
SET PANAL DATA
8/3/2019 Basic Manual - Stata 10.0
15/34
Introduction to STATA 10.0 Training Manual Page 15
GRAPHS
8/3/2019 Basic Manual - Stata 10.0
16/34
Introduction to STATA 10.0 Training Manual Page 16
DATA LABELING AND DATA MANIPULATION
8/3/2019 Basic Manual - Stata 10.0
17/34
Introduction to STATA 10.0 Training Manual Page 17
REGRESSION
8/3/2019 Basic Manual - Stata 10.0
18/34
Introduction to STATA 10.0 Training Manual Page 18
GLS: Generalized Least-Squares and Violations of Assumptions
The properties of OLS, or Ordinary Least-Squares, regressions are sensitive to the underlying
assumptions: normality, homoskedasticity, and independence. But, those assumptions are frequently
violated in the real world. Thus, in order to determine the validity of an OLS regression, we may
want to test whether our residuals are normally distributed, homoskedastic, and not autocorrelated.
This section will help you learn how to test whether our model violates the OLS assumptions and
how to adjust the model for these violations. The model where we generalize our assumptions
regarding the variance-covariance matrix and residual distribution is called Generalized Least-
Squares (GLS) and can overcome these violations of our OLS assumptions.
Non-Normality
In order to test for normality, Stata has several commands. We will use these two:
8/3/2019 Basic Manual - Stata 10.0
19/34
Introduction to STATA 10.0 Training Manual Page 19
pnorm varname Draws a graph comparing a diagonal line corresponding to normality with the
actual observations.
sktest var1...varkPerforms a skewness and kurtosis test for normality for each variable listed.
Testing for Normality Tutorial:
Step 1:Perform a multiple regression, by typing:
regress utilities age age2 ageunitsqft unitsqft
Step 2:Store the residuals from our regression in a variable named resids by typing:
predict resids, residuals
Step 3:Use the pnorm command on the residuals to assess normality. Type:
pnorm resids
Step 4:Perform a numerical test for normality on the residuals by typing:
sktest resids
Step 5:Access the help files on pnorm and sktest by typing:
help pnorm
help sktest
8/3/2019 Basic Manual - Stata 10.0
20/34
Introduction to STATA 10.0 Training Manual Page 20
Our residuals are not normally distributed using the DataFerrett dataset. This can be corrected by
fitting the residuals to a different distribution. The glm command for Generalized Linear Models
allows for this flexibility. See section 5.4 for more information on the glm command and specifying
other residual distributions for linear estimation.
Heteroskedasticity
In order to test for heteroskedasticity, Stata makes use of the estat command. If we reject the null
hypothesis of homoskedasticity (constant variance), we can correct for heteroskedasticity within our
estimation using the vwls command which performs a variance-weighted least-squares estimation of
our model.
Testing for and Correcting Heteroskedasticity Tutorial:
Step 1:Perform a multiple regression, by typing:
reg utilities age age2 ageunitsqft unitsqft
8/3/2019 Basic Manual - Stata 10.0
21/34
Introduction to STATA 10.0 Training Manual Page 21
Step 2:Perform the Breusch-Pagan test for heteroskedasticity by typing:
estat hettest
Step 3:If we rejected the null hypothesis of homoskedasticity, perform the variance-weighted least-
squares estimation of our model by typing:
vwls utilities age age2 ageunitsqft unitsqft
NOTE: The vwls command also has the sd(varname) option that you may specify where varname is
a variable specifying an estimate of the conditional standard deviation. If you do not specify the sd()
option, Stata will use the standard deviations and the means of each variable instead. For more
information on this option and other options for the vwls command, type:
help vwls
Upon running this test, it is clear that our assumption of constant variance, or homoskedasticity, has
been violated in our DataFerrett dataset. So, we have adjusted our model for heteroskedasticity
using variance-weighted least-squares estimation.
Autocorrelation
8/3/2019 Basic Manual - Stata 10.0
22/34
Introduction to STATA 10.0 Training Manual Page 22
In order to test our data for autocorrelation, we must first have a time variable. Once we have a
variable that indexes our time-series data in terms of some unit of time, we can use the tsset
command to declare it as our time index by typing:
tsset timevariable
After performing the regression using the regress command, we can obtain a Durbin-Watson d-
statistic for our estimated model by typing:
estat dwatson
Another option within Stata is the Durbin-h test that can be obtained by typing:
estat durbinalt
8/3/2019 Basic Manual - Stata 10.0
23/34
Introduction to STATA 10.0 Training Manual Page 23
If our data shows evidence of autocorrelation, we can easily correct for it by re-estimating our model
using the Prais-Winston command, prais, and using the Cochrane-Orcutt transformation. This can
be done by typing:
prais depvar[indepvar1 . . . indepvark], corc
Where depvar is the dependent variable in our model and indepvar1 . . . indepvark are the
independent variables in our model.
8/3/2019 Basic Manual - Stata 10.0
24/34
Introduction to STATA 10.0 Training Manual Page 24
Hypothesis Testing
After performing regressions, we may want to test linear hypotheses. Using a Wald statistic, we can
test various hypotheses with our slope parameter estimates using the test command. This command
has several forms. This tutorial will help you learn how to use some of these forms.
test Command Tutorial:
Step 1:After loading the DataFerrett dataset, perform the following regression:
reg utilities age age2 ageunitsqft unitsqft lotsize
Step 2:With the regression output in the Results window, we would like to test several hypotheses.
To test whether a square foot of house space contributes as much as a square foot of lot space to the
utility costs, type:
test unitsqft = lotsize
Step 3:To test whether our two statistically insignificant variables age and lotsize are jointly
significant in our model, type:
test age lotsize
8/3/2019 Basic Manual - Stata 10.0
25/34
Introduction to STATA 10.0 Training Manual Page 25
Step 4:To test whether the parameter value of the slope coefficient on unitsqft is equal to one,
type:
test unitsqft = 1
Step 5:To test whether the coefficient for unitsqft is four times the coefficient on lotsize due to
the fact that lots usually only need watering and not the other three utilities, type:
test 4*unitsqft = lotsize
Step 6:Finally, to learn more about the options associated with hypothesis testing with the test
command, type:
help test
Many more tests are available in Stata, depending on the project and data that you are working on.
As you become more familiar with the Stata syntax, you will be able to take better advantage of all
of the other options and commands that Stata has to offer.
8/3/2019 Basic Manual - Stata 10.0
26/34
Introduction to STATA 10.0 Training Manual Page 26
Other Regressions - Logit, Probit, & LAD
Stata has many commands for regressions and other estimation procedures. Logit, probit, and least
absolute deviation (LAD) are just a few of the other models that Stata can estimate. For a complete
listing of estimation commands, type:
help estimation commands
In this chapter we will only attempt to explain the logit, probit, and LAD models in order to help you
familiarize yourself with the syntax of regression models. You will then be prepared to use the more
advance procedures, as you learn more econometric theory.
Logit
The logit or logistic regression can easily be performed in Stata by using the logit command:
8/3/2019 Basic Manual - Stata 10.0
27/34
Introduction to STATA 10.0 Training Manual Page 27
logit depvar[indepvar1...indepvark] [, options]
In the above command, depvar is the dependent binary (success or failure) variable,
indepvar1...indepvark are the independent variables of the model, and options are the additional
options specified by the user. Many of the options available with the regress command are also
available for the logit command such as robust and noconstant for using the robust variance-variance
matrix and eliminating the constant term, respectively.
You can access the regression output by using the predict and estat commands as outlined in 3.1 with
the regress command.
Probit
The probit regression can be performed in Stata by using the probit command:
probit depvar[indepvar1...indepvark] [, options]
8/3/2019 Basic Manual - Stata 10.0
28/34
Introduction to STATA 10.0 Training Manual Page 28
In the above command, depvar is the dependent binary (success or failure) variable,
indepvar1...indepvark are the independent variables of the model, and options are the additional
options specified by the user. Many of the options available with the regress command are also
available for the probit command such as robust and noconstant for using the robust variance-
variance matrix and eliminating the constant term, respectively.
You can access the regression output by using the predict and estat commands as outlined in 3.1 with
the regress command.
Graphics - Tables, Charts, etc.
One of the advantages of Stata over some other statistical packages is its graphics output capabilities.
This chapter explains several commands used to create graphics in Stata.
8/3/2019 Basic Manual - Stata 10.0
29/34
Introduction to STATA 10.0 Training Manual Page 29
7.1The table, list, and summarize Commands: Tables
Simple tables are the basic format that Stata generates all output to the screen. For simple tables that
you can copy-and-paste from the Results window, the following three commands are very useful:
table varname
list var1...vark
summarize var1...vark
The plot Command: Simple Text Plots
8/3/2019 Basic Manual - Stata 10.0
30/34
Introduction to STATA 10.0 Training Manual Page 30
The plot command in Stata is useful to print simple scatter plots in order to see a general relationship
between variables. The basic form of the plot command with yvar on the y-axis and xvar on the
x-axis is as follows:
plotyvar xvar
The histogram Command: Histograms
The histogram command is useful for creating histograms to see various characteristics of our
datasets. The basic form for the histogram command is as follows:
histogram varname[, bin(n)]
8/3/2019 Basic Manual - Stata 10.0
31/34
Introduction to STATA 10.0 Training Manual Page 31
histogram Command Tutorial:
Step 1:selecting the file where you saved the data.
Step 2:Generate a histogram of the variable residuals by typing:
histogram residuals
Step 3:Draw a normal curve on top of the histogram of residuals using the normal option. Type:
histogram residuals, normal
Step 4:Use the shortened version of histogram, hist, to generate a histogram of utilities by typing:
hist utilities
Step 5:Generate two histograms of the variable age, one with 10 bins and the other with 50 bins,
by typing:
8/3/2019 Basic Manual - Stata 10.0
32/34
Introduction to STATA 10.0 Training Manual Page 32
hist age, bin(10)
hist age, bin(50)
Step 6: See the help file for histogram by typing:
help histogram
The histogram command is very useful for creating stylish and functional histograms to visualize the
distribution of any dataset.
8/3/2019 Basic Manual - Stata 10.0
33/34
Introduction to STATA 10.0 Training Manual Page 33
The graph Command: Charts, Graphs, and Plots
In Stata the graph command has many options. For documentation on the available options, consult
the Stata help file for graph by typing:
help graph
We will cover a few basic examples in this tutorial.
graph Command Tutorial:
Step 1:Open the DataFerrett Dataset by clicking File on the toolbar and then Open... in the menu
and selecting the file where you saved the data.
Step 2:Create a scatter plot of utilities on predictedvalues. Type:
graph twoway scatter utilities predictedvalues
Step 3:Create a scatter plot of utilities on unitsqft. Typing graph is optional. Type:
twoway scatter utilities unitsqft
8/3/2019 Basic Manual - Stata 10.0
34/34
Introduction to STATA 10.0 Training Manual Page 34
Step 4:Create a scatter plot of utilities on lotsize. Typing twoway is also optional. Type:
scatter utilities lotsize
As mentioned before, the graph command has many options. Among these are a variety of twoway
(comparison between two variables), bar, box, dot, and pie plots.
REFERENCE
Denton, Lynn, and Jody Kelly.Designing, Writing, & Producing Computer Documentation. Ed. Jay
Ranade. McGraw-Hill, Inc. 1992.
Heaton, Christopher. STATA: An Overview for Economics 388 Students. Economics 588 Final
Project. Brigham Young University, n.d.
Holtz, Herman. The Complete Guide to Writing Readable User Manuals. Homewood, IL: Dow
Jones-Irwin. 1988.
StataCorp LP. Stata 9.0 Help File. Stata/SE 9.0 for Windows. College Station, TX: StataCorp LP,
2005.