Basic Manual - Stata 10.0

Embed Size (px)

Citation preview

  • 8/3/2019 Basic Manual - Stata 10.0

    1/34

    Introduction to STATA 10.0 Training Manual Page 1

    Introduction to STATA 10.0

    Basic Training Manual

    STAGES IN THE STATA10.0

    1. STATA WINDOWS

  • 8/3/2019 Basic Manual - Stata 10.0

    2/34

    Introduction to STATA 10.0 Training Manual Page 2

    2. LOADING YOUR DATA

    3. CREATING A DO FILE AND LOG FILE

    4. VIEWING YOUR DATA

    5. SOME BASIC COMMANDS

    IINTRODUCTION

    BASIC SYSTEM OF STATA 10.0

    STATA WINDOWS

    A basic overview of the Stata interface is necessary to effectively use this manual. Open

    Stata, which you have already installed on you computer. Figure 1 shows the Stata

    interface that should appear.

  • 8/3/2019 Basic Manual - Stata 10.0

    3/34

    Introduction to STATA 10.0 Training Manual Page 3

    Within the main Stata window are four smaller windows:

    1. Results

    2. Command

    3. Review

    4. Variables

    The Results window is where all of the procedural output will be generated and will be

    referred to hereafter as the Results window. The Command window is where the user

    inputs the commands that Stata will then execute and will be referred to hereafter as the

    Command window. The Review window is where all previous commands are stored

    and will be referred to as the Review window. This is particularly useful to reprocess or to

    slightly modify a set of previous commands by simply clicking the desired command in the

    list without having to retype or even copy and paste each time. Finally, the Variables

    window is where the variables stored in Stata are listed for quick reference. When you click

    on a variable name, Stata will add that variable name to the text in the Command window

    so that you may avoid typing the variable name.

  • 8/3/2019 Basic Manual - Stata 10.0

    4/34

    Introduction to STATA 10.0 Training Manual Page 4

    LOADING YOUR DATA

    INPUT FILE

    SELECT INPUT FILE AND TYPE OF FILE

  • 8/3/2019 Basic Manual - Stata 10.0

    5/34

    Introduction to STATA 10.0 Training Manual Page 5

    PROVIDE FILE NAME AND SAVE IT INTO STATA FORMAT (WHICH EVER FORMAT YOU FEEL LIKE)

  • 8/3/2019 Basic Manual - Stata 10.0

    6/34

    Introduction to STATA 10.0 Training Manual Page 6

    CREATING DO FILE

  • 8/3/2019 Basic Manual - Stata 10.0

    7/34

    Introduction to STATA 10.0 Training Manual Page 7

  • 8/3/2019 Basic Manual - Stata 10.0

    8/34

    Introduction to STATA 10.0 Training Manual Page 8

    OPEN LOG FILE

  • 8/3/2019 Basic Manual - Stata 10.0

    9/34

    Introduction to STATA 10.0 Training Manual Page 9

    CLOSE LOG FILE

    VIEWING YOUR DATA EDITOR

  • 8/3/2019 Basic Manual - Stata 10.0

    10/34

    Introduction to STATA 10.0 Training Manual Page 10

  • 8/3/2019 Basic Manual - Stata 10.0

    11/34

    Introduction to STATA 10.0 Training Manual Page 11

  • 8/3/2019 Basic Manual - Stata 10.0

    12/34

    Introduction to STATA 10.0 Training Manual Page 12

    VIEW DATA BROWSER

  • 8/3/2019 Basic Manual - Stata 10.0

    13/34

    Introduction to STATA 10.0 Training Manual Page 13

    SOME BASIC COMMANDS

    LIST

  • 8/3/2019 Basic Manual - Stata 10.0

    14/34

    Introduction to STATA 10.0 Training Manual Page 14

    ED

    BROWSE

    SUM

    TAB

    REGRESSION

    SET TIME

    SET PANAL DATA

  • 8/3/2019 Basic Manual - Stata 10.0

    15/34

    Introduction to STATA 10.0 Training Manual Page 15

    GRAPHS

  • 8/3/2019 Basic Manual - Stata 10.0

    16/34

    Introduction to STATA 10.0 Training Manual Page 16

    DATA LABELING AND DATA MANIPULATION

  • 8/3/2019 Basic Manual - Stata 10.0

    17/34

    Introduction to STATA 10.0 Training Manual Page 17

    REGRESSION

  • 8/3/2019 Basic Manual - Stata 10.0

    18/34

    Introduction to STATA 10.0 Training Manual Page 18

    GLS: Generalized Least-Squares and Violations of Assumptions

    The properties of OLS, or Ordinary Least-Squares, regressions are sensitive to the underlying

    assumptions: normality, homoskedasticity, and independence. But, those assumptions are frequently

    violated in the real world. Thus, in order to determine the validity of an OLS regression, we may

    want to test whether our residuals are normally distributed, homoskedastic, and not autocorrelated.

    This section will help you learn how to test whether our model violates the OLS assumptions and

    how to adjust the model for these violations. The model where we generalize our assumptions

    regarding the variance-covariance matrix and residual distribution is called Generalized Least-

    Squares (GLS) and can overcome these violations of our OLS assumptions.

    Non-Normality

    In order to test for normality, Stata has several commands. We will use these two:

  • 8/3/2019 Basic Manual - Stata 10.0

    19/34

    Introduction to STATA 10.0 Training Manual Page 19

    pnorm varname Draws a graph comparing a diagonal line corresponding to normality with the

    actual observations.

    sktest var1...varkPerforms a skewness and kurtosis test for normality for each variable listed.

    Testing for Normality Tutorial:

    Step 1:Perform a multiple regression, by typing:

    regress utilities age age2 ageunitsqft unitsqft

    Step 2:Store the residuals from our regression in a variable named resids by typing:

    predict resids, residuals

    Step 3:Use the pnorm command on the residuals to assess normality. Type:

    pnorm resids

    Step 4:Perform a numerical test for normality on the residuals by typing:

    sktest resids

    Step 5:Access the help files on pnorm and sktest by typing:

    help pnorm

    help sktest

  • 8/3/2019 Basic Manual - Stata 10.0

    20/34

    Introduction to STATA 10.0 Training Manual Page 20

    Our residuals are not normally distributed using the DataFerrett dataset. This can be corrected by

    fitting the residuals to a different distribution. The glm command for Generalized Linear Models

    allows for this flexibility. See section 5.4 for more information on the glm command and specifying

    other residual distributions for linear estimation.

    Heteroskedasticity

    In order to test for heteroskedasticity, Stata makes use of the estat command. If we reject the null

    hypothesis of homoskedasticity (constant variance), we can correct for heteroskedasticity within our

    estimation using the vwls command which performs a variance-weighted least-squares estimation of

    our model.

    Testing for and Correcting Heteroskedasticity Tutorial:

    Step 1:Perform a multiple regression, by typing:

    reg utilities age age2 ageunitsqft unitsqft

  • 8/3/2019 Basic Manual - Stata 10.0

    21/34

    Introduction to STATA 10.0 Training Manual Page 21

    Step 2:Perform the Breusch-Pagan test for heteroskedasticity by typing:

    estat hettest

    Step 3:If we rejected the null hypothesis of homoskedasticity, perform the variance-weighted least-

    squares estimation of our model by typing:

    vwls utilities age age2 ageunitsqft unitsqft

    NOTE: The vwls command also has the sd(varname) option that you may specify where varname is

    a variable specifying an estimate of the conditional standard deviation. If you do not specify the sd()

    option, Stata will use the standard deviations and the means of each variable instead. For more

    information on this option and other options for the vwls command, type:

    help vwls

    Upon running this test, it is clear that our assumption of constant variance, or homoskedasticity, has

    been violated in our DataFerrett dataset. So, we have adjusted our model for heteroskedasticity

    using variance-weighted least-squares estimation.

    Autocorrelation

  • 8/3/2019 Basic Manual - Stata 10.0

    22/34

    Introduction to STATA 10.0 Training Manual Page 22

    In order to test our data for autocorrelation, we must first have a time variable. Once we have a

    variable that indexes our time-series data in terms of some unit of time, we can use the tsset

    command to declare it as our time index by typing:

    tsset timevariable

    After performing the regression using the regress command, we can obtain a Durbin-Watson d-

    statistic for our estimated model by typing:

    estat dwatson

    Another option within Stata is the Durbin-h test that can be obtained by typing:

    estat durbinalt

  • 8/3/2019 Basic Manual - Stata 10.0

    23/34

    Introduction to STATA 10.0 Training Manual Page 23

    If our data shows evidence of autocorrelation, we can easily correct for it by re-estimating our model

    using the Prais-Winston command, prais, and using the Cochrane-Orcutt transformation. This can

    be done by typing:

    prais depvar[indepvar1 . . . indepvark], corc

    Where depvar is the dependent variable in our model and indepvar1 . . . indepvark are the

    independent variables in our model.

  • 8/3/2019 Basic Manual - Stata 10.0

    24/34

    Introduction to STATA 10.0 Training Manual Page 24

    Hypothesis Testing

    After performing regressions, we may want to test linear hypotheses. Using a Wald statistic, we can

    test various hypotheses with our slope parameter estimates using the test command. This command

    has several forms. This tutorial will help you learn how to use some of these forms.

    test Command Tutorial:

    Step 1:After loading the DataFerrett dataset, perform the following regression:

    reg utilities age age2 ageunitsqft unitsqft lotsize

    Step 2:With the regression output in the Results window, we would like to test several hypotheses.

    To test whether a square foot of house space contributes as much as a square foot of lot space to the

    utility costs, type:

    test unitsqft = lotsize

    Step 3:To test whether our two statistically insignificant variables age and lotsize are jointly

    significant in our model, type:

    test age lotsize

  • 8/3/2019 Basic Manual - Stata 10.0

    25/34

    Introduction to STATA 10.0 Training Manual Page 25

    Step 4:To test whether the parameter value of the slope coefficient on unitsqft is equal to one,

    type:

    test unitsqft = 1

    Step 5:To test whether the coefficient for unitsqft is four times the coefficient on lotsize due to

    the fact that lots usually only need watering and not the other three utilities, type:

    test 4*unitsqft = lotsize

    Step 6:Finally, to learn more about the options associated with hypothesis testing with the test

    command, type:

    help test

    Many more tests are available in Stata, depending on the project and data that you are working on.

    As you become more familiar with the Stata syntax, you will be able to take better advantage of all

    of the other options and commands that Stata has to offer.

  • 8/3/2019 Basic Manual - Stata 10.0

    26/34

    Introduction to STATA 10.0 Training Manual Page 26

    Other Regressions - Logit, Probit, & LAD

    Stata has many commands for regressions and other estimation procedures. Logit, probit, and least

    absolute deviation (LAD) are just a few of the other models that Stata can estimate. For a complete

    listing of estimation commands, type:

    help estimation commands

    In this chapter we will only attempt to explain the logit, probit, and LAD models in order to help you

    familiarize yourself with the syntax of regression models. You will then be prepared to use the more

    advance procedures, as you learn more econometric theory.

    Logit

    The logit or logistic regression can easily be performed in Stata by using the logit command:

  • 8/3/2019 Basic Manual - Stata 10.0

    27/34

    Introduction to STATA 10.0 Training Manual Page 27

    logit depvar[indepvar1...indepvark] [, options]

    In the above command, depvar is the dependent binary (success or failure) variable,

    indepvar1...indepvark are the independent variables of the model, and options are the additional

    options specified by the user. Many of the options available with the regress command are also

    available for the logit command such as robust and noconstant for using the robust variance-variance

    matrix and eliminating the constant term, respectively.

    You can access the regression output by using the predict and estat commands as outlined in 3.1 with

    the regress command.

    Probit

    The probit regression can be performed in Stata by using the probit command:

    probit depvar[indepvar1...indepvark] [, options]

  • 8/3/2019 Basic Manual - Stata 10.0

    28/34

    Introduction to STATA 10.0 Training Manual Page 28

    In the above command, depvar is the dependent binary (success or failure) variable,

    indepvar1...indepvark are the independent variables of the model, and options are the additional

    options specified by the user. Many of the options available with the regress command are also

    available for the probit command such as robust and noconstant for using the robust variance-

    variance matrix and eliminating the constant term, respectively.

    You can access the regression output by using the predict and estat commands as outlined in 3.1 with

    the regress command.

    Graphics - Tables, Charts, etc.

    One of the advantages of Stata over some other statistical packages is its graphics output capabilities.

    This chapter explains several commands used to create graphics in Stata.

  • 8/3/2019 Basic Manual - Stata 10.0

    29/34

    Introduction to STATA 10.0 Training Manual Page 29

    7.1The table, list, and summarize Commands: Tables

    Simple tables are the basic format that Stata generates all output to the screen. For simple tables that

    you can copy-and-paste from the Results window, the following three commands are very useful:

    table varname

    list var1...vark

    summarize var1...vark

    The plot Command: Simple Text Plots

  • 8/3/2019 Basic Manual - Stata 10.0

    30/34

    Introduction to STATA 10.0 Training Manual Page 30

    The plot command in Stata is useful to print simple scatter plots in order to see a general relationship

    between variables. The basic form of the plot command with yvar on the y-axis and xvar on the

    x-axis is as follows:

    plotyvar xvar

    The histogram Command: Histograms

    The histogram command is useful for creating histograms to see various characteristics of our

    datasets. The basic form for the histogram command is as follows:

    histogram varname[, bin(n)]

  • 8/3/2019 Basic Manual - Stata 10.0

    31/34

    Introduction to STATA 10.0 Training Manual Page 31

    histogram Command Tutorial:

    Step 1:selecting the file where you saved the data.

    Step 2:Generate a histogram of the variable residuals by typing:

    histogram residuals

    Step 3:Draw a normal curve on top of the histogram of residuals using the normal option. Type:

    histogram residuals, normal

    Step 4:Use the shortened version of histogram, hist, to generate a histogram of utilities by typing:

    hist utilities

    Step 5:Generate two histograms of the variable age, one with 10 bins and the other with 50 bins,

    by typing:

  • 8/3/2019 Basic Manual - Stata 10.0

    32/34

    Introduction to STATA 10.0 Training Manual Page 32

    hist age, bin(10)

    hist age, bin(50)

    Step 6: See the help file for histogram by typing:

    help histogram

    The histogram command is very useful for creating stylish and functional histograms to visualize the

    distribution of any dataset.

  • 8/3/2019 Basic Manual - Stata 10.0

    33/34

    Introduction to STATA 10.0 Training Manual Page 33

    The graph Command: Charts, Graphs, and Plots

    In Stata the graph command has many options. For documentation on the available options, consult

    the Stata help file for graph by typing:

    help graph

    We will cover a few basic examples in this tutorial.

    graph Command Tutorial:

    Step 1:Open the DataFerrett Dataset by clicking File on the toolbar and then Open... in the menu

    and selecting the file where you saved the data.

    Step 2:Create a scatter plot of utilities on predictedvalues. Type:

    graph twoway scatter utilities predictedvalues

    Step 3:Create a scatter plot of utilities on unitsqft. Typing graph is optional. Type:

    twoway scatter utilities unitsqft

  • 8/3/2019 Basic Manual - Stata 10.0

    34/34

    Introduction to STATA 10.0 Training Manual Page 34

    Step 4:Create a scatter plot of utilities on lotsize. Typing twoway is also optional. Type:

    scatter utilities lotsize

    As mentioned before, the graph command has many options. Among these are a variety of twoway

    (comparison between two variables), bar, box, dot, and pie plots.

    REFERENCE

    Denton, Lynn, and Jody Kelly.Designing, Writing, & Producing Computer Documentation. Ed. Jay

    Ranade. McGraw-Hill, Inc. 1992.

    Heaton, Christopher. STATA: An Overview for Economics 388 Students. Economics 588 Final

    Project. Brigham Young University, n.d.

    Holtz, Herman. The Complete Guide to Writing Readable User Manuals. Homewood, IL: Dow

    Jones-Irwin. 1988.

    StataCorp LP. Stata 9.0 Help File. Stata/SE 9.0 for Windows. College Station, TX: StataCorp LP,

    2005.