29
Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Embed Size (px)

Citation preview

Page 1: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Getting Started with Stata

2/11/2010

Tom Tomberlin

Nealia Khan

Learning Technologies Center

Harvard Graduate School of Education

Page 2: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Agenda

I. Overview of Stata

II. Getting Started

III. ‘Do’ files

IV. Basic data cleaning

V. Basic data management

VI. Beginning analysis

VII. Special topics (time permitting)

Page 3: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Agenda

I. Overview of Stata

II. Getting Started

III. ‘Do’ files

IV. Basic data cleaning

V. Basic data management

VI. Beginning analysis

VII. Special topics (time permitting)

Page 4: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Overview

Why use Stata?

Availability Can self-program, or use menus Cutting –edge statistical methods (including user-defined

functions) Publication-quality graphics

Page 5: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Stats and Graphics

Page 6: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Getting Started

• A word about programming in and using Stata

• Stata is case sensitive, so Myvar is different from myvar

• All commands in Stata are lower-case

• “and’ = &, “or” = |, “not”= !

• Assignment is “=“ , value equivalency is “==“

Page 7: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Windows in Stata

Page 8: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Agenda

I. Overview of Stata

II. Getting Started

III. ‘Do’ files

IV. Basic data cleaning

V. Basic data management

VI. Beginning analysis

VII. Special topics (time permitting)

Page 9: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Getting Started

• Opening Stata

• Opening Data:– Stata formatted data

“use” command

– Comma-separated variables “insheet using”

– Tab-delimited variables “insheet using”

– Flat-files Create a dictionary

Page 10: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Apply Your Knowledge

• Exercise 1:

• Open Stata

• Using the insheet command, open the comma-separated variables data file located in– F:\workshops\SATdata.csv

(HINT: all Stata commands must be written in lower case.

Don’t forget to put pathnames in quotes!)

Page 11: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Examining Data

• Look at your data – did our data import correctly?

– How are our data measured?

– What kinds of variables do we have?

• How would we describe the distribution of our data?– Graphs

Histograms Scatterplots

– Charts/Tables Frequency tables Cross-tabs

Page 12: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Looking at Data

• There are several ways to look at our data in Stata

– Editor

– Browser

– Stata commands codebook des Tables of frequency and distribution Graphs of distribution

Page 13: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Examining Data

• Let’s look at how the variable ‘csat’ is distributed

– hist csat

– tab csat

Page 14: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Agenda

I. Overview of Stata

II. Getting Started

III. ‘Do’ files

IV. Basic data cleaning

V. Basic data management

VI. Beginning analysis

VII. Special topics (time permitting)

Page 15: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Do files

What are do-files?

‘Do’ files are essentially a syntax list of all of the commands that you wish to run, and the setting that you would like to set

– Why use them?

Replication Collaboration Audit trail Help

– How to create and run one

Page 16: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Do-files

• Creating and running a do-file

Page 17: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Do files

– EXERCISE 2: Create a simple do-file from the commands that you have already entered.

(HINT: you must clear the data in memory before opening a new dataset.)

Page 18: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Agenda

I. Overview of Stata

II. Getting Started

III. ‘Do’ files

IV. Basic data cleaning

V. Basic data management

VI. Beginning analysis

VII. Special topics (time permitting)

Page 19: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Agenda

I. Overview of Stata

II. Getting Started

III. ‘Do’ files

IV. Basic data cleaning

V. Basic data management

VI. Beginning analysis

VII. Special topics (time permitting)

Page 20: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Basic Data Cleaning

– Labeling– To label a variable: label var varname label– To label values:

label define labelname 1 ‘high’ 0 ’low’ Label val varname labelname

– Renaming ren varname1 varname2

– Recoding recode varname oldvalue=newvalue

– Generating a new variable gen newvarname=somevalue

– Replacing values of an already generated variable replace newvarname=somevalue

Page 21: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Basic Data Management

• Subsetting– keep

– drop

– if

Merging

merge

must sort both files by the linkage variable!

ex: merge linkage_var using “F:\workshops\newfile”

Page 22: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Basic Data Cleaning

• EXERCISE 3:

• generate a dichotomous variable called hi_score from the

csat variable, where a value of 1 indicates a score of greater than 922 and a 0 is less than or equal to 922.

• label it as 0=low and 1=high.

Page 23: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Agenda

I. Overview of Stata

II. Getting Started

III. ‘Do’ files

IV. Basic data cleaning

V. Basic data management

VI. Beginning analysis

VII. Special topics (time permitting)

Page 24: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Beginning Analysis

• Univariate analysis summarize histogram Table

Bivariate analysis

tabulate

pwcorr

ttest

Page 25: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Apply Your Knowledge

EXERCISE 4:

Generate a histogram of the expense variable

generate a two-way table to see if distributions are the same or different for the values of expense by the different values of your newly created hi_score variable

If you have time, see if there is a significant correlation between scores on SATs and the average amount of money that each state spends on education.

Page 26: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Beginning Analysis

• Multivariate models

– Linear regression

regress depvar indepvar1 indepvar2 … indepvarN

– Logistic Regression logit depvar indepvar1 indepvar2 … indepvarN

Page 27: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Apply Your Knowledge

• Exercise 5:

Generate two scaterplots – one to look at the relationship between expense and csat , one to look at expense and hi_score.

Depending on your estimation of the relationship (linear or not), run the appropriate regression to test for the relative effect of expense on either csat scores or hi_scores

Page 28: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Agenda

I. Overview of Stata

II. Getting Started

III. ‘Do’ files

IV. Basic data cleaning

V. Basic data management

VI. Beginning analysis

VII. Special topics (time permitting)

Page 29: Getting Started with Stata 2/11/2010 Tom Tomberlin Nealia Khan Learning Technologies Center Harvard Graduate School of Education

Thanks

Questions?

Gutman Library, room 323a&b

[email protected]://www.isites.harvard.edu/icb/icb.do?keyword=ltc