Empirical Methods in Trade: Analyzing Trade Costs …b. Stata windows c. Organization of the...

Preview:

Citation preview

Empirical Methods in Trade: Analyzing Trade Costs and Trade Facilitation

June 2015

Bangkok, Thailand

Cosimo Beverelli Simon Neumueller

(ERSD/WTO) (ERSD/WTO)

1

Content

2

a. Resources

b. Stata windows

c. Organization of the “Bangkok_June_2015\Stata” folder

d. The “directory_definition” do file

e. Datasets used in this introduction to Stata

f. Do files

g. Importing data into Stata

h. Basic commands

i. Merging datasets

j. Macros

k. Loops

l. Graphics

a. Resources

3

1. Stata help and Stata manual

2. A variety of books covering Stata exist

Web resources:

1. Germán Rodríguez’s webpage • Data management, graphics and programming

2. UCLA IDRES’ webpage • Very comprehensive , covering all sorts of topics (data management,

analysis,…) with several examples

• FAQ

3. Statalist • Typically accessed via a google search

b. Stata windows

4

Type commands here

Type commands here

List of variables and labels

Type commands here Some properties of the variables

List of variables and labels

Type commands here Some properties of the variables

List of variables and labels Actions taken in the session

c. Organization of the “Bangkok_June_2015\Stata” folder

6

• The “Bangkok_June_2015\Stata” folder contains the following sub-folders: • data

• do_files

• results

• Do not change the name of the folder or of the sub-folders

• The first thing to do is to define your own working directory (see slide c.)

d. The “directory_definition” do file

7

• To make things easy for all of us, we define a “path” for the working directory that can be easily changed, and will be changed only once and for all

• When you open Stata, the first thing to do is to open the “directory_definition” do-file in the do-file editor

• Click here:

• …or Ctrl+9

• The “directory_definition” looks like this and you have to change it to your own path

e. Datasets and do files used in this introduction to Stata

8

• To apply some of the Stata commands described in this presentation, we will use two datasets:

• WDI.csv – a very small subset of the World Development indicators

• WB_ES.xls – derived from the World Bank Enterprise Surveys

• You can find the datasets in the “data” directory: "$BKK\data\Introduction_Stata"

• There are also 4 do files

• 01_data_intro_stata

• 02_descriptive

• 03_reshape_merge

• 04_loops

• You can find the do files in the directory: "$BKK\do_files\Introduction_Stata“

f. Do files

9

• A do file is a set of Stata commands typed in a plain text file

• When you work with STATA, always use do files

• E.g. one do file for creating your master dataset and one do file for regressions

• Do files can also be used to set globals and directories or to run a series of different do files after each other

Typical commands at beginning of each do file:

clear all /* removes all data in the current Stata session*/

set more off, perm /* prevents Stata to pause while running a do file */

capture log close /* closes a log file */

cd “directory” /* sets the directory, e.g.. “$BKK\data*/

log using “filename”, replace /* useful for long do files, allows printing */

use “dataset.dta”, replace /* open dataset in Stata format (.dta); “ ”

are not necessarily needed */

Typical commands at beginning of each do file:

clear all /* removes all data in the current Stata session*/

set more off, perm /* prevents Stata to pause while running a do file */

capture log close /* closes a log file */

cd “directory” /* sets the directory, e.g.. “$BKK\data*/

log using “filename”, replace /* useful for long do files, allows printing */

use “dataset.dta”, replace /* open dataset in Stata format (.dta); “ ”

are not necessarily needed */

Notes

• “*” treats everything after it in a line as a comment

• “/* text */” will make Stata treat “text” as a comment (“text” can span over several lines)

g. Importing data into Stata

11

• insheet using filename.csv, clear

• Typically used for text files that are either comma or tab-separated

• import excel using filename.xls, sheet(“Sheet1”) first clear

• Reads excel files directly into Stata

• Allows to specify variables, cell range and worksheet to import

• Copy paste is strongly discouraged. Watch out. The accuracy of copied numbers depends on:

o How data are formatted in excel, i.e. how many digits are shown

o Your settings in Stata (use set type double before copying)

To save the dataset (in Stata format): • save filename, replace

h. Basic commands

12

• Identify missing values (represented by . (dot) or empty cell) • inspect varlist

• codebook varlist

• Identify duplicate observations • duplicates (report/drop/tag/list) varlist

• Identify number of unique values

• unique varlist

• Browse the dataset

• browse varlist

Other basic commands

• describe

• generate

• destring/tostring

• replace

• rename

• Alternative: renvars

• keep

• drop

• The list can go on…what is important is to keep in mind that, in case of doubt, you can always use the “help”:

• help command

List of useful operators commonly used in expressions

Arithmetic Logical Relational

+ add ! not (also ~) == equal

- subtract | or != not equal (also ~=)

* multiply & and < less than

/ divide <= less than or equal

^ raise to power > greater than

+ string concatenation >= greater than or equal

Commands for descriptive statistics

• summarize varlist

• tabulate var1 var 2

• table rowvar (colvar), content()

• tabstat varlist, statistics() by()

The egen command

• Often used command to create new variables

• Commonly used egen functions (refer to WB_ES dataset):

• bysort cou sector: egen sales_sec=total(sales), missing

• bysort cou sector: egen sales_sec=mean(sales)

• egen exp_tot=rowtotal(exp_intermediate exp_final)

• egen id_cluster=group(cou sector)

• egen cou_sec=concat(cou sector)

• See 02_descriptive.do

• Further functions include: max, min, count, tag,…

String functions

• generate newvar =function()

• Some useful functions are:

• abbrev() –> shortens the string the number of indicated characters

• length() –> returns the length of the string, i.e. number of characters

• subinstr() –> allows to replace or delete particular substrings

• substr() –> allows to extract substrings based on its position

• upper (lower) –> Changes the entire string to upper-case (lower- case) strings

• trim() –> removes leading and trailing blanks of the string

The collapse command

• collapse (mean) varlist (sum) varlist, by(varlist)

• Creates an aggregate dataset by e.g. averaging or summing variables across the dimension identified in by()

• All observations not included in the command are dropped

• Useful in analysis when moving to a higher level of aggregation, e.g. aggregating trade flows from HS 6-digit to HS 2-digit

• Useful for calculating descriptive statistics before exporting them to excel using export excel

The collapse command

• collapse (mean) varlist (sum) varlist, by(varlist)

• Creates an aggregate dataset by e.g. averaging or summing variables across the dimension identified in by()

• All observations not included in the command are dropped

• Useful in analysis when moving to a higher level of aggregation, e.g. aggregating trade flows from HS 6-digit to HS 2-digit

• Useful for calculating descriptive statistics before exporting them to excel using export excel

• If you do not want to collapse, duplicates drop after bysort (): egen gives the same results as collapse. Example (see 02_descriptive.do): • collapse (mean) sales (sum) dummy_exp, by(cou sector)

is equivalent to:

• bysort cou sector: egen avg_sales = mean(sales)

• bysort cou sector: egen number_exporters =

total(dummy_exp)

• keep cou sector avg_sales number_exporters

• duplicates drop

The reshape command

• reshape wide (long) ‘stub’, i(var) j(var) options

• Reshapes dataset from long to wide format and vice versa

• Data dimensions such as country, year or sector are normally put in long format

• ‘stub’ are variables in reshape wide and stubs of variables in reshape long

• i(var) are identifying dimensions; j(var) dimension to change

• Exercise: Open WDI.dta and reshape it first long and then wide (see 03_reshape_merge.do)

i j stub

1 1 4.1

1 2 4.5

2 1 3.3

2 2 3.0

i stub1 stub2

1 4.1 4.5

2 3.3 3.0

Long

Wide

Reshape

i. Merging datasets

20

• To merge datasets you can use joinby or merge

• joinby varlist using filename, unmatched(both)

• The command forms all pairwise combination for varlist

• unmatched can keep unmatched observations from the master dataset, the using dataset or both (see generated variable _merge)

• Exercise: Open WB_ES.dta and merge it with WDI.dta (see 03_reshape_merge.do)

j. Macros

21

• Macros are names associated with some text • The commands global and local assign strings to global and local macro names

• global mname [=exp | :extended_fcn | [`]"[string]"['] ] • Global macros, once defined, are available anywhere in Stata

• Evaluate it using $mname

• local lclname [=exp | :extended_fcn | [`]"[string]"['] ]

• Simplest example: local c USA JPN

• Evaluate it using `lclname'

• Local macros work only within the do file in which they are defined

• Globals and locals have a variety of uses

• To define the directories for this class, i.e. directory_definition.do

• They are used in loops (see next slides)

• A set of explanatory variables can be grouped under one macro name

k. Loops

22

• See Stata help and Germán Rodríguez’s webpage

• Two main commands: foreach and forvalues

• foreach loops through strings of text, forvalues loops through numbers

• Syntax (3 alternatives): foreach item in a-list-of-things (e.g. a b d) {

commands referring `item‘ }

foreach varname of varlist list-of-variables {

commands referring to `varname‘ }

forvalues number = first(step)last {

commands referring to `number' }

23

Examples for loops in WB_ES.dta

Ex1 foreach k in USA JPN { /* Loop over any_list */

egen sales_`k'=total(sales) if cou=="`k'"

}

Ex2 vallist cou, local(c) /* vallist shows values and creates

local */

foreach k of local c { /* Loop over a local macro */

capture drop sales_`k'

egen sales_`k'=total(sales) if cou=="`k'"

}

Ex3 forvalues k=1(1)3 { /* Loop over sector codes. The

range can be defined in different ways*/

egen total_`k'=total(sales) if sector==`k'

}

• foreach can also be used to loop over variables and numbers • foreach k of var varlist; foreach k of num numlist

l. Graphics

24

• Useful links: official Stata or UCLA IDRES

• Histograms and bar graphs: • histogram var

• graph bar (stat) var, over(var)

25

More graphics

• Distribution plots: • histogram var, frequency kdensity

• twoway kdensity var

• twoway (kdensity var1 if var2==“”) (kdensity var1 if var2

==“”), by(var)

26

More graphics

• Scatter plots: • scatter yvar xvar

• graph twoway (scatter yvar xvar) (lfit yvar xvar)

27

More graphics

• Scatter plots: • scatter yvar xvar

• graph twoway (scatter yvar xvar) (lfit yvar xvar)

28

More graphics

• Line plots: • line yvar year

• graph twoway (line yvar year if…) (line yvar year if …)

Recommended