12
1 SOCY7709: Quantitative Data Management Instructor: Natasha Sarkisian Automating Your Work: Macros and Loops Macros Macros allow us to create an association linking some text or number(s) to a short label. They have a lot of functions. Macros can be local those work with your current do file or program (ado file) or global, existing outside of the current do file until you either delete them or exit Stata. For now, we will deal with local macros as those are used in the vast majority of cases. Local macros are defined as follows: If they are to contain a string: local localname "text" If they are to contain a number: local localname = exp And we can then refer to them as `localname' in our code. Global macros are defined: If they are to contain a string: global globalname "text" (Quotation marks here are optional, but they really help clarity of code.) If they are to contain a number: global globalname = exp And we can then refer to them as $globalname in our code. Example 1: Using macros to display results Do-file: local newlocal "hello" local numlocal=42 display "The text says `newlocal' and the number is `numlocal'" Output: . local newlocal "hello" . local numlocal=42 . display "The text says: `newlocal' and the number is `numlocal'" The text says hello and the number is 42 The same approach allows you to just test whether macro contains what you think using display: display "localname: `localname'"

SOCY7709: Quantitative Data Management Instructor: …Loops Loops help automate repetitive operations, from the simplest to very complex. “Old” syntax examples To define simple

  • Upload
    others

  • View
    10

  • Download
    0

Embed Size (px)

Citation preview

1

SOCY7709: Quantitative Data Management

Instructor: Natasha Sarkisian

Automating Your Work: Macros and Loops

Macros

Macros allow us to create an association linking some text or number(s) to a short label. They

have a lot of functions. Macros can be local – those work with your current do file or program

(ado file) – or global, existing outside of the current do file until you either delete them or exit

Stata. For now, we will deal with local macros as those are used in the vast majority of cases.

Local macros are defined as follows:

If they are to contain a string: local localname "text"

If they are to contain a number: local localname = exp

And we can then refer to them as `localname' in our code.

Global macros are defined:

If they are to contain a string: global globalname "text"

(Quotation marks here are optional, but they really help clarity of code.)

If they are to contain a number: global globalname = exp

And we can then refer to them as $globalname in our code.

Example 1: Using macros to display results

Do-file: local newlocal "hello"

local numlocal=42

display "The text says `newlocal' and the number is `numlocal'"

Output: . local newlocal "hello"

. local numlocal=42

. display "The text says: `newlocal' and the number is `numlocal'"

The text says hello and the number is 42

The same approach allows you to just test whether macro contains what you think using display: display "localname: `localname'"

2

Example 2: Using macros to run analyses

Do file: //defining locals

local outcome "hrs1"

local ses "income prestg80"

local demog "age sex educ"

local options "beta noheader"

//running regression

reg `outcome' `ses' `demog', `options'

Output: . //defining locals

. local outcome "hrs1"

. local ses "income prestg80"

. local demog "age sex educ"

. local options "beta noheader"

. //running regression

. reg `outcome' `ses' `demog', `options'

------------------------------------------------------------------------------

hrs1 | Coef. Std. Err. t P>|t| Beta

-------------+----------------------------------------------------------------

income | 1.165911 .1878649 6.21 0.000 .1507902

prestg80 | .1192887 .0281875 4.23 0.000 .1161999

age | -.0923105 .0268899 -3.43 0.001 -.0815599

sex | -6.246038 .6761722 -9.24 0.000 -.2172943

educ | -.0262734 .1419402 -0.19 0.853 -.0050456

_cons | 36.96576 2.798094 13.21 0.000 .

------------------------------------------------------------------------------

By using macros here, you eliminate the possibility that you omit one variable accidentally from

a group of variables in some of your models, and you make it possible to make changes to a

whole bunch of commands all at once – you may have hundreds of regression models, but you

can change them all by changing the macro definitions in the beginning of your do file.

Setting options with macros can be particularly useful when there are many of them and the same

options apply to a number of commands – e.g., a series of regressions or graphs.

Alternatively:

Do file: //defining variable sets

local outcome "hrs1"

local ses "income prestg80"

local demog "age sex educ"

local options "beta noheader"

//defining models

local rhs1 "`ses' `demog'"

//running regression

reg `outcome' `rhs1', `options'

3

Output: . //defining variable sets

. local outcome "hrs1"

. local ses "income prestg80"

. local demog "age sex educ"

. local options "beta noheader"

. //defining models

. local rhs1 "`ses' `demog'"

. //running regression

. reg `outcome' `rhs1', `options'

------------------------------------------------------------------------------

hrs1 | Coef. Std. Err. t P>|t| Beta

-------------+----------------------------------------------------------------

income | 1.165911 .1878649 6.21 0.000 .1507902

prestg80 | .1192887 .0281875 4.23 0.000 .1161999

age | -.0923105 .0268899 -3.43 0.001 -.0815599

sex | -6.246038 .6761722 -9.24 0.000 -.2172943

educ | -.0262734 .1419402 -0.19 0.853 -.0050456

_cons | 36.96576 2.798094 13.21 0.000 .

------------------------------------------------------------------------------

Example 3: Circular macro definitions

If the list of variables is too long, we can assign it in multiple steps by referring to the macro

itself:

Do-file: local demog "age sex educ"

local demog "`demog' i.marital hompop"

sum `demog'

Output: . local demog "age sex educ"

. local demog "`demog' i.marital hompop"

. sum `demog'

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

age | 2751 46.28281 17.37049 18 89

sex | 2765 1.555877 .4969578 1 2

educ | 2753 13.36397 2.973924 0 20

|

marital |

widowed | 2765 .0893309 .2852724 0 1

-------------+--------------------------------------------------------

divorced | 2765 .1609403 .3675424 0 1

separated | 2765 .0347197 .1831021 0 1

never mar.. | 2765 .2560579 .4365331 0 1

|

hompop | 2765 2.325497 1.336758 1 10

4

Example 4: Defining macros using saved results

After we run most commands, some results are stored as temporary matrices and scalars; you can

see what is stored using, for example, return list or ereturn list. E.g. after the regression we

ran: . ereturn list

scalars:

e(N) = 1652

e(df_m) = 5

e(df_r) = 1646

e(F) = 33.32831624048627

e(r2) = .0919330014993301

e(rmse) = 13.71922015861433

e(mss) = 31364.77878255548

e(rss) = 309805.1848978319

e(r2_a) = .0891745962790972

e(ll) = -6667.334929783322

e(ll_0) = -6746.99198777109

e(rank) = 6

macros:

e(cmdline) : "regress hrs1 income prestg80 age sex educ, beta noheader"

e(title) : "Linear regression"

e(marginsok) : "XB default"

e(vce) : "ols"

e(depvar) : "hrs1"

e(cmd) : "regress"

e(properties) : "b V"

e(predict) : "regres_p"

e(model) : "ols"

e(estat_cmd) : "regress_estat"

matrices:

e(b) : 1 x 6

e(V) : 6 x 6

functions:

e(sample)

Commands can be of different types, and that determines whether the results are stored in return

or ereturn; there are a few more classes but those are more rare.

r-class: general commands such as summarize. Results are returned in r(), can be located

using return list, and generally must be used before executing more commands.

e-class: estimation commands such as regress, logistic, etc., that fit statistical models.

Such estimation results stay around until the next model is fit. Results are returned in e()

and can be located using ereturn list.

s-class: programming commands that assist in parsing. These commands are relatively

rare. Results are returned in s()and can be located using sreturn list.

n-class: commands that do not store results at all, e.g., generate and replace.

c-class: not commands per se, but the values of system parameters and settings, along

with certain constants like pi, are stored as c() and can be located using creturn list.

Saved results should be used usually before other commands are executed, so that’s where

storing these to macros can be handy.

5

Do-file: //defining variable sets

local outcome "hrs1"

local ses "income prestg80"

local demog "age sex educ"

local options "beta noheader"

//defining models

local rhs1 "`ses' `demog'"

//running regression

reg `outcome' `rhs1', `options'

//saving results

local size=e(N)

local r2=e(r2)

//displaying results

display "sample size is `size' and R-squared is `r2'"

Last portion of output: . //displaying results

. display "sample size is `size' and R-squared is %3.2f`e(r2)'"

sample size is 1652 and R-squared is %3.2f.0919330014993301

If we want to format the displayed results better, we change the last line of this do file to include

a format specification: display "sample size is `size' and R-squared is " %4.3f `e(r2)'

Last portion of output: . //displaying results

. display "sample size is `size' and R-squared is " %4.3f `e(r2)'

sample size is 1652 and R-squared is 0.092

Loops

Loops help automate repetitive operations, from the simplest to very complex.

“Old” syntax examples

To define simple loops, we can use “old” syntax for loops; Stata does not provide help for it

anymore as they phased it out, but I do find it useful when exploring. If you want to see

documentation for it, someone saved it here: http://www.eui.eu/Personal/Franklin/old-

style%20for.pdf

Looping through numbers: for num

. for num 1/5: reg hrs1 ibX.marital

-> reg hrs1 ib1.marital

Source | SS df MS Number of obs = 1729

-------------+------------------------------ F( 4, 1724) = 3.64

Model | 3092.33822 4 773.084556 Prob > F = 0.0059

Residual | 366411.487 1724 212.535665 R-squared = 0.0084

-------------+------------------------------ Adj R-squared = 0.0061

Total | 369503.825 1728 213.833232 Root MSE = 14.579

6

--------------------------------------------------------------------------------

hrs1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------------+----------------------------------------------------------------

marital |

widowed | -7.723317 2.103863 -3.67 0.000 -11.84971 -3.596924

divorced | -.4496047 .9896131 -0.45 0.650 -2.390573 1.491364

separated | -1.897114 1.865238 -1.02 0.309 -5.555481 1.761253

never married | -.0034797 .8290799 -0.00 0.997 -1.629588 1.622629

|

_cons | 42.15469 .5087968 82.85 0.000 41.15677 43.15261

--------------------------------------------------------------------------------

-> reg hrs1 ib2.marital

Source | SS df MS Number of obs = 1729

-------------+------------------------------ F( 4, 1724) = 3.64

Model | 3092.33822 4 773.084556 Prob > F = 0.0059

Residual | 366411.487 1724 212.535665 R-squared = 0.0084

-------------+------------------------------ Adj R-squared = 0.0061

Total | 369503.825 1728 213.833232 Root MSE = 14.579

--------------------------------------------------------------------------------

hrs1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------------+----------------------------------------------------------------

marital |

married | 7.723317 2.103863 3.67 0.000 3.596924 11.84971

divorced | 7.273712 2.210843 3.29 0.001 2.937496 11.60993

separated | 5.826203 2.718015 2.14 0.032 .4952498 11.15716

never married | 7.719837 2.143797 3.60 0.000 3.51512 11.92455

|

_cons | 34.43137 2.041413 16.87 0.000 30.42747 38.43528

--------------------------------------------------------------------------------

-> reg hrs1 ib3.marital

Source | SS df MS Number of obs = 1729

-------------+------------------------------ F( 4, 1724) = 3.64

Model | 3092.33822 4 773.084556 Prob > F = 0.0059

Residual | 366411.487 1724 212.535665 R-squared = 0.0084

-------------+------------------------------ Adj R-squared = 0.0061

Total | 369503.825 1728 213.833232 Root MSE = 14.579

--------------------------------------------------------------------------------

hrs1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------------+----------------------------------------------------------------

marital |

married | .4496047 .9896131 0.45 0.650 -1.491364 2.390573

widowed | -7.273712 2.210843 -3.29 0.001 -11.60993 -2.937496

separated | -1.447509 1.985119 -0.73 0.466 -5.341004 2.445986

never married | .4461249 1.071895 0.42 0.677 -1.656227 2.548477

|

_cons | 41.70508 .8487991 49.13 0.000 40.0403 43.36987

--------------------------------------------------------------------------------

-> reg hrs1 ib4.marital

Source | SS df MS Number of obs = 1729

-------------+------------------------------ F( 4, 1724) = 3.64

Model | 3092.33822 4 773.084556 Prob > F = 0.0059

Residual | 366411.487 1724 212.535665 R-squared = 0.0084

-------------+------------------------------ Adj R-squared = 0.0061

Total | 369503.825 1728 213.833232 Root MSE = 14.579

7

--------------------------------------------------------------------------------

hrs1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------------+----------------------------------------------------------------

marital |

married | 1.897114 1.865238 1.02 0.309 -1.761253 5.555481

widowed | -5.826203 2.718015 -2.14 0.032 -11.15716 -.4952498

divorced | 1.447509 1.985119 0.73 0.466 -2.445986 5.341004

never married | 1.893634 1.910167 0.99 0.322 -1.852854 5.640122

|

_cons | 40.25758 1.794502 22.43 0.000 36.73795 43.77721

--------------------------------------------------------------------------------

-> reg hrs1 ib5.marital

Source | SS df MS Number of obs = 1729

-------------+------------------------------ F( 4, 1724) = 3.64

Model | 3092.33822 4 773.084556 Prob > F = 0.0059

Residual | 366411.487 1724 212.535665 R-squared = 0.0084

-------------+------------------------------ Adj R-squared = 0.0061

Total | 369503.825 1728 213.833232 Root MSE = 14.579

------------------------------------------------------------------------------

hrs1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]

-------------+----------------------------------------------------------------

marital |

married | .0034797 .8290799 0.00 0.997 -1.622629 1.629588

widowed | -7.719837 2.143797 -3.60 0.000 -11.92455 -3.51512

divorced | -.4461249 1.071895 -0.42 0.677 -2.548477 1.656227

separated | -1.893634 1.910167 -0.99 0.322 -5.640122 1.852854

|

_cons | 42.15121 .6545986 64.39 0.000 40.86732 43.4351

------------------------------------------------------------------------------

Looping through variables: for var

. for var natspac-natsci: gen X_tm=(X==3) if X<.

-> gen natspac_tm=(natspac==3) if natspac<.

(1496 missing values generated)

-> gen natenvir_tm=(natenvir==3) if natenvir<.

(1443 missing values generated)

-> gen natheal_tm=(natheal==3) if natheal<.

(1429 missing values generated)

-> gen natcity_tm=(natcity==3) if natcity<.

(1526 missing values generated)

-> gen natcrime_tm=(natcrime==3) if natcrime<.

(1444 missing values generated)

-> gen natdrug_tm=(natdrug==3) if natdrug<.

(1455 missing values generated)

-> gen nateduc_tm=(nateduc==3) if nateduc<.

(1422 missing values generated)

-> gen natrace_tm=(natrace==3) if natrace<.

(1506 missing values generated)

-> gen natarms_tm=(natarms==3) if natarms<.

8

(1441 missing values generated)

-> gen nataid_tm=(nataid==3) if nataid<.

(1452 missing values generated)

-> gen natfare_tm=(natfare==3) if natfare<.

(1451 missing values generated)

-> gen natroad_tm=(natroad==3) if natroad<.

(101 missing values generated)

-> gen natsoc_tm=(natsoc==3) if natsoc<.

(104 missing values generated)

-> gen natmass_tm=(natmass==3) if natmass<.

(167 missing values generated)

-> gen natpark_tm=(natpark==3) if natpark<.

(80 missing values generated)

-> gen natchld_tm=(natchld==3) if natchld<.

(172 missing values generated)

-> gen natsci_tm=(natsci==3) if natsci<.

(1499 missing values generated)

Looping through new variables: for new

. set seed 1234

. for new rand1-rand10: gen X=uniform()

-> gen rand1=uniform()

-> gen rand2=uniform()

-> gen rand3=uniform()

-> gen rand4=uniform()

-> gen rand5=uniform()

-> gen rand6=uniform()

-> gen rand7=uniform()

-> gen rand8=uniform()

-> gen rand9=uniform()

-> gen rand10=uniform()

. sum rand*

Variable | Obs Mean Std. Dev. Min Max

-------------+--------------------------------------------------------

rand1 | 2765 .5062428 .2896212 .0007097 .999229

rand2 | 2765 .4989947 .2878202 .0000677 .9998645

rand3 | 2765 .4980761 .2882185 .0000445 .9992853

rand4 | 2765 .4949973 .2872546 .0002033 .9999771

rand5 | 2765 .5025668 .2882805 .0006171 .9996551

-------------+--------------------------------------------------------

rand6 | 2765 .5052714 .290094 .0003625 .9999846

9

rand7 | 2765 .4947994 .2873185 .0004361 .9994423

rand8 | 2765 .497452 .2894505 .0005534 .9998585

rand9 | 2765 .4962796 .2903314 .0001721 .999858

rand10 | 2765 .4992551 .2909377 .0007322 .9999549

Multiple embedded loops:

. for X in var hrs1 prestg80 educ: for Y in num 1/5: reg X ibY.marital

-> for Y in num 1/5: reg hrs1 ibY.marital

-> reg hrs1 ib1.marital

Source | SS df MS Number of obs = 1729

-------------+------------------------------ F( 4, 1724) = 3.64

Model | 3092.33822 4 773.084556 Prob > F = 0.0059

Residual | 366411.487 1724 212.535665 R-squared = 0.0084

-------------+------------------------------ Adj R-squared = 0.0061

Total | 369503.825 1728 213.833232 Root MSE = 14.579

--------------------------------------------------------------------------------

hrs1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------------+----------------------------------------------------------------

marital |

widowed | -7.723317 2.103863 -3.67 0.000 -11.84971 -3.596924

divorced | -.4496047 .9896131 -0.45 0.650 -2.390573 1.491364

separated | -1.897114 1.865238 -1.02 0.309 -5.555481 1.761253

never married | -.0034797 .8290799 -0.00 0.997 -1.629588 1.622629

|

_cons | 42.15469 .5087968 82.85 0.000 41.15677 43.15261

--------------------------------------------------------------------------------

-> reg hrs1 ib2.marital

Source | SS df MS Number of obs = 1729

-------------+------------------------------ F( 4, 1724) = 3.64

Model | 3092.33822 4 773.084556 Prob > F = 0.0059

Residual | 366411.487 1724 212.535665 R-squared = 0.0084

-------------+------------------------------ Adj R-squared = 0.0061

Total | 369503.825 1728 213.833232 Root MSE = 14.579

--------------------------------------------------------------------------------

hrs1 | Coef. Std. Err. t P>|t| [95% Conf. Interval]

---------------+----------------------------------------------------------------

marital |

married | 7.723317 2.103863 3.67 0.000 3.596924 11.84971

divorced | 7.273712 2.210843 3.29 0.001 2.937496 11.60993

separated | 5.826203 2.718015 2.14 0.032 .4952498 11.15716

never married | 7.719837 2.143797 3.60 0.000 3.51512 11.92455

|

_cons | 34.43137 2.041413 16.87 0.000 30.42747 38.43528

--------------------------------------------------------------------------------

[output omitted]

Looping through a list that is neither variable names nor numbers:

. for X in any relate gender old mar: for Y in num 1/8: tab XY

-> for Y in num 1/8: tab relateY

-> tab relate1

10

relationship of 1st |

person to household head | Freq. Percent Cum.

--------------------------+-----------------------------------

head of household | 2,765 100.00 100.00

--------------------------+-----------------------------------

Total | 2,765 100.00

-> tab relate2

relationship of 2nd |

person to household head | Freq. Percent Cum.

--------------------------+-----------------------------------

spouse | 1,214 64.71 64.71

child | 328 17.48 82.20

son or daughter-in-law | 13 0.69 82.89

grand or great-grandchild | 7 0.37 83.26

parent or parent-in-law | 11 0.59 83.85

other relative | 44 2.35 86.19

non-relative | 259 13.81 100.00

--------------------------+-----------------------------------

Total | 1,876 100.00

[output omitted]

New syntax for loops

New syntax for loops uses commands foreach and forvalues and relies on macros to build loops.

Looping through numbers:

Do file versions doing the same loop we did above with for num:

foreach nummar in 1 2 3 4 5 {

reg hrs1 ib`nummar'.marital

}

or

foreach nummar of numlist 1/5 {

reg hrs1 ib`nummar'.marital

}

or

forvalues nummar=1/5 {

reg hrs1 ib`nummar'.marital

}

If we want to use different step in going through numbers, e.g., 2, we can use a numlist with

skips:

forvalues nummar=1(2)5 {

reg hrs1 ib`nummar'.marital

}

11

Looping through variables

When looping through variables, you can either specify foreach localname in var1 var2 var3…

and list all variables individually (without specifying that they are variables, as undetermined

elements of the loop) or, if you want to use variable ranges as allowed in a typical varlist, then

specify “of varlist”.

Do-file: foreach natvar of varlist natspac-natsci {

gen `natvar'tl=(`natvar'==1) if `natvar'<.

}

Output: . foreach natvar of varlist natspac-natsci {

2. gen `natvar'tl=(`natvar'==1) if `natvar'<.

3. }

(1496 missing values generated)

(1443 missing values generated)

(1429 missing values generated)

(1526 missing values generated)

(1444 missing values generated)

(1455 missing values generated)

(1422 missing values generated)

(1506 missing values generated)

(1441 missing values generated)

(1452 missing values generated)

(1451 missing values generated)

(101 missing values generated)

(104 missing values generated)

(167 missing values generated)

(80 missing values generated)

(172 missing values generated)

(1499 missing values generated)

Looping through new variables

Do file creating the random number variables similar to what we did above: set seed 1234

foreach new of newlist rnd1-rnd10 {

gen `new'=uniform()

}

Multiple embedded loops

Double loop for regressing different outcomes on marital status with different omitted categories: foreach var_y in hrs1 prestg80 educ {

forvalues cat=1/5 {

reg `var_y' ib`cat'.marital

}

}

We can further generalize this by setting dependent and independent variables using macros and

by determining the number of categories automatically:

//defining macros

local dv “hrs1 prestg80 educ”

local iv “marital”

local controls “sex age”

12

tab `iv’

local numcat=r(r)

foreach var_y in `dv’ {

forvalues cat=1/`numcat’ {

reg `var_y' ib`cat'.`iv’ `controls’

}

}

Double loop for tabulating characteristics of up to 8 household members – we will also add a

couple more commands within the loops, to get codebook and clone each variable: foreach stem in relate gender old mar {

forvalues pers=1/8 {

tab `stem'`pers'

codebook `stem’`pers’

clonevar `stem’`pers’_new=`stem’`pers’

}

}

Old and new syntax correspondence

Elements to loop through Old syntax New syntax

Numbers for num … forvarlues macroname=…

foreach macroname of numlist …

Variables for var … foreach macroname of varlist …

New variables for new … foreach macroname of newlist …

Any text for any … foreach macroname in …