Upload
harva
View
36
Download
2
Embed Size (px)
DESCRIPTION
Working sideways in Stata. Jakob Hjort DataManager, MPH Department of Cardiology Aarhus University Hospital DK-8200 Aarhus Denmark. 2014 Nordic and Baltic Stata Users Group Metting. The rectangular dataset. The rectangular dataset. Statistics. The rectangular dataset. Statistics. - PowerPoint PPT Presentation
Citation preview
2014 Nordic and Baltic Stata Users Group Metting
Working sideways in Stata
Jakob HjortDataManager, MPH
Department of CardiologyAarhus University Hospital
DK-8200 AarhusDenmark
The rectangular dataset
Statistics
The rectangular dataset
Statistics
The rectangular dataset
”It is not the data we want it’s the ssence of data”
results
Datamanagement
The rectangular dataset
Datamanagement
The rectangular dataset
The rectangular dataset
DatamanagementStatistics
DatamanagementStatistics
The rectangular dataset - transpose?
use ”family.dta”, clear* Dataset with: fam_name, inc_mother & inc_father
mata st_view(x=0,.,(”inc_mother”,”inc_father”)) income=colsum(x’)’ st_addvar(”long”,”inc_household”) st_store(.,”inc_household”,income)end
list fam_name inc_mother inc_father inc_household
The rectangular dataset – subset in matrix using mata?
generate [type] newvar=exp [if] [in]
The direct approach
Datamanagement
generate [type] newvar=exp [if] [in]
The direct approach
Weight Height BMI
Datamanagement Ex.: generate BMI=Weight/Height^2
egen [type] newvar=fcn(arguments) [if] [in] [,options]
rowtotal, rowmin, rowmax, rowfirst, rowlast, rowmean, rowmedian, rowmiss, rownonmiss, rowpctile, rowsd, concat, anycount, anymatch, anyvalue,count, diff, fill, group, iqr, kurt, max, mdev, mean, median, min, mode, mtr, pc, pctile, rank, sd, seq, skew, std, tag, total
The direct approach
Datamanagement
egen [type] newvar=fcn(arguments) [if] [in] [,options]
rowtotal, rowmin, rowmax, rowfirst, rowlast, rowmean, rowmedian, rowmiss, rownonmiss, rowpctile, rowsd, concat, anycount, anymatch, anyvalue,count, diff, fill, group, iqr, kurt, max, mdev, mean, median, min, mode, mtr, pc, pctile, rank, sd, seq, skew, std, tag, total
The direct approach
IncJan IncFeb income
Datamanagement Ex.: egen income=rowtotal(inc*)
IncMar IncApr IncMay IncJun IncJul …
program define _growmin version 6, missing gettoken type 0 : 0 gettoken g 0 : 0 gettoken eqs 0 : 0
syntax varlist [if] [in] [, BY(string)] if `"`by'"' != "" { _egennoby rowmin() `"`by'"' }
tempvar touse mark `touse' `if' `in' quietly { gen `type' `g' = . tokenize `varlist' while "`1'"!="" { replace `g' = cond(`1' < `g',`1',`g') mac shift } }end
Looking under the skirts – just for inspiration
viewsource _growmin.ado the rowmin() function of egen
program define _growmin version 6, missing gettoken type 0 : 0 gettoken g 0 : 0 gettoken eqs 0 : 0
syntax varlist [if] [in] [, BY(string)] if `"`by'"' != "" { _egennoby rowmin() `"`by'"' }
tempvar touse mark `touse' `if' `in' quietly {1. gen `type' `g' = .2. tokenize `varlist'3. while "`1'"!="" {4. replace `g' = cond(`1' < `g',`1',`g')5. mac shift6. } }end
Looking under the skirts – just for inspiration
viewsource _growmin.ado the rowmin() function of egen
1. Initialize target variable2. Prepare the variable-list3. Looping:4. In-the-loop-commands
Prepare the variable-list
Variables can be specified with wildcards - The expanded list is stored in `vars'(unab means unabbreviate – however the command itself can’t be un-abbreviated)
. unab vars: inc* . unab vars: incJan-incDec
1. Initialize target variable2. Prepare the variable-list3. Looping:4. In-the-loop-commands
. local vars incJan incFeb incMar incApr incMay incJun /// incJul incAug incSep incOct incNov incDec
. ds inc* . ds incJan-incDecincJan incFeb incMar incApr incMay incJun incJul incAug incSep incOct incNov incDec
Full specification of each and every variable – OK with 12 but what in case of hundreds?The list is stored in `vars'
Variables can be specified with wildcards - The list is stored in `r(varlist)’Nice feature: the expanded list is shown for inspection
Looping
”foreach” is the quickest and the most transparent loop command
foreach lvar in incJan incFeb { // do stuff with "`lvar'”}
unab lvar: inc*foreach lvar in `lvar' { // do stuff with "`lvar'”}
ds inc*foreach lvar in `r(varlist)' { // do stuff with "`lvar'” }
1. Initialize target variable2. Prepare the variable-list3. Looping:4. In-the-loop-commands
”foreach” is the quickest and the most transparent loop command
foreach lvar in incJan incFeb { // do stuff with "`lvar'”}
unab lvar: inc*foreach lvar in `lvar' { // do stuff with "`lvar'”}
ds inc*foreach lvar in `r(varlist)' { // do stuff with "`lvar'” }
1. Initialize target variable2. Prepare the variable-list3. Looping:4. In-the-loop-commands
alt 0 9 6
Hold + press …
on numeric keypad
`
0 3 9 ’Hold + press …
on numeric keypad
alt =
=
Left single-quote
Right single-quote
Looping
In the loop
generate minimum=.unab vars: inc*foreach lvar in `vars' { replace minimum = cond(`lvar' < minimum,`lvar’,minimum)}
generate minimum=.unab vars: inc*foreach lvar in `vars' { replace minimum = `lvar’ if `lvar’<minimum}
generate minimum=.unab vars: inc*foreach lvar in `vars' { if `lvar’<minimum { replace minimum = `lvar’ }}
1. Initialize target variable2. Prepare the variable-list3. Looping:4. In-the-loop-commands
!
Some of the danish participants who might know ”the DREAM database”will propably be able to see how these approaches can be useful when working with this fantastic but difficult construction.
Thank you very much