Upload
others
View
43
Download
0
Embed Size (px)
Citation preview
Extracting Metadata from Stata Datasets
SuzannaVidmarandLukeStevens
ClinicalEpidemiologyandBiosta;s;csUnitMurdochChildren’sResearchIns;tute
Data sharing and storage
• Toenabledatasharing,thedatashouldbestoredinaformatthatdoesnotrequiredapar;cularversionofapar;cularsta;s;calpackage• Attheconclusionofastudy,datashouldbestoredinaretrievableformat,andnotonethatmaybecomeobsolete• ThesafestretrievableformatistohavethedatastoredinCSVortextfiles• Stata’sexport delimitedcommandwritesdatafromaStatadatasettoatextfile
But what do the data mean?
Withoutadescrip;onofthedata,thedatafileisoflimiteduse
Metadata • Metadataisdatathatdescribesotherdata
• Myfocusisonvariable-levelmetadata,alsoknownasadatadic;onary• Examplesofvariable-levelmetadataaredatatypes,variablelabelsandvaluelabels
Metadataisalovenotetothefuture
Extrac8ng the data dic8onary from Stata
filename.CSV
But wait, there’s more!
DataandmetadatacanbeimportedintodatacapturesoOwaresuchasREDCap
Features of REDCap
• Secure,web-basedapplica;onforresearchdatabasesandsurveys• Veryeasytouse• Audittrail• Userpermissioncontrols• Dataqualitymeasures• Dataexporttosta;s;calsoOware• GeneratesummaryreportandleQers
hQps://projectredcap.org/
8
Building a REDCap database • AswithalldatacapturesoOware,dataentryformscanbedevelopedwithinREDCap• AREDCapdatabasecanalsobebuiltbyuploadinganexternaldatadic;onary
metadatacsv.ado
Example using metadatacsv.ado
example.dta
dict_example.csv
Directory and file name
describe, replace local fullpath: char _dta[d_filename] mata: st_local("fullname", pathbasename("`fullpath'"))
local length=strpos("`fullname'",".")-1 local filestub=substr("`fullname'",1,`length')
Directory and file name
describe, replace local fullpath: char _dta[d_filename] • di"`fullpath'"• C:\Users\suzanna.vidmar\Documents\Suzanna\Metadata\example.dta mata: st_local("fullname", pathbasename("`fullpath'"))
local length=strpos("`fullname'",".")-1 local filestub=substr("`fullname'",1,`length')
Directory and file name
describe, replace local fullpath: char _dta[d_filename] • di"`fullpath'"• C:\Users\suzanna.vidmar\Documents\Suzanna\Metadata\example.dta mata: st_local("fullname", pathbasename("`fullpath'")) • di"`fullname'"• example.dta local length=strpos("`fullname'",".")-1 local filestub=substr("`fullname'",1,`length')
Directory and file name
describe, replace local fullpath: char _dta[d_filename] • di"`fullpath'"• C:\Users\suzanna.vidmar\Documents\Suzanna\Metadata\example.dta mata: st_local("fullname", pathbasename("`fullpath'")) • di"`fullname'"• example.dta local length=strpos("`fullname'",".")-1 • di"`length'"• 7 local filestub=substr("`fullname'",1,`length')
Directory and file name
describe, replace local fullpath: char _dta[d_filename] • di"`fullpath'"• C:\Users\suzanna.vidmar\Documents\Suzanna\Metadata\example.dta mata: st_local("fullname", pathbasename("`fullpath'")) • di"`fullname'"• example.dta local length=strpos("`fullname'",".")-1 • di"`length'"• 7 local filestub=substr("`fullname'",1,`length') • di"`filestub'"• example
Saving data dic8onary
export delimited "dict_`filestub'.csv", replace
Savesthedatafile: dict_example.csv
describe, replace
• describeusuallyproducesawriQenreport• Whenthereplaceop;onisspecified,insteadofareportthedatainmemoryarereplacedwithdatasetcontainingtheinforma;onthatwouldhavebeenpresentedinthereport.Thenewdatasethasanobserva;onforeachvariableintheoriginaldata.
describe
describe, replace
uselabel
Createsadatasetcontainingvalue-labelinforma;on
Extrac8ng value label names
gen recnum=_n • recnumcontainsthenumberofthecurrentobserva;on levelsof lname, local(levels) `"coblab"'`"genderlab"'`"noyes"'• Thesearestoredinthelocalmacro`levels'
Crea8ng the contents of each value label
foreach x of local levels { local fullab qui su recnum if lname=="`x'" local j=r(min) local k=r(max) forval i=`j'/`k' { local val=value[`i'] local lab=label[`i'] local fullab `fullab' `val', `lab' | } local lenlab=strlen("`fullab'")-2 local fullab=substr("`fullab'",1,`lenlab') }
foreach x of local levels { local fullab qui su recnum if lname=="`x'" local j=r(min) local k=r(max) forval i=`j'/`k' { local val=value[`i'] local lab=label[`i'] local fullab `fullab' `val', `lab' | } local lenlab=strlen("`fullab'")-2 local fullab=substr("`fullab'",1,`lenlab') }
Crea8ng the contents of each value label
Example with coblab
forval i=`j'/`k' { local val=value[`i']
local lab=label[`i']
local fullab `fullab' `val', `lab' |
}
`i'=1-1,Missing|
Example with coblab
forval i=`j'/`k' { local val=value[`i']
local lab=label[`i']
local fullab `fullab' `val', `lab' |
}
`i'=2-1,Missing|1,Australia|
Example with coblab
forval i=`j'/`k' { local val=value[`i']
local lab=label[`i']
local fullab `fullab' `val', `lab' |
}
`i'=3-1,Missing|1,Australia|2,UnitedKingdom|
Example with coblab
forval i=`j'/`k' { local val=value[`i']
local lab=label[`i']
local fullab `fullab' `val', `lab' |
}
`i'=4-1,Missing|1,Australia|2,UnitedKingdom|3,Vietnam|
Example with coblab
forval i=`j'/`k' { local val=value[`i']
local lab=label[`i']
local fullab `fullab' `val', `lab' |
}
`i'=5-1,Missing|1,Australia|2,UnitedKingdom|3,Vietnam|4,China|
Example with coblab
forval i=`j'/`k' { local val=value[`i']
local lab=label[`i']
local fullab `fullab' `val', `lab' |
}
`i'=6-1,Missing|1,Australia|2,UnitedKingdom|3,Vietnam|4,China|5,Singapore|
Example with coblab
forval i=`j'/`k' { local val=value[`i']
local lab=label[`i']
local fullab `fullab' `val', `lab' |
}
`i'=7-1,Missing|1,Australia|2,UnitedKingdom|3,Vietnam|4,China|5,Singapore|6,NewZealand|
Example with coblab foreach x of local levels {
…
forval i=`j'/`k' {
local val=value[`i']
local lab=label[`i']
local fullab `fullab' `val', `lab' |
}
local lenlab=strlen("`fullab'")-2
local fullab=substr("`fullab'",1,`lenlab')
}
-1,Missing|1,Australia|2,UnitedKingdom|3,Vietnam|4,China|5,Singapore|6,NewZealand|
-1,Missing|1,Australia|2,UnitedKingdom|3,Vietnam|4,China|5,Singapore|6,NewZealand
Allowing for extremely long strings
tempname mem file write `mem' "`x'" _tab "`fullab'" _newline • fileallowsforextremelylongstringvalues,upto2-billioncharacters• Withpostfilethelimitis2045characters
OneweekaOersubmiingmyabstractforthismee;ng…
Beaten to the punch
SethLireQeetal
AlfredRusselWallace
metadatacsv.ado
The redcapture command
redcapture syntax
redcapture varlist, file(string) form(string) [text(varlist) dropdown(varlist) radio(varlist) header(string) validate(varlist) validtype(validtypes) validmin(minlist)validmax(maxlist) matrix1(varlist) matrix2(varlist) matrix3(varlist) matrix4(varlist) matrix5(varlist) matrix6(varlist) matrix7(varlist) matrix8(varlist) matrix9(varlist) matrix10(varlist)]
First, some background on
REDCap field types
REDCap valida8ons for text fields
Capturing categorical data in REDCap
Example Stata dataset
Example script
redcapture *, file(example) form(example_form) header(Example) /// text(id age sex bdate sbp dbp comment) /// dropdown(consented race) /// radio(happy1 happy2 happy3) /// validate(id bdate dbp comment) /// validtype(ssn date_ymd integer alpha_only) /// validmin(none 1/1/1900 20 none) /// validmax(none 12/31/2014 200 none) /// matrix1(happy1 happy2 happy3)
• Metadataaresavedinexample.csv.Thisisthedatadic4onarythatwillbeuploadedtoREDCap.
• Theform/instrumentnameinREDCapisexample_form• Itsheaderis"Example"
redcapture *, file(example) form(example_form) header(Example) /// text(id age sex bdate sbp dbp comment) /// dropdown(consented race) /// radio(happy1 happy2 happy3) /// validate(id bdate dbp comment) /// validtype(ssn date_ymd integer alpha_only) /// validmin(none 1/1/1900 20 none) /// validmax(none 12/31/2014 200 none) /// matrix1(happy1 happy2 happy3)
Forcategoricalvariables.TheymustbenumericwithvaluelabelsaEached.
Example script
redcapture *, file(example) form(example_form) header(Example) /// text(id age sex bdate sbp dbp comment) /// dropdown(consented race) /// radio(happy1 happy2 happy3) /// validate(id bdate dbp comment) /// validtype(ssn date_ymd integer alpha_only) /// validmin(none 1/1/1900 20 none) /// validmax(none 12/31/2014 200 none) /// matrix1(happy1 happy2 happy3)
• Thesearetextfields• Allvariablesinthevalidate()op4onmustbedeclaredastextfields
Example script
redcapture *, file(example) form(example_form) header(Example) /// text(id age sex bdate sbp dbp comment) /// dropdown(consented race) /// radio(happy1 happy2 happy3) /// validate(id bdate dbp comment) /// validtype(ssn date_ymd integer alpha_only) /// validmin(none 1/1/1900 20 none) /// validmax(none 12/31/2014 200 none) /// matrix1(happy1 happy2 happy3)
• idisasocialsecuritynumber• bdateisadatefieldinYMDformat• dbpisaninteger• commentisastring
Example script
redcapture *, file(example) form(example_form) header(Example) /// text(id age sex bdate sbp dbp comment) /// dropdown(consented race) /// radio(happy1 happy2 happy3) /// validate(id bdate dbp comment) /// validtype(ssn date_ymd integer alpha_only) /// validmin(none 1/1/1900 20 none) /// validmax(none 12/31/2014 200 none) /// matrix1(happy1 happy2 happy3)
• Toomitrangechecksforanyorallofthevalida4onvariables,"none"shouldbeenteredintothecorrespondingloca4on
• ThesearesoMchecks
Example script
redcapture *, file(example) form(example_form) header(Example) /// text(id age sex bdate sbp dbp comment) /// dropdown(consented race) /// radio(happy1 happy2 happy3) /// validate(id bdate dbp comment) /// validtype(ssn date_ymd integer alpha_only) /// validmin(none 1/1/1900 20 none) /// validmax(none 12/31/2014 200 none) /// matrix1(happy1 happy2 happy3)
• Radiofieldswithacommonsetofresponseop4onscanbegroupedinamatrix
• Seenextslide
Example script
Matrix of fields
Data dic8onary
Theredcapturecommandcreatedthisdatadic;onary
…whichcanbeuploadedintoREDCap
In conclusion …
1. Ensuredatawillberetrievable10or20yearsfromnow2. Ensurethenextgenera;onofresearcherswillbeable
tounderstandcurrentlyarchiveddataHow?BystoringbothdataandmetadataintextfilesStata'sexport delimited andredcapture commandsfacilitatesthisDataandmetadatacanbeuploadedtodatacapturesoOwaresuchasREDCap