51
Extracting Metadata from Stata Datasets Suzanna Vidmar and Luke Stevens Clinical Epidemiology and Biosta;s;cs Unit Murdoch Children’s Research Ins;tute

Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

  • Upload
    others

  • View
    43

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Extracting Metadata from Stata Datasets

SuzannaVidmarandLukeStevens

ClinicalEpidemiologyandBiosta;s;csUnitMurdochChildren’sResearchIns;tute

Page 2: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Data sharing and storage

• Toenabledatasharing,thedatashouldbestoredinaformatthatdoesnotrequiredapar;cularversionofapar;cularsta;s;calpackage• Attheconclusionofastudy,datashouldbestoredinaretrievableformat,andnotonethatmaybecomeobsolete• ThesafestretrievableformatistohavethedatastoredinCSVortextfiles• Stata’sexport delimitedcommandwritesdatafromaStatadatasettoatextfile

Page 3: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

But what do the data mean?

Withoutadescrip;onofthedata,thedatafileisoflimiteduse

Page 4: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Metadata • Metadataisdatathatdescribesotherdata

• Myfocusisonvariable-levelmetadata,alsoknownasadatadic;onary• Examplesofvariable-levelmetadataaredatatypes,variablelabelsandvaluelabels

Metadataisalovenotetothefuture

Page 5: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Extrac8ng the data dic8onary from Stata

filename.CSV

Page 6: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

But wait, there’s more!

DataandmetadatacanbeimportedintodatacapturesoOwaresuchasREDCap

Page 7: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Features of REDCap

• Secure,web-basedapplica;onforresearchdatabasesandsurveys• Veryeasytouse• Audittrail• Userpermissioncontrols• Dataqualitymeasures• Dataexporttosta;s;calsoOware• GeneratesummaryreportandleQers

hQps://projectredcap.org/

Page 8: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

8

Building a REDCap database • AswithalldatacapturesoOware,dataentryformscanbedevelopedwithinREDCap• AREDCapdatabasecanalsobebuiltbyuploadinganexternaldatadic;onary

Page 9: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

metadatacsv.ado

Page 10: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Example using metadatacsv.ado

example.dta

dict_example.csv

Page 11: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Directory and file name

describe, replace local fullpath: char _dta[d_filename] mata: st_local("fullname", pathbasename("`fullpath'"))

local length=strpos("`fullname'",".")-1 local filestub=substr("`fullname'",1,`length')

Page 12: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Directory and file name

describe, replace local fullpath: char _dta[d_filename] •  di"`fullpath'"•  C:\Users\suzanna.vidmar\Documents\Suzanna\Metadata\example.dta mata: st_local("fullname", pathbasename("`fullpath'"))

local length=strpos("`fullname'",".")-1 local filestub=substr("`fullname'",1,`length')

Page 13: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Directory and file name

describe, replace local fullpath: char _dta[d_filename] •  di"`fullpath'"•  C:\Users\suzanna.vidmar\Documents\Suzanna\Metadata\example.dta mata: st_local("fullname", pathbasename("`fullpath'")) •  di"`fullname'"•  example.dta local length=strpos("`fullname'",".")-1 local filestub=substr("`fullname'",1,`length')

Page 14: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Directory and file name

describe, replace local fullpath: char _dta[d_filename] •  di"`fullpath'"•  C:\Users\suzanna.vidmar\Documents\Suzanna\Metadata\example.dta mata: st_local("fullname", pathbasename("`fullpath'")) •  di"`fullname'"•  example.dta local length=strpos("`fullname'",".")-1 •  di"`length'"•  7 local filestub=substr("`fullname'",1,`length')

Page 15: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Directory and file name

describe, replace local fullpath: char _dta[d_filename] •  di"`fullpath'"•  C:\Users\suzanna.vidmar\Documents\Suzanna\Metadata\example.dta mata: st_local("fullname", pathbasename("`fullpath'")) •  di"`fullname'"•  example.dta local length=strpos("`fullname'",".")-1 •  di"`length'"•  7 local filestub=substr("`fullname'",1,`length') •  di"`filestub'"•  example

Page 16: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Saving data dic8onary

export delimited "dict_`filestub'.csv", replace

Savesthedatafile: dict_example.csv

Page 17: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

describe, replace

• describeusuallyproducesawriQenreport• Whenthereplaceop;onisspecified,insteadofareportthedatainmemoryarereplacedwithdatasetcontainingtheinforma;onthatwouldhavebeenpresentedinthereport.Thenewdatasethasanobserva;onforeachvariableintheoriginaldata.

Page 18: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

describe

describe, replace

Page 19: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

uselabel

Createsadatasetcontainingvalue-labelinforma;on

Page 20: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Extrac8ng value label names

gen recnum=_n • recnumcontainsthenumberofthecurrentobserva;on levelsof lname, local(levels) `"coblab"'`"genderlab"'`"noyes"'• Thesearestoredinthelocalmacro`levels'

Page 21: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Crea8ng the contents of each value label

foreach x of local levels { local fullab qui su recnum if lname=="`x'" local j=r(min) local k=r(max) forval i=`j'/`k' { local val=value[`i'] local lab=label[`i'] local fullab `fullab' `val', `lab' | } local lenlab=strlen("`fullab'")-2 local fullab=substr("`fullab'",1,`lenlab') }

Page 22: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

foreach x of local levels { local fullab qui su recnum if lname=="`x'" local j=r(min) local k=r(max) forval i=`j'/`k' { local val=value[`i'] local lab=label[`i'] local fullab `fullab' `val', `lab' | } local lenlab=strlen("`fullab'")-2 local fullab=substr("`fullab'",1,`lenlab') }

Crea8ng the contents of each value label

Page 23: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Example with coblab

forval i=`j'/`k' { local val=value[`i']

local lab=label[`i']

local fullab `fullab' `val', `lab' |

}

`i'=1-1,Missing|

Page 24: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Example with coblab

forval i=`j'/`k' { local val=value[`i']

local lab=label[`i']

local fullab `fullab' `val', `lab' |

}

`i'=2-1,Missing|1,Australia|

Page 25: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Example with coblab

forval i=`j'/`k' { local val=value[`i']

local lab=label[`i']

local fullab `fullab' `val', `lab' |

}

`i'=3-1,Missing|1,Australia|2,UnitedKingdom|

Page 26: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Example with coblab

forval i=`j'/`k' { local val=value[`i']

local lab=label[`i']

local fullab `fullab' `val', `lab' |

}

`i'=4-1,Missing|1,Australia|2,UnitedKingdom|3,Vietnam|

Page 27: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Example with coblab

forval i=`j'/`k' { local val=value[`i']

local lab=label[`i']

local fullab `fullab' `val', `lab' |

}

`i'=5-1,Missing|1,Australia|2,UnitedKingdom|3,Vietnam|4,China|

Page 28: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Example with coblab

forval i=`j'/`k' { local val=value[`i']

local lab=label[`i']

local fullab `fullab' `val', `lab' |

}

`i'=6-1,Missing|1,Australia|2,UnitedKingdom|3,Vietnam|4,China|5,Singapore|

Page 29: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Example with coblab

forval i=`j'/`k' { local val=value[`i']

local lab=label[`i']

local fullab `fullab' `val', `lab' |

}

`i'=7-1,Missing|1,Australia|2,UnitedKingdom|3,Vietnam|4,China|5,Singapore|6,NewZealand|

Page 30: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Example with coblab foreach x of local levels {

forval i=`j'/`k' {

local val=value[`i']

local lab=label[`i']

local fullab `fullab' `val', `lab' |

}

local lenlab=strlen("`fullab'")-2

local fullab=substr("`fullab'",1,`lenlab')

}

-1,Missing|1,Australia|2,UnitedKingdom|3,Vietnam|4,China|5,Singapore|6,NewZealand|

-1,Missing|1,Australia|2,UnitedKingdom|3,Vietnam|4,China|5,Singapore|6,NewZealand

Page 31: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Allowing for extremely long strings

tempname mem file write `mem' "`x'" _tab "`fullab'" _newline • fileallowsforextremelylongstringvalues,upto2-billioncharacters• Withpostfilethelimitis2045characters

Page 32: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

OneweekaOersubmiingmyabstractforthismee;ng…

Page 33: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data
Page 34: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Beaten to the punch

SethLireQeetal

AlfredRusselWallace

Page 35: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

metadatacsv.ado

Page 36: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

The redcapture command

Page 37: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

redcapture syntax

redcapture varlist, file(string) form(string) [text(varlist) dropdown(varlist) radio(varlist) header(string) validate(varlist) validtype(validtypes) validmin(minlist)validmax(maxlist) matrix1(varlist) matrix2(varlist) matrix3(varlist) matrix4(varlist) matrix5(varlist) matrix6(varlist) matrix7(varlist) matrix8(varlist) matrix9(varlist) matrix10(varlist)]

Page 38: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

First, some background on

Page 39: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

REDCap field types

Page 40: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

REDCap valida8ons for text fields

Page 41: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Capturing categorical data in REDCap

Page 42: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Example Stata dataset

Page 43: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Example script

redcapture *, file(example) form(example_form) header(Example) /// text(id age sex bdate sbp dbp comment) /// dropdown(consented race) /// radio(happy1 happy2 happy3) /// validate(id bdate dbp comment) /// validtype(ssn date_ymd integer alpha_only) /// validmin(none 1/1/1900 20 none) /// validmax(none 12/31/2014 200 none) /// matrix1(happy1 happy2 happy3)

•  Metadataaresavedinexample.csv.Thisisthedatadic4onarythatwillbeuploadedtoREDCap.

•  Theform/instrumentnameinREDCapisexample_form•  Itsheaderis"Example"

Page 44: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

redcapture *, file(example) form(example_form) header(Example) /// text(id age sex bdate sbp dbp comment) /// dropdown(consented race) /// radio(happy1 happy2 happy3) /// validate(id bdate dbp comment) /// validtype(ssn date_ymd integer alpha_only) /// validmin(none 1/1/1900 20 none) /// validmax(none 12/31/2014 200 none) /// matrix1(happy1 happy2 happy3)

Forcategoricalvariables.TheymustbenumericwithvaluelabelsaEached.

Example script

Page 45: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

redcapture *, file(example) form(example_form) header(Example) /// text(id age sex bdate sbp dbp comment) /// dropdown(consented race) /// radio(happy1 happy2 happy3) /// validate(id bdate dbp comment) /// validtype(ssn date_ymd integer alpha_only) /// validmin(none 1/1/1900 20 none) /// validmax(none 12/31/2014 200 none) /// matrix1(happy1 happy2 happy3)

•  Thesearetextfields•  Allvariablesinthevalidate()op4onmustbedeclaredastextfields

Example script

Page 46: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

redcapture *, file(example) form(example_form) header(Example) /// text(id age sex bdate sbp dbp comment) /// dropdown(consented race) /// radio(happy1 happy2 happy3) /// validate(id bdate dbp comment) /// validtype(ssn date_ymd integer alpha_only) /// validmin(none 1/1/1900 20 none) /// validmax(none 12/31/2014 200 none) /// matrix1(happy1 happy2 happy3)

•  idisasocialsecuritynumber•  bdateisadatefieldinYMDformat•  dbpisaninteger•  commentisastring

Example script

Page 47: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

redcapture *, file(example) form(example_form) header(Example) /// text(id age sex bdate sbp dbp comment) /// dropdown(consented race) /// radio(happy1 happy2 happy3) /// validate(id bdate dbp comment) /// validtype(ssn date_ymd integer alpha_only) /// validmin(none 1/1/1900 20 none) /// validmax(none 12/31/2014 200 none) /// matrix1(happy1 happy2 happy3)

•  Toomitrangechecksforanyorallofthevalida4onvariables,"none"shouldbeenteredintothecorrespondingloca4on

•  ThesearesoMchecks

Example script

Page 48: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

redcapture *, file(example) form(example_form) header(Example) /// text(id age sex bdate sbp dbp comment) /// dropdown(consented race) /// radio(happy1 happy2 happy3) /// validate(id bdate dbp comment) /// validtype(ssn date_ymd integer alpha_only) /// validmin(none 1/1/1900 20 none) /// validmax(none 12/31/2014 200 none) /// matrix1(happy1 happy2 happy3)

•  Radiofieldswithacommonsetofresponseop4onscanbegroupedinamatrix

•  Seenextslide

Example script

Page 49: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Matrix of fields

Page 50: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

Data dic8onary

Theredcapturecommandcreatedthisdatadic;onary

…whichcanbeuploadedintoREDCap

Page 51: Extracting Metadata from Stata Datasets · 2017-10-11 · Metadata • Metadata is data that describes other data • My focus is on variable-level meta data, also known as a data

In conclusion …

1.  Ensuredatawillberetrievable10or20yearsfromnow2.  Ensurethenextgenera;onofresearcherswillbeable

tounderstandcurrentlyarchiveddataHow?BystoringbothdataandmetadataintextfilesStata'sexport delimited andredcapture commandsfacilitatesthisDataandmetadatacanbeuploadedtodatacapturesoOwaresuchasREDCap