View
229
Download
3
Category
Tags:
Preview:
Citation preview
DATA MANAGEMENTDATA MANAGEMENT
Using EpiData and SPSSUsing EpiData and SPSS
ReferencesReferences
Public domain (pdf) book on data management: Public domain (pdf) book on data management: Bennett, et al. (2001). Bennett, et al. (2001). Data Management for Data Management for Surveys and Trials. A Practical Primer Using Surveys and Trials. A Practical Primer Using EpiDataEpiData. The EpiData Documentation Project. : . The EpiData Documentation Project. : http://www.epidata.dk/downloads/dmepidata.pdfhttp://www.epidata.dk/downloads/dmepidata.pdf
EpiData Association Website: EpiData Association Website: http://www.epidata.dk/http://www.epidata.dk/
Importing raw data into SPSS: Importing raw data into SPSS: http://www.ats.ucla.edu/stat/spss/modules/input.hthttp://www.ats.ucla.edu/stat/spss/modules/input.htmm
Data ManagementData Management• Planning data needsPlanning data needs• Data collectionData collection• Data entry and controlData entry and control• Validation and checkingValidation and checking• Data cleaning and variable transformationData cleaning and variable transformation• Data backup and storageData backup and storage• System documentationSystem documentation• OtherOther
Types of Data Base Types of Data Base Management Systems Management Systems
(DBMSs)(DBMSs)• Spreadsheets (e.g., Excel, SPSS Data Editor)Spreadsheets (e.g., Excel, SPSS Data Editor)
• Prone to error, data corruption, & mismanagementProne to error, data corruption, & mismanagement• Lack data controls, limited programmabilityLack data controls, limited programmability• Suitable only for small and didactic projects Suitable only for small and didactic projects • Also good for last step data cleaningAlso good for last step data cleaning
• Commercial DBMS programs (e.g., Oracle, Access)Commercial DBMS programs (e.g., Oracle, Access)• Limited data control, good programmabilityLimited data control, good programmability• Slow & expensiveSlow & expensive• Powerful and widely availablePowerful and widely available
• Public domain programs (e.g., EpiData, Epi Info)Public domain programs (e.g., EpiData, Epi Info)• Controlled data entry, good programmabilityControlled data entry, good programmability• Suitable for research and field useSuitable for research and field use
We will use two We will use two platforms:platforms:
• EpiDataEpiData • controlled data entry controlled data entry • data documentationdata documentation• export (“write”) data export (“write”) data
• SPSSSPSS • import (“read”) dataimport (“read”) data• analysis analysis • reportingreporting
What is EpiData ? What is EpiData ? • EpiData is computer program (small in size EpiData is computer program (small in size
1.2Mb) for simple or programmed data entry 1.2Mb) for simple or programmed data entry and data documentationand data documentation
• It is highly reliable It is highly reliable • It runs on Windows computers It runs on Windows computers
• Runs on Macs and Linus with emulator software Runs on Macs and Linus with emulator software (only)(only)
• InterfaceInterface• pull down menus pull down menus • work barwork bar
History of EpiInfo & EpiData History of EpiInfo & EpiData
• 1976–1995: EpiInfo (DOS program) created by 1976–1995: EpiInfo (DOS program) created by CDC (in wake of swine flu epidemic)CDC (in wake of swine flu epidemic)• Small, fast, reliable, 100,000+ users worldwideSmall, fast, reliable, 100,000+ users worldwide
• 1995–2000: DOS dies slow painful death1995–2000: DOS dies slow painful death• 2000: CDC releases EpiInfo20002000: CDC releases EpiInfo2000
• Based on Microsoft Jet (Access) data engineBased on Microsoft Jet (Access) data engine• Large, slow, unreliable (resembled EpiInfo in name Large, slow, unreliable (resembled EpiInfo in name
only)only)
• 2001: Loyal EpiInfo user group decides it needs 2001: Loyal EpiInfo user group decides it needs real “EpiInfo for Windows”real “EpiInfo for Windows”• Creates open source public domain program Creates open source public domain program • Calls program “EpiData” Calls program “EpiData”
Goal: Create & Maintain Goal: Create & Maintain Error-Free DatasetsError-Free Datasets
• Two types of data errorsTwo types of data errors• Measurement error (i.e., information bias) – Measurement error (i.e., information bias) –
discussed last couple of weeksdiscussed last couple of weeks• Processing errors = errors that occur during Processing errors = errors that occur during
data handling – discussed this weekdata handling – discussed this week
• Examples of data processing errorsExamples of data processing errors• Transpositions (91 instead of 19)Transpositions (91 instead of 19)• Copying errors (O instead of 0)Copying errors (O instead of 0)• Additional processing errors described on p. Additional processing errors described on p.
18.2 18.2
Avoiding Data Processing Avoiding Data Processing ErrorsErrors
• Manual checks (e.g., handwriting Manual checks (e.g., handwriting legibility)legibility)
• Range and consistency checks* (e.g., do Range and consistency checks* (e.g., do not allow hysterectomy dates for men)not allow hysterectomy dates for men)
• Double entry and validation* Double entry and validation* • Operator 1 enters dataOperator 1 enters data• Operator 2 enters data in separate fileOperator 2 enters data in separate file• Check files for inconsistenciesCheck files for inconsistencies
• Screening during analysis (e.g., look for Screening during analysis (e.g., look for outliers)outliers)
* covered in lab
Controlled Data EntryControlled Data Entry• Criteria for accepting & rejecting dataCriteria for accepting & rejecting data• Types of data controlsTypes of data controls
• Range checks (e.g., restrict Range checks (e.g., restrict AGEAGE to to reasonable range)reasonable range)
• Value labels (e.g., Value labels (e.g., SEXSEX:: 1 = male, 2 = female1 = male, 2 = female))• Jumps (e.g., if “male,” jump to Q8)Jumps (e.g., if “male,” jump to Q8)• Consistency checks (e.g., if “sex = male,” Consistency checks (e.g., if “sex = male,”
do not allow “hysterectomy = yes”)do not allow “hysterectomy = yes”)• Must entersMust enters• etc.etc.
Data Processing StepsData Processing Steps1.1. File naming conventionsFile naming conventions2.2. Variables types and namesVariables types and names3.3. QES (questionnaire) developmentQES (questionnaire) development4.4. Convert .QES file to .REC (record) file Convert .QES file to .REC (record) file 5.5. Add .CHK file Add .CHK file 6.6. Enter data in REC fileEnter data in REC file7.7. Validate data (double entry procedure)Validate data (double entry procedure)8.8. Documentation data (code book) Documentation data (code book) 9.9. Export data to SPSS Export data to SPSS 10.10. Import data into SPSSImport data into SPSS
Filenaming and File Filenaming and File ManagementManagement
• c:\path\filename.extc:\path\filename.ext• A web address is a good example of a filename, e.g., A web address is a good example of a filename, e.g.,
http://www2.sjsu.edu/faculty/gerstman/StatPrimer/data.ppthttp://www2.sjsu.edu/faculty/gerstman/StatPrimer/data.ppt
• Some systems are case sensitive (Unix)Some systems are case sensitive (Unix)• Others are not (Windows) Others are not (Windows)
• Always be aware ofAlways be aware of• Physical locationPhysical location (local, removable, network)(local, removable, network)• PathPath (folders and subfolders) (folders and subfolders)• FilenameFilename (proper) (proper) • ExtensionExtension
• Demo Demo Windows Network ExplorerWindows Network Explorer: right-click Start : right-click Start Bar > ExploreBar > Explore
File extensions you should File extensions you should knowknow
ExtensionExtension Software programSoftware program
.qes.qes EpiInfo/EpiData questionnaireEpiInfo/EpiData questionnaire
.rec.rec EpiInfo/EpiData records (data)EpiInfo/EpiData records (data)
.chk.chk EpiInfo/EpiData check (controls & labels)EpiInfo/EpiData check (controls & labels)
.not.not EpiData notes (data documentation)EpiData notes (data documentation)
.sav.sav SPSS permanent data fileSPSS permanent data file
.sps.sps SPSS syntax file (program)SPSS syntax file (program)
.txt.txt Generic (flat) text dataGeneric (flat) text data
.htm.htm Web BrowserWeb Browser
.doc.doc Microsoft WordMicrosoft Word
.xls.xls Microsoft ExcelMicrosoft Excel
Selected EpiData Selected EpiData Variable TypesVariable Types
Variable TypeVariable Type ExamplesExamples
TextText _ _ <A ><A >
NumericNumeric ####.###.#
DateDate <mm/dd/yyyy><mm/dd/yyyy><dd/mm/yyyy><dd/mm/yyyy>
Auto IDAuto ID <IDNUM><IDNUM>
Sondex (sanitized)Sondex (sanitized) <S ><S >
EpiData Variable EpiData Variable NamesNames
• Variable nameVariable name based on text that occurs based on text that occurs before variable type indicator codebefore variable type indicator code
• EpiData variable naming default vary EpiData variable naming default vary depending on installation depending on installation
• Create variable names exactly as specifiedCreate variable names exactly as specifiedTo be safe, denote variable names in {curly To be safe, denote variable names in {curly
brackets}brackets}
• For example, to create a two byte numeric For example, to create a two byte numeric variable called age, use the question:variable called age, use the question:
What is your {age}? ##
Demo / Work AlongDemo / Work Along• Create QES file [demo.qes]Create QES file [demo.qes]• Convert QES to REC [demo.rec]Convert QES to REC [demo.rec]• Create CHK file [demo.chk]Create CHK file [demo.chk]• Create double entry file [demo2.rec]Create double entry file [demo2.rec]• Enter data Enter data • Validate dataValidate data
FnameFname LnameLname DOBDOB SEXSEX DEATHAGEDEATHAGE
JohnJohn SnowSnow 3/15/18133/15/1813 11 4545
GeorgeGeorge OrwellOrwell 6/25/19036/25/1903 11 4646
We We willwill stop here and stop here and pick up the second part pick up the second part
of the lecture next of the lecture next weekweek
““Stay tuned”Stay tuned”
CodebooksCodebooks
• Contain info that helps users decipher Contain info that helps users decipher data file content and structuredata file content and structure
• Includes:Includes:• Filename(s)Filename(s)• File location(s)File location(s)• Variable namesVariable names• Coding schemesCoding schemes• Units Units • Anything else you think might be usefulAnything else you think might be useful
EpiData codebook EpiData codebook generatorsgenerators
File Structure File Structure CodebookCodebook
Full codebook contains descriptive statistics (demo)
Full CodebookFull Codebook
Notice descriptive statistics
Conversion of Data Conversion of Data FileFile
• Requires common intermediate file Requires common intermediate file formatformat
• Examples of common intermediate filesExamples of common intermediate files• .TXT = plain text .TXT = plain text • .DBF = dBase program.DBF = dBase program• .XLS = Excel.XLS = Excel
• StepsSteps• Export .REC file Export .REC file .TXT file .TXT file• Import .TXT file into SPSS Import .TXT file into SPSS • Save permanent SAV fileSave permanent SAV file
Current Export Formats Current Export Formats Supported by EpiDataSupported by EpiData
Plain (“raw”) TXT dataPlain (“raw”) TXT data
• plain ASCII data formatplain ASCII data format• no column demarcationsno column demarcations• no variable namesno variable names• no labelsno labels
TXT file with codebook TXT file with codebook tox-samp.txttox-samp.txt tox-samp.nottox-samp.not
SPSS Data Export / SPSS Data Export / ImportImport
TXT(raw data)
REC
SPS(syntax)
SAV
Top of tox-samp.spsTop of tox-samp.sps
Lines beginning with * are comments (ignored by command interpreter)
Next set of commands showfile location and structure via SPSS command syntax
Bottom part of tox-Bottom part of tox-samp.sps filesamp.sps file
Labels being importedinto SPSS
Delete * if you want this command to run
Opening the SPS (command) Opening the SPS (command) filefile
Running the SPS fileRunning the SPS file
Ethics of Data Ethics of Data KeepingKeeping
• Confidentiality (sanitized files – Confidentiality (sanitized files – free of identifiers)free of identifiers)
• Beneficence Beneficence • EquipoiseEquipoise• Informed consent (To what Informed consent (To what
extent?)extent?)• Oversight (IRB)Oversight (IRB)
Recommended