31
DATA MANAGEMENT DATA MANAGEMENT Using EpiData and SPSS Using EpiData and SPSS

DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

Embed Size (px)

Citation preview

Page 1: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

DATA MANAGEMENTDATA MANAGEMENT

Using EpiData and SPSSUsing EpiData and SPSS

Page 2: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

ReferencesReferences

Public domain (pdf) book on data management: Public domain (pdf) book on data management: Bennett, et al. (2001). Bennett, et al. (2001). Data Management for Data Management for Surveys and Trials. A Practical Primer Using Surveys and Trials. A Practical Primer Using EpiDataEpiData. The EpiData Documentation Project. : . The EpiData Documentation Project. : http://www.epidata.dk/downloads/dmepidata.pdfhttp://www.epidata.dk/downloads/dmepidata.pdf

EpiData Association Website: EpiData Association Website: http://www.epidata.dk/http://www.epidata.dk/

Importing raw data into SPSS: Importing raw data into SPSS: http://www.ats.ucla.edu/stat/spss/modules/input.hthttp://www.ats.ucla.edu/stat/spss/modules/input.htmm

Page 3: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

Data ManagementData Management• Planning data needsPlanning data needs• Data collectionData collection• Data entry and controlData entry and control• Validation and checkingValidation and checking• Data cleaning and variable transformationData cleaning and variable transformation• Data backup and storageData backup and storage• System documentationSystem documentation• OtherOther

Page 4: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

Types of Data Base Types of Data Base Management Systems Management Systems

(DBMSs)(DBMSs)• Spreadsheets (e.g., Excel, SPSS Data Editor)Spreadsheets (e.g., Excel, SPSS Data Editor)

• Prone to error, data corruption, & mismanagementProne to error, data corruption, & mismanagement• Lack data controls, limited programmabilityLack data controls, limited programmability• Suitable only for small and didactic projects Suitable only for small and didactic projects • Also good for last step data cleaningAlso good for last step data cleaning

• Commercial DBMS programs (e.g., Oracle, Access)Commercial DBMS programs (e.g., Oracle, Access)• Limited data control, good programmabilityLimited data control, good programmability• Slow & expensiveSlow & expensive• Powerful and widely availablePowerful and widely available

• Public domain programs (e.g., EpiData, Epi Info)Public domain programs (e.g., EpiData, Epi Info)• Controlled data entry, good programmabilityControlled data entry, good programmability• Suitable for research and field useSuitable for research and field use

Page 5: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

We will use two We will use two platforms:platforms:

• EpiDataEpiData • controlled data entry controlled data entry • data documentationdata documentation• export (“write”) data export (“write”) data

• SPSSSPSS • import (“read”) dataimport (“read”) data• analysis analysis • reportingreporting

Page 6: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

What is EpiData ? What is EpiData ? • EpiData is computer program (small in size EpiData is computer program (small in size

1.2Mb) for simple or programmed data entry 1.2Mb) for simple or programmed data entry and data documentationand data documentation

• It is highly reliable It is highly reliable • It runs on Windows computers It runs on Windows computers

• Runs on Macs and Linus with emulator software Runs on Macs and Linus with emulator software (only)(only)

• InterfaceInterface• pull down menus pull down menus • work barwork bar

Page 7: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

History of EpiInfo & EpiData History of EpiInfo & EpiData

• 1976–1995: EpiInfo (DOS program) created by 1976–1995: EpiInfo (DOS program) created by CDC (in wake of swine flu epidemic)CDC (in wake of swine flu epidemic)• Small, fast, reliable, 100,000+ users worldwideSmall, fast, reliable, 100,000+ users worldwide

• 1995–2000: DOS dies slow painful death1995–2000: DOS dies slow painful death• 2000: CDC releases EpiInfo20002000: CDC releases EpiInfo2000

• Based on Microsoft Jet (Access) data engineBased on Microsoft Jet (Access) data engine• Large, slow, unreliable (resembled EpiInfo in name Large, slow, unreliable (resembled EpiInfo in name

only)only)

• 2001: Loyal EpiInfo user group decides it needs 2001: Loyal EpiInfo user group decides it needs real “EpiInfo for Windows”real “EpiInfo for Windows”• Creates open source public domain program Creates open source public domain program • Calls program “EpiData” Calls program “EpiData”

Page 8: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

Goal: Create & Maintain Goal: Create & Maintain Error-Free DatasetsError-Free Datasets

• Two types of data errorsTwo types of data errors• Measurement error (i.e., information bias) – Measurement error (i.e., information bias) –

discussed last couple of weeksdiscussed last couple of weeks• Processing errors = errors that occur during Processing errors = errors that occur during

data handling – discussed this weekdata handling – discussed this week

• Examples of data processing errorsExamples of data processing errors• Transpositions (91 instead of 19)Transpositions (91 instead of 19)• Copying errors (O instead of 0)Copying errors (O instead of 0)• Additional processing errors described on p. Additional processing errors described on p.

18.2 18.2

Page 9: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

Avoiding Data Processing Avoiding Data Processing ErrorsErrors

• Manual checks (e.g., handwriting Manual checks (e.g., handwriting legibility)legibility)

• Range and consistency checks* (e.g., do Range and consistency checks* (e.g., do not allow hysterectomy dates for men)not allow hysterectomy dates for men)

• Double entry and validation* Double entry and validation* • Operator 1 enters dataOperator 1 enters data• Operator 2 enters data in separate fileOperator 2 enters data in separate file• Check files for inconsistenciesCheck files for inconsistencies

• Screening during analysis (e.g., look for Screening during analysis (e.g., look for outliers)outliers)

* covered in lab

Page 10: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

Controlled Data EntryControlled Data Entry• Criteria for accepting & rejecting dataCriteria for accepting & rejecting data• Types of data controlsTypes of data controls

• Range checks (e.g., restrict Range checks (e.g., restrict AGEAGE to to reasonable range)reasonable range)

• Value labels (e.g., Value labels (e.g., SEXSEX:: 1 = male, 2 = female1 = male, 2 = female))• Jumps (e.g., if “male,” jump to Q8)Jumps (e.g., if “male,” jump to Q8)• Consistency checks (e.g., if “sex = male,” Consistency checks (e.g., if “sex = male,”

do not allow “hysterectomy = yes”)do not allow “hysterectomy = yes”)• Must entersMust enters• etc.etc.

Page 11: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

Data Processing StepsData Processing Steps1.1. File naming conventionsFile naming conventions2.2. Variables types and namesVariables types and names3.3. QES (questionnaire) developmentQES (questionnaire) development4.4. Convert .QES file to .REC (record) file Convert .QES file to .REC (record) file 5.5. Add .CHK file Add .CHK file 6.6. Enter data in REC fileEnter data in REC file7.7. Validate data (double entry procedure)Validate data (double entry procedure)8.8. Documentation data (code book) Documentation data (code book) 9.9. Export data to SPSS Export data to SPSS 10.10. Import data into SPSSImport data into SPSS

Page 12: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

Filenaming and File Filenaming and File ManagementManagement

• c:\path\filename.extc:\path\filename.ext• A web address is a good example of a filename, e.g., A web address is a good example of a filename, e.g.,

http://www2.sjsu.edu/faculty/gerstman/StatPrimer/data.ppthttp://www2.sjsu.edu/faculty/gerstman/StatPrimer/data.ppt

• Some systems are case sensitive (Unix)Some systems are case sensitive (Unix)• Others are not (Windows) Others are not (Windows)

• Always be aware ofAlways be aware of• Physical locationPhysical location (local, removable, network)(local, removable, network)• PathPath (folders and subfolders) (folders and subfolders)• FilenameFilename (proper) (proper) • ExtensionExtension

• Demo Demo Windows Network ExplorerWindows Network Explorer: right-click Start : right-click Start Bar > ExploreBar > Explore

Page 13: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

File extensions you should File extensions you should knowknow

ExtensionExtension Software programSoftware program

.qes.qes EpiInfo/EpiData questionnaireEpiInfo/EpiData questionnaire

.rec.rec EpiInfo/EpiData records (data)EpiInfo/EpiData records (data)

.chk.chk EpiInfo/EpiData check (controls & labels)EpiInfo/EpiData check (controls & labels)

.not.not EpiData notes (data documentation)EpiData notes (data documentation)

.sav.sav SPSS permanent data fileSPSS permanent data file

.sps.sps SPSS syntax file (program)SPSS syntax file (program)

.txt.txt Generic (flat) text dataGeneric (flat) text data

.htm.htm Web BrowserWeb Browser

.doc.doc Microsoft WordMicrosoft Word

.xls.xls Microsoft ExcelMicrosoft Excel

Page 14: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

Selected EpiData Selected EpiData Variable TypesVariable Types

Variable TypeVariable Type ExamplesExamples

TextText _ _ <A ><A >

NumericNumeric ####.###.#

DateDate <mm/dd/yyyy><mm/dd/yyyy><dd/mm/yyyy><dd/mm/yyyy>

Auto IDAuto ID <IDNUM><IDNUM>

Sondex (sanitized)Sondex (sanitized) <S ><S >

Page 15: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

EpiData Variable EpiData Variable NamesNames

• Variable nameVariable name based on text that occurs based on text that occurs before variable type indicator codebefore variable type indicator code

• EpiData variable naming default vary EpiData variable naming default vary depending on installation depending on installation

• Create variable names exactly as specifiedCreate variable names exactly as specifiedTo be safe, denote variable names in {curly To be safe, denote variable names in {curly

brackets}brackets}

• For example, to create a two byte numeric For example, to create a two byte numeric variable called age, use the question:variable called age, use the question:

What is your {age}? ##

Page 16: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

Demo / Work AlongDemo / Work Along• Create QES file [demo.qes]Create QES file [demo.qes]• Convert QES to REC [demo.rec]Convert QES to REC [demo.rec]• Create CHK file [demo.chk]Create CHK file [demo.chk]• Create double entry file [demo2.rec]Create double entry file [demo2.rec]• Enter data Enter data • Validate dataValidate data

FnameFname LnameLname DOBDOB SEXSEX DEATHAGEDEATHAGE

JohnJohn SnowSnow 3/15/18133/15/1813 11 4545

GeorgeGeorge OrwellOrwell 6/25/19036/25/1903 11 4646

Page 17: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

We We willwill stop here and stop here and pick up the second part pick up the second part

of the lecture next of the lecture next weekweek

““Stay tuned”Stay tuned”

Page 18: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

CodebooksCodebooks

• Contain info that helps users decipher Contain info that helps users decipher data file content and structuredata file content and structure

• Includes:Includes:• Filename(s)Filename(s)• File location(s)File location(s)• Variable namesVariable names• Coding schemesCoding schemes• Units Units • Anything else you think might be usefulAnything else you think might be useful

Page 19: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

EpiData codebook EpiData codebook generatorsgenerators

Page 20: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

File Structure File Structure CodebookCodebook

Full codebook contains descriptive statistics (demo)

Page 21: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

Full CodebookFull Codebook

Notice descriptive statistics

Page 22: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

Conversion of Data Conversion of Data FileFile

• Requires common intermediate file Requires common intermediate file formatformat

• Examples of common intermediate filesExamples of common intermediate files• .TXT = plain text .TXT = plain text • .DBF = dBase program.DBF = dBase program• .XLS = Excel.XLS = Excel

• StepsSteps• Export .REC file Export .REC file .TXT file .TXT file• Import .TXT file into SPSS Import .TXT file into SPSS • Save permanent SAV fileSave permanent SAV file

Page 23: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

Current Export Formats Current Export Formats Supported by EpiDataSupported by EpiData

Page 24: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

Plain (“raw”) TXT dataPlain (“raw”) TXT data

• plain ASCII data formatplain ASCII data format• no column demarcationsno column demarcations• no variable namesno variable names• no labelsno labels

Page 25: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

TXT file with codebook TXT file with codebook tox-samp.txttox-samp.txt tox-samp.nottox-samp.not

Page 26: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

SPSS Data Export / SPSS Data Export / ImportImport

TXT(raw data)

REC

SPS(syntax)

SAV

Page 27: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

Top of tox-samp.spsTop of tox-samp.sps

Lines beginning with * are comments (ignored by command interpreter)

Next set of commands showfile location and structure via SPSS command syntax

Page 28: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

Bottom part of tox-Bottom part of tox-samp.sps filesamp.sps file

Labels being importedinto SPSS

Delete * if you want this command to run

Page 29: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

Opening the SPS (command) Opening the SPS (command) filefile

Page 30: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

Running the SPS fileRunning the SPS file

Page 31: DATA MANAGEMENT Using EpiData and SPSS References Public domain (pdf) book on data management: Bennett, et al. (2001). Data Management for Surveys and

Ethics of Data Ethics of Data KeepingKeeping

• Confidentiality (sanitized files – Confidentiality (sanitized files – free of identifiers)free of identifiers)

• Beneficence Beneficence • EquipoiseEquipoise• Informed consent (To what Informed consent (To what

extent?)extent?)• Oversight (IRB)Oversight (IRB)