23
Data Quality Control Data Quality Control by Naila Baig Ansari Research Fellow Dept of Community Health Sciences The Aga Khan University Karachi, Pakistan

Data Quality Control by Naila Baig Ansari Research Fellow Dept of Community Health Sciences The Aga Khan University Karachi, Pakistan

Embed Size (px)

Citation preview

Data Quality ControlData Quality Control

by

Naila Baig Ansari

Research Fellow

Dept of Community Health Sciences

The Aga Khan University

Karachi, Pakistan

Who am I?Who am I?Education:MSc (Epidemiology),

The Aga Khan University, 2001. Thesis: Care and feeding practices and their association with stunting among young children residing in Karachi-s squatter settlements

BBA (Management), The College of William and Mary, Williamsburg, VA, USA, 1989

Research interest: Nutritional and behavioral epidemiology, methodological issues in dietary assessment methods, household food security and gender-related issues, care and feeding practices, management of data and questionnaire designing

Learning ObjectivesLearning Objectives

To know the steps necessary for ensuring quality assurance and control of data at various stages of a study

To understand the difference between pilot testing and pre-testing

To understand the importance of designing data collection instruments

To understand how data can be managed using an audit trail and the various techniques that can be used to inspect your dataset after it has been entered

Performance ObjectivesPerformance Objectives

Know the difference between quality assurance and quality control and ways to ensure them

Know the objectives of a pilot test and a pre-test

Understand how data collection instruments should be designed and coded

Be able to manage data using an audit trail

Be able to inspect datasets for errors and rectify them

Data Quality ControlData Quality Control

Quality Assurance– Activities to ensure

quality of data before data collection

Quality Control– Monitoring and

maintaining the quality of data during the conduct of the study

• Data Management

– Handling and processing of data throughout the study

Steps in Quality AssuranceSteps in Quality Assurance

1. Specify the study hypothesis

2. Specify general design to test study hypothesis Develop an overall study protocol

3. Choose or prepare specific instruments

4. Develop procedures for data collection and processing Develop operation manuals

5. Train staff Certify staff

6. User certified staff, pretest and pilot-study data collection and processing instruments and procedures

Quality Assurance: Standardization of Quality Assurance: Standardization of proceduresprocedures

Why is standardization important?– In order to achieve highest possible level of uniformity

and standardization of data collection procedures in the entire study population

Preparation of written manual of operations– Detailed descriptions of exactly how the procedures

specific to each data collection instrument are to be carried out (BP example)

– Q by Q’s (question by question) instructions for interviews

Quality Assurance: Training of StaffQuality Assurance: Training of Staff

Aim to make each staff person thoroughly familiar with procedures under his/her responsibility

Training certification of the staff member to perform a specific procedure

Quality Assurance: Pretesting and Pilot Quality Assurance: Pretesting and Pilot testingtesting

Pretesting– Involves assessing

specific procedures on a sample in order to detect major flaws

Pilot Testing– Formal rehearsal of

study procedures

– Attempts to reproduce the whole flow of operations in a sample as similar as possible to study participants

Pretesting and Pilot testing resultsPretesting and Pilot testing results

Pretesting of questionnaire used to assess:

– flow of questions,

– presence of sensitive questions,

– appropriateness of categorization of variables,

– clarity of the q by q instructions to the interviewer

Pilot testing

– In addition to the above, flow of process

Quality Assurance: Data ManagementQuality Assurance: Data Management

Designing data collection– Layout, questions to ask, sequence of questions,

phrasing of questions, response categories, skip patterns

– Collect and record “raw”, not processed information (eg. Age)

– Codebook: link between the questionnaire and the data entered in the computer

Code book exampleCode book exampleVariable QNo Meaning Codes Format

Q1Id Q1 Quest. No 1-750 C 3

Q2Sex Q2 Respondent’s sex 1 male

2 female

N 1.0

Q3Child Q3 No of children 99 no response N 2.0

Q4Wt Q4 Weight in kg 999 not recorded N 3.1

Q5roof Q5 Roof type 1 RCC

2 Cement sheet

3 Tin sheet

4 Thatched

Other (specify)

N 2.0

Quality Assurance: Use of a Code bookQuality Assurance: Use of a Code book

Variable names

– Up to 8 characters a-z and 0-9, must start with a letter

– Combination of question number and description (eg. q3age)

Meaning:

– short text description describing the meaning of the variable

– SPSS software can incorporate this info as variable labels and display it in the output

Quality Assurance: Use of a Code bookQuality Assurance: Use of a Code book

Codes

– Try and use numerical codes

Predecide codes for no response, missing values

– Question could not be asked or not applicable (eg. pregnancy outcome)

– Question was asked but respondent did not reply (eg salary)

– Respondent replied “don’t know”

Quality ControlQuality Control

Observation of procedures and performance of staff members for identification of obvious protocol deviations

Strategies include:

– Over-the-shoulder observation of staff

– Taping all interviews and reviewing a random sample

– Ongoing field supervision

– field editing by interviewer as well as field supervisor

– Office editing which includes coding

– log book maintenance

– Statistical assessment of trends over time in the performance of each observer/interviewer/technician

Data Management: Audit trailData Management: Audit trail

Researcher should be able to trace each piece of information back to the original document:

– ID included in the original documents and in the dataset

– All corrections must be documented and explained

– All modifications to the dataset must be documented by command files

– Each analysis must be documented by a command file

Purpose of audit is to

– protect yourself against mistakes, errors, waste of time and loss of information

– enable external audit (revision)

Data Management: Handling of DataData Management: Handling of Data

Entering data

– Use professional data entry program like EpiData

Preparations

– complete codebook

– examine questionnaires for obvious inconsistencies, skip patterns

Data Management: Handling of DataData Management: Handling of Data

Error prevention:

– Set up a data entry form resembling your questionnaire

– Define valid values before entering data

– double data entry by two different operators compare contents to get list of discrepancies (

EpiInfo)

correct errors in both files and run new comparison

First Inspection of data. Error FindingFirst Inspection of data. Error Finding

Add variable and value labels to your data using a syntax command

Searching for errors

– make printouts of codebook from the data, overview of variables, simple frequency tables of appropriate variables

– compare codebook created with original codebook and see if label information is correct

– Inspect the generated summary/frequency tables for illegal or improbable minimum and maximum values of variables and inconsistencies (eg. 250 years age, pregnant male; 23 yr woman with 19 yr son)

Calculate the error rate by

– randomly select 10% or at least 40 of your questionnaires and re-enter them into new file

Correction of errors - DocumentationCorrection of errors - Documentation

If errors are discovered

– Make corrections in a command file (SPSS syntax file), this will provide full documentation of changes made to the dataset

If errors are discovered when comparing files after double data entry

– you can make corrections directly in the data entered, provided you end this step with a comparison of the two files entered and corrected

Correction of errors - DocumentationCorrection of errors - Documentation

Split the process into distinct and well-defined steps and that your documentation from one step to another is consistent

Archive

– once you have a “clean” documented version of your primary data, save one copy in a safe place and do your work with another copy

AnalysisAnalysis

Make sure you use the right data set

– recommend to create command files for analysis which start with the command reading the dataset

Late discovery of errors and inconsistencies

Backing up vs ArchivingBacking up vs Archiving

Backing up

– everyday activity

– purpose to able you to restore your data and documents in case of destruction or loss of data

– not only datasets, but also command files modifying your data, written documents such as the protocol, log book and other documenting information

Archiving

– takes place once or a few times during the life of the project

– purpose is to preserve your data and documents for a more distant future, maybe to even allow other researchers access to the information.