29
SADC Course in Statistics Types and Sources of Errors in Statistical Data

SADC Course in Statistics Types and Sources of Errors in Statistical Data

Embed Size (px)

Citation preview

Page 1: SADC Course in Statistics Types and Sources of Errors in Statistical Data

SADC Course in Statistics

Types and Sources of Errors in Statistical Data

Page 2: SADC Course in Statistics Types and Sources of Errors in Statistical Data

2To put your footer here go to View > Header and Footer

Types of Errors

• In general, there are two types of errors:

a. non-sampling errors and

b. sampling errors.

• It is important for a researcher to be aware of these errors, in particular non-sampling errors, so that they can be either minimised or eliminated

from the data collected.

Page 3: SADC Course in Statistics Types and Sources of Errors in Statistical Data

3To put your footer here go to View > Header and Footer

Non-sampling errors

– These are errors that arise during the course of all data collection activities.

– In summary, they have the following characteristics:• exist in both sample surveys and censuses

data.• difficult to measure .

Page 4: SADC Course in Statistics Types and Sources of Errors in Statistical Data

4To put your footer here go to View > Header and Footer

Sources of non-sampling errors

Non-sampling errors arise from:

• defects in the sampling frame.

• failure to identify the target population.

• non response.

• responses given by respondents.

• data processing and

• reporting, among others.

Page 5: SADC Course in Statistics Types and Sources of Errors in Statistical Data

5To put your footer here go to View > Header and Footer

Defects in the sampling frame • This result in coverage errors.• These occur when there is an omission, duplication

or wrongful inclusion of units in the sampling frame.

• Omissions are referred to as ‘under coverage’ while duplications and wrongful inclusions are called ‘over coverage’.

• These errors are caused by defects such as inaccuracy, incompleteness, duplication, inadequacy and out of date sampling frames.

• Coverage errors may also occur in field operations, that is, when an enumerator misses several households or persons during the interviewing process.

Page 6: SADC Course in Statistics Types and Sources of Errors in Statistical Data

6To put your footer here go to View > Header and Footer

Failure to Identify Target Population

• This occurs when the target population is not clearly defined through the use of imprecise definitions or concepts or when the survey population does not reflect the target population due to an inadequate sampling frame and poor

coverage rules.

Page 7: SADC Course in Statistics Types and Sources of Errors in Statistical Data

7To put your footer here go to View > Header and Footer

Response • They result from the data that have been

requested, provided, received or recorded incorrectly.

• They may occur as a result of inefficiencies with the questionnaire, the interviewer, the respondent or the survey process.

Page 8: SADC Course in Statistics Types and Sources of Errors in Statistical Data

8To put your footer here go to View > Header and Footer

a. Poor questionnaire design

• The content and wording of the questionnaire may be misleading and the layout of the questionnaire may make it difficult to accurately record responses.

• As a rule, questions in questionnaire should not be loaded, double-barrelled, misleading or ambiguous, and should be directly relevant to the objectives of the survey.

• It is essential to pilot test questionnaires to identify questionnaire flow and question wording problems, and allow sufficient time for improvements to be made to the questionnaire.

Page 9: SADC Course in Statistics Types and Sources of Errors in Statistical Data

9To put your footer here go to View > Header and Footer

Poor questionnaire design – cont’d

• The questionnaire should then be re-tested to ensure changes made do not introduce other problems.

Page 10: SADC Course in Statistics Types and Sources of Errors in Statistical Data

10To put your footer here go to View > Header and Footer

b. Interviewer bias

• An interviewer may influence the way a respondent answers survey questions.

• To prevent this, interviewers must be trained to remain neutral throughout the interviewing process and must pay close attention to the way they ask each question.

Page 11: SADC Course in Statistics Types and Sources of Errors in Statistical Data

11To put your footer here go to View > Header and Footer

c. Respondent errors

• These arise through the respondent providing inaccurate or wrong information.

• They occur because of memory biases or respondents giving inaccurate or false information when they believe that they are protecting their personal interests or integrity.

• They can also arise from the way the respondent interprets the questionnaire and the wording of the answer that the respondent gives.

• Careful questionnaire design and effective questionnaire testing can overcome these problems to some extent.

Page 12: SADC Course in Statistics Types and Sources of Errors in Statistical Data

12To put your footer here go to View > Header and Footer

d. Problems with the survey process

• Errors can also occur because of problems with the actual survey process such as using proxy responses, that is, taking answers from someone other than the respondent or lacking control over

the survey procedure.

Page 13: SADC Course in Statistics Types and Sources of Errors in Statistical Data

13To put your footer here go to View > Header and Footer

Non-Response • Non-response results when data is not collected

from respondents. • The proportion of these non-respondents in the

sample is called the non-response rate.• Non-response can be either total or partial.• Total non-response or unit non-response can

arise if a respondent cannot be contacted (because the sampling frame is incomplete or out-of-dated) or the respondent is not at home or is unable to respond because of language difficulties or illness or out rightly refuses to answer any questions or the dwelling unit is vacant.

• Other respondents may indicate that they simply don't have the time to complete the interview or survey form.

Page 14: SADC Course in Statistics Types and Sources of Errors in Statistical Data

14To put your footer here go to View > Header and Footer

Non-response - cont’d

• When conducting surveys it is important to document information on why a respondent has not responded.

• Partial non-response or item non-response can occur when a respondent replies to some but not all questions of the survey.

• This can arise due to memory problems, inadequate information or an inability to answer a particular question/section of the questionnaire.

• A respondent may refuse to answer if;a. they find questions particularly sensitive, or

ifb. they have been asked too many questions.

Page 15: SADC Course in Statistics Types and Sources of Errors in Statistical Data

15To put your footer here go to View > Header and Footer

Non-response - cont’d

• To reduce non-response, the following approaches can be used:– care should be taken in questionnaire design

through the use of simple questions.– pilot testing of the questionnaire.– explaining survey purposes and uses. – assuring confidentiality of responses.– public awareness activities including

discussions with key organisations and interest groups, news releases, media interview and articles.

Page 16: SADC Course in Statistics Types and Sources of Errors in Statistical Data

16To put your footer here go to View > Header and Footer

Processing • These occur at various stages of data processing

such as data cleaning, data capture and editing.

• Data cleaning involves taking preliminary checks before entering the data onto the processing system.

• Coder bias is usually a result of poor training or incomplete instructions, variability in coder performance and data entry errors.

Page 17: SADC Course in Statistics Types and Sources of Errors in Statistical Data

17To put your footer here go to View > Header and Footer

Processing – cont’d

• Inadequate checking and quality management at this stage can introduce data loss (where data is not entered into the system) and data duplication (where the same data is entered into the system more than once) thus introducing errors in data.

• To minimise these errors, processing staff should be given adequate training, instructions and realistic workloads.

Page 18: SADC Course in Statistics Types and Sources of Errors in Statistical Data

18To put your footer here go to View > Header and Footer

Time Period Bias

• This occurs when a survey is conducted during an unrepresentative time period.

• Survey timing is thus important and failure to recognise this introduces errors in data.

Page 19: SADC Course in Statistics Types and Sources of Errors in Statistical Data

19To put your footer here go to View > Header and Footer

Analysis and Estimation • Analysis errors include any errors that occur when

using wrong analytical tools or when preliminary results are used instead of the final ones.

• Errors that occur during the publication of the data results are also considered as analysis errors.

• Estimation errors occur when inappropriate or inaccurate weights are used in the estimation procedure thus introducing errors to the data.

• They also occur when wrong estimators are selected by the analyst.

Page 20: SADC Course in Statistics Types and Sources of Errors in Statistical Data

20To put your footer here go to View > Header and Footer

Reducing non-sampling errors

• Can be minimised by adopting any of the following approaches:– using an up-to-date and accurate sampling

frame. – careful selection of the time the survey is

conducted.– planning for follow up of non-respondents.– careful questionnaire design.– providing thorough training and periodic

retraining of interviewers and processing staff.

Page 21: SADC Course in Statistics Types and Sources of Errors in Statistical Data

21To put your footer here go to View > Header and Footer

Reducing non-sampling errors – cont’d

- designing good systems to capture errors that occur during the process of collecting data, sometimes called Data Quality Assurance Systems.

Page 22: SADC Course in Statistics Types and Sources of Errors in Statistical Data

22To put your footer here go to View > Header and Footer

Sampling error

• Refer to the difference between the estimate derived from a sample survey and the 'true' value that would result if a census of the whole population were taken under the same conditions.

• These are errors that arise because data has been collected from a part, rather than the whole of the population.

• Because of the above, sampling errors are restricted to sample surveys only unlike non-sampling errors that can occur in both sample surveys and censuses data.

Page 23: SADC Course in Statistics Types and Sources of Errors in Statistical Data

23To put your footer here go to View > Header and Footer

Sampling errors – cont’d

• There are no sampling errors in a census because the calculations are based on the entire population.

• They are measurable from the sample data in the case of probability sampling.

• More will be discussed in detail in more advanced modules of the training programme.

Page 24: SADC Course in Statistics Types and Sources of Errors in Statistical Data

24To put your footer here go to View > Header and Footer

Factors Affecting Sampling Error

It is affected by a number of factors including:a. sample size.• In general, larger sample sizes decrease the

sampling error, however this decrease is not directly proportional.

• As a rough rule of the thumb, you need to increase the sample size fourfold to halve the sampling error but bear in mind that non sampling errors are likely to increase with large samples.

b. the sampling fraction.• this is of lesser influence but as the sample size

increases as a fraction of the population, the sampling error should decrease.

Page 25: SADC Course in Statistics Types and Sources of Errors in Statistical Data

25To put your footer here go to View > Header and Footer

Factors Affecting Sampling Error – cont’d

c. the variability within the population. • More variable populations give rise to larger

errors as the samples or the estimates calculated from different samples are more likely to have greater variation.

• The effect of variability within the population can be reduced by the use of stratification that allows explaining some of the variability in the population.

d. sample design. • An efficient sampling design will help in reducing

sampling error.

Page 26: SADC Course in Statistics Types and Sources of Errors in Statistical Data

26To put your footer here go to View > Header and Footer

Characteristics of the sampling error

• generally decreases in magnitude as the sample size increases (but not proportionally).

• depends on the variability of the characteristic of interest in the population.

• can be accounted for and reduced by an appropriate sample plan.

• can be measured and controlled in probability sample surveys.

Page 27: SADC Course in Statistics Types and Sources of Errors in Statistical Data

27To put your footer here go to View > Header and Footer

Reducing sampling error

If sampling principles are applied carefully within the constraints of available resources, sampling error can be kept to a minimum.

Page 28: SADC Course in Statistics Types and Sources of Errors in Statistical Data

28To put your footer here go to View > Header and Footer

Sources

– http://www.nss.gov.au/nss/home.nsf/SurveyDesignDoc/4354A8928428F834CA2571AB002479CE?OpenDocument

– http://www.statcan.ca/english/edu/power/ch6/nonsampling/nonsampling.htm

– http://www.statcan.ca/english/edu/power/ch6/sampling/sampling.htm

Page 29: SADC Course in Statistics Types and Sources of Errors in Statistical Data

29To put your footer here go to View > Header and Footer