VALIDITY AND RELIABILITY.doc

  • Upload
    farlynz

  • View
    222

  • Download
    0

Embed Size (px)

Citation preview

  • 7/27/2019 VALIDITY AND RELIABILITY.doc

    1/6

    VALIDITY AND RELIABILITY

    For the statistical consultant working with social science researchers the estimation of

    reliability and validity is a task frequently encountered. Measurement issues differ in the social

    sciences in that they are related to the quantification of abstract, intangible and unobservable

    constructs. In many instances, then, the meaning of quantities is only inferred.

    Let us begin by a general description of the paradigm that we are dealing with. Most

    concepts in the behavioral sciences have meaning within the context of the theory that they are a

    part of. Each concept, thus, has an operational definition which is governed by the overarching

    theory. If a concept is involved in the testing of hypothesis to support the theory it has to be

    measured. So the first decision that the research is faced with is how shall the concept be

    measured? That is the type of measure. At a very broad level the type of measure can be

    observational, self-report, interview, etc. These types ultimately take shape of a more specific

    form like observation of ongoing activity, observing video-taped events, self-report measures

    like questionnaires that can be open-ended or close-ended, Likert-type scales, interviews that are

    structured, semi-structured or unstructured and open-ended or close-ended. Needless to say,

    each type of measure has specific types of issues that need to be addressed to make the

    measurement meaningful, accurate, and efficient.

    Another important feature is thepopulation for which the measure is intended. This

    decision is not entirely dependent on the theoretical paradigm but more to the immediate

    research question at hand.

    10/26/2013 1

  • 7/27/2019 VALIDITY AND RELIABILITY.doc

    2/6

    A third point that needs mentioning is thepurpose of the scale or measure. What is it that

    the researcher wants to do with the measure? Is it developed for a specific study or is it

    developed with the anticipation of extensive use with similar populations?

    Once some of these decisions are made and a measure is developed, which is a careful

    and tedious process, the relevant questions to raise are how do we know that we are indeed

    measuring what we want to measure? since the construct that we are measuring is abstract, and

    can we be sure that if we repeated the measurement we will get the same result?. The first

    question is related to validity and second to reliability. Validity and reliability are two important

    characteristics of behavioral measure and are referred to as psychometric properties.

    It is important to bear in mind that validity and reliability are not an all or none issue but

    a matter of degree.

    Validity:

    Very simply, validity is the extent to which a test measures what it is supposed to

    measure. The question of validity is raised in the context of the three points made above, the

    form of the test, the purpose of the test and the population for whom it is intended. Therefore,

    we cannot ask the general question Is this a valid test?. The question to ask is how valid is

    this test for the decision that I need to make? or how valid is the interpretation I propose for

    the test? We can divide the types of validity into logical and empirical.

    Content Validity:

    When we want to find out if the entire content of the behavior/construct/area is

    represented in the test we compare the test task with the content of the behavior. This is a logical

    method, not an empirical one. Example, if we want to test knowledge on American Geography it

    is not fair to have most questions limited to the geography of New England.

    10/26/2013 2

  • 7/27/2019 VALIDITY AND RELIABILITY.doc

    3/6

    Face Validity:

    Basically face validity refers to the degree to which a test appears to measure what it

    purports to measure.

    Criterion-Oriented or Predictive Validity:

    When you are expecting a future performance based on the scores obtained currently by

    the measure, correlate the scores obtained with the performance. The later performance is called

    the criterion and the current score is the prediction. This is an empirical check on the value of

    the test a criterion-oriented or predictive validation.

    Concurrent Validity:

    Concurrent validity is the degree to which the scores on a test are related to the scores on

    another, already established, test administered at the same time, or to some other valid criterion

    available at the same time. Example, a new simple test is to be used in place of an old

    cumbersome one, which is considered useful, measurements are obtained on both at the same

    time. Logically, predictive and concurrent validation are the same, the term concurrent

    validation is used to indicate that no time elapsed between measures.

    Construct Validity:

    Construct validity is the degree to which a test measures an intended hypothetical

    construct. Many times psychologists assess/measure abstract attributes or constructs. The

    process of validating the interpretations about that construct as indicated by the test score is

    construct validation. This can be done experimentally, e.g., if we want to validate a measure of

    anxiety. We have a hypothesis that anxiety increases when subjects are under the threat of an

    electric shock, then the threat of an electric shock should increase anxiety scores (note: not all

    construct validation is this dramatic!)

    10/26/2013 3

  • 7/27/2019 VALIDITY AND RELIABILITY.doc

    4/6

    A correlation coefficient is a statistical summary of the relation between two variables. It

    is the most common way of reporting the answer to such questions as the following: Does this

    test predict performance on the job? Do these two tests measure the same thing? Do the ranks

    of these people today agree with their ranks a year ago?

    (rank correlation and product-moment correlation)

    According to Cronbach, to the question what is a good validity coefficient? the only

    sensible answer is the best you can get, and it is unusual for a validity coefficient to rise above

    0.60, though that is far from perfect prediction.

    All in all we need to always keep in mind the contextual questions: what is the test going

    to be used for? how expensive is it in terms of time, energy and money? what implications are

    we intending to draw from test scores?

    Reliability:

    Research requires dependable measurement. (Nunnally) Measurements are reliable to the

    extent that they are repeatable and that any random influence which tends to make measurements

    different from occasion to occasion or circumstance to circumstance is a source of measurement

    error. (Gay) Reliability is the degree to which a test consistently measures whatever it measures.

    Errors of measurement that affect reliability are random errors and errors of measurement that

    affect validity are systematic or constant errors.

    Test-retest, equivalent forms and split-half reliability are all determined through

    correlation.

    10/26/2013 4

  • 7/27/2019 VALIDITY AND RELIABILITY.doc

    5/6

    Test-retest Reliability:

    Test-retest reliability is the degree to which scores are consistent over time. It indicates

    score variation that occurs from testing session to testing session as a result of errors of

    measurement. Problems: Memory, Maturation, Learning.

    Equivalent-Forms or Alternate-Forms Reliability:

    Two tests that are identical in every way except for the actual items included. Used when

    it is likely that test takers will recall responses made during the first session andwhen alternate

    forms are available. Correlate the two scores. The obtained coefficient is called the coefficient

    of stability or coefficient of equivalence. Problem: Difficulty of constructing two forms that are

    essentially equivalent.

    Both of the above require two administrations.

    Split-Half Reliability:

    Requires only one administration. Especially appropriate when the test is very long. The

    most commonly used method to split the test into two is using the odd-even strategy. Since

    longer tests tend to be more reliable, and since split-half reliability represents the reliability of a

    test only half as long as the actual test, a correction formula must be applied to the coefficient.

    Spearman-Brown prophecy formula.

    Split-half reliability is a form of internal consistency reliability.

    Rationale Equivalence Reliability:

    Rationale equivalence reliability is not established through correlation but rather

    estimates internal consistency by determining how all items on a test relate to all other items and

    to the total test.

    10/26/2013 5

  • 7/27/2019 VALIDITY AND RELIABILITY.doc

    6/6

    Internal Consistency Reliability:

    Determining how all items on the test relate to all other items. Kudser-Richardson-> is

    an estimate of reliability that is essentially equivalent to the average of the split-half reliabilities

    computed for all possible halves.

    Standard Error of Measurement:

    Reliability can also be expressed in terms of the standard error of measurement. It is an

    estimate of how often you can expect errors of a given size.

    REFERENCES

    Berk, R., 1979. Generalizability of Behavioral Observations: A Clarification of Interobserver

    Agreement and Interobserver Reliability. American Journal of Mental Deficiency, Vol.

    83, No. 5, p. 460-472.

    Cronbach, L., 1990. Essentials of psychological testing. Harper & Row, New York.

    Carmines, E., and Zeller, R., 1979. Reliability and Validity Assessment. Sage Publications,

    Beverly Hills, California.

    Gay, L., 1987. Eductional research: competencies for analysis and application. Merrill Pub.

    Co., Columbus.

    Guilford, J., 1954. Psychometric Methods. McGraw-Hill, New York.

    Nunnally, J., 1978. Psychometric Theory. McGraw-Hill, New York.

    Winer, B., Brown, D., and Michels, K., 1991. Statistical Principles in Experimental Design,

    Third Edition. McGraw-Hill, New York.

    10/26/2013 6