Upload
derek-jennings
View
235
Download
4
Tags:
Embed Size (px)
Citation preview
Norm- Referenced measurements
BY
Shadia Abd Elkader
Norm- Referenced measurements
Norm-referenced: metric based on comparison of individual performance in relation to specific group.
A norm-referenced test is a type of test, assessment, or evaluation in which the tested individual is compared to a sample of his or her peers (referred to as a "normative sample"). The term "normative assessment" refers to the process of comparing one test-taker to his or her peers.
Norm-referenced measures are designed to compare students (i.e., disperse average student scores along a bell curve, with some students performing very well, most performing average, and a few performing poorly).
Tests that set goals for students based on the average student's performance are norm-referenced tests.
The SAT, Graduate Record Examination (GRE), and Wechsler Intelligence Scale for Children (WISC) compare individual student performance to the performance of a normative sample. Test-takers cannot "fail" a norm-referenced test, as each test-taker receives a score that compares the individual to others that have taken the test, usually given by a percentile.
Norm referenced test - Aim to produce an overall distribution of student scores that has a 'normal' (bell shaped) distribution curve.
An obvious disadvantage of norm-referenced tests is that it cannot measure progress of the population of a whole, only where individuals fall within the whole. Thus, only measuring against a fixed goal can be used to measure the success of an educational reform program which seeks to raise the achievement of all students against new standards which seek to assess skills beyond choosing among multiple choices.
However, while this is attractive in theory, in practice the bar has often been moved in the face of excessive failure rates, and improvement sometimes occurs simply because of familiarity with and teaching to the same test.
Designing Norm-referenced
Selecting conceptual ModelDeveloping objectives of the measureDeveloping a blueprintConstruction of the measure
Definitions
Scientific method: Set of procedures for creating and answering questions
Assume in psychology behavior is lawful, determined, and understandable
The research design should provide empirical, objective, systematic, and controlled observations that can be replicated.
Goals: Describe, Explain, Predict, Control
Definitions
Applied Research-solve existing problemBasic Research-obtain knowledgeTheory-set of statements organize body
of knowledge (Kuhn vs. Popper)Model – explains underlying process
Research Process
Research question Hypothesis-formally stated expectation of
behavior. Testable, falsifiable, rational, parsimonious Null – no relationship Alternative – relationship
Operational Definition-translating construct into measurable observations Variable –entity takes on different values Attribute – value on variable
Definitions
Construct – hypothetical, non-observable entity
Validity – degree to which legitimate inferences can be made from the operationalization of the theoretical construct. Conclusion – establishing a relationship Internal – causal (how it works) Construct – legitimacy of inferences to construct External-generalization to different subjects,
settings
Converging Operations-different procedures producing same results (e.g. Schizophrenia, refrigerator mother or dopamine).
Big Picture
Setting Objectives
Stating the purpose of the study is the first step in designing tool. If we already had a conceptual model for the study stating and determining objectives becomes much easier ▪These objectives should be derived from and be consistent with conceptual model appropriate to the topic understudy
Conceptual definition of framework defines the relevant domain of content to be assessed by the measure as specific as well as specifies the type of behavior the subject will exhibit to demonstrate the purpose of the measure has been met.
Objectives should be stated correctly according to the following approach:
(1) a description of the respondent; (2) delineation of the kind of behavior the
respondent will exhibit to demonstrate accomplishment of the objective; and
(3) a statement of the kind of content to which behavior relates.
This approach toobjective explication is quite useful
within a norm-referenced measurement context because it results in an outline of content and a list of behaviors that can then be readily used in blueprinting.
The use of taxonomies in explicating and measuring objectives provides several advantages:
A critical aspect of any behavioral objective is the word selected to indicate expected behavior.
A behavioral term by definition is one that is observable and measurable (i.e., behavior refers to any action on the part of an individual that can be seen, felt, or heard by another person).
Cognitive and affective objectives, although they are concerned with thinking and feeling, which themselves are not directly observable, are inferred from psychomotor or behavioral acts. In reality, the same behavioral term can be seen, felt, or heard differently by different people
it is impossible to measure every action inherent in a given behavior, different people frequently define the critical behavior to be observed, using a given objective, quite differently.
When taxonomies are employed, action verbs and critical behaviors to be observed are specified, hence decreasing the possibility that the behaviors will be interpreted differently and increasing the probability that the resulting measure will be reliable and valid.
A measurement must match the level of respondent performance stated in the behavioral objective; that is, a performance verb at the application level of the cognitive taxonomy must be assessed by a cognitive item requiring the same level of performance.
▪ Any discrepancy between the stated objective and the performance required by the instrument or measurement device will result in decreased reliability and validity of the measurement process
For example, if the objective for the measurement is to:
ascertain the ability of practicing nurses to apply gerontological content in their work with aging clients (application level of Bloom's) and if the measure constructed to assess the objective simply requires a statement in their own words of some principles important to the care of the gerontological patient (comprehension level of the taxonomy), the outcomes of the measurement are not valid, in that this tool does not measure what is intended.
Blueprinting
The next step is to develop a blueprint to establish the specific scope and emphasis of the measure.
For example blueprint for measure to assess a patient's compliance with a discharge plan. The four major content areas to be assessed appear as column headings across the top of the table; critical behaviors to be measured are listed on the left-hand side of the table as row headings
Blueprint for a measure to assess a
patient's compliance a discharge plan
Objectives
knowledge
medications
nutrition
Daily activity
Total
Ascertain
Patien
Knowledge of The
contents
Of the
Discharge plan
355518
Determine
Patient
Attitudes
Toward
Contents of
the discharge
Plan
22228
EvaluatePatient
Compliance
With the
Contents of
The discharge
Plan
Total
4
9
10
17
10
17
10
17
34
60
Each intersection or cell thus represents a particular content-objective pairing, and values in each cell reflect the actual number of each type of item to be included on the measure.
Hence, from the table it can be seen that three items will be constructed to assess
the content-objective pairing patient knowledge of the contents of the discharge plan/general health knowledge.
The scope of the measure is defined by the cells, which are reflective of the domain of items to be measured, and the emphasis of the measure and/ or relative importance of each content-behavior pairing is ascertained by examining the numbers in the cells.
Blueprint, one can readily tell the topics about which questions will be asked, the types of critical behaviors subjects will be required to demonstrate, and what is relatively important and unimportant to the constructor.
Given the blueprint, the number (or percentage) of items prescribed in each cell would be constructed. Content validity could then be assessed by presenting content experts with the
blueprint and the test and having them judge: 1- The adequacy of the measure as reflected in the
blueprint—that is, whether or not the domain is adequately represented to ascertain that the most appropriate elements are being assessed;
(2) The fairness of the measure—whether it gives unfair advantage to some subjects over others;
(3) The fit of the method to the blueprint from which it was derived.
Constructing the Measure
The type of measure to be employed is a function of the conceptual model and subsequent operational definition of key variables to be measured.
Every measure Is composed of three components:
(1) directions for administration; (2} a set of items: (3) directions for obtaining and
interpreting scores.
Administration
Considerations to be made in preparing instructions for the administration of a measure:
1. A description of who should administer the measure
• A statement of eligibility • A list of essential characteristics • A list of duties 2. Directions for those who administer the
measure • A statement of the purposes for the measure
• Amount of time needed for administration • A statement reflecting the importance of
adhering to directions • Specifications for the physical
environment • A description of how material will be received
and stored • Specifications for maintaining security
• Provisions for supplementary materials needed • Recommendations for response to subjects'
questions • Instructions for handling defective materials • Procedures to follow when distributing the
measure • A schedule for administration • Directions for collection of completed measures • Specifications for the preparation of special
reports (e.g., irregularity reports)
• Instructions for the delivery and/or preparation of completed
measures for scoring• Directions for the return or disposal of
materials
3. Directions for respondents • A statement regarding information to be
given to subjects prior to the data collection session (e.g., materials to be brought along and procedures for how, when, and where data will be collected)
• Instructions regarding completion of the measure, including a request for cooperation, directions to be followed in completing each item type, and directions for when and how to record answers
4. Directions for users of results • Suggestions for use of results
Instructions for dissemination of results The importance of providing this information as an
essential component of any measure cannot be overemphasized.
Errors in administration are an important source of measurement error, and their probability of occurrence is greatly increased when directions for administration are not communicated clearly and explicitly in writing
Standard error of measurementStandard error of measurement……an estimate of how often a researcher
can expect errors of a given size on an instrument
Characteristics of a Good Test
- The test is valid- Does the test measure what it is being used
to measure?
- The test is reliable- Are scores consistent?
- The test is constructed to facilitate ease of taking and scoring
- The test is of an appropriate length
Reliability
Next to validity, reliability is the most important characteristic of assessment results.
Why?1. It provides the consistency to make validity
possible.
2. It indicates the degree to which various kinds of generalizations are justifiable.
Reliability
Reliability: the consistency of measurement, i.e. how consistent test scores or other assessment results are from one measurement to another.
Standard Error of Measurement (SEM)= the estimated amount of variation expected in a score.
Definitions of Reliability
Consistency of a measurement made with a particular test
Consistency of scores upon repeated measurement of the same individuals
Extent to which test produces the same results when used repeatedly under the same conditions
Reliability
Which is more reliable?
Reliability
e = error
Error variance is the variability that exists in a set of scores and is due to factors other than the one being assessed.
Systematic: errors that are consistent.Random: errors that have no pattern.
Reliability
e = error
Positive error (i.e. raises score):Lucky guesses. Items that give clues to the answer.Cheating (students, aides, teachers).
Reliability
e = error score
Negative error (i.e. lowers score): Not following directions. Miss-marking items. Room climate/atmosphere. Hunger, fatigue, illness, “need to go potty”. Assemblies, ball games, fire drills, etc. Break-up of a relationship.
Reliability
Summation of Reliability:
1. Reliability refers to the results and not to the instrument itself.
2. Reliability is a necessary but not sufficient condition for validity.
3. The more reliable the assessment, the better.
Evaluation of a Measurement: Reliability and Validity
Reliability refers to the consistency of the measurement.
Validity refers to the accuracy of the measurement.
Which is the most consistent and accurate?
Can one be very consistent but inaccurate?
Would you describe someone as very accurate when he/she is very inconsistent?
Validity: Accuracy of the Measurement
Does the instrument measure the property that it intends to measure?
A measurement device is valid if it measures what it is supposed to measure.
Validity
Validity: appropriateness of test usage in measurement and inference.
Content: item representativenessCriterion: relationship of test score and
other outcome or measure.Concurrent: other similar measurePredictive: outcome in future
Construct: extent of validity in test measuring theoretical construct.
The extent to which the research design is sufficiently precise or powerful enough for the detection of effects on the operationalized variable should they exist
Conclusion Validity
Threats to Conclusion ValidityThreats to Conclusion Validity
Low statistical powerViolated assumptions of statistical testsFishing and the Error Rate problemLow reliability of measuresPoor reliability of treatment
implementationRandom irrelevancies in the settingRandom heterogeneity of respondents
Descriptive: Central Tendency
Mean Median ModeVariabilityNumbers that summarize the extent to
which scores in a distribution differVarianceStandard deviation
Z-score
Z = score – mean _________________
Standard deviation
Converts scores to a standard metric. Mean = 0, SD = 1
Test: score 80, mean is 85 E.G. (80-85)/10 = -.5 Standard Scores
T score = 10(z) + 50 SAT = 100(z) + 100
Correlation and Relationship Patterns
PositiveNegativeNoneCurvilinear
Correlation and Regression-- The General Linear ModelCorrelation and Regression-- The General Linear Model
Formula for a straight lineFormula for a straight line
y = by = b00 + b + b11xx
xx
yy
b0 = interceptb0 = intercept
b1 = slope b1 = slope
yy
xx
yyxx
==
OutcomeOutcomeOutcomeOutcome ProgramProgramProgramProgram
Correlation Range: -1.0 to +1.0
Standard Error of Estimate
SEE demonstrates the accuracy of prediction
12)80(.15
36.115
60.115
1
2
2
SEE
SEE
SEE
rSDySEE xy
Confidence Intervals
Provides an indicator of how certain we are in reporting scores
CI=obtained score + (SEM)
68% level = 1.00 z score
85% level = 1.44
90% = 1.65
95% = 1.65
99% =2.58
yreliabilitSDSEM 1
Reliability
Reliability: Consistency of measurementReliability coefficient: quantitative
estimate of the degree of stabilityTest-retest Alternate Form Internal Consistency
Chronbach Alpha, Spearman-Brown, Kuder-Rich Inter-Rater
Factors that influence reliablity
Test Test length Homogeneity of items Test-rest Score variability Guessing Sample Size Examinee Factors/Situation Factors
Examinee / Situational Factors
1. Examinee characteristics (health, fatigue, motivation, etc.)
2. Understanding directions, interpretation of directions, language, fictitious/malingering
3. Examiner Factors1. Bias, rapport, complex directions, error in
scoring or administration, failure to provide suitable environment
Discrepancy Procedure
Raw score differenceStandard Score DifferenceDifference considering reliabilityDifference from regression based
prediction
Standard Error of Difference
SED provides a determination of a significant difference between any two standard scores
Requires same SD for tests Best when it takes into account reliability
and correlation of two tests When only reliability is known:
yYreliabilityXreliabilitSDSED 2
General Procedure for Determining Score Discrepancy
1. Subtract ability from achievement score (don’t do it!)
2. Take reliability of tests into consideration (SED)
3. Take reliability and correlation of tests into consideration (SER)
yYreliabilityXreliabilitSDSED 2
Ability SS = 105(rxx=.89); Read=85(.72)
9.37 =critical value for difference at 68% Multiply by 2.58 (p<.01=24.16); need
25pts or 1.96(p<.05=18.37); need 19
37.9
)624(.15
72.89.215
SED
SED
SED
SE Residual: adds correlation
Suppose Ability and Achievement r=.55
1. Find predicted Ability
2. Converting to z-scores
3. Multiply by correlation 1815.)33(.55.)(
)(*)(
2
33.15/100105
/
1
predictedz
abilityzrxypredictedz
Step
z
SDMrawz
Step
SER
pts
obtainedSSpredicatedSS
Step
predicatedSS
zpredictedSS
Step
1885103
()(
5
103)15*1815(.100(
)15*(100)(
4
4. Convert to scale 100 SD=15
5. Subtract predicted from obtained
6. Calculate SEResidual
551.
6975./)605(.269.72.
55.1/)55(.2)55)(.89(.72.
1/)(2)(
6
222
222
residual
residual
residual
rxyrxyrrxxryyresidual
Step
SER
396.8
)670)(.835(.15
551.1*3025.115
1*1 2
SER
SER
SER
residualxyrSDSER
Multiply by 1.96 (p<.05 = 16.45), 17 pt difference
Multiply by 2.58(p<.01=21.66), 22 pt difference
Conclusion?
Compare SED and SER methods
Predictive Utility
Valid Positive(hit)- predicted + and confirmed +
False Positive (false alarm)-predicted + but actual –
Valid Negative(correct rejection)-predicted – and actual –
False Negative(miss)- predicted – actual +
Decision Making
The Decision MatrixThe Decision MatrixIn realityIn reality
WhatWhatwe concludewe conclude
The Decision MatrixThe Decision MatrixIn realityIn reality
WhatWhatwe concludewe conclude
Null trueNull true
Alternative falseAlternative falseIn In realityreality... ...
• There is no real program effectThere is no real program effect• There is no difference, gainThere is no difference, gain• Our theory is wrongOur theory is wrong
The Decision MatrixThe Decision MatrixIn realityIn reality
Whatwe conclude
Null trueNull true
Alternative falseAlternative falseIn In realityreality... ...
Accept null
Reject alternative
We say...
• There is no real program effect
• There is no difference, gain
• Our theory is wrong
• There is no real program effectThere is no real program effect• There is no difference, gainThere is no difference, gain• Our theory is wrongOur theory is wrong
The Decision MatrixThe Decision MatrixIn realityIn reality
Whatwe conclude
Null trueNull true
Alternative falseAlternative falseIn In realityreality... ...
Accept null
Reject alternative
We say...
• There is no real program effect
• There is no difference, gain
• Our theory is wrong
• There is no real program effectThere is no real program effect• There is no difference, gainThere is no difference, gain• Our theory is wrongOur theory is wrong
1-1-
THE CONFIDENCE LEVELTHE CONFIDENCE LEVEL
The odds of saying there is The odds of saying there is nono effect or gain when in effect or gain when in
fact there is nonefact there is none
# # of times out of 100 of times out of 100 when there is when there is nono effect, effect, we’ll say there is nonewe’ll say there is none
The Decision MatrixThe Decision MatrixIn realityIn reality
Whatwe conclude
Null trueNull true
Alternative falseAlternative falseIn In realityreality... ...
Reject null
Accept alternative
We say...
• There is a real program effect
• There is a difference, gain
• Our theory is correct
• There is no real program effectThere is no real program effect• There is no difference, gainThere is no difference, gain• Our theory is wrongOur theory is wrong
The Decision MatrixThe Decision MatrixIn realityIn reality
Whatwe conclude
Null trueNull true
Alternative falseAlternative falseIn In realityreality... ...
Reject null
Accept alternative
We say...
• There is a real program effect
• There is a difference, gain
• Our theory is correct
• There is no real program effectThere is no real program effect• There is no difference, gainThere is no difference, gain• Our theory is wrongOur theory is wrong
TYPE I ERRORTYPE I ERRORThe odds of saying there The odds of saying there isis
an effect or gain when in an effect or gain when in fact there is nonefact there is none
# of times out of 100 when # of times out of 100 when there is there is nono effect, we’ll say effect, we’ll say
there is onethere is one
The Decision MatrixThe Decision MatrixIn realityIn reality
Whatwe conclude
Null falseNull false
Alternative trueAlternative true
In In realityreality... ... • There is a real program effectThere is a real program effect• There is a difference, gainThere is a difference, gain• Our theory is correctOur theory is correct
The Decision MatrixThe Decision MatrixIn realityIn reality
Whatwe conclude
Null falseNull false
Alternative trueAlternative true
In In realityreality... ...
Accept null
Reject alternative
We say...
• There is no real program effect
• There is no difference, gain
• Our theory is wrong
• There is a real program effectThere is a real program effect• There is a difference, gainThere is a difference, gain• Our theory is correctOur theory is correct
The Decision MatrixThe Decision MatrixIn realityIn reality
Whatwe conclude
Null falseNull false
Alternative trueAlternative true
In In realityreality... ...
Accept null
Reject alternative
We say...
• There is no real program effect
• There is no difference, gain
• Our theory is wrong
• There is a real program effectThere is a real program effect• There is a difference, gainThere is a difference, gain• Our theory is correctOur theory is correct
TYPE II ERRORTYPE II ERROR
The odds of saying there is The odds of saying there is no effect or gain when in no effect or gain when in
fact there is onefact there is one
# of times out of 100 # of times out of 100 when there when there isis an effect, an effect, we’ll say there is nonewe’ll say there is none
The Decision MatrixThe Decision MatrixIn realityIn reality
Whatwe conclude
Null falseNull false
Alternative trueAlternative true
In In realityreality... ...
Reject null
Accept alternative
We say...
• There is a real program effect
• There is a difference, gain
• Our theory is correct
• There is a real program effectThere is a real program effect• There is a difference, gainThere is a difference, gain• Our theory is correctOur theory is correct
The Decision MatrixThe Decision MatrixIn realityIn reality
WhatWhatwe concludewe conclude
Null falseNull false
Alternative trueAlternative true
In In realityreality... ...
Reject null
Accept alternative
We say...
• There is a real program effect
• There is a difference, gain
• Our theory is correct
• There is a real program effectThere is a real program effect• There is a difference, gainThere is a difference, gain• Our theory is correctOur theory is correct
1-1-
POWERPOWERThe odds of saying there The odds of saying there isis
an effect or gain when in an effect or gain when in fact there is onefact there is one
# of times out of 100 when # of times out of 100 when there there isis an effect, we’ll say an effect, we’ll say
there is onethere is one
The Decision MatrixThe Decision MatrixIn realityIn reality
Whatwe conclude
NullNull truetrue Null falseNull false
Alternative falseAlternative false Alternative trueAlternative true
In In realityreality... ... In In realityreality... ...
Accept null
Reject alternative
Reject null
Accept alternative
We say...
• There is no real program effect
• There is no difference, gain
• Our theory is wrong
We say...
• There is a real program effect
• There is a difference, gain
• Our theory is correct
• There is no real program effectThere is no real program effect• There is no difference, gainThere is no difference, gain• Our theory is wrongOur theory is wrong
• There is a real program effectThere is a real program effect• There is a difference, gainThere is a difference, gain• Our theory is correctOur theory is correct
1-1-
THE CONFIDENCE LEVELTHE CONFIDENCE LEVEL TYPE II ERRORTYPE II ERROR
The odds of saying there is The odds of saying there is nono effect or gain when in effect or gain when in
fact there is nonefact there is none
# of times out of 100 when # of times out of 100 when there is there is nono effect, we’ll say effect, we’ll say
there is nonethere is none
The odds of saying there is The odds of saying there is no effect or gain when in no effect or gain when in
fact there is onefact there is one
# of times out of 100 when # of times out of 100 when there there isis an effect, we’ll say an effect, we’ll say
there is nonethere is none
1-1-
TYPE I ERRORTYPE I ERROR POWERPOWERThe odds of saying there The odds of saying there isis
an effect or gain when in an effect or gain when in fact there is nonefact there is none
The odds of saying there The odds of saying there isis an effect or gain when in an effect or gain when in
fact there is onefact there is one
# of times out of 100 when # of times out of 100 when there is there is nono effect, we’ll say effect, we’ll say
there is onethere is one
# of times out of 100 when # of times out of 100 when there there isis an effect, we’ll say an effect, we’ll say
there is onethere is one
The Decision MatrixThe Decision MatrixIn realityIn reality
Whatwe conclude
Null trueNull true Null falseNull false
Alternative falseAlternative false Alternative trueAlternative true
In In realityreality... ... In In realityreality... ...
Accept null
Reject alternative
Reject null
Accept alternative
We say...
• There is no real program effect
• There is no difference, gain
• Our theory is wrong
We say...
• There is a real program effect
• There is a difference, gain
• Our theory is correct
• There is no real program effectThere is no real program effect• There is no difference, gainThere is no difference, gain• Our theory is wrongOur theory is wrong
• There is a real program effectThere is a real program effect• There is a difference, gainThere is a difference, gain• Our theory is correctOur theory is correct
1-1-
THE CONFIDENCE LEVELTHE CONFIDENCE LEVEL TYPE II TYPE II ERRORERROR
1-1-
TYPE ITYPE I ERRORERROR POWERPOWER
The Decision MatrixThe Decision MatrixIn realityIn reality
Whatwe conclude
Null trueNull true Null falseNull false
Alternative falseAlternative false Alternative trueAlternative true
In In realityreality... ... In In realityreality... ...
Accept null
Reject alternative
Reject null
Accept alternative
We say...
• There is no real program effect
• There is no difference, gain
• Our theory is wrong
We say...
• There is a real program effect
• There is a difference, gain
• Our theory is correct
• There is no real program effectThere is no real program effect• There is no difference, gainThere is no difference, gain• Our theory is wrongOur theory is wrong
• There is a real program effectThere is a real program effect• There is a difference, gainThere is a difference, gain• Our theory is correctOur theory is correct
1-1-
THE CONFIDENCE LEVELTHE CONFIDENCE LEVEL TYPE II ERRORTYPE II ERROR
1-1-
TYPE I ERRORTYPE I ERROR POWERPOWER
CORRECTCORRECT
CORRECTCORRECT
The Decision MatrixThe Decision MatrixIn realityIn reality
Whatwe conclude
Null trueNull true Null falseNull false
Alternative falseAlternative false Alternative trueAlternative true
In In realityreality... ... In In realityreality... ...
Accept null
Reject alternative
Reject null
Accept alternative
We say...
• There is no real program effect
• There is no difference, gain
• Our theory is wrong
We say...
• There is a real program effect
• There is a difference, gain
• Our theory is correct
• There is no real program effectThere is no real program effect• There is no difference, gainThere is no difference, gain• Our theory is wrongOur theory is wrong
• There is a real program effectThere is a real program effect• There is a difference, gainThere is a difference, gain• Our theory is correctOur theory is correct
1-1-
THE CONFIDENCE LEVELTHE CONFIDENCE LEVEL TYPE II ERRORTYPE II ERROR
The odds of saying there is The odds of saying there is nono effect or gain when in effect or gain when in
fact there is nonefact there is none
# of times out of 100 when # of times out of 100 when there is there is nono effect, we’ll say effect, we’ll say
there is nonethere is none
The odds of saying there is The odds of saying there is no effect or gain when in no effect or gain when in
fact there is onefact there is one
# of times out of 100 when # of times out of 100 when there there isis an effect, we’ll say an effect, we’ll say
there is nonethere is none
1-1-
TYPE I ERRORTYPE I ERROR POWERPOWERThe odds of saying there The odds of saying there isis
an effect or gain when in an effect or gain when in fact there is nonefact there is none
The odds of saying there The odds of saying there isis an effect or gain when in an effect or gain when in
fact there is onefact there is one
# of times out of 100 when # of times out of 100 when there is there is nono effect, we’ll say effect, we’ll say
there is onethere is one
# of times out of 100 when # of times out of 100 when there there isis an effect, we’ll say an effect, we’ll say
there is onethere is one
If you try to increase power, you increase If you try to increase power, you increase the chance of winding up in the bottom the chance of winding up in the bottom
row and of Type I error.row and of Type I error.
If you try to increase power, you increase If you try to increase power, you increase the chance of winding up in the bottom the chance of winding up in the bottom
row and of Type I error.row and of Type I error.
The Decision MatrixThe Decision MatrixIn realityIn reality
WhatWhatwe concludewe conclude
Null trueNull true Null falseNull false
Alternative falseAlternative false Alternative trueAlternative true
In In realityreality... ... In In realityreality... ...
Accept nullAccept null
Reject alternativeReject alternative
Reject null
Accept alternative
We We saysay......
• There is no real There is no real program effectprogram effect
• There is no difference, There is no difference, gaingain
• Our theory is wrongOur theory is wrong
We say...
• There is a real program effect
• There is a difference, gain
• Our theory is correct
• There is no real program effectThere is no real program effect• There is no difference, gainThere is no difference, gain• Our theory is wrongOur theory is wrong
• There is a real program effectThere is a real program effect• There is a difference, gainThere is a difference, gain• Our theory is correctOur theory is correct
1-1-
THE CONFIDENCE LEVELTHE CONFIDENCE LEVEL TYPE II ERRORTYPE II ERROR
The odds of saying there is The odds of saying there is nono effect or gain when in effect or gain when in
fact there is nonefact there is none
# of times out of 100 when # of times out of 100 when there is there is nono effect, we’ll say effect, we’ll say
there is nonethere is none
The odds of saying there is The odds of saying there is no effect or gain when in no effect or gain when in
fact there is onefact there is one
# of times out of 100 when # of times out of 100 when there there isis an effect, we’ll say an effect, we’ll say
there is nonethere is none
1-1-
TYPE I ERRORTYPE I ERROR POWERPOWERThe odds of saying there The odds of saying there isis
an effect or gain when in an effect or gain when in fact there is nonefact there is none
The odds of saying there The odds of saying there isis an effect or gain when in an effect or gain when in
fact there is onefact there is one
# of times out of 100 when # of times out of 100 when there is there is nono effect, we’ll say effect, we’ll say
there is onethere is one
# of times out of 100 when # of times out of 100 when there there isis an effect, we’ll say an effect, we’ll say
there is onethere is one
If you try to If you try to decrease Type I decrease Type I
errors, you errors, you increase the increase the
chance of winding chance of winding up in the top row up in the top row
and of Type II and of Type II error.error.
If you try to If you try to decrease Type I decrease Type I
errors, you errors, you increase the increase the
chance of winding chance of winding up in the top row up in the top row
and of Type II and of Type II error.error.