60
The Evaluation of Teacher and School Effectiveness Using Growth Models and Value Added Modeling: Hope Versus Reality Robert W. Lissitz University of Maryland Maryland Assessment Research Center for Education Success http://marces.org/Completed.htm

Robert W. Lissitz University of Maryland

  • Upload
    tia

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

The Evaluation of Teacher and School Effectiveness Using Growth Models and Value Added Modeling: Hope Versus Reality. Robert W. Lissitz University of Maryland. http://marces.org/Completed.htm. Thank you. First, I want to thank… The creators of this symposium Burcu Kaniskan - PowerPoint PPT Presentation

Citation preview

Page 1: Robert W. Lissitz University of Maryland

The Evaluation of Teacher and School Effectiveness Using Growth Models and Value

Added Modeling:Hope Versus Reality

Robert W. LissitzUniversity of Maryland

Maryland Assessment Research Center for Education Success

http://marces.org/Completed.htm

Page 2: Robert W. Lissitz University of Maryland

THANK YOU

First, I want to thank…• The creators of this symposium

• Burcu Kaniskan• The State of Maryland• MARCES:

• Laura Reiner, Yuan Zhang, Xiaoshu Zhu, and Dr. Bill Schafer• Drs. Xiaodong Hou and Ying Li• Yong Luo, Matt Griffin, Tiago Calico, and Christy Lewis

Page 3: Robert W. Lissitz University of Maryland

PREVIEW

• History of VAM

• Literature:

• Reliability

• Validity

• Application of VAM

• Direction of VAM in the future

• Applied viewpoint

• Psychometric viewpoint

Page 4: Robert W. Lissitz University of Maryland

INTRODUCTION AND HISTORY

• The federal government is asking psychometricians to

help make decisions

• Race to the Top

• Earlier: No Child Left Behind (“Race to the Middle”)

• The government wants a system that will

• Pressure educational administrations to do the right thing

• Combat the teachers’ unions perceived as obstacles

RACE TO THE MIDDLE

Page 5: Robert W. Lissitz University of Maryland

• Value-added modeling (VAM) is a system that we hope

can determine the effectiveness of some mechanism

• Usually teachers or schools

• Most popular models include

• Simple regression

• Recording transitions between performance levels in

adjacent grades

• Mixed effects or multilevel regression models

• Teacher or school as level 2 effect

INTRODUCTION AND HISTORYWHAT IS VAM?

Page 6: Robert W. Lissitz University of Maryland

• Results for each student are usually aggregated

• Provides summaries of every student for each teacher

• Attempt to show whether students associated with a

teacher are performing above or below statistically

expected values, or values associated with other teachers

• Usually normative in nature

INTRODUCTION AND HISTORYWHAT IS VAM?

Page 7: Robert W. Lissitz University of Maryland

• Investigated school effectiveness and reliability of indicators

• Findings:

• Some schools are better than others

• Differences in quality are inconsistent

• Across years

• Within schools across grade levels and subject areas

INTRODUCTION AND HISTORYMANDEVILLE – late 1980’s

Page 8: Robert W. Lissitz University of Maryland

1994: School effects1995-1996: Teacher effects

• Model with two stages:• Regression to control for “fairness variables”

• Gender, ethnicity, English proficiency, SES, etc.• HLM to control for prior achievement, attendance, and

school-level variables

• High stakes decisions• Bonuses• Frequency of classroom observations

INTRODUCTION AND HISTORYDALLAS – mid-1990’S

Page 9: Robert W. Lissitz University of Maryland

• Sanders et al.• “Layered” multiple regression model

• Effects of teachers and past teachers• Multiple years of prior performance on several subject

matter exams• Used to covary out the effect of undesirable student

characteristics on growth• Complex interactions could not be statistically removed

• Effects may have different influence on students of different ability levels

• Probably not possible to eliminate statistically• Future might look at latent classes of students and

teachers

INTRODUCTION AND HISTORYTVAAS – mid-1990’S

Page 10: Robert W. Lissitz University of Maryland

• Nonrandom assignment of students to teachers

• Effect not controlled by use of prior performance level

• Bias reduced by using multiple prior measures

• “Dynamic” interaction between students and teachers

• Association between teacher effectiveness and student

characteristics

• VAM for high-stakes decisions not for all

• Many teachers with subjects not tested

• Memphis, TN – VAM does not apply to 70% of teachers

INTRODUCTION AND HISTORYCHALLENGES – CRITICISM

Page 11: Robert W. Lissitz University of Maryland

Think of the reliability of VAM as a generalizability problem.

Is teacher effectiveness justified as a main effect, or are teachers actually effective in some circumstances and ineffective in others?

If interactions exist, the problem for the principal changes from “who is ineffective?” to “are there

conditions in which this teacher can be effective?”

RELIABILITYGENERALIZABILITY

Page 12: Robert W. Lissitz University of Maryland

RELIABILITY

Mandeville (1988): • School effectiveness estimates were stable in the 0.34

to 0.66 range of correlations• Large differences across grade level and subject matter

McCaffrey (2009): • Teacher effect estimates one year apart had

correlations around 0.2 to 0.3• Teaching itself may not be a stable phenomenon

• Variability may be due to actual performance changes from year to year; instability may be intractable

STABILITY OVER A ONE-YEAR PERIOD

Page 13: Robert W. Lissitz University of Maryland

Sass (2008) and Newton, et al (2010):

• Estimates of teacher effectiveness from test-retest

assessments over a short time period

• Correlations in the range of 0.6

• For high stakes testing, we usually require reliability

greater than 0.8

• Still may indicate a real phenomenon, but modest

RELIABILITYSTABILITY OVER A SHORT PERIOD OF TIME

Page 14: Robert W. Lissitz University of Maryland

Mandeville & Anderson (1987) and others (Rockoff, 2004;

Newton, et al, 2010):

• Stability fluctuates across grade and subject matter

• Limited stability found more often with math courses,

less often with reading courses

• Success depends on what class you are assigned

rather than your ability?

• Serious issues of fairness and comparability

RELIABILITYSTABILITY ACROSS GRADE AND SUBJECT

Page 15: Robert W. Lissitz University of Maryland

• Perception that entire school is good or bad is very popular

• St. Louis, early 1990’s• Challenged advisory committee to find a school that

remained at the top 3 years in a row• No system that reported back had even one

• Fed Blue Ribbon Schools• “Winning school in one year was typically not at the top a

year or two later”• Bottom line:

• Rankings or groupings of schools (e.g., quintiles) are not stable.

RELIABILITYSTABILITY AT THE SCHOOL LEVEL

Page 16: Robert W. Lissitz University of Maryland

Sass (2008):• Top quintile and bottom quintile seem the most stable• Correlation of teacher effectiveness in those groups

was 0.48 across comparable exams over a short time• Time extended to a year between tests: correlation

dropped to 0.27Papay (2011):

• Three different tests• Rank order correlations of teacher effectiveness across

time ranged from 0.15 to 0.58 across different tests• Test timing and measurement error have effects

RELIABILITYSTABILITY ACROSS TEST FORMS

Page 17: Robert W. Lissitz University of Maryland

Tekwe, et al (2004):

• Compared four regression models

• Unless models involve different variables, results tend to be

similar

Dawes (1979):

• Linear composites seem to be pretty much the same

regardless of how one gets the weights

Hill, et al (2011):

• Convergent validity problem

RELIABILITYSTABILITY ACROSS STATISTICAL MODELS

Page 18: Robert W. Lissitz University of Maryland

Newton, et al (2010):

• Students who are less advantaged, ESL, or on a lower track

can have a negative impact on teacher effect estimates

• Multiple VAM models were tested

• Success of matching teacher characteristics to VAM

outcomes was modest

• VAM could be used as a criterion to judge other variables,

but validity is questionable

RELIABILITYSTABILITY ACROSS CLASSROOMS

Page 19: Robert W. Lissitz University of Maryland

• Persistent effects (teacher consistency), non-persistent effects

(inconsistency), and non-persistence due to sampling error (unknown)

• 30-60% of variation is due to sampling error

• In part due to small numbers of students as the basis of effectiveness

estimates

• Regression to the mean

• Class sizes vary within a school or district

• Classrooms with fewer students tend toward the mean

• Bayes estimates in multilevel modeling also introduce bias that is a function

of sample size

• Other occupations: Lack of consistency is typical of complex professions –

baseball players, stock investors…

RELIABILITYSOURCES OF UNRELIABILITY

Page 20: Robert W. Lissitz University of Maryland

• Years of experience, advanced degrees, certification,

licensure, school quality, etc. have low relationships (if any) to

teacher effectiveness

• Weak relationship between effectiveness and advanced degree

• Knowledge of mathematics positively correlated with teaching

mathematics effectively

• VAM estimates provide better measures of teacher impact on

student test scores than measures on teacher’s job

application

VALIDITYJOB APPLICATIONS AS PREDICTIVE MEASURES

Page 21: Robert W. Lissitz University of Maryland

Goe, et al (2008):

• Context for evaluation

• Teachers should be compared to other teachers who:

• Teach similar courses

• In same grade

• In a similar context

• Assessed by same or similar examination

• Probably necessary to establish validity

VALIDITYTRIANGULATION OF MULTIPLE INDICATORS

Page 22: Robert W. Lissitz University of Maryland

• Ability is very likely correlated with growth and

status

• Do gifted students learn at the same rate as others?

• Gifted students and their teachers have an

advantage

• Interaction between student ability and teachers’

ability to be effective

• Mixture models are in development

VALIDITYCOMPARABILITY

Page 23: Robert W. Lissitz University of Maryland

Rubin (2004):

• Missing data is not missing at random

• Missing in a way that confounds results and

complicates inferences

• We do not have a clear idea what our hypothesis

is

• Multiple operational definitions of growth, but no

developmental science for the phenomenon

VALIDITYCAUSALITY, RESEARCH DESIGN, AND THEORY

Page 24: Robert W. Lissitz University of Maryland

• Without carefully controlled experiments, we cannot isolate teacher effects• Students have multiple teachers• Influence of prior performance and experience

• What do we even mean by causal effect?• How do teachers and schools impart their effect?• How is it internalized by the student?

• Lord’s paradox• ANCOVA does not lead to unambiguous interpretations

• Only experimental efforts will provide adequate results• Eminent faculty member: teacher decision-making - unclear

what is optimal

VALIDITYCAUSALITY, RESEARCH DESIGN, AND THEORY

Page 25: Robert W. Lissitz University of Maryland

Are teachers the most important factor determining student

achievement?

• Nye, et al (2004): 11% of variation in student gains

explained by teacher effects

• Rockoff (2004): Teacher effects 5.0-6.4%

School effects 2.7-6.1%

Student fixed effects 59-68%

VALIDITYWHY SHOULD WE CARE?

Page 26: Robert W. Lissitz University of Maryland

Importance of classroom context

• Kennedy (2010), etc.:

• Situational factors influence teacher success

• Time, materials, work assignments

• Controlling behavioral issues; mainstreaming only

students who are willing/capable to be non-disruptive

• Technical assistance with teaching (computers..)

New teacher’s Goal: Maximize context for learning

VALIDITYWHY SHOULD WE CARE?

Page 27: Robert W. Lissitz University of Maryland

New paradigm – different orientation toward the learning process• Teacher optimizes the context of the classroom

• Adding to motivation• Preventing disruption• Providing opportunity for enhanced learning engagement

• Use of assistive teaching devices (computers) will change teacher’s role

• Develop a learning science• Current paradigm emphasizes external validity and

immediate generality• Instead, create laboratory for education science

VALIDITYWHY SHOULD WE CARE?

Page 28: Robert W. Lissitz University of Maryland

Fairness

• Little evidence VAM is ready for high stakes use

• But…

Is it less fair than traditional personnel selection that

focuses on advanced degrees and certificates, more credit

hours, and working more years? Classroom observations?

VALIDITYWHY SHOULD WE CARE?

Page 29: Robert W. Lissitz University of Maryland

• The MARCES Center has studied 11 of the simplest models that might be applied

• The full VAM report and the full text supporting this presentation can be accessed at

http://marces.org/Completed.htm

OUR STUDYCOMPARING MODELS USING REAL DATA

Page 30: Robert W. Lissitz University of Maryland

• We obtained 3 years of data on the same students, linked to their teachers

• Students divided into four cohorts: (N ≈ 5000 per cohort)

OUR STUDYCOMPARING MODELS USING REAL DATA

Cohort 1: 3rd, 4th, 5th grades Cohort 3: 5th, 6th, 7th grades

Cohort 2: 4th, 5th, 6th grades Cohort 4: 6th, 7th, 8th grades

• Math and reading data from yearly spring state assessment (2008-2010)• No vertical scale

• Horizontally equated from year to year

• VAM models chosen for comparison do not require vertical scaling• Nine models compare growth from first to second year

• Two models compare growth from first and second to third year

Page 31: Robert W. Lissitz University of Maryland

TABLE 2: Data used in our studyVariable LabelQRG1 Quantile regression with one

predictorQRG2 Quantile regression with two

predictorConD Deciles conditional on decilesConZ Z scores conditional on decilesOLS1 Ordinary least squares with one

predictorOLS2 Ordinary least squares with two

predictorsOLSS Ordinary least squares using spline

scoresDIFS Difference between spline scoresTRSG Transition model with values

reflecting both status and growthTRUG Transition model reflecting upward

growth onlyTRUD Transition model reflecting upward

and downward change

Variable LabelPre Pretest Post Posttest Sub1 Math subscore 1 (Algebra) or Reading

subsocre1 (General Reading)Sub2 Math subscore 2 (Geometry and

Measurement) or Reading subscore 2 (Literary Reading)

Sub3 Math subscore 3 (Statistics and Probability) or Reading subscore 3 (Informational Reading)

Sub4 Math subscore 4 (Numbers and Computations)

Sub5 Math subscore 5 (Processes)Gender Male – 0 ; female – 1SPED Special Ed : no – 0 ; yes – 1LEP ELL code : no – 0 ; yes – 1FARMs Free and reduced meals :

no – 0 ; yes – 1ACC Accommodated : no – 0 ; yes – 1Indian  No – 0 ; yes – 1Asian  No – 0 ; yes – 1African  No – 0 ; yes – 1White  No – 0 ; yes – 1Hispanic  No – 0 ; yes – 1

Page 32: Robert W. Lissitz University of Maryland

OUR STUDYMODELS

Variable Label

QRG1  Quantile regression with one prior year predictor

QRG2  Quantile regression with two prior years as predictors

ConD  Deciles conditional on prior deciles

ConZ  Z scores conditional on deciles of z-scores

OLS1  Ordinary least squares with one prior year predictor

OLS2  Ordinary least squares with two prior year predictors

OLSS  Ordinary least squares using spline scores

DIFS  Difference between spline scores

TRSG  Transition model with values reflecting both status and growth

TRUG  Transition model reflecting upward growth only

TRUD  Transition model reflecting upward and downward change

Page 33: Robert W. Lissitz University of Maryland

OUR STUDYMODELS

Variable Label

QRG1  Quantile regression with one predictor

QRG2  Quantile regression with two predictor

ConD  Deciles conditional on deciles

ConZ  Z scores conditional on deciles

OLS1  Ordinary least squares with one predictor

OLS2  Ordinary least squares with two predictors

OLSS  Ordinary least squares using spline scores

DIFS  Difference between spline scores

TRSG  Transition model with values reflecting both status and growth

TRUG  Transition model reflecting upward growth only

TRUD  Transition model reflecting upward and downward change

BETEBENNER’S MODEL• Used in Colorado• Looks at conditional percentile of each student’s

performance in the second year, compared to other students who started in same percentile the first year

• Aggregates conditional percentiles of students exposed to each teacher

ConD is a simplification: aggregates students into deciles one year and compares to deciles the second year

BETEBENNER’S MODEL• Used in Colorado• Looks at conditional percentile of each student’s

performance in the second year, compared to other students who started in same percentile the first year

• Aggregates conditional percentiles of students exposed to each teacher

QRG2 uses 2 prior years to condition the percentile the 3rd year

BETEBENNER’S MODEL• Used in Colorado• Looks at conditional percentile of each student’s

performance in the second year, compared to other students who started in same percentile the first year

• Aggregates conditional percentiles of students exposed to the same teacher

QRG1 uses prior year to condition the percentile the next year

QRG1  Quantile regression with one predictor

QRG2  Quantile regression with two predictors

ConD  Deciles conditional on deciles

QRG2  Quantile regression conditioning on two years

QRG1  Quantile regression conditioning on one prior year

ConD  Decile conditional on prior year decile

Page 34: Robert W. Lissitz University of Maryland

OUR STUDYMODELS

Variable Label

QRG1  Quantile regression with one predictor

QRG2  Quantile regression with two predictors

ConD  Deciles conditional on deciles

ConZ  Z scores conditional on deciles

OLS1  Ordinary least squares with one predictor

OLS2  Ordinary least squares with two predictors

OLSS  Ordinary least squares using spline scores

DIFS  Difference between spline scores

TRSG  Transition model with values reflecting both status and growth

TRUG  Transition model reflecting upward growth only

TRUD  Transition model reflecting upward and downward change

THUM’S MODELOur simplification: z score conditional on prior deciles:

1. Rank order all students’ year one scale scores; divide into 10 deciles

2. Compute mean of year 2 scale scores for students within each decile

3. Compute deviation scores from the decile mean of year 2 scale scores for students within each decile

4. Compute pooled within-decile SD of year 2 scale scores5. Compute growth z score for each student

THUM’S MODEL• Similar to ConD, but looks at effect size• Uses z score to identify student’s performance level

compared to the average student the first year• In second year, compares student’s z score to students

who started at same z position (within a decile) in the prior year

• Conditional z scores aggregated for each teacher to provide measure of effectiveness

ConZ  Z scores conditional on deciles

Page 35: Robert W. Lissitz University of Maryland

OUR STUDY

Variable Label

QRG1  Quantile regression with one predictor

QRG2  Quantile regression with two predictors

ConD  Deciles conditional on deciles

ConZ  Z scores conditional on deciles

OLS1  Ordinary least squares with one predictor

OLS2  Ordinary least squares with two predictors

OLSS  Ordinary least squares using spline scores

DIFS  Difference between spline scores

TRSG  Transition model with values reflecting both status and growth

TRUG  Transition model reflecting upward growth only

TRUD  Transition model reflecting upward and downward change

MODELS

OLS1  Ordinary least squares with one predictor

OLS2  Ordinary least squares with two predictors

ORDINARY LEAST SQUARES REGRESSION• Aggregates errors of prediction across teachers to see

which teacher’s students tend to perform above or below prediction

OLS2 Independent variable: first two years’ scale scoresEffectiveness measure: deviation from expected scale

score for year three

OLS1 Independent variable: first year scale scoreEffectiveness measure: deviation from expected scale

score for year two

Page 36: Robert W. Lissitz University of Maryland

OUR STUDYMODELS

Variable Label

QRG1  Quantile regression with one predictor

QRG2  Quantile regression with two predictors

ConD  Deciles conditional on deciles

ConZ  Z scores conditional on deciles

OLS1  Ordinary least squares with one predictor

OLS2  Ordinary least squares with two predictors

OLSS  Ordinary least squares using spline scores

DIFS  Difference between spline scores

TRSG  Transition model with values reflecting both status and growth

TRUG  Transition model reflecting upward growth only

TRUD  Transition model reflecting upward and downward change

OLSS  Ordinary least squares using spline scores

DIFS  Difference between spline scores

REGRESSION USING SPLINE SCORES• Calculated with scores that had been transformed by a

spline function• Gives relational meaning to points along the performance

continuum across grades• Builds a quasi-vertical scale without common items

• Transformation matched to cut scores for 3 proficiency levels: basic, proficient, advanced

DIFS subtracts spline function transformed score at year 1 from the transformed score at year 2, as though they were a true vertical scale

OLSS applies ordinary least squares to the spline scale scores and looked at deviations from predicted

Page 37: Robert W. Lissitz University of Maryland

OUR STUDYMODELS

Variable Label

QRG1  Quantile regression with one predictor

QRG2  Quantile regression with two predictors

ConD  Deciles conditional on deciles

ConZ  Z scores conditional on deciles

OLS1  Ordinary least squares with one predictor

OLS2  Ordinary least squares with two predictors

OLSS  Ordinary least squares using spline scores

DIFS  Difference between spline scores

TRSG  Transition model with values reflecting both status and growth

TRUG  Transition model reflecting upward growth only

TRUD  Transition model reflecting upward and downward change

TRSG  Transition model with values reflecting both status and growth

TRUG  Transition model reflecting upward growth only

TRUD  Transition model reflecting upward and downward change

TRUD values reflect growth as well as decreased performance• Does not reward for status

 Value Table for TRUD  B1 B2 B3 P1 P2 P3 A1 A2 A3B1 0 0.5 1 1.5 2 2.5 3 3.5 4B2 -1 0 0.5 1 1.5 2 2.5 3 4B3 -1 -1 0 0.5 1 1.5 2 2.5 3P1 -2 -1 -1 0 0.5 1 1.5 2 3P2 -2 -2 -1 -1 0 0.5 1 1.5 2P3 -3 -2 -2 -1 -1 0 0.5 1 2A1 -3 -3 -2 -2 -1 -1 0 0.5 1A2 -4 -3 -3 -2 -2 -1 -1 0 1A3 -4 -4 -3 -3 -2 -2 -1 -1 0

TRANSITION MODELS• Used in Delaware and Arkansas• Classify students into categories in year one (basic,

proficient, advanced)• Divide each category into three subcategories

• Observe year two category conditional on year one performance

• Matrix associated with transition from level at year one to level at year two

• Values represent importance of each transition; determined by educators

TRUD values reflect growth as well as decreased performance• Does not reward for status

TRUG rewards students only for growth• Does not punish for regressing• Does not distinguish much between amounts of growth

 Value Table for TRUG  B1 B2 B3 P1 P2 P3 A1 A2 A3B1 1 3 4 4 4 4 4 4 4B2 0 1 3 4 4 4 4 4 4B3 0 0 1 3 3 4 4 4 4P1 0 0 0 1 2 3 4 4 4P2 0 0 0 0 1 2 4 4 4P3 0 0 0 0 0 1 3 3 4A1 0 0 0 0 0 0 1 2 3A2 0 0 0 0 0 0 0 1 2A3 0 0 0 0 0 0 0 0 1

TRANSITION MODELS• Used in Delaware and Arkansas• Classify students into categories in year one (basic,

proficient, advanced)• Divide each category into three subcategories

• Observe year two category conditional on year one performance

• Matrix associated with transition from level at year one to level at year two

• Values represent importance of each transition; determined by educators

TRUG rewards students only for growth• Does not punish for regressing

TRSG rewards students for maintaining previous status and for growth within and across performance levels

• Reward increases with higher performance level status

 Value Table for TRSG  B1 B2 B3 P1 P2 P3 A1 A2 A3B1 9 11 13 15 17 19 21 23 25B2 8 10 12 14 16 18 20 22 24B3 7 9 11 13 15 17 19 21 23P1 6 8 10 12 14 16 18 20 22P2 5 7 9 11 13 15 17 19 21P3 4 6 8 10 12 14 16 18 20A1 3 5 7 9 11 13 15 17 19A2 2 4 6 8 10 12 14 16 18A3 1 3 5 7 9 11 13 15 17

TRANSITION MODELS• Used in Delaware and Arkansas• Classify students into categories in year one (basic,

proficient, advanced)• Divide each category into three subcategories

• Observe year two category conditional on year one performance

• Matrix associated with transition from level at year one to level at year two

• Values represent importance of each transition; determined by educators

TRSG rewards students for maintaining previous status and for growth within and across performance levels

• Reward increases with higher performance level status

Page 38: Robert W. Lissitz University of Maryland

• Each student had growth calculation from year 1-2 and year 2-3

• Factor analysis of student growth from these models

intercorrelated for year 1-2 and replicated for 2-3

• One dimension accounts for largest percentage of variance

• Great deal of noise in results

• Over 80% of variance undefined by first dimension

• Results of factor analysis same for each pair of years, for each

cohort and for each content area

OUR STUDYINTER-CORRELATION OF STUDENT GROWTH SCORES AND THEIR DIMENSIONALITY

Page 39: Robert W. Lissitz University of Maryland

Example: Scree Plot for Math 2008-2009, Cohort 1

OUR STUDYINTER-CORRELATION OF STUDENT GROWTH SCORES AND THEIR DIMENSIONALITY

Page 40: Robert W. Lissitz University of Maryland

• Growth in reading tends to be slightly more correlated with SES and race than growth in math

• Correlations between TRSG and pre- and post-tests are strongest among all the models• Correlation between TRSG and pretest around 0.5• Correlation between TRSG and posttest around 0.8

• Correlations otherwise…• Between pretest and regression-based models: low• Between pretest and transition-based models: medium• Between posttest and regression-based models: higher• Between posttest and transition-based models: lower

OUR STUDYRELATION TO DEMOGRAPHIC VARIABLES AND PRE- AND POSTTEST SCORES

Page 41: Robert W. Lissitz University of Maryland

Year 2008-2009

OUR STUDYTHE CORRELATION BETWEEN GROWTH IN MATH AND GROWTH IN READING

  Cohort 1 Cohort 2 Cohort 3 Cohort 4Scale score1 0.74 0.66 0.64 0.65QRG1 0.25 0.17 0.19 0.15ConD 0.25 0.18 0.19 0.16ConZ 0.27 0.19 0.23 0.17OLS1 0.26 0.18 0.23 0.16OLSS 0.25 0.19 0.20 0.14DIFS 0.20 0.14 0.15 0.08TRSG 0.42 0.30 0.36 0.32TRUG 0.13 0.03 0.11 0.07TRUD 0.16 0.07 0.14 0.08

Page 42: Robert W. Lissitz University of Maryland

Year 2009-2010

OUR STUDYTHE CORRELATION BETWEEN GROWTH IN MATH AND GROWTH IN READING

  Cohort 1 Cohort 2 Cohort 3 Cohort 4Scale score1 0.69 0.67 0.68 0.65QRG1 0.19 0.22 0.18 0.16QRG2 0.18 0.21 0.18 0.14ConD 0.19 0.22 0.18 0.16ConZ 0.20 0.24 0.20 0.18OLS1 0.20 0.24 0.18 0.17OLS2 0.19 0.23 0.18 0.16OLSS 0.21 0.25 0.16 0.15DIFS 0.17 0.16 0.09 0.09TRSG 0.30 0.40 0.35 0.32TRUG 0.05 0.12 0.08 0.08TRUD 0.07 0.16 0.09 0.08

Page 43: Robert W. Lissitz University of Maryland

Math

OUR STUDYTHE CORRELATION BETWEEN THE TWO GROWTH PERIODS (YEAR 1-2 AND YEAR 2-3)

  Cohort 1 Cohort 2 Cohort 3 Cohort 4Scale Score2 0.84 0.83 0.86 0.87QRG1 -0.30 -0.25 -0.29 -0.25ConD -0.24 -0.20 -0.24 -0.19ConZ -0.23 -0.20 -0.23 -0.18OLS1 -0.31 -0.26 -0.32 -0.28OLSS -0.31 -0.24 -0.32 -0.28DIFS -0.45 -0.36 -0.44 -0.36TRSG 0.18 0.32 0.33 0.37TRUG -0.39 -0.28 -0.37 -0.29TRUD -0.48 -0.33 -0.46 -0.33

Page 44: Robert W. Lissitz University of Maryland

Reading

OUR STUDYTHE CORRELATION BETWEEN THE TWO GROWTH PERIODS (YEAR 1-2 AND YEAR 2-3)

  Cohort 1 Cohort 2 Cohort 3 Cohort 4Scale Score2 0.77 0.68 0.69 0.71QRG1 -0.25 -0.26 -0.26 -0.27ConD -0.19 -0.22 -0.22 -0.21ConZ -0.20 -0.19 -0.18 -0.19OLS1 -0.26 -0.26 -0.26 -0.26OLSS -0.27 -0.22 -0.25 -0.27DIFS -0.42 -0.47 -0.44 -0.50TRSG 0.12 0.02 0.03 0.03TRUG -0.36 -0.37 -0.38 -0.38TRUD -0.38 -0.45 -0.46 -0.46

Page 45: Robert W. Lissitz University of Maryland

Square Root of Intra-Class Correlations for Year 2008-2009

OUR STUDYTEACHER EFFECTIVENESS AND TEACHER RELIABILITY

  Math   Reading

 Cohort

1Cohort

2Cohort

3Cohort

4  Cohort

1Cohort

2Cohort

3Cohort

4QRG1 0.48 0.44 0.39 0.44   0.42 0.37 0.28 0.26ConD 0.47 0.44 0.38 0.43   0.43 0.37 0.28 0.28ConZ 0.47 0.45 0.38 0.44   0.43 0.38 0.31 0.28OLS1 0.48 0.45 0.39 0.45   0.42 0.37 0.31 0.27OLSS 0.47 0.45 0.41 0.43   0.42 0.39 0.32 0.26DIFS 0.44 0.41 0.38 0.40   0.37 0.34 0.29 0.23TRSG 0.55 0.60 0.51 0.57   0.53 0.42 0.37 0.35TRUG 0.41 0.36 0.33 0.35   0.35 0.34 0.31 0.23TRUD 0.42 0.40 0.35 0.37   0.36 0.33 0.28 0.24# of Teacher4 292 262 96 120   268 107 122 122# Mean 19.48 21.13 57.99 48.26   21.29 51.79 45.39 47.18# SD 15.66 14.00 82.32 89.14   14.43 31.78 25.62 26.90

Page 46: Robert W. Lissitz University of Maryland

Square Root of Intra-Class Correlations for Year 2009-2010

OUR STUDYTEACHER EFFECTIVENESS AND TEACHER RELIABILITY

  Math   Reading

 Cohort

1Cohort

2Cohort

3Cohort

4  Cohort

1Cohort

2Cohort

3Cohort

4QRG1 0.44 0.35 0.38 0.36   0.33 0.29 0.30 0.24QRG2 0.42 0.35 0.38 0.38   0.32 0.27 0.25 0.22ConD 0.43 0.36 0.37 0.36   0.34 0.29 0.31 0.26ConZ 0.43 0.36 0.38 0.37   0.34 0.31 0.31 0.26OLS1 0.44 0.38 0.38 0.37   0.34 0.30 0.31 0.25OLS2 0.42 0.38 0.39 0.39   0.33 0.29 0.26 0.21OLSS 0.44 0.40 0.38 0.36   0.34 0.28 0.28 0.25DIFS 0.40 0.40 0.35 0.32   0.30 0.23 0.23 0.20TRSG 0.59 0.50 0.59 0.57   0.42 0.34 0.34 0.34TRUG 0.36 0.32 0.31 0.31   0.31 0.20 0.22 0.20TRUD 0.40 0.36 0.32 0.31   0.32 0.22 0.23 0.19# of Teacher4 306 283 94 103   291 91 97 95# Mean 18.33 16.97 50.61 49.27   19.33 53.81 48.84 53.61# SD 8.99 9.24 34.50 38.35   12.11 33.19 30.60 34.49

Page 47: Robert W. Lissitz University of Maryland

Year to Year Reliability of Teacher EffectivenessBetween 2008-2009 and 2009-2010

OUR STUDYTEACHER EFFECTIVENESS AND TEACHER RELIABILITY

   Math  

   Reading  

Grade 5 Grade 6 Grade 7 Grade 5 Grade 6 Grade 7QRG1 0.42 0.73 0.50   0.28 0.51 0.61ConD 0.44 0.73 0.52   0.31 0.53 0.63ConZ 0.46 0.74 0.56   0.36 0.53 0.68OLS1 0.47 0.75 0.55   0.34 0.49 0.67OLSS 0.43 0.72 0.52   0.32 0.58 0.58DIFS 0.42 0.65 0.50   0.13 0.08 0.30TRSG 0.61 0.82 0.73   0.42 0.71 0.68TRUG 0.36 0.58 0.53   0.22 0.01 0.34TRUD 0.40 0.62 0.50   0.20 0.10 0.29               # of Teacher 177 69 82   185 57 55

Page 48: Robert W. Lissitz University of Maryland

Sq. root of School Intra-Class Correlation for Year 2008-2009

OUR STUDYSCHOOL EFFECTIVENESS AND SCHOOL RELIABILITY

  Math   Reading

 Cohort

1Cohort

2Cohort

3Cohort

4  Cohort

1Cohort

2Cohort

3Cohort

4QRG1 0.37 0.36 0.28 0.34   0.31 0.27 0.21 0.18ConD 0.37 0.36 0.27 0.34   0.32 0.27 0.20 0.19ConZ 0.37 0.36 0.26 0.34   0.32 0.27 0.22 0.18OLS1 0.38 0.35 0.26 0.34   0.31 0.27 0.23 0.18OLSS 0.36 0.36 0.27 0.35   0.30 0.28 0.24 0.18DIFS 0.36 0.34 0.26 0.24   0.26 0.25 0.27 0.16TRSG 0.37 0.41 0.31 0.43   0.38 0.27 0.25 0.22TRUG 0.31 0.29 0.24 0.19   0.25 0.22 0.28 0.16TRUD 0.32 0.31 0.24 0.22   0.26 0.23 0.24 0.17# of School7 103 102 27 28   103 100 27 27# Mean 55.23 54.27 206.19 206.82   54.47 48.03 176.19 187.96# SD 15.61 13.91 76.78 91.74   10.38 12.53 30.27 28.97

Page 49: Robert W. Lissitz University of Maryland

Sq. root of School Intra-Class Correlation for Year 2009-2010

OUR STUDYSCHOOL EFFECTIVENESS AND SCHOOL RELIABILITY

  Math   Reading

 Cohort

1Cohort

2Cohort

3Cohort

4  Cohort

1Cohort

2Cohort

3Cohort

4QRG1 0.35 0.26 0.28 0.25   0.24 0.19 0.19 0.16QRG2 0.34 0.26 0.27 0.28   0.24 0.19 0.17 0.15ConD 0.34 0.25 0.28 0.25   0.24 0.19 0.20 0.17ConZ 0.35 0.26 0.28 0.26   0.24 0.20 0.20 0.17OLS1 0.34 0.27 0.27 0.25   0.24 0.19 0.20 0.16OLS2 0.34 0.28 0.27 0.28   0.25 0.20 0.18 0.14OLSS 0.35 0.28 0.27 0.24   0.23 0.20 0.18 0.16DIFS 0.33 0.28 0.22 0.20   0.21 0.19 0.14 0.14TRSG 0.40 0.31 0.37 0.40   0.26 0.23 0.21 0.20TRUG 0.26 0.24 0.19 0.21   0.21 0.17 0.13 0.14TRUD 0.30 0.26 0.20 0.21   0.23 0.17 0.14 0.13# of School7 103 27 27 28   103 27 27 27# Mean 55.40 205.22 205.07 205.57   54.61 181.37 175.44 188.63# SD 15.17 73.19 76.85 93.65   15.48 73.50 82.69 90.40

Page 50: Robert W. Lissitz University of Maryland

Year to Year Reliability of School EffectivenessBetween 2008-2009 and 2009-2010

OUR STUDYSCHOOL EFFECTIVENESS AND SCHOOL RELIABILITY

   Math  

   Reading  

Grade 5 Grade 6 Grade 7 Grade 5 Grade 6 Grade 7QRG1 0.53 0.77 0.60   0.33 0.74 0.37ConD 0.53 0.77 0.61   0.39 0.72 0.43ConZ 0.55 0.76 0.61   0.41 0.72 0.43OLS1 0.58 0.76 0.63   0.37 0.76 0.44OLSS 0.56 0.78 0.62   0.46 0.79 0.28DIFS 0.48 0.77 0.30   0.25 0.86 -0.15TRSG 0.79 0.86 0.90   0.61 0.81 0.53TRUG 0.52 0.75 0.23   0.31 0.88 -0.20TRUD 0.53 0.73 0.31   0.30 0.89 -0.21               # of School 101 27 27   99 27 27

Page 51: Robert W. Lissitz University of Maryland

Levels of Effectiveness

2008-2009

OUR STUDYCOMPARISON BETWEEN SCHOOL AND TEACHER EFFECT

QRG1 ConD ConZ OLS1 OLSS DIFS TRSG TRUG TRUD0.00

0.10

0.20

0.30

0.40

0.50

0.60

School_Math School_ReadingTeacher_Math Teacher_Reading

Effectiveness

Page 52: Robert W. Lissitz University of Maryland

Levels of Effectiveness

2009-2010

OUR STUDYCOMPARISON BETWEEN SCHOOL AND TEACHER EFFECT

QRG1 QRG2 ConD ConZ OLS1 OLS2 OLSS DIFS TRSG TRUG TRUD0.00

0.10

0.20

0.30

0.40

0.50

0.60

School_Math School_ReadingTeacher_Math Teacher_Reading

Effectiveness

Page 53: Robert W. Lissitz University of Maryland

Math Cohort 1 in Year 2008-2009

OUR STUDYMETHODOLOGICAL ISSUES

Page 54: Robert W. Lissitz University of Maryland

THE MODEL YOU USE CAN MAKE A DIFFERENCE

• Deciding how to balance status against growth

• No standardization for the modeling of VAM

• Traditional qualitative approaches used by principals are

not likely to be an improvement on VAM

• Using either approach for high stakes testing and

decision-making seems premature

• Combining two procedures that are not valid will not

necessarily result in a valid system

OUR CONCLUSIONS

Page 55: Robert W. Lissitz University of Maryland

MORE SOPHISTICATED GROWTH MODELSOUR CONCLUSIONS

• Would be nice to explore different models

• Example: 4 level model• Many vertically scaled time points• Many subject matter assessments• Nested within students (level 2)• Nested within teachers (level 3)• Nested within school context (level 4)

• Mixture and latent class models• Student and teachers as members of discrete

groups that interact

Page 56: Robert W. Lissitz University of Maryland

INTERACTIONS SHOULD BE MODELEDWhy model teacher effects…• as if all students react the same way?• as if all teachers are the same over time?

OUR CONCLUSIONS

SCHOOL AND CLASSROOM CONTEXT EFFECTS• Should be investigated as well• Implications for how to create a learning science• May add to the modest results for teachers and

schools

Page 57: Robert W. Lissitz University of Maryland

CHANGE IN INSTRUCTION INVOLVING SUPPORTIVE TECHNOLOGY

• The transition (paradigm shift) may be closer than we think

• Cognitive, computer, econometrician, engineering

scientists are beginning to study education

• Field can be expected to change as researchers and their

students change

• The nature of teachers and instructional decision-making

• Radical changes for the better are expected

OUR CONCLUSIONS

Page 58: Robert W. Lissitz University of Maryland

VAM FOR HIGH STAKES

• Right now, I do not encourage it

• It makes a difference what VAM model we

implement

• Choose the model based on policy decisions that

capture the goals and intent of the school system

• Factors not in the teacher’s control have an effect

OUR CONCLUSIONS

Page 59: Robert W. Lissitz University of Maryland

RELATING VAM TO WHAT TEACHERS ARE DOING• Create causal models and explore with experiments

OUR CONCLUSIONS

INTERESTED IN IMPLEMENTING A VAM?• Read Finlay and Manavi (2008)

• Practical political issues of using VAM in schools• Unions, federal government, special education

advocates…• Effective teaching requires good measurement and

presents a great challenge and is a worthy goal…

Page 60: Robert W. Lissitz University of Maryland

Questions?

Robert W. LissitzUniversity of Maryland

Maryland Assessment Research Center for Education Success

Visit http://marces.org/Completed.htm to find references, the full text of this talk, our

comparison of value-added models, and other projects.