Pieces of A Puzzle Russell Gersten RG Research Group & Professor Emeritus, University of Oregon Russell Gersten RG Research Group & Professor Emeritus,

Pieces of A PuzzlePieces of A Puzzle

Russell GerstenRG Research Group & Professor Emeritus, University of Oregon

Russell GerstenRG Research Group & Professor Emeritus, University of Oregon

Evaluating Professional Development

in Mathematics

OverviewOverview

1. Components of rigorous evaluation 2. Inherent Difficulties on a small budget 3. Suggestions for creating pieces of the

puzzle4. Example of rigorous research on

Teacher Study Groups in reading 5. Follow-up session: linkages to the

evaluation of your Partnership

1. Components of rigorous evaluation 2. Inherent Difficulties on a small budget 3. Suggestions for creating pieces of the

puzzle4. Example of rigorous research on

Teacher Study Groups in reading 5. Follow-up session: linkages to the

evaluation of your Partnership

Typical EvaluationTypical Evaluation

Relies on Self ReportMight include teacher knowledge measureHeavy reliance on teachers’ perceptionsTypically no control group (One of best examples: Evaluation of

Eisenhower Professional development by Mike Garet, Andy Porter, Bea Birman and Laura Densimore et al.)

Relies on Self ReportMight include teacher knowledge measureHeavy reliance on teachers’ perceptionsTypically no control group (One of best examples: Evaluation of

Eisenhower Professional development by Mike Garet, Andy Porter, Bea Birman and Laura Densimore et al.)

What Can be LearnedWhat Can be Learned

Useful for purposes of formative evaluation (i.e. examples used, type of exercises, “market potential” etc.)

Can provide insights into what teachers/districts value: Coherence, Practicality, hands on examples

Can weed out ineffective approaches or components (e.g. no gain at all)

Useful for purposes of formative evaluation (i.e. examples used, type of exercises, “market potential” etc.)

Can provide insights into what teachers/districts value: Coherence, Practicality, hands on examples

Can weed out ineffective approaches or components (e.g. no gain at all)

What Can’t Be Learned What Can’t Be Learned

No information on Whether teachers use the professional

development content in their teaching Whether teachers’ increased mathematics

knowledge reflected in teaching Long or short term impacts on student learning

of mathematics Whether merely giving out a teacher guide, for

example, would be as useful or more useful.

No information on Whether teachers use the professional

development content in their teaching Whether teachers’ increased mathematics

knowledge reflected in teaching Long or short term impacts on student learning

of mathematics Whether merely giving out a teacher guide, for

example, would be as useful or more useful.

Problemas Problemas

Self reports on teaching practice tend to not be accurate (e.g. Ball, 1990; Cohen, 1990)

There is so much to learn and without a comparison group, we don’t answer any questions

Reminder: without classroom observations or student achievement, we don’t know if we had any real impact

Self reports on teaching practice tend to not be accurate (e.g. Ball, 1990; Cohen, 1990)

There is so much to learn and without a comparison group, we don’t answer any questions

Reminder: without classroom observations or student achievement, we don’t know if we had any real impact

Towards RigorTowards Rigor

Do not rely only on self report Use a comparison groupRemember: study can agregate

information over the course of several years.

Do not rely only on self report Use a comparison groupRemember: study can agregate

information over the course of several years.

Rigorous Evaluation Rigorous Evaluation Only two have been done in mathematics

professional development One is underway as we speak

(AIR: Garet and colleagues) Much more intricate than evaluation of

instructional approach or curriculum Really necessary investment to understand what

helps teachers/what helps groups of teachers (e.g. newly hired, elementary teachers in mathematics with little background)

Only two have been done in mathematics professional development

One is underway as we speak (AIR: Garet and colleagues)

Much more intricate than evaluation of instructional approach or curriculum

Really necessary investment to understand what helps teachers/what helps groups of teachers (e.g. newly hired, elementary teachers in mathematics with little background)

Why are they necessary?Why are they necessary?

It is unclear how well many professional development approaches work

We work on folklore and on trying do the oppostie of what doesn’t seem to work

Important issues are never studied systematicslly

We don’t know much about effective mechanisms

It is unclear how well many professional development approaches work

We work on folklore and on trying do the oppostie of what doesn’t seem to work

Important issues are never studied systematicslly

We don’t know much about effective mechanisms

Design Design 1. Control/comparison group of teachers not

receiving this type of professional development 2. Adequate sample size (about 40-60 teachers per

condition)3. Random assignment to conditions4. Focus 5. Often multiple sites6. Possibly multi-year7. Requires partnerships

1. Control/comparison group of teachers not receiving this type of professional development

2. Adequate sample size (about 40-60 teachers per condition)

3. Random assignment to conditions4. Focus 5. Often multiple sites6. Possibly multi-year7. Requires partnerships

Measures Measures

1. Measures of teacher knowledge gained2. Measures of actual teaching practice3. Measure of student learning 4. Possible addenda: atittudes, perceptions,

concerns, documentation of the process of change, knowledge of conditions that enhance impact (e.g. teacher study groups, professional support, role of mathematics coach or specialist)

1. Measures of teacher knowledge gained2. Measures of actual teaching practice3. Measure of student learning 4. Possible addenda: atittudes, perceptions,

concerns, documentation of the process of change, knowledge of conditions that enhance impact (e.g. teacher study groups, professional support, role of mathematics coach or specialist)

(3) Students sometimes remember only part of a rule. The might say, for instance, Òtwo negatives make a positive.Ó For each operation listed, decide whether the statement Òtwo negatives make a positiveÓ sometimes works, always works, or never works (Mark SOMETIMES, ALWA YS, NEVER, or IÕM NOT SURE) Sometimes Always Never IÕm not sure Works Works Works ---------------------------------------------------------------------- (a) Addition (b) Subtraction (c) Multiplication (d) Division

To introduce the idea of grouping by tens and ones with young learners, which of the following materials or tools would be most appropriate? (Choose ONE.)

a. A number lineb. Plastic counting chipsc. Pennies and dimesd. Straws and rubber bandse. Any of these would be equally appropriate

for introducing the idea of grouping by tens and ones.

Knowledge of Content and Teaching(A sample Item)Knowledge of Content and Teaching(A sample Item)

Ball, D. & Hill, H. (2006). Knowledge of content and teaching: An example using materials for grouping. For a discussion of similar examples, see: Ball, D. L., Hill, H.C, & Bass, H. (2005). “Knowing mathematics for teaching: Who knows mathematics well enough to teach third grade, and how can we decide?” American Educator.

Project GoalsProject GoalsProject GoalsProject Goals

Use professional development to:1. Improve quality of comprehension and vocabulary

instruction in grade 1 Reading First classrooms.2. Improve student reading outcomes. 3. Improve Teaching Practice (measured by observation).4. Explore shifts in teachers’ sense of a professional

culture

Use professional development to:1. Improve quality of comprehension and vocabulary

instruction in grade 1 Reading First classrooms.2. Improve student reading outcomes. 3. Improve Teaching Practice (measured by observation).4. Explore shifts in teachers’ sense of a professional

culture

Teacher MeasuresTeacher Measures (Pre and Post) (Pre and Post)Teacher MeasuresTeacher Measures (Pre and Post) (Pre and Post)

Bryk et al. scales on School Professional Culture (pre and post)

Carlisle measure on efficacy and beliefs in terms of reading instruction

Bryk et al. scales on School Professional Culture (pre and post)

Carlisle measure on efficacy and beliefs in terms of reading instruction

Teacher Measures: Post OnlyTeacher Measures: Post OnlyTeacher Measures: Post OnlyTeacher Measures: Post Only

Teacher Knowledge (Phelps & Schilling, 2005 tailored comprehension and vocabulary for primary grades to ensure high IRT reliability)

Observed teaching practice (post) in comprehension and vocabulary

Teacher Knowledge (Phelps & Schilling, 2005 tailored comprehension and vocabulary for primary grades to ensure high IRT reliability)

Observed teaching practice (post) in comprehension and vocabulary

Classroom Observational Classroom Observational MeasureMeasure Classroom Observational Classroom Observational MeasureMeasure Was developed by Instructional Research Group

( Gersten, Dimino, Jayanthi) over a 12 month period Goal: Measure teaching practices that experimental

research suggests are most effective Goal: Obtain information on nuances of instruction but

do so using a reliable, low-moderate inference system

Was developed by Instructional Research Group ( Gersten, Dimino, Jayanthi) over a 12 month period

Goal: Measure teaching practices that experimental research suggests are most effective

Goal: Obtain information on nuances of instruction but do so using a reliable, low-moderate inference system

Two Major Approaches to Two Major Approaches to Observational ResearchObservational Research Two Major Approaches to Two Major Approaches to Observational ResearchObservational Research

1. Rating Scales (High to Moderate Level Inference)

2. Direct Measures of Observable Behaviors, Activities, Grouping Structures (Relatively Objective)

1. Rating Scales (High to Moderate Level Inference)

2. Direct Measures of Observable Behaviors, Activities, Grouping Structures (Relatively Objective)

Potential DrawbacksPotential DrawbacksPotential DrawbacksPotential Drawbacks Rating Scales: Bias, Halo effects

Direct Measures: Cost, lack of focus, clutter, what does it all mean?

(e.g. number of minutes in small group instruction, number of minutes with decodables, number of vocabulary words taught, and number of praise statements)

Rating Scales: Bias, Halo effects

Direct Measures: Cost, lack of focus, clutter, what does it all mean?

(e.g. number of minutes in small group instruction, number of minutes with decodables, number of vocabulary words taught, and number of praise statements)

Potential Advantages: Rating Potential Advantages: Rating ScalesScalesPotential Advantages: Rating Potential Advantages: Rating ScalesScales

Can focus on relevant aspects of instruction Can hit big picture issues Often have higher correlations to growth in outcomes

than more direct measures (Stoolmiller et al. Gersten et al, 1986; Foorman & Schatschneider, 2003)

1. Was story grammar used? 2. Did teacher think aloud during comprehension? 3. Clarity of teacher models during decoding;4. Use of student friendly definitions

Can focus on relevant aspects of instruction Can hit big picture issues Often have higher correlations to growth in outcomes

than more direct measures (Stoolmiller et al. Gersten et al, 1986; Foorman & Schatschneider, 2003)

1. Was story grammar used? 2. Did teacher think aloud during comprehension? 3. Clarity of teacher models during decoding;4. Use of student friendly definitions

Direct Observations: Potential Direct Observations: Potential AdvantagesAdvantagesDirect Observations: Potential Direct Observations: Potential AdvantagesAdvantages

Objective Can obtain data on number of minutes students

read decodables, engagement during decodable instruction etc

Can tally number of think alouds

Precision and Lack of Bias

Objective Can obtain data on number of minutes students

read decodables, engagement during decodable instruction etc

Can tally number of think alouds

Precision and Lack of Bias

Observational Measurement Observational Measurement Issues:Issues: Observational Measurement Observational Measurement Issues:Issues:

Quantity as a Surrogate for Quality

Are variables used in experimental intervention studies useful in classroom observations?

Creation of scales from direct observations

Quantity as a Surrogate for Quality

Are variables used in experimental intervention studies useful in classroom observations?

Creation of scales from direct observations

Sample Comprehension Items:Sample Comprehension Items:Relatively Low InferenceRelatively Low InferenceSample Comprehension Items:Sample Comprehension Items:Relatively Low InferenceRelatively Low InferenceTeacher models:1. Make inferences, summarize and find main ideas

2. “Retell, sequencing – “what is happening?, what

happened first?”

Interactive Items Teacher asks:1. Students questions requiring inference.2. Recall questions

Teacher models:1. Make inferences, summarize and find main ideas

2. “Retell, sequencing – “what is happening?, what

happened first?”

Interactive Items Teacher asks:1. Students questions requiring inference.2. Recall questions

Reliability Issues: Low Reliability Issues: Low Inference MeasuresInference MeasuresReliability Issues: Low Reliability Issues: Low Inference MeasuresInference Measures

A. Low inference measures can have high inter- rater reliabilities. Training often needs to be extensive and

costly.B. Researchers e.g. Reid and Patterson found, as early as

1975, that it is often better to sacrifice inter observer reliability for sophistication in coding of behaviors. Thus often reliability is only 80%.

C. Even then, can sacrifice NUANCE D. Actual computation of reliabilities is problematic: can be

inflated due to agreements on non-occurrence.

A. Low inference measures can have high inter- rater reliabilities. Training often needs to be extensive and

costly.B. Researchers e.g. Reid and Patterson found, as early as

1975, that it is often better to sacrifice inter observer reliability for sophistication in coding of behaviors. Thus often reliability is only 80%.

C. Even then, can sacrifice NUANCE D. Actual computation of reliabilities is problematic: can be

inflated due to agreements on non-occurrence.

Documents

Pieces of A Puzzle Russell Gersten RG Research Group & Professor Emeritus, University of Oregon Russell Gersten RG Research Group & Professor Emeritus,