79
EVALUATION RESEARCH Presented by : Mr. Resty C. Samosa MAEd Biology

Evaluation research-resty-samosa

Embed Size (px)

Citation preview

EVALUATION RESEARCH

Presented by : Mr. Resty C. SamosaMAEd Biology

Evaluation Research

– as an analytical tool, involves investigating a policy program to obtain all information pertinent to the assessment of its performance, both process and result.

– evaluation as a phase of the policy cycle more generally refers to the reporting of such information back to the into the policy-making process

– “the systematic assessment of the operation and/or the outcomes of a program or policy, compared to a set of explicit or implicit standards, as a means of contributing to the improvement of the program or policy”

Why is there research evaluation?

– provide an evidence base for strategy development,– document funding practices and thereby establish

transparency about taxpayers’ money,– decide on the allocation of resources,– support internal processes for learning about the

research system and funding activities which may result in the adaptation of funding programmes or research fields,

Why is there research evaluation? – demonstrate that research performing and research funding

organizations are accountable and are concerned with quality assurance,

– sharpen concepts: for example what is understood by internationalization, interdisciplinary or impact of science

– establish a direct channel of communication with stakeholders, to communicate the impact and results of research funding to government or to allow grantees (scientists) to articulate their opinions about the funding system, application procedures and research conditions (for example during site visits, interviews, surveys).

Characteristics and Principles of Evaluation

– Childers (1989, p. 250), in his article emphasizing the evaluation of programs, notes that evaluation research

1. is usually employed for decision making; 2. deals with research questions about a program; 3. takes place in the real world of the program; and 4. usually represents a compromise between pure

and applied research.

– Wallace and Van Fleet (2001) comment that evaluation should be carefully planned, not occur by accident; have a purpose that is usually goal oriented; focus on determining the quality of a product or service; go beyond measurement; not be any larger than necessary; and reflect the situation in which it will occur.

Griffiths and King (1991) identify some Principles for

Good evaluation1. Evaluation must have a purpose; it must not be an end in

itself2. Without the potential for some action, there is no need to

evaluate3. Evaluation must be more than descriptive; it must take into

account relationships among operational performance, users, and organizations

4. Evaluation should be a communication tool involving staff and users

5. Evaluation should not be sporadic but be ongoing and provide a mean for continual monitoring, diagnosis, and change

6. Ongoing evaluation should provide a means for continual monitoring, diagnosis and change

7. Ongoing evaluation should be dynamic in nature, reflecting new knowledge and changes in the environment

5 Basic Evaluation Questions

1) What will be assessed?

2) What measures/indicators will be used?

3) Who will be evaluated?

4) What data will be collected?

5) How will data be analyzed?

TYPES OF EVALUATION RESEARCH

Summative evaluation

Summative evaluation seeks to understand the outcomes or effects of something, for example where a test in of children in school is used to assess the effectiveness of teaching or the deployment of a curriculum. The children in this case are not direct beneficiaries - they are simply objects that contain information that needs to be extracted.

Summative Evaluation Research

– Summative evaluations can assess such as:A. Finance: Effect in terms of cost, savings, profit

and so on.B. Impact: Broad effect, both positive and

negative, including depth, spread and time effects.

C. Outcomes: Whether desired or unwanted effects are achieved.

D. Secondary analysis: Analysis of existing data to derive additional information.

E. Meta-analysis: Integrating results of multiple studies.

Formative evaluation

– Formative evaluation is used to help strengthen or improve the person or thing being tested. For example where a test of children in school is used to shape teaching methods that will result in optimal learning.

Formative evaluation

Formative evaluations can assess such as:

A. Implementation: Monitoring success of a process or project.

B. Needs: Looking at such as type and level of need.

C. Potential: The ability of using information for formative purpose.

Topics Appropriate to Evaluation Research

– Evaluation research is appropriate whenever some social intervention occurs or is planned.

– Social intervention is an action taken within a social context for the purpose of producing some intended result.

– In its simplest sense, evaluation research is the process of determining whether a social intervention has produced the intended result.

– The topics appropriate for evaluation research are limitless.– The questions appropriate for evaluation research are of

great practical significance: jobs, programs, and investments as well as values and beliefs.

What Will Be Evaluated?

Formative (aka Process) Evaluation:

Done to help improve the project itself.Gather information on how the project worked.Data is collected about activities: What was done.

Summative (aka Outcome) Evaluation:

Done to determine what results were achieved.Data is collected about outcomes (objectives; goals): What happened.

What Measures Will Be Used?Formative Evaluation:

Completion of planned activitiesAdherence to proposed time linesMeeting budget

Summative Evaluation:

Reaching a criterionChange in knowledge, attitude, skill, behavior

Who will be evaluated?

Formative Evaluation:Those responsible for doing activities/delivering services and those participating in activities.FacultyAgency personnelStudents

Summative Evaluation:Those who were expected to be impacted by activities. StudentsClients

What data will be collected?

Formative Evaluation:Program recordsObservationsActivity logsSatisfaction surveys

Summative Evaluation:ObservationsInterviewsTestsSurveys/questionnaires

How will data be analyzed?

1) Qualitative analysis (more for formative)1) Self-reports2) Documentation3) Description4) Case Study

2) Quantitative analysis (more for summative)1) Group comparison2) Group change3) Individual change4) Comparison to population/reference5) Analysis of relationships

An ExampleThe Cosmic Ray Observatory Project (CROP)

Goal: Establish a statewide collaborative network of expert teachers fully capable of continuing the project locally.

Objectives: Teachers will acquire knowledge about cosmic ray physics and skill in high energy research methods.Teachers will exhibit increased self-efficacy for conducting CROP research and integrating CROP into their teaching.

Activity: High school physics teachers and students will attend a 3-4 week hands-on summer research experience on cosmic ray physics at UNL

Formative Evaluation

What activities were evaluated?The specific components of the Summer Research Experience

What measures were used?Completion of activitiesParticipant satisfactionParticipant evaluation of goal attainmentParticipant evaluation of activity effectiveness

Who was evaluated?Participants

What data was collected?InterviewsRating scales

How was data analyzed?Content analysis of interview responsesFrequency and descriptive statistical analysis of rating scales.

Examples of Formative Measures Interview Questions

What was the most effective part of the workshop?

Hands-on work with detectors 6

Information from classroom sessions 4

Teacher Comments (by teacher with coded category(s) indicated):

– For me personal was the activities. The actual connecting and wiring and those things. I don’t sit and take lectures very well. That’s just me. [Hands on work with the detectors]

– Um, I think it was the classroom work. There was a good review for those of us that have had physics and it was a good introduction for those that didn’t. [Information from classroom sessions]

Examples of Formative Measures Rating Scales

1. How effective do you think the workshop was in meeting its goals?

1 2 3 4 5

Not Effective Neither Effective Somewhat Effective Very Effective nor ineffective Effective

4. Indicate how USEFUL you think each of the following workshop components was using the following scale.

1 2 3 4 5 6

Very Unuseful Somewhat Somewhat Useful Very

Unuseful Unuseful Useful Useful

a. Classroom/lecture sessions on particle detectors and experimental techniques.

b. Lab work sessions refurbishing and preparing detectors.

Examples of Formative Measures Rating Scales

How effective do you think the workshop was in meeting its goals?

Very Effective Somewhat Neither Effective Not

Effective Effective nor Ineffective Effective

(5) (4) (3) (2) (1) M SD 1 3 1 0 0 4.00 .71

Very Unuseful Somewhat Somewhat Useful Very

Unuseful Unuseful Useful Useful

(1) (2) (3) (4) (5) (6) M SD Component 0 0 0 0 2 3 5.60 .55 Classroom/lecture sessions on particle detectors and

experimental techniques. 0 0 0 1 3 1 5.00 .71 Lab work sessions refurbishing and preparing detectors.

Summative Evaluation

What Outcomes were evaluated?Teachers’ increase in knowledge about cosmic ray physics and skill in high energy research methodsTeachers’ change self-efficacy for conducting CROP research and integrating CROP into their teaching

What measures were used?Knowledge gainAchieving criteria level of knowledge/skillIncrease in self-efficacy

Who was evaluated?Teachers

What data was collected?Pre- and post-workshop tests of cosmic ray physics and researchPre- and post-workshop self-efficacy ratings

How was data analyzed?Dependent t-tests of pre-post scoresComparing skill scores to criteria

Summative EvaluationKnowledge Test Questions1. The energy distribution of primary cosmic rays bombarding the earth has been

measured by a number of experiments. In the space below, sketch a graph of the number of observed primary cosmic rays vs. cosmic ray energy, and describe the distribution in a sentence or two.

2. Explain how a scintillation counter works, i.e. write down the sequence of events from the passage of a charged particle through a scintillator to the generation of an electric signal in a photomultiplier tube.

3. Describe some characteristic differences between electromagnetic showers and hadronic showers created when particles impinge on a block of matter or a cosmic ray enters the atmosphere. Hint: think in terms of the type of particle which initiates the shower, the type of secondary particles in the shower, the shape of the shower, depth penetration of the shower particles, etc.

Summative EvaluationData AnalysisTable 9

Participants Pre- and Post-Test Mean Scores on Knowledge TestsPre-Test Post-Test

df M SD M SD t ES

Teachers 4 5.00 2.69 19.60 1.71 8.67* 6.64

Note. ES = effect size computed by Cohen's d in averaged pre- and post-test SD units. Teachers, n = 5.

*p < .01.

Summative EvaluationSelf-Efficacy Questions

Please rate how confident you are about each of the following from 0 (completely unconfident) to 100 (completely confident).

1. Your ability to set-up and maintain the CROP research equipment at your school.

2. Your ability to conduct CROP research at your school. 3. Your ability to teach students at your school who haven't attended the Summer

Workshop how to conduct CROP research at your school.4. Your ability to design your own research projects for your students utilizing the

CROP research equipment. 5. Your ability to incorporate lessons and activities in high-energy physics into your

classes. 6. Your ability to create "hands-on" projects and activities for students in your

classes using the CROP research equipment.

Summative EvaluationData AnalysisTable 11

Participants Pre- and Post-Test Mean Self-Efficacy Scores Pre-Test Post-Test

df M SD M SD t ES

Conducting 4 41.00 31.58 77.80 16.10 3.06* 1.54CROP Activities

Integrating CROP 4 45.00 31.37 79.25 17.31 3.32* 1.41Into Classes

Utilizing Distance 4 56.67 17.48 70.33 13.35 4.08* .89Education

Note. ES = effect size computed by Cohen's d in averaged pre- and post-test SD units. Teachers, n = 5. *p < .01.

Formative Evaluation Example

To obtain student reactions for the development of the campus specific Web based brief intervention versions, student feedback will be obtained. Beta versions will be evaluated by recruiting a panel of students from the each participating campus. These students will complete the intervention and provide verbal and written feedback on their reactions to the program and their suggestions for improvement. Adjustments to the program will be made based on student feedback.

Summative Evaluation Example

Students will complete the web-based brief alcohol intervention (pre-test). Approximately 6-weeks later, they will again complete the web-based brief alcohol intervention (post-test). Change will be determined by comparing post-test scores to pre-test scores using a Repeated Measures Analysis of Variance (ANOVA). Success will be determined by a statistically significant decrease in drinking and driving (Objective 1) and riding with a driver who has been drinking (Objective 2), with an effect size of at least a 10% pre- to post-test decrease for drunk driving and a 6% decrease for riding with a drinking driver.

Formulating the Problem:

Issues of Measurement

Formulating the Problem: Issues of Measurement

– Problem: What is the purpose of the intervention to be evaluated?

– This question often produces vague results. – A common problem is measuring the “unmeasurable.” – Evaluation research is a matter of finding out whether

something is there or not there, whether something happened or did not happen.

– To conduct evaluation research, we must be able to operationalize, observe, and measure.

What is the outcome, or the response variable?

– If a social program is intended to accomplish something, we must be able to measure that something.

– It is essential to achieve agreements on definitions in advance.

– In some cases you may find that the definitions of a problem and a sufficient solution are defined by law or by agency regulations; if so you must be aware of such specifications and accommodate them.

– Whatever the agreed-upon definitions, you must also achieve agreement on how the measurements will be made.

– There may be several outcome measures, for instance surveys of attitudes and behaviors, existing statistics, use of other resources.

Measuring Experimental Contexts

– Measuring the dependent variable directly involved in the experimental program is only a beginning.

– It is often appropriate and important to measure those aspects of the context of an experiment researchers think might affect the experiment.

– For example, what is happening in the larger society beyond the experimental group, which may affect the experimental group.

Specifying Interventions

– Besides making measurements relevant to the outcomes of a program, researchers must measure the program intervention—the experimental stimulus.

– The experimental stimulus is the program intervention.

– If the research design includes an experimental and a control group, then the experimental stimulus will be handled.

– Assigning a person to the experimental group is the same as scoring that person “yes” on the stimulus, and assigning “no” to the person in the control group.

– Considerations: who participates fully; who misses participation in the program periodically; who misses participation in the program a lot?

– Measures may need to be included to measure level of participation. – The problems may be more difficult than that. – The factors to consider should be addressed thoroughly.

Specifying the Population

– It is important to define the population of possible subjects for whom the program is appropriate.

– Ideally, all or a sample of appropriate subjects will then be assigned to experimental and control groups as warranted by the study design.

 – Beyond defining the relevant population, the researcher should

make fairly precise measurements of the variables considered in the definition.

New versus Existing Measures

– If the study addresses something that’s never been measured before, the choice is easy—new measures.

– If the study addresses something that others have tried to measure, the researcher will need to evaluate the relative worth of various existing measurement devices in terms of her or his specific research situations and purpose.

– Of greater scientific significance, measures that have been used frequently by other researchers carry a body of possible comparisons that might be important to the current evaluation.

– Finally, measures with a long history of use usually have known degrees of validity and reliability, but newly created measures will require pretesting or will be used with considerable uncertainty.

– Advantages of creating measures: – They can offer greater relevance and validity than

using existing measures.

– Advantages of using existing measures:– Creating good measures takes time and energy,

both of which could be saved by adopting an existing technique.

Operationalizing Success/Failure

– Potentially one of the most taxing aspects of evaluation research is determining whether the program under review succeeded or failed. Definitions of “success” and “failure” can be rather difficult.

Cost-benefit analysis

– How much does the program cost in relation to what it returns in benefits?– If the benefits outweigh the cost, keep the program

going. – If the reverse, ‘junk it’. – Unfortunately this is not an appropriate analysis to

make if thinking only in terms of money.

– Researchers must take measurement quite seriously in evaluation research, carefully determining all the variables to be measured and getting appropriate measures for each.

– Such decisions are often not purely scientific ones. – Evaluation researchers often must work out their

measurement strategy with the people responsible for the program being evaluated.

– There is also a political aspect.

TYPES OF EVALUATION RESEARCH

DESIGN

Types of Evaluation Research Designs

– Evaluation research is not itself a method, but rather one application of social research methods. As such, it can involve any of several research designs. To be discussed:

– 1. Experimental designs

– 2. Quasi-experimental designs

– 3. Qualitative evaluations

– Experimental Designs

– Many of the experimental designs introduced in Chapter 8 can be used in evaluation research.

– Quasi-Experimental Designs: distinguished from “true” experiments primarily by the lack of random assignment of subjects to an experiments primarily by the lack of random assignment of subjects to an experimental and control group. In evaluation research, it’s often impossible to achieve such an assignment of subjects.

– Rather than forgo evaluation all together, there are some other possibilities.

– Time-Series Designs– Nonequivalent Control Groups– Multiple Time-Series Designs

– Nonequivalent Control Groups:

– Using an existing “control” group that appears similar to the experimental group, used when researchers cannot create experimental and control groups by random assignment from a common pool.

– A nonequivalent control group can provide a point of comparison even though it is not formally a part of the study.

– Multiple Time-Series Designs:

– Using more than one time-series analysis.

– These are the improved version of the nonequivalent control group design.

– This method is not as good as the one in which control groups are randomly assigned, but it is an improvement over assessing the experimental group’s performance without any comparison.

– Qualitative Evaluations

– Evaluations can be less structured and more qualitative.

– Sometimes important, often unexpected information is yielded from in-depth interviews.

Logic Model of

Evaluation Research

Planning Evaluation

The Logic ModelA systematic linkage of project goals, objectives, activities, and outcomes.

Steps in Creating a Logic Model1) Clarify what the goals of the project/ program are.

2) Clarify what objectives the project should achieve.

3) Specify what program activities will occur.

Goal Clarification

High school physics teachers and students will attend a 3-4 week hands-on summer research experience on cosmic ray physics at UNL.

Is this a goal?

Goal Clarification

Establish a statewide collaborative network of expert teachers fully capable of continuing the project locally.

Developing Objectives

Goal: Establish a statewide collaborative network of expert teachers fully capable of continuing the project locally.

Objectives1. Teachers will acquire knowledge about cosmic ray

physics and skill in high energy research methods.2. Teachers will exhibit increased self-efficacy for

conducting CROP research and integrating CROP into their teaching.

CROP Logic ModelGoal: Establish a statewide collaborative network of expert teachers fully

capable of continuing the project locally.

Objectives:

1. Teachers will acquire knowledge about cosmic ray physics and skill in high energy research methods.

2. Teachers will exhibit increased self-efficacy for conducting CROP research and integrating CROP into their teaching.

Activity: High school physics teachers and students will attend a 3-4 week hands-on summer research experience on cosmic ray physics at UNL

Evaluating the Logic Model

– Goal – Objective CorrespondenceAre objectives related to the overall goal?

– Goal – Activity CorrespondenceDo anticipated activities adequately implement the goals?

– Activity – Objective CorrespondenceWill program activities result in achieving objectives?

CROP Logic ModelGoal: Establish a statewide collaborative network of expert

teachers fully capable of continuing the project locally.

Objectives: Teachers will acquire knowledge about cosmic ray physics and skill in high energy research methods.

Teachers will exhibit increased self-efficacy for conducting CROP research and integrating CROP into their teaching.

Activity: High school physics teachers and students will attend a 3-4 week hands-on summer research experience on cosmic ray physics at

UNL

An Example

GOAL 1: Increase the availability of attractive student centered social activities located both on and off the NU campus.

Objective 1.1: Increase by 15% from baseline the number of students aware of campus and community entertainment options available to NU students.

Activity: Develop and maintain an interactive web site describing social and entertainment options for students.

Another Example

GOAL 7: Reduce high-risk alcohol marketing and promotion practices.

Objective 7.3: Reduce by 25% from baseline, the volume of alcohol advertisements in the Daily Nebraskan, The Reader and Ground Zero that mention high-risk marketing and promotion practices.

Activity: Work with the media to encourage at least 3 newspaper articles or television news stories in the Lincoln market each school year concerning high-risk marketing and promotion practices.

Logic Model ExampleGoal Objectives Methodology Completion Date 1. Create active, operational campus task forces at the 11 remaining state-funded institutions of higher education serving undergraduate populations.

1.1 Recruit support from upper administration at each institution to commit personnel to task force coordination and participation.

Administrative luncheon presentations to institution chancellors and presidents on statewide initiatives hosted by University of Nebraska President James Milliken; follow-up identifying key contacts

By November, 2005

1.2 Provide technical assistance and training to assist campuses in campus task force recruitment, organization and development.

Drive-in workshop on coalition development; follow-up teleconferences with organizers, internet resources.

By December, 2005

1.3: Provide regular teleconferencing facilitation to allow interaction between task force coordinators at participating campuses.

Monthly telephone conference of campus task force organizers; agenda that allows sharing of issues, problems, needs, and accomplishments

Ongoing through January, 2007

Logic Model Worksheet

Goals Activities Objectives(Outcome)

IndicatorsMeasures

Who Evaluated

Data Sources Data Analysis

Issues about Evaluation Research

– The Social Context

– Evaluation research has a special propensity for running into problems.– Logistical problems– Ethical problems

Logistical Problems

– Problems associated with getting subjects to do what they are supposed to do, getting research instruments distributed and returned, and other seemingly unchallenging tasks that can prove to be very challenging.

– The special, logistical problems of evaluation research grow out of the fact that it occurs within the context of real life.

– Although evaluation research is modeled after the experiment—which suggests that the researchers have control over what happens—it takes place within frequently uncontrollable daily life.

– Lack of control can create real dilemmas for the researchers.

Administrative control:

– The logistical details of an evaluation project often fall to program administrators.

– What happens when the experimental stimulus changes in the middle of the experiment due to unforeseen problems (e.g. escaping convicts; inconsistency of attendance, or replacing original subjects with substitutes)?

– Some of the data will reflect the original stimulus; other data will reflect the modification.

Ethical Issues

– Ethics and evaluation are intertwined in many ways.

– Sometimes the social interventions being evaluated raise ethical issues. They may involve political, ideological and ethical issues about the topic itself

– Maybe the experimental program is of great value to those participating in it. – But what about the control group who is not receiving

help?

Use of Research Results

– Because the purpose of evaluation research is to determine the success or failure of social interventions, you might think it reasonable that a program would automatically be continued or terminated based on the results of the research.

– It’s not that simple. – Other factors intrude on the assessment of evaluation

research results, sometimes blatantly and sometimes subtly.

– Three important reasons why the implications of the evaluation research results are not always put into practice.

– The implications may not always be presented in a way that the nonresearchers can understand.

– Evaluation results sometimes contradict deeply held beliefs

– Vested interests in the programs underway

Social Indicators Research – Combining evaluation research with the analysis of existing

data.

– A rapidly growing field in social research involves the development and monitoring of social indicators, aggregated statistics that reflect the social condition of a society or social subgroup.

– Researchers use indicators to monitor social life. – It’s possible to use social indicators data for comparison across

groups either at one time or across some period of time. – Often doing both sheds the most light on the subject.

– The use of social indicators is proceeding on two fronts:– Researchers are developing ever more-refined

indicators; finding which indicators of a general variable are the most useful in monitoring social life

– Research is being devoted to discovering the relationships among variables within whole societies

– Evaluation research provides a means for us to learn right away whether a particular “tinkering” really makes things better.

– Social indicators allow us to make that determination on a broad scale; coupling them with computer simulation opens up the possibility of knowing how much we would like a particular intervention without having to experience its risks.

General References

– Frechting, J.A (2007). Logic modeling methods in Evaluation Research. San Francisco: Jossey – Bass/. Wisey.

– Pattons, M. Q. (2002). Qualitative Research and evaluation methods. Thousand, CA: Sage

“Don’t stop Learning, if you stop Learning, you’ll stop growing”.