24
utility, cost- effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn, PhD (ABD) The Evaluation Center—Western Michigan University

The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

  • View
    221

  • Download
    3

Embed Size (px)

Citation preview

Page 1: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems:A study of seventeen nations

Chris L. S. Coryn, PhD (ABD)The Evaluation Center—Western Michigan University

Page 2: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

• Scientists evaluate themselves…but are unable to finance themselves

Page 3: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

…large parts of scholarly research are publicly-funded and in most parts of the world government funding for research is tight, and it is only getting tighter, due to, among other things, current funding priorities and large budget deficits…under demands for greater accountability, diminished funding, and in the pursuit of general quality improvements, many countries have initiated systems for evaluating publicly-funded research at the national level…

…almost universally, governments want answers to the questions “now what?” “how much?” and “what if?” They also want to discriminate, or sort, the good from the bad, the worthwhile from the worthless, and the important from the trivial. However, as it turns out, in most nations there are weaknesses in their understanding and application of the logic of evaluation…

Background and context

Page 4: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

Background and context …although there are vast differences in the way

governments fund research around the world, and a diversity of approaches to evaluating publicly-funded research, almost all now share the common purpose of relating funding to performance…

…research evaluation has emerged as a “rapid growth industry”…there is an increasing emphasis on accountability, as well as on the effectiveness and efficiency of research…governments need such evaluations for different purposes: optimizing their research allocations at a time of budget stringencies; re-orienting their research support; rationalizing or downsizing research organizations; augmenting research productivity, etc. To this end, governments have developed or stimulated research evaluation activities in an attempt to get “more value for the money” they spend on research support…(OECD, 1997, p. 5).

Page 5: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

Aims and objectives of the study1. Identify a small set of relevant and demonstrable

properties that characterize a ‘good’ research evaluation system/model: Good evaluations should be valid, credible, useful, cost-effective, and ethical…in other words, evaluations should be logically correct and produce justifiable conclusions, be believable or have reasonable grounds for being believable to relevant audiences, be useful or designed for use, be economical in terms of the benefits produced by it, and be conducted in an ethical, legal, professional, and otherwise appropriate manner…

2. Characterize each of the seventeen countries’ research evaluation models in terms of their validity, credibility, utility, cost-effectiveness, and ethicality—“what’s so?”…

3. Grade/rate, profile, and rank each of the models to assess their relative strengths and weaknesses—“so what?”…

4. Determine ways in which the evaluation of researchers and their research can be improved—at all levels…and for all purposes…

Page 6: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

Countries included in the study Australia, Belgium, the Czech Republic,

Finland, France, Germany, Hong Kong, Hungary, Ireland, Japan, the Netherlands, New Zealand, Poland, Sweden, Taiwan, the United Kingdom, and the United States

Page 7: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

What constitutes research?• Research is ‘a truth-seeking activity which

contributes to knowledge, aimed at describing or explaining the world, conducted and governed by those with a high level of proficiency or expertise…also, a particular instance or piece of research’

• Typologies of research (e.g., basic, applied, experimental development), such as those applied by the OECD, COSEUP, and others are unnecessary and often irrelevant, especially for evaluating research. Research does not require this disaggregation in order to meet the definitional requirement of a description, explanation, or contribution to knowledge; it is either a contribution to knowledge or it is not…that is, asymmetrical…

Page 8: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

Why ‘define’ research?• Research is a cognitive activity, not an

aesthetic one, and in many instances the creative arts, for example, are not obviously really cases of research…a fundamental problem in many countries…that is, how to deal with the creative arts and humanities, and in some cases the social sciences…

• A painter’s paper about their own exhibition ‘might’ qualify as research, whereas the painting does not; a critical paper on a musical composition ‘might’ be considered research, while the performance of the composition, and even the composition itself, would not…

Page 9: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

The international research landscape• See handout

− Research intensity− Research budget− Budget growth rate− Researchers− Research growth rate− Researchers/labor force− Researcher expenditure− Publications− Publication growth rate− World publications− Patents

Page 10: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

International research evaluation systems’ primary purposes

−Accountability (i.e., summative evaluation)

−Resource allocation (i.e., a priori/ex post/prospective and a posteriori /ex ante/retrospective evaluation)

−Improvement (i.e., formative evaluation)−Synthesis (i.e., ascriptive evaluation)−Decision making of other types (i.e.,

selection, prioritization, and prediction )

Page 11: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

International research evaluation systems’ basic units of assessment

−Research products−Individual researchers−Research groups−Departments−Institutions−Disciplines−Programs−Policies

Page 12: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

International research evaluation systems’ core methodologies

−Bibliometrics−Cartography/mapping−Case studies−Expert panels (internal, external, mixed)−Interviews and observations−Comparative, prospective, and

retrospective analysis−Reports and site-visits−Strategic plans−Surveys

Page 13: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

International research evaluation systems’ key indicators

−Impact (local, regional, national, international)

−FTE researchers−FTE students−Degrees awarded−External research funding−Esteem−Research inputs, outputs, and processes−Teaching

Page 14: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

Methods• Participants: international,

multidisciplinary expert panels—that is, judges…

• Materials: Each judge was given very detailed accounts of each country’s policies, principles, and procedures for both funding and evaluating government-supported research (e.g., New Zealand’s PBRF, Hong Kong and the United Kingdom’s RAE) and scoring sheets (see handout)…

Page 15: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

Methods• Procedure and design: two-stage

nonexperimental, retrospective, descriptive design aimed at calibration and consensus…− Stage I: all experts independently rated three

randomly assigned countries on each criterion; in preparation for Stage II a discussion about the discrepancies in ratings was held to highlight the reasons implicit in each person and to work toward a shared set of parameters underlying ratings (i.e., calibration)…following lengthy training…

− Stage II: All of the judges worked as teams by country, and the group made a decision to resolve the discrepancies from Stage I; they then agreed to a small set of reasons for putting a country into a rating category (i.e., consensus)…

Page 16: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

Measurement• Meta-dimensions: Validity, credibility,

utility, cost-effectiveness, and ethicality…that is, latent constructs…

• Indicators: Five indicators on each of the five meta-dimensions…that is, measured or observed aspects of meta-dimensions…

• Scoring/rating: Each of the five indicators on each of the five meta-dimensions was scored by judges from 0-10. Only the endpoints (i.e., 0 and 10) were anchored, where 0 = absence of merit…the assumption underlying this scoring system was that scores on the indicators could be treated as a ratio unit of measurement…

Page 17: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

Analytic approach• Three general, overarching approaches…

1. Scoring2. Profiling and synthesis3. Human judgment analysis

Page 18: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

Scoring• Meta-dimension raw and weighted score

calculation (see handout)

Meta-Dimension

Indicator Raw Score WeightWeighted

Score

1. Validity ∑ (i, ii, iii, iv, v) = 0-

50 20% (2) 0-100

i 0-10

ii 0-10

iii 0-10

iv 0-10

v 0-10

Page 19: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

ScoringThus, if country X received a rating of 5 on indictor i, a rating

of 7 on indicator ii, a rating of 6 on indicator iii, a rating of 5 on indicator iv, and a rating of 9 on indicator v, on the validity meta-dimension, for example, the raw score would be 32 and the weighted score would be 64, or 64%, as shown in Equations 1 and 2:

Raw Score = ∑ (Ii, Iii, Iiii, Iiv, Iv) (1a)

= (5 + 7 + 6 + 5 + 9) = 32 (1b)

Weighted Score = Raw Score x Weight (2a)

= 32 x 2 = 64 (2b)

or = = = 64% (2c)

where I is indicator, WS is weighted score, and TPS is total possible score on any given meta-dimension (i.e., 0-50).

TPS

WS

TPS

WS

50

32

Page 20: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

ScoringFrom the five meta-dimension weighted scores it is then possible to compute

both a total weighted score or an average total weighted score as shown in Equations 3 and 4 (i.e., synthesis of the meta-dimensions using NWS).

Total Weighted Score = ∑ (MD1, MD2, MD3, MD4, MD5) = 0-500 (3a)

or = = 0%-100% (3b)

Average Total Weighted Score = = (4a)

= (MD1, MD2, MD3, MD4, MD5) (4b)

= = 0-100 (4c)

or = = 0%-100% (4d)

where MD is meta-dimension, TWS is total weighted score, ATWS is average total weighted score, and TPMDS is total possible score across all meta-dimensions (i.e., 0-500).

x

TPMDS

TWS

TPMDS

ATWS

n

iixn 1

1

5

1

5

),,,,( 54321 MDMDMDMDMD

Page 21: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

Profiling and synthesis• Profiling: Profiling refers to graphically

exhibiting performance on relevant dimensions of merit…in some cases it makes ranking of two or more evaluands or evaluees possible…profiling is equally valuable for absolute performance profiling, where it can be used for formative, improvement purposes by identifying areas of poor or underperformance; that is, profiling can sometimes serve as a useful diagnostic tool…

• Synthesis: The synthesis operation is the process of amalgamating a set of ratings or performances on several dimensions, components, or criteria into an overall evaluative conclusion…the process of combining a set of ratings or performances on several subdimensions into a rating on one dimension…

Page 22: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

Human judgment analysis:The Lens Model• Used to assess the reliability and validity of

judgments…along with, and to supplement typical indices (e.g., scoring; interrater (e.g., Cohen’s K); and internal consistency [e.g., Cronbach’s α] forms of reliability; face; content; and construct forms of validity)…

• The Lens Model represents the relationship between the judge and the objects as mediated by cues, whose relationship to the judge and object is probabilistic (see handout)…

• The Lens Model Equation (see handout) states that judgmental accuracy is by the degree to which the task is predictable…determined by the knowledge of the properties of the task…and cognitive control over the utilization of that knowledge…

Page 23: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

Problems facing national-level evaluation of research: Values duality• Effectiveness versus efficiency

− System should be able to select the ‘best’− System should cost little to operate

• Accountability versus autonomy− System should make researchers accountable for use of public

monies− System should allow researcher a degree of independence

• Responsive versus inertial− System should be responsive to new ideas or national needs− System should impart stability (tension between tradition and

originality)

• Metiocratic versus fair− System should make judgments according to strict criteria of merit− System should be inclusive (e.g., age, race, gender)

• Reliable versus valid− System should measure research quality with little random or

systematic error− System should aim for validity—often, quantifiable criteria can be

assessed reliably, but they are often low in validity

Page 24: The validity, credibility, utility, cost-effectiveness, and ethicality of research evaluation systems: A study of seventeen nations Chris L. S. Coryn,

Questions & Comments