20
http://aje.sagepub.com American Journal of Evaluation DOI: 10.1177/1098214005281701 2006; 27; 64 American Journal of Evaluation Jordi Molas-Gallart, Jordi Molas-Gallart, Andrew Davies and Andrew Davies Policies Toward Theory-Led Evaluation: The Experience of European Science, Technology, and Innovation http://aje.sagepub.com/cgi/content/abstract/27/1/64 The online version of this article can be found at: Published by: http://www.sagepublications.com On behalf of: American Evaluation Association can be found at: American Journal of Evaluation Additional services and information for http://aje.sagepub.com/cgi/alerts Email Alerts: http://aje.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: http://aje.sagepub.com/cgi/content/refs/27/1/64 SAGE Journals Online and HighWire Press platforms): (this article cites 19 articles hosted on the Citations © 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.com Downloaded from

AJE281701

Embed Size (px)

DESCRIPTION

Toward Theory-Led Evaluation: The Experience of European Science, Technology, and Innovation T he evaluation of science, technology, and innovation (STI) policies is challenging because The Experience of European Science, Technology, and Innovation Policies Keywords: policy evaluation; science, technology, and innovation policy; innovation theory; Framework Programme evaluations Evolving Models of Innovation and the Rationale for Policy Intervention

Citation preview

http://aje.sagepub.com

American Journal of Evaluation

DOI: 10.1177/1098214005281701 2006; 27; 64 American Journal of Evaluation

Jordi Molas-Gallart, Jordi Molas-Gallart, Andrew Davies and Andrew Davies Policies

Toward Theory-Led Evaluation: The Experience of European Science, Technology, and Innovation

http://aje.sagepub.com/cgi/content/abstract/27/1/64 The online version of this article can be found at:

Published by:

http://www.sagepublications.com

On behalf of: American Evaluation Association

can be found at:American Journal of Evaluation Additional services and information for

http://aje.sagepub.com/cgi/alerts Email Alerts:

http://aje.sagepub.com/subscriptions Subscriptions:

http://www.sagepub.com/journalsReprints.navReprints:

http://www.sagepub.com/journalsPermissions.navPermissions:

http://aje.sagepub.com/cgi/content/refs/27/1/64SAGE Journals Online and HighWire Press platforms):

(this article cites 19 articles hosted on the Citations

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from

10.1177/1098214005281701American Journal of Evaluation / March 2006Molas-Gallart, Davies / T oward Theory-Led Evaluation

Toward Theory-Led EvaluationThe Experience of European Science,Technology, and Innovation Policies

Jordi Molas-GallartINGENIO (CSIC-UPV)

Andrew DaviesTanaka Business School, Imperial College London

Abstract: This article reviews the literature and practice concerned with the evaluation of science,technology, and innovation (STI) policies and the way these relate to theories of the innovation pro-cess. Referring to the experience of the European Union (EU), the authors review the attempts toensure that the STI policy theory is informed by advances in the authors’understanding of the inno-vation process. They argue, however, that the practice of policy evaluation lags behind advances ininnovation theory. Despite the efforts to promote theory-led evaluations of STI policies based onnew theories of the systemic nature of innovation, evaluation practice in the EU continues to favorthe development of methods implicitly based on outdated linear views of the innovation process.This article examines the reasons why this is the case and suggests that STI policy evaluation shouldnevertheless be supported by the evolving theoretical understanding of the innovation process.

Keywords: policy evaluation; science, technology, and innovation policy; innovation theory;Framework Programme evaluations

The evaluation of science, technology, and innovation (STI) policies is challenging becauseof the multiple goals of STI policy, the indirect and complex linkages between policy out-

puts and outcomes, and the serendipitous and long-term nature of policy impacts (Feller, 2002;Molas-Gallart, Salter, Patel, Scott, & Duran, 2002). This article reviews the literature and prac-tice concerned with the evaluation of STI policies, focusing on the evaluation of EuropeanUnion (EU) programs. During the past two decades, the evaluation of STI policies in the EU hasbeen undertaken by a community of specialist academics and niche consultancies. Althoughsome of the problems faced are common across different fields of evaluation, this communityand the practices of STI policy evaluation have had limited exposure to other evaluationcommunities.

Jordi Molas-Gallart, Research Professor, INGENIO (CSIC-UPV), Camí de Vera s/n, Universitat Politècnica deValència, 46027 València, Spain; e-mail: [email protected].

Authors’Note: The research that led to this article was conducted while both authors worked at SPRU, University ofSussex, as senior fellows. This article further develops arguments that the authors presented in a report commissioned bythe Swedish Institute for Growth Policy Studies (Institutet for Tillväxpolitiska Studier).

American Journal of Evaluation, Vol. 27 No. 1, March 2006 64-82DOI: 10.1177/1098214005281701© 2006 American Evaluation Association

64

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from

In this article, we focus our attention on the ex-post evaluation of the results of policy initia-tives and not on the ex-ante assessment of project proposals submitted to an agency to obtainfunding. The typical policy instrument is that of a program by which a public organizationfunds a set of research, technological development, or other innovation-oriented activities to becarried out by firms, research organizations, or both. Funding is commonly in the form of subsi-dies, covering up to 50% of the project’s cost in the case of private beneficiaries or up to 100% inthe case of public research organizations (e.g., higher education institutions). External, inde-pendent experts usually undertake the evaluation of these programs once they have been com-pleted or are at an advanced stage and likely to have produced some results. Oftentimes, theagencies managing the programs do not have the breadth of capacities and expertise necessaryto assess the results of a program and will require an independent view on the results of a spe-cific initiative. For this, they rely on external consultants engaged after open or restrictedrequests for proposals. A community of specialist STI policy evaluators has developed to sup-ply this need. Small specialized consultancies share this evaluation market with individualexperts and groups from larger research organizations for whom policy evaluation is oneamong a broader portfolio of activities usually linked to the analysis of innovation and publicpolicy.

The role of the STI policy evaluators is to determine what the policy outputs and outcomeshave been, in other words, to link changes in the behavior of project participants and the impactsthat such changes have generated to the specific policy initiatives under evaluation. As we willsee below, policy makers are commonly interested in the impacts of STI policies on, forinstance, productivity, competitiveness, and social welfare.

In the field of STI policy, both policy and program theories are strongly rooted in innovationtheory. By policy theory, we understand a broad set of assumptions underlying generic policygoals and approaches (e.g., the belief that the support of basic research will promote academicexcellence, economic competitiveness, and welfare in general). In contrast, following the con-ceptual distinction made among others by Leeuw (2003, p. 6), program theory refers to therationale underpinning a particular policy initiative (a specific policy program), specifying itsinputs and components, its expected outcomes, the assumed linkages between outcomes andpolicy inputs, and the underlying mechanisms that are responsible for these linkages. As thearticle will show, in our area, new policy approaches can be traced to changes in innovation the-ory. We will discuss how our understanding of the innovation process and the role that researchand development (R&D) plays within it has influenced STI policies and how evaluation prac-tice has tried to adapt to the changes. The role of the evaluator is to develop further the logicallinks between policy practice and their expected effects and to turn them into the theoreticalsupport for a detailed evaluation study. In other words, the evaluator needs to develop a specific,detailed program theory from the policy theory that underpins the initiative under evaluation.This article argues, however, that unlike STI policy theory, the practice of policy evaluationcontinues to lag behind advances in innovation theory. Innovation theory has produced succes-sive generations of more sophisticated conceptual models that seek to explain how therelationship between scientific and technological research and the market opportunities forinnovation occurs.

In this article, the term innovation refers to the entire set of processes linking R&D to themarket introduction of a new product or service. Initial views postulating a linear relationshipbetween scientific research, technological development, and eventually innovation and com-petitiveness informed the program logic of early STI policies. Currently, the new theories on thesystemic nature of technological innovation provide a new STI policy theory, emphasizing theimportance of the relationships between different actors in the innovation system and the com-

Molas-Gallart, Davies / Toward Theory-Led Evaluation 65

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from

plex connections between R&D activities and the whole of the innovation process in which theyare inserted. The evaluation community has reacted by proposing new ways to adapt the evalua-tion of STI policies to the new policy theories. Yet such change is difficult to implement in prac-tice because it involves a complete shift in the traditional approaches to STI policy evaluationand, therefore, in the way in which policy agencies contract for and use external evaluations. Wecan describe this situation as a lag between evaluation practice and STI policy theory.

This lag between evaluation practice and policy theory is not confined to the STI policy eval-uation arena. Evaluation in many other fields needs to assess, and sometimes is required to mea-sure, the more elusive downstream, societal impacts of policy programs. Our article is con-cerned with instances where evaluators are asked to appraise the impacts of policies whoseeffects are likely to emerge in the medium and long term and that are difficult to trace and attrib-ute to the original policy measures. We are speaking here of complex environments where pol-icy measures are only one among many other factors with a bearing on those variables that thepolicy community is trying to influence. In the case we are addressing, STI policy, complex the-ories of change have been developed to account for the intricate way in which different activi-ties and factors can explain the innovation process. We discuss how these theories have proveddifficult to use in the practice of evaluation, resulting in a gap between evaluation practice andSTI policy theory. This gap is a result of the tension between complex theories of change and thesimpler models required to deliver accountability-focused performance measurements.

The problem is relevant because in almost all areas of public policy, the trend toward results-based management in public administration has had the effect of focusing attention on themeasurement of policy outcomes (rather than outputs). Such outcomes commonly take placedownstream from policy action as policy outputs combine with other factors. In this context,observable changes in target areas are difficult to attribute to specific policy actions; in responseevaluation, practitioners have proposed multifaceted approaches to the analysis of the contribu-tion of policies to specific outcomes (Mayne, 2001). Similarly, as we will see, STI policy evalu-ation experts have also suggested multilevel evaluations, trying to capture the current under-standing of the complex and systemic nature of innovation processes, in which STI policy isinserted. We will argue, however, that the assessments that these evaluations will yield will notsatisfy the politically driven requirements for simple quantitative measures of program perfor-mance: mainly, the increasingly repeated request that evaluation be able to deliver a single fig-ure demonstrating the economic returns derived from specific policy initiatives.

The article first sets the scene for the analysis of innovation policy evaluation by consideringhow models of innovation have produced insights about the functioning of innovation pro-cesses. These models have important implications for policy: They provide a rationale (or pol-icy theory) for government intervention in support of innovation. The shift in the rationale forpolicy intervention should drive changes in evaluation theory and practice. In the second part ofthe article, we discuss recently proposed changes in the focus of STI policy evaluation. Evalua-tion experts have proposed new approaches based on a systems-based understanding of innova-tion. In general, they suggest a shift away from the evaluation of the direct outcomes of individ-ual programs or projects toward attempts to assess the effects of policy on the performance ofinnovation systems as a whole. Yet using examples from the evaluation of EU STI policies, thesection will show how other political trends are driving evaluation policy in a different direc-tion. We identify a tension between political demands for accountability and outcome measure-ment and the complex results yielded by systemic approaches to evaluation. The article con-cludes that the role of the evaluator is central in steering the evaluation practices toward theimplementation of approaches informed by our current understanding of innovation processes.

66 American Journal of Evaluation / March 2006

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from

Evolving Models of Innovation andthe Rationale for Policy Intervention

Since the 1950s, numerous empirical studies have been conducted to develop a theory ofinnovation. These studies have produced successive generations of conceptual models based onincreasingly sophisticated explanations of the process of innovation (von Hippel, 1988). Eachmodel has furnished new empirical evidence pointing out the limitations of the previous gener-ation and, in so doing, questioned the assumptions of previous approaches to innovation policy.

Policy makers seek to understand the factors that influence the sources, rate, and direction ofinnovation so that they can formulate policies to achieve economic and social benefits. A linkcan be found between the trends in STI policies and our theoretical understanding of the rela-tionships between R&D, innovation, and wealth generation. This section reviews the main the-oretical advances in the study of innovation and the changes in policy practice that followed. Weshow how the evolving theoretical understanding of the relationships between R&D, innova-tion, and wealth generation are linked to changes in policy approaches.

Linear Models of Innovation:Supply Push and Demand Pull

In the decades after World War II, science policy was based on the belief that scientificresearch would result in beneficial social and economic outcomes. This belief was laid out inthe influential 1945 report Science the Endless Frontier (Bush, 1945). The report to the U.S.president, authored by Vannevar Bush, director of the Office of Scientific Research and Devel-opment, argued that scientific research was key to U.S. welfare and security and that essentialnew knowledge could only be obtained through basic scientific research. Government supportto basic research and the development of scientific talent would translate into new products andtechnology, leading to social and economic improvements and enhanced military security. Inother words, basic research forms the basis for new knowledge on which, in turn, appliedresearch and technological development rest for the development of new products and pro-cesses. This view of the role of R&D provided the rationale underpinning U.S. and Europeanresearch policies in the decades after World War II.

Such supply-push models of innovation were linear in that they saw the generation of newknowledge and technology as following sequential stages from basic research through appliedresearch to technological development and exploitation in the market. The supply-pushmodel—also known as “science push” and “technology push”—emphasized the role of origi-nal research and invention and downplayed or ignored market demand. The policy implicationsof the supply-push model are that spending money on R&D promotes innovation and stimu-lates economic growth. Large government expenditures on research are justified on the groundsthat they expand the opportunities for innovation. The policy outcome is supply-side interven-tion providing public funds for basic, undirected research and applied research. These policieswere popular in the 1950s and 1960s, when there was a great deal of optimism about thepotential of research to meet social and economic objectives.

From the mid-1960s to the late 1970s, empirical studies criticized supply-push approachesfor downplaying or ignoring the role of market forces in the innovation process. These studiesargued that innovations are triggered in response to consumer and user demands (Schmookler,1966). Such demand-pull models of innovation were also linear but stressed the role of the mar-ket in pulling technological innovation and directing R&D investments.

Molas-Gallart, Davies / Toward Theory-Led Evaluation 67

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from

Nonlinear Models of Innovation

Coupling Models (Late 1970s to 1980s)

By the late 1970s, studies of innovation concluded that there was little evidence to supportthe view that demand-pull forces govern the innovation process. Furthermore, it was argued thatdemand-pull theories provide little help in formulating policies aimed at promoting R&D andinnovation (Mowery & Rosenberg, 1979, p. 194). Several authors (Freeman & Soete, 1997;Kline, 1985; Kline & Rosenberg, 1986; Rothwell, 1992) saw innovation as a two-sided processinvolving complex interactions between supply and demand:

• The supply side includes new scientific or technical knowledge that is the result of original researchactivity conducted by various science and technology institutions (R&D laboratories and scientificand technical institutions).

• The demand side is derived from actual or perceived market demands for new technologies, prod-ucts, services, or processes.

The efforts to develop a more comprehensive theory of innovation aim to account for theinteraction, or coupling, of both elements (Freeman & Soete, 1997, p. 201). As depicted in Fig-ure 1, the coupling model of innovation addresses the direct and indirect results of the innova-tion process arising from the continually changing interfaces between advancing science, tech-nology, and the market (Rothwell, 1993). Unlike the previous linear models, the couplingmodel links decision making within the firm to basic research and to the marketplace. Couplingmodels treat innovation as a nonlinear, chain-linked process involving communication pathsbetween functionally interacting and interdependent stages, with feedback loops between laterand earlier stages (Kline, 1985; Kline & Rosenberg, 1986).

Innovation Systems Models (Early 1990s to Early 2000s)

During the 1990s, the coupling model of innovation was refined to take into account themore complex, distributed, and often random nature of the innovation process. New modelswere developed to account for the network of interactions linking science, technology, and themarketplace, as well as the risks and uncertainties associated with policy making. In the early1990s, attempts to develop a conceptual model were conducted from two perspectives: net-works of firms and national systems of innovation. First, in an attempt to refine the couplingmodel of innovation, Rothwell (1992) recognized that innovation occurs increasingly in net-works and alliances of firms rather than in the individual company or R&D laboratory. Theincreasing number of corporate alliances, partnerships, R&D consortia, and joint venturespointed to the importance of cooperation between systems integrators and their suppliers, cus-tomers, and collaborating competitors. By the 1990s, a new pattern of innovation was emergingbased on a distributed learning process taking place within and between networked firms(Rothwell, 1992; von Hippel, 1988).

Second, in contrast with this emphasis on the firm, a growing body of literature emphasizedthe role of the national or regional environments, including local institutions, organiza-tions, and culture, in shaping the innovation process (Lundvall, 1992). These national orregional systems-of-innovation approaches are concerned with the ways in which institutionalsetups influence the rate and direction of innovation and focus on the interactions and learning

68 American Journal of Evaluation / March 2006

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from

processes between the institutional actors involved. Because, for instance, users of knowledgeare also knowledge producers, traditional distinctions between supply and demand break down.

These models have been developed to take into account nonlinear processes and feedbackloops between different stages in the innovation system, underlining the complex links betweenR&D and innovation (Grupp, 2000). These nonlinear, nonsequential models of innovation cantake into account problems arising at late stages of innovation, which can trigger basic researchactivities. In other words, research is an adjunct rather than a precondition for innovation andcan relate to any stage in the innovation process.

More complex models of the innovation process are now adopting an innovation systemsapproach (Guy & Nauwelaars, 2003), informed by general systems theory, to capture theincreasing complexity of innovation processes. Innovation systems comprise many actors(firms, governments, universities, public research institutes, users, etc.) and the relationshipsbetween them, underpinned by flows of information, finance, and power. Innovation systemstheory calls for new systemic approaches to policy formulation, implementation, and evalua-tion. The next section discusses the way in which STI policy has adapted to this evolving under-standing of innovation and the implications of this new policy rationale for evaluation practice.

Innovation Theories and Changing Approaches to STI Policy

As a broad policy theory, supply-push models supported the belief that basic research wouldyield public goods; there was no perceived need for the establishment of concrete, more specificmissions for policy programs. Soon, however, the need to justify such investments in terms ofidentifiable returns became a policy concern. This concern emerges as demand-pull modelsstart to emerge. Within a demand-pull framework, R&D needs to fit with the needs of its usersand beneficiaries; it cannot be taken for granted that basic research will automatically lead tosocietal benefits. Impact assessment tools were developed from the late 1960s to demonstratethe downstream benefits of publicly funded R&D. This is a difficult task as, even within a linearmodel, the impact of R&D is often long term and difficult to identify and trace. In the late 1960s

Molas-Gallart, Davies / Toward Theory-Led Evaluation 69

New need

State of the art in technology & production

Market-place

Newtechnology

Ideageneration ManufacturingPrototype

production

Research,design &

development

Marketing& sales

Needs of society & the marketplace

Figure 1The Coupling Model of Innovation

Source: Rothwell, 1993, p. 21.

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from

and early 1970s, retrospective studies were launched in the United States to assess the depend-ence of innovations on scientific work. The main examples of such retrospective approacheswere Project Hindsight and Project TRACES, the former sponsored by the U.S. Department ofDefense and the latter funded by the National Science Foundation (see, e.g., Kostoff, 1997;Sherwin & Isenson, 1967).

Demand-pull models influenced new attitudes toward government policies and funding. Ifinnovation emerged as a consequence of specific demands, supply-side policies to acceleratetechnological progress through funding of basic research could be misguided. A new socialcontract was needed to steer scientific activities toward areas in which they could contribute toaddressing societal needs. Scientific research started to be seen as a long-term investment respond-ing to societal needs, and therefore, scientific priorities needed to be defined in a strategic manner.This social contract between science and society required that the scientific community showedthat its activities would address issues of economic and social relevance. The policy responsewas a new approach that focuses on strategic, targeted, and problem-oriented research.

The need to link basic and fundamental research with societal needs also led to an increasedemphasis on the role of universities as generators of growth and on university-industry rela-tions. It was argued that the traditional way of handing off basic research in a sequential mannerfrom universities and government laboratories into industry “simply does not work quickly oreffectively because of divergent focus, and lack of proper incentives among several other prob-lems” (Betz, 1994, p. 786). In response, initiatives like the U.S. National Science FoundationIndustry/University Cooperative Research Projects Program, developed between the late 1970sand the early 1980s, allowed university and industrialists to jointly identify new fundamentaldisciplinary approaches (Betz, 1994).

As an even more complex and systemic view of the role of R&D developed, policy contin-ued to adapt accordingly. The linear approaches to STI policy do not sit well with the currentunderstanding of the innovation process, with its feedback loops, interrelations, and inherentcomplexity. Not only is it no longer implicitly expected that any investment in basic researchwill naturally result in improved welfare, but the policy process itself should become more fluidand flexible to adapt to the complexity of the system of which it is part. During the 1980s and1990s, different approaches, such as technology foresight,1 technology assessment,2 and for-mative approaches, to evaluation were being developed and used to aid STI policy formulationin a complex and distributed innovation system (Kuhlmann, 1999). STI policy formation in theEU and in many European countries has adopted several of these elements. Foresight exercisesare, for instance, routinely conducted in several European countries, and the European Com-mission (EC) officials follow these exercises closely. The formation of an EU STI policy is alsobased on a broadly based consultation, informed by the results of systematic evaluation exer-cises. Furthermore, many of its policy initiatives address issues such as the promotion of net-works and international training that are important in any systemic understanding of the innova-tion process. Yet Kuhlmann (1999, p. 19) argues that by the late 1990s, there was still noblueprint for combining all these tools in a single strategic intelligence approach integrating thedifferent stages of the policy process (formulation, agenda setting, decision, implementation,and evaluation) in an interactive manner. For Kuhlmann, what is needed is for STI policy to bedefined through portfolios of actions informed by interactive systems of evaluation constantlyfeeding into the policy formulation process. In other words, for Kuhlmann, developing a forma-tive approach to evaluation becomes a key element in building up a systems approach to policyformation.

70 American Journal of Evaluation / March 2006

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from

How Evaluation Practice Respondedto Changing STI Policy Theories

When compared with other policy fields, STI policy evaluation practice evolved relativelylate. Recent reviews trace evaluation studies in the United States back to the late 1960s(Roessner, 2002) and in Europe to the 1970s (Luukkonen, 2002), although ex-ante assessmentof R&D proposals, characterized mainly by peer review procedures, started much earlier (Rip,2003). Noticeably, the concern to develop a systematic approach to policy evaluation developedrelatively late when compared with other fields. By the late 1960s, evaluation practices andmethods were already well established in fields such as the evaluation of education (Campbell& Julian, 1963) and social policies, and evaluation methodologies had already received sub-stantial attention (Scriven, 1967). In contrast, in the early 1980s, the practice of STI policy eval-uation was still in its infancy (Gibbons & Georghiou, 1987). There were few systematic guide-lines as to how to evaluate major programs and few systematic evaluations of the effectivenessof innovation policies. Yet today, STI policy evaluation has become one of the most importantareas of concern for STI policy makers. This interest can be traced to two interrelated factors:(a) the development of a new social contract between science and society discussed above and(b) the emergence of new public management approaches emphasizing the application to allpublic functions (including the funding of R&D activities) of management practices oriented tothe control of outputs rather than simply monitoring processes and inputs.

Although the new social contract underpinning public funding of R&D activities is linked toa more complex view of the relationship between science and social and economic welfare, weargue here that developments in the management of public policies have pushed evaluationpractice in another direction. The ascendancy of the so-called new public managementapproaches in the 1980s increased the pressure for evaluation to provide evidence of policyresults and to develop output, rather than input, measures. It proposed the application of perfor-mance indicators, output controls, management by accounting, and increased competition andmarketization of public sector activities (Boden, Gummett, Cox, & Barker, 1998). Increasingattention was paid to evaluations to assess and measure the performance of large R&D andinnovation support programs. These evaluations focus on the development of measurementmethods and techniques that do not refer to the more complex evolving STI policy theory. In thefollowing sections, we will review the way in which evaluation experts are attempting to incor-porate current policy theories into evaluation practice and how their proposals may enter intoconflict with the political need for performance measurement. Theory-led evaluation may clashwith the pragmatist search for new measurement methods and result in a conflict between methodand theory-led evaluation that has already been noted in other policy fields (Stame, 2004).

Changing Approaches to the Evaluation of STI Policy

Attempts at evaluating STI policy initiatives became widespread in the 1980s, when STI pol-icy theory was driven by demand-pull theories of innovation. The concern about the use ofresearch results and the alignment of STI programs with societal needs resulted in a new socialcontract between science and society, which in turn had implications for the evaluation of sci-ence and technology policies (see, e.g., Dalpe & Anderson, 1993; Jaffe, 1998). Strategicallytargeted R&D called for evaluation practices able to assess whether the specific policy goalshad been achieved and thus support the decision-making process.

Molas-Gallart, Davies / Toward Theory-Led Evaluation 71

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from

By the mid-1990s, analysts of STI evaluation were noting that the assessment of the linkagesbetween the scientific community and potential users and beneficiaries had become a dominantelement of STI initiatives and their evaluation (Dietz, 2003; Luukkonen, 1998). Yet at the sametime, it was starting to become apparent that a monitoring and evaluation approach in line withthe developing systemic approaches to policy development (i.e., policy initiatives rooted in asystemic policy theory) would be more complex. Different evaluation approaches are beingproposed.

Formative Evaluation

As discussed above, Kuhlmann (1999) has aligned the implementation of a formativeapproach to STI policy evaluation with the development of a systemic approach to the policyformation process. In this view, different policy initiatives would no longer be considered andplanned in isolation from each other but rather as part of a wider portfolio of actions addressingthe performance of the system at different levels. STI policy would then comprise interlinkedportfolios of initiatives, which would be informed by interactive systems of evaluation, con-stantly supporting the policy formulation and implementation processes. Such systemicapproach to policy formation stems from the heightened awareness of the complex relation-ships between knowledge creation and innovation, and the diversity of actors involved in theseprocesses, and rests on a formative approach to evaluation, which becomes a key integral part ofthis policy model. On their own, formative approaches are not new in the field of STI policyevaluation. For instance, the evaluation of the U.K. Alvey program (1984 to 1990), a British ini-tiative to support R&D in the information technology sector, used new combinations of evalua-tion techniques to develop a real-time evaluation approach and can be seen as an early exampleof formative STI policy evaluation. The Alvey evaluators argued that real-time evaluation hadseveral advantages over traditional ex-post evaluation, significantly among them the ability tofeed back evaluation results to those responsible for directing the program (Hobday, 1988).Kuhlmann goes further and proposes a move away from an objective model of evaluation, inwhich independent evaluators produce evidence but no recommendations, and toward a modelinvolving evaluators in learning exercises with all stakeholders and providing advice and rec-ommendations as well as independent analysis. In this formative context, the evaluatorbecomes a facilitator rather than an external expert. The result would be a more flexible andexperimental approach to policy formulation.

The move to formative evaluation has important methodological implications (Guy, 2003;Kuhlmann, 2003). Beyond the need to adopt real-time evaluation and pay less attention toimpact assessment, which by their own nature requires a delayed time frame to be carried out,formative evaluation needs to engage the stakeholders in charge of policy development andimplementation into the evaluation process. Proponents of formative evaluation in the STI fieldargue that ex-post, hands-off evaluation by external experts is no longer an option. Evaluationhas to become a process, part of policy implementation, by which programs are constantlyassessed to improve the policy process.

Systemic Analysis

Arnold (2004) suggests a multilevel system of STI policy evaluation, encompassing tradi-tional project and program evaluation, the systemic analysis of the overall health of innovationsystems, and a subsystems analysis of bottlenecks, exploring the role of institutions and otheractors. This systemic approach to evaluation would address institutional conditions, the con-

72 American Journal of Evaluation / March 2006

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from

nectivity of the system (cooperation and networking across institutional boundaries), and thesystem capabilities, from a knowledge, economic, and technical point of view (Arnold, 2004).Other experts are also placing greater emphasis on the evaluation of STI policy portfolios (Guy,2003) and the linkages between evaluation and complex processes of policy formation.

Intelligent Benchmarking

A different response to cope with the new approaches to STI policy and their underlying sys-temic policy theories is the attempt to assess the effect of policies on the innovation systemusing so-called intelligent benchmarking approaches (Guy & Nauwelaars, 2003; O’Doherty &Arnold, 2003; Soete & Corpakis, 2003). Benchmarks aim to establish standards to measure andassess performance and to show where improvements are possible. They offer the promise ofproviding sets of measurements that may be easy to communicate while rooting evaluationpractice on the current systemic understanding of the nature of innovation and technologicalchange. Benchmarking proponents argue that it can be used to assess whether STI policies arehaving the expected downstream benefits on competitiveness and employment.

The EC is supporting the development of intelligent benchmarking. For instance, it hasfunded a high-level working group to study the benchmarking of national STI policies in rela-tion to the Impact of RTD on Competitiveness and Employment (STRATA-ETAN Expert Work-ing Group, 2002). This working group developed benchmarking frameworks for the perfor-mance (in terms of improved competitiveness and employment) of innovation systems and forthe policies likely to affect this performance. The approach was informed by concepts furnishedby the innovation systems literature. It defined four categories (social and human capital,research capacity, technological and innovation performance, and absorptive capacity) underwhich to group sets of indicators to help define the performance of an innovation system. Thefour categories can be related to each other by a model of an innovation system (Guy &Nauwelaars, 2003). The research concluded that “benchmarks of performance along thesedimensions . . . are thus highly desirable as inputs to improved policy making” (STRATA-ETAN Expert Working Group, 2002, p. ix).

The Organization for Economic Co-operation and Development (2002) has also developed asimilar benchmarking approach to examine industry-science relationships in national innova-tion systems. The approach includes a conceptual framework for assessing the relationshipbetween industry and science based on three dimensions of a national system of innovation: (a)channels of interaction between organizations, such as companies, research institutes, andother bodies; (b) incentive structures; and (c) institutional arrangements.

The proponents of intelligent benchmarking are, however, aware of the practical difficultiesof using benchmarking as a policy evaluation tool (Soete & Corpakis, 2003, p. 4). Bench-marking how policies affect the performance of innovation systems is complicated by threemain obstacles:

• The complexity of innovation systems—in terms of the number of activities and actors involved—makes it extremely difficult to identify the causal links between R&D on the input side of such sys-tems and competitiveness and employment on the output side.

• Even if adequate models were available to draw correlations between input and output indicators, itis very difficult to isolate and assess the impact of policy initiatives on system performance. Thescale and direction of R&D is primarily driven by private sector decision making rather than publicpolicy initiatives.

• The problems of complexity, causality, and attribution conspire against the formulation of genericpolicy lessons and prescriptions. The multidimensional complexity of innovation systems suggests

Molas-Gallart, Davies / Toward Theory-Led Evaluation 73

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from

that policy initiatives that work well in one context might not generate similar benefits in othercontexts.

Analyzing the aggregate impact of multiple policy instruments is also difficult. Although it ispossible to make crude correlations at the macrolevel between indicators such as governmentR&D expenditures and innovation system performance indicators, such quantitative calcula-tions will tell us little about the links between policy and the observed impacts, the efficacy ofparticular policy mixes or individual instruments, or the specific policy levers that have to bepulled to improve overall system performance.

Therefore, according to its proponents, benchmarking cannot provide a simple, univocalmeasure of the impact of a specific policy or a group of policies. It can only provide data to pop-ulate a complex analytical framework where different performance factors and policy instru-ments are combined and lead to results across a range of variables. It will seldom be possible toattribute a change in measured performance to a specific policy measure. Therefore,benchmarking appears more as a further tool to aid in policy definition than an impact measure-ment approach to the evaluation of project and program performance. It must be noted, how-ever, that in practice, the recent popularity of benchmarking does not adhere to this model.Benchmarks are now being used for broad comparisons of national innovative performance,with targets being set up for a very small number of the most common indicators. As an extremecase, take for instance the EU targets for investment in R&D set up during the 2002 meeting ofthe European Council in Barcelona. The Barcelona targets set up the goal for all EU countries tospend 3% of their gross domestic product in R&D activities by 2010 (up from 1.9% in 2000)and to increase the level of business funding to two thirds of all R&D expenditure. This type ofbenchmarking has become very important in European STI policy but clearly falls far shortfrom being part of the systemic approach to formative policy evaluation that its proponentssupport.

Evaluation in Practice: ImplementingNew Approaches to EU STI Policy Evaluation

Changing STI policy theories have set a challenge for evaluation practice. As the results ofR&D activities are inserted in a complex system and recombined with multiple other sources ofinnovation and are themselves affected by the results of previous R&D, impact assessments forR&D projects and programs become very difficult to carry out. As discussed above, Europeanevaluation experts have proposed different approaches to meet the challenge. Intelligentbenchmarking, systemic evaluation, and formative approaches to evaluation take as their pointof departure the need to deal with a systems theory of innovation. They can therefore bedescribed as theory-led evaluation approaches. Yet these approaches are unlikely to yieldunequivocal measures of program impact.

The accountability culture that is developing in public management is driving politiciansand public officials to develop clear, unambiguous performance indicators focusing on outputand impact measurements. For instance, it is common among evaluation experts involved inevaluations of EC’s Research and Technological Development Framework Programmes (seebelow) to be asked to provide more specific measures of project and program impact. This hasencouraged a focus on method-led approaches to evaluation and, in particular, the developmentof quantitative methodologies able to yield numerical assessment of outcomes and impacts.

This section discusses the way in which the tension between method-led and evolvingtheory-led evaluations is being resolved in the case of the EC’s long-standing efforts to assessconsecutive Research and Technological Development Framework Programmes. The Frame-

74 American Journal of Evaluation / March 2006

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from

work Programme constitutes the most important set of initiatives to fund EU research and tech-nological development activities. It aims to strengthen the scientific and technological basis aswell as the competitiveness of the European industry. The Programmes define the mainresearch priorities and objectives structured in a series of Specific Programmes and establish aset of funding instruments and an associated budget to cover a period of 5 years. The 1st Frame-work Programme was launched in 1984, and in 2005, the 6th Framework Programme is under-way, and discussions for the definition of the 7th Programme are already well advanced. TheEC’s experience in the evaluation of the Framework Programmes is particularly relevant for theevolution of STI policy evaluation in Europe.3 A steady flow of STI evaluation work has devel-oped around their assessment dating back two decades (Georghiou, 1995, 2003). For instance,between 1984 and 1994, more than 70 program evaluations and more than 40 supporting stud-ies were conducted (Karatzas & Fayl, 1999). These evaluation activities have led to the consoli-dation of a sizeable European evaluation community.

Framework Programme evaluations have evolved toward increased institutionalization andemphasis on the contribution of external experts, performance assessment, and systems of con-tinuous monitoring (Guy & Arnold, 1998). A 1990 report (Policy Research in Engineering Sci-ence and Technology [PREST], 1990) pointed out that in practice, the evaluation efforts hadserved mainly to reinforce and legitimize existing policies. The need to provide harder evidenceof program performance led to the legislative requirement in the 4th Framework Programme totake a more systematic approach to evaluation and the reorganization of the process in 1994.The new system set up an evaluation process, which is by and large still followed today, basedon continuous monitoring conducted by program managers supplemented by assessments car-ried out annually by expert panels focusing mainly on implementation and 5-year assessmentsalso conducted by expert panels. Oftentimes, the work of the panels is helped with evaluationsof specific initiatives contracted out to expert evaluators. The 5-year assessment is particularlyimportant: It is conducted halfway through the 5-year program and includes an ex-post evalua-tion of the previous program, plus a midterm review of the current one. In this way, the evalua-tion feeds on to the management of the ongoing program and contributes toward the definitionof the following one. This approach attempts to bring together evaluation tools for real-timemonitoring and the evaluation of project results; the former is to be used to introduce improve-ments in program implementation and the latter to inform the allocation of budgets and the defi-nition of instruments and R&D priorities in subsequent programs. In practice, the approachresults in a system of rolling overlapping evaluations conducted at different levels and feedinginto the policy process also at different levels. Structuring a continuous evaluation process withpolicy definition represents an instance of systematic integration of policy definition withevaluation practice (see, for instance, Georghiou, Ribgy, & Cameron, 2002; Kastrinos, 1994;Laredo, 1998; Luukkonen, 1998; PREST et al., 2002).

Yet having a structured, systematic process in place does not solve the problem of theapproach to be taken by the expert panels and the evaluators conducting specific evaluationstudies. In particular, politicians have found the results of the process unsatisfactory. Forinstance, in 1997, the European Parliament was openly critical of the results of the annual eval-uation report, stating that the report did not present data of an evaluative nature that would helpthem scrutinize the results of each of the Specific Programmes constituting the FrameworkProgramme. European parliamentarians asked for data on, among other issues, the contributionof the initiative to the strengthening of the scientific and technological bases of industry and itscompetitiveness and the impact of the research on the labor market, quality of life, and theenvironment (Guy & Arnold, 1998).

This is an example of how the officials at the EC and, through them, the community of STIpolicy evaluators are being asked to provide measurements of the program’s economic and

Molas-Gallart, Davies / Toward Theory-Led Evaluation 75

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from

social impact. It is not a unique case. During 2003 and 2004, one of the authors was involved inan evaluation of the types of projects funded under the 5th Framework in the specific field ofinformation technologies (Hawkins, Montalvo-Corral, van Audenhove, & Molas-Gallart,2004). The request for proposals specified a case study methodology to identify instances ofgood practice that could be used in the definition of actions and the fine tuning of managementprocesses in forthcoming initiatives. In the initial phases of the study, we used cases to analyzethe way in which research groups in international teams related with each other and how theyinteracted with project managers in different types of projects. Halfway through the project,however, our client started to require quantitative evidence of differentials in performanceacross instruments. The need for a simple measure to determine which types of actionswere working well and which not so well or not at all became an objective of the project as itbecame clear that many of the examples of good practice we were finding were not contingenton the type of project or funding instrument. The need to show hard numerical evidence of per-formance was clearly important to the client in the process leading to the definition of R&D pri-orities in the next Framework Programme. This is a situation in which different research priori-ties and therefore groups within the EC working in different fields enter into politicalcompetition for control over the management and distribution of scarce funding resources.

Both of these examples show recent instances of situations in which program administratorsfind themselves under pressure to provide simple data reflecting program impact. The politicalpressure for output and impact measures can be linked to the diffusion and popularity of NewPublic Management approaches to public administration. New Public Management has empha-sized the need to develop a culture of accountability in the public sector, revolving around themonitoring of policy achievements usually in relation to quantitative targets. The requests ofthe European Parliament and EC officials are for clear, easy-to-interpret data derived fromimpact assessments. Moving toward impact measurements poses a substantial challenge. It isthe complexity of the innovation process and our understanding of it, as reflected in the sys-temic theories of innovation discussed above, that makes it difficult to reduce evaluation to ameasurement exercise. Besides, the complexity of the relationship between R&D, innovation,and eventually economic or social changes means that the impact of a research initiative maynot be felt for years after it has been completed. Identifying and tracing these effects becomesmore difficult the longer the time elapsed since the project finished. (For a practical example ofhow these problems emerge and the ways in which evaluation practice may address them, see,for instance, Shapira, 2003.) The problem facing the evaluator is to strike a balance between, onone hand, the increasing cost and diminishing policy relevance of delaying impact assessmentand, on the other hand, the need to delay impact assessment to capture potential long-termimpacts (Molas-Gallart, Tang, Sinclair, Morrow, & Martin, 1999). In this context, a simplerframework implicitly or explicitly based on a linear model of innovation becomes a requisitewhen quantitative impact assessments are to be carried out within considerable budget and timestrictures.

Despite these pressures, we can find incipient examples of evaluations in which elements ofa more complex, theory-led, systemic approach to evaluation are being applied to project orprogram evaluations. There have been evaluations of Specific Programmes and projects ana-lyzing their effects on the behavior of the social agents involved in the innovation process(behavioral additionality). Luukkonen (1998), for instance, has argued that networking emergesas a major effect of the EC Framework Program initiatives. Several evaluations have focused ondetermining the network effects of EC-funded R&D programs. This approach has been used toanalyze the outcome of specific programs (Laredo, Kahane, Meyer, & Vinck, 1992) or moreoften to study the impact on participating countries taking part in European programs (Laredo,1998). The formal techniques of social network analysis have also been used to this end (Sanz

76 American Journal of Evaluation / March 2006

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from

Menéndez, Fernández Carro, & García, 1999). Referring to different approaches, a study forthe EC (Fahrenkrog, Polt, Rojo, Tubke, & Zinocker, 2002) suggests that strategic intelligencetools are now beginning to take hold and play a more prominent role in the EU policy-makingprocess.

These examples, however, fall short of a widespread application of the complex, theory-ledapproaches to evaluation discussed above. It is then not surprising that experts in the evaluationcommunity are expressing concerns about the way in which STI program evaluations are beingcarried out in practice. The report we have just mentioned (Fahrenkrog et al., 2002) also arguesthat the evaluation of research and technological development has tended to focus on projectsand programs rather than the wider policy environment and that the practice of policy evalua-tion should move away from attempts to quantify policy impacts and toward efforts to facili-tate learning from previous experiences and supporting decision making and policy strategydefinition. Barré (1999) argues that, paradoxically, too narrow a focus on impact measurementscreates a risk of reducing credibility and relevance for policy makers. He notes that most policy-making and evaluation activities undertaken in the EU have until recently been based on the tra-ditional linear model and that the growing focus on impact measurement may result in an evennarrower approach to evaluation. Pursuing impact measurements, evaluations assume a tempo-ral sequence in which R&D is defined and done and from which measurable society impactsfollow. Measurement methods implicitly based on this linear model, Barré argues, will fail toaccount for the complex interactions between R&D and its applications. Recent efforts todevelop evaluation tools and methods have been hampered, in Barré’s view, by the difficultiesin building a nonlinear model that can account for the complex chain of interactions and multi-ple actor dynamics involved in the innovation process while at the same time yielding unequiv-ocal impact measures. Similarly, van Raan (2000) worries that evaluation practice continues tobe primarily concerned with the short-term and direct impact of R&D programs rather than thebroader implications of STI policy portfolios (van Raan, 2000).

Conclusion: Narrowing the Gap Between Theoryand Practice in European STI Policy Evaluation

There is, in principle, a strong rationale to support the use of theory-led approaches to evalu-ation that take into account the systemic nature of the relationship between science, innovation,and social and economic welfare. The systemic perspectives that underpin the proposed intelli-gent benchmarking and formative evaluation approaches described above cannot be operated inisolation from the rest of the policy process. The way in which they feed into policy definitionand implementation are an integral part of the approach. Under a systemic view of the innova-tion process, policies have to be directed at improving the weakest link or node in the systemand would benefit, it has been suggested, from a systemic evaluation involving bottleneckanalysis (Arnold, 2004). From this perspective, successful STI policies are to be based on theconstruction of a policy portfolio rather than the separate application of individual policyinstruments and the role of evaluation changes from focusing on ex-post analysis of program orproject performance to a broader, more diffuse role. The new approaches reviewed in this arti-cle emphasize the formative role of evaluation, contributing to learning and future policy defini-tion, and entail a switch from quantifying specific and attributable policy impacts toward pro-moting learning to support the formulation of an integrated set of innovation policies(Fahrenkrog et al., 2002).

Yet the progress in this direction has been slow. After a lag of almost a decade, STI policyevaluation is starting to catch up with the latest thinking about how nonlinear and complex inno-vation processes work. Traditional project and program evaluations coexist with the incipient

Molas-Gallart, Davies / Toward Theory-Led Evaluation 77

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from

use of formative approaches, whereas policy makers and politicians—the final clients of policyevaluations—continue to be primarily concerned with the short-term and direct impact of STIpolicies. In particular, the political need for impact assessments, preferably quantitative, is, ifanything, increasing. The need to justify R&D expenditures against other competing socialneeds and the diffusion of New Public Management approaches to public administrationemphasizing the need for accountability and results-based management drive the political needfor evaluations that are able to measure the benefit of specific policy actions in terms of simpleeconomic indicators. Initiatives need to be measured in terms of their effect on, say, productiv-ity, growth, or employment. These outcome measures, required by evaluation clients, can moreeasily be delivered by evaluation frameworks based, implicitly or explicitly, on linear models ofinnovation, particularly when taking into account the limited resources that can, in practice, beallocated to evaluation. As Rip (2003) noted, “the evolving state of the art of R&D evaluation isco-produced in the forcefields of concerns and interests of principals commissioning R&D, thesubjects of such evaluations, and the evaluation professionals” (p. 49). The problem is that asevaluation professionals begin to favor and actively promote theory-led evaluations, politicalrealities often force evaluators to focus on the development of methods implicitly anchored inoutdated linear views of the innovation process.

The tension between the systemic evaluation approaches proposed by evaluation experts andthe impact measurements required by clients for accountability purposes is not going to besolved easily. A systemic approach to evaluation is, in principle, capable of yielding impactmeasurements. Intelligent benchmarking, as discussed above, is one approach that combines atheory-led systemic approach to evaluation with a focus on measurement. It can provide a viewof the innovation system and of where the main systemic failures and bottlenecks may lie butcannot establish a direct relationship between a policy measure and a specific outcome mea-surement of policy performance (like, for instance, the effect of a specific R&D program onaggregate productivity). One practical reason for such limitation is that as we have already dis-cussed, the impacts of STI initiatives are bound to be long term whereas the political need forimpact measurements is almost immediate after the completion of a program. Quantifying suchimpacts through the use of complex modeling is theoretically feasible but in practice wouldrequire an intensive, complex, and long-term effort at data gathering and analysis. The mainproblem here is that we are trying to measure the exact extent to which specific outcomes can beattributed to policy measures, which are likely to play a relatively small role among the manyother factors that will emerge in a systemic model. Such detailed attribution requires compre-hensive modeling and measurements that are not currently available. Furthermore, because ofthe delayed impact of many STI policy outcomes, impact measurement cannot provide resultsthat can be used in short-term program evaluation required by policy makers. It is not surprisingthat impact measurements of STI policy initiatives carried out for evaluation purposes have, sofar, been based on simpler linear models that can be operationalized more easily.4

A systemic approach to evaluation, such as the one suggested by Arnold (2004), will yield acomplex picture of the context of innovation with an assessment of the role that a program or setof programs have played in modifying it. It may uncover system bottlenecks and the processesthrough which a specific initiative has had, or has failed to have, the desired impact. Currently,this does not satisfy the political need for clear, unequivocal, and simple measures of policyimpact.

The generic lesson of this experience is that the role of the professional community of evalu-ators is central. When theory-led evaluations are not suited to the delivery of simple perfor-mance measurements requested by policy makers, the evaluator faces a predicament. On onehand, to seek short-term influence would require a focus on evaluations that are not based on theevaluator’s best theoretical understanding of the policy impact processes. On the other hand, to

78 American Journal of Evaluation / March 2006

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from

insist on the development of theory-led approaches is a more arduous and risky task, as thework of the evaluator may in the end appear to have proved irrelevant. Evaluations oriented tothe generation of clear impact measurements based on simpler models may provide relativelyeasy answers to the accountability problem that the current political environment faces. By pro-viding results that can be more easily digested into policy action, evaluations can, for instance,provide a powerful tool to justify the existence of a program or otherwise cancel it. Instead,whenever the answers provided by theory-led systemic evaluations are complex, for them tohave an effect, they will require a sophisticated policy in which to operate, and the use of theevaluation, if any, will be more difficult to determine.

Yet we need to be reminded that the channels through which evaluations can influence thepolicy process are diverse and complex (Henry & Mark, 2003). Although attempting to pushsystemic evaluations against the grain of the political need for accountability-oriented assess-ments may first appear to risk irrelevance, without the efforts to develop and, above all, imple-ment new theory-led approaches to evaluation, there may be a risk that policy definition may beinformed by method-led evaluations based on discredited models of the policy impact process.Whenever there is policy interest in developing evidence-based policy processes, progresstoward theory-led evaluations remains. In the area analyzed for this article, European STI pol-icy evaluation, the proposals of the evaluation experts are at least being listened to by EC offi-cials, who have supported different research initiatives to study and develop new approaches toevaluation. The results of these efforts have been broadly disseminated among the evalua-tion and STI policy communities and, as we have shown, have had an incipient but far-from-dominant effect on evaluation practice. Those in the evaluation community proposing a sys-temic, theory-led approach to STI policy evaluation are helped by the way in which the ECoperates a structured evaluation process revolving around the tendering of evaluation work toexternal organizations and experts. The leading evaluation experts, who are repeatedly involvedin evaluation projects through this system, are often committed to the implementation of moresophisticated, theory-led approaches to evaluation. Although their efforts may at times clashwith the requests to develop techniques to yield simpler impact measures, by developing andsuggesting ways in which systemic evaluation can be organized and inserted in the policyprocess, the role of the evaluator becomes formative in the wider sense of the word.

Notes

1. Technology foresight refers to a body of techniques and practices to investigate systematically the future of sci-ence and technology developments to identify areas of strategic R&D and emerging technologies (Office of Science andTechnology, 1996).

2. Technology assessment attempts to assess the implications of adopting particular technological options and todescribe the potential social and economic impacts of technologies on society (Meyer-Krahmer & Reiss, 1992).

3. A description of the way the Framework Programmes and their evaluations operate can be found in a comprehen-sive review of Framework Programme evaluations carried out by several European organizations (Policy Research inEngineering Science and Technology [PREST] et al., 2002).

4. See, for instance, the models recently developed by the EC-funded NO-REST project to assess the impact of stan-dardization policies. This project represents a good example of a multilayered approach using different methodologiesto build a comprehensive view of the innovation process and the effect of standardization on it. Yet the quantitative com-ponent of this initiative is based on comparatively simple and linear models (see http://www.no-rest.org).

References

Arnold, E. (2004). Evaluating research and innovation policy: A systems world needs systems evaluations. ResearchEvaluation, 13(1), 3-17.

Molas-Gallart, Davies / Toward Theory-Led Evaluation 79

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from

Barré, R. (1999). Public research programmes: Socio-economic impact assessment and user needs. IPTS Report SpecialIssue: Evaluation and Research Activities, 20, 5-9.

Betz, F. (1994). Basic research and technology transfer. International Journal of Technology Management, 9(5, 6, 7),784-796.

Boden, R., Gummett, P., Cox, D., & Barker, K. (1998). Men in white coats. . . . Men in grey suits: New Public Manage-ment and the funding of scientific research in the UK. Journal of Auditing, Accounting and Accountability, 11(3),267-291.

Bush, V. (1945). Science, the endless frontier (a report to the president by Vannevar Bush, director of the Office of Scien-tific Research and Development). Washington, DC: U.S. Government Printing Office.

Campbell, D., & Julian, S. (1963). Experimental and quasi-experimental designs for research on teaching. Chicago:Rand McNally.

Dalpe, R., & Anderson, F. (1993). Evaluating the industrial relevance of public R&D laboratories. In B. Bozeman & J.Melkers (Eds.), Evaluating R&D impacts: Methods and practice (1st ed., pp. 207-228). Norwell, MA: KluwerAcademic.

Dietz, J. S. (2003). Assessing RTD program portfolios in the European Union. In P. Shapira & S. Kuhlmann (Eds.),Learning for science and technology policy evaluation. experiences from the United States and Europe (pp. 204-222). Cheltenham, UK: Edward Elgar.

Fahrenkrog, G., Polt, W., Rojo, J., Tubke, A., & Zinocker, K. (Eds.). (2002). RTD evaluation toolbox: Assessing thesocio-economic impact of RTD policies. Seville, Spain: Joint Research Centre, European Commission.

Feller, I. (2002). Performance measurement redux. American Journal of Evaluation, 23(4), 435-452.Freeman, C., & Soete, L. (1997). The economics of industrial innovation. Cambrdige, MA: MIT Press.Georghiou, L. (1995). Assessing the Framework Programmes—A meta-evaluation. Evaluation, 1(2), 171-188.Georghiou, L. (2003). Evaluation of research and innovation policy in Europe—New policies, new frameworks? In P.

Shapira & S. Kuhlmann (Eds.), Learning form science and technology evaluation. Experiences from the UnitedStates and Europe (pp. 65-79). Cheltenham, UK: Edward Elgar.

Georghiou, L., Ribgy, J., & Cameron, H. (2002). Assessing the socio-economic impacts of the Framework Programme.Manchester, UK: Policy Research in Engineering Science and Technology (PREST), University of Manchester.

Gibbons, M., & Georghiou, L. (1987). Evaluation of research: A selection of current practices. Paris: Organization forEconomic Co-operation and Development.

Grupp, H. (2000). R&D evaluation. Research Evaluation, 8(1), 87-99.Guy, K. (2003). Assessing RTD program portfolios in the European Union. In P. Shapira & S. Kuhlmann (Eds.), Learn-

ing form science and technology evaluation. Experiences from the United States and Europe (pp. 174-203).Cheltenham, UK: Edward Elgar.

Guy, K., & Arnold, E. (1998). Strategic options for the evaluation of the R&D Programmes of the European Union (finalreport). Brussels, Belgium: European Parliament, Directorate General for Research, Scienctific and TechnologicalOptions Assessment.

Guy, K., & Nauwelaars, C. (2003). Benchmarking STI policies in Europe: In search of good practice. Institute for Pro-spective and Technological Studies Report, 71, 20-27.

Hawkins, R., Montalvo-Corral, C., van Audenhove, L., & Molas-Gallart, J. (2004). Analysis via case studies of theinstruments used in the IST programme, submitted to European Commission, DG Information Society, Evaluationand Monitoring (017.31126/01/01).

Henry, G., & Mark, M. (2003). Beyond use: Understanding evaluation’s influence on attitudes and actions. AmericanJournal of Evaluation, 24(3), 293-314.

Hobday, M. (1988). Evaluating collaborative R&D programmes in information technology: The case of the U.K. Alveyprogramme. Technovation, 8, 271-298.

Jaffe, A. B. (1998). Measurement issues. In L. M. Branscomb & J. H. Keller (Eds.), Investing in innovation (pp. 64-84).Cambridge, MA: MIT Press.

Karatzas, I., & Fayl, G. (1999). Editorial. Institute for Prospective and Technological Studies Report Special Issue:Evaluation and Research Activities, 40, 2-4.

Kastrinos, N. (1994). Evaluating the impact of the EC Framework Programme. Technovation, 14(10), 679-688.Kline, S. J. (1985). Innovation is not a linear process. Research Management, 28(4), 36-45.Kline, S. J., & Rosenberg, N. (1986). An overview of innovation. In R. Landau & N. Rosenberg (Eds.), The positive sum

strategy: Harnessing technology for economic growth (pp. 275-305). Washington, DC: National Academy Press.Kostoff, R. N. (1997). The handbook of research impact assessment (No. DTIC ADA296021). Arlington, VA: Office of

Naval Research.Kuhlmann, S. (1999). Distributed intelligence: Combining evaluation, foresight and technology assessment. Institute

for Prospective and Technological Studies Report, 40, 16-22.

80 American Journal of Evaluation / March 2006

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from

Kuhlmann, S. (2003). Evaluation as a source of ‘strategic intelligence.’ In P. Shapira & S. Kuhlmann (Eds.), Factorsaffecting technology transfer in industry-US federal laboratory partnerships (pp. 352-375). Cheltenham, UK:Edward Elgar.

Laredo, P. (1998). The networks promoted by the framework programme and the questions they raise about its formula-tion and implementation. Research Policy, 27(6), 589-598.

Laredo, P., Kahane, B., Meyer, J. B., & Vinck, D. (1992). The research networks built by the MHR4 Programme.Brussels, Belgium: Commission of the European Communities.

Leeuw, F. L. (2003). Reconstructing program theories: Methods available and problems to be solved. American Journalof Evaluation, 24(1), 5-20.

Lundvall, B.-A. (1992). National systems of innovation: Towards a theory of innovation and interactive learning (1sted.). London: Pinter.

Luukkonen, T. (1998). The difficulties in assessing the impact of EU framework programmes. Research Policy, 27(6),599-610.

Luukkonen, T. (2002). Research evaluation in Europe: State of the art. Research Evaluation, 11(2), 81-84.Mayne, J. (2001). Attribution through contribution analysis: Using performance measures sensibly. Canadian Journal

of Program Evaluation, 16(1), 1-24.Meyer-Krahmer, F., & Reiss, T. (1992). Ex ante evaluation and technology assessment—Two emerging elements of

technology policy evaluation. Research Evaluation, 2(1), 47-54.Molas-Gallart, J., Salter, A., Patel, P., Scott, A., & Duran, X. (2002). Measuring third stream activities. Brighton, UK:

SPRU.Molas-Gallart, J., Tang, P., Sinclair, T., Morrow, S., & Martin, B. (1999). Assessing research impact on non-academic

audiences. Brighton, UK: SPRU.Mowery, D. C., & Rosenberg, N. (1979). The influence of market demand upon innovation: A critical review of some

recent empirical studies. Research Policy, 8, 105-153.O’Doherty, D., & Arnold, E. (2003). Understanding innovation: The need for a systemic approach. Institute for Pro-

spective and Technological Studies Report, 71, 29-36.Office of Science and Technology. (1996). Winning through foresight. A strategy taking the foresight programme to the

millenium. London: Department of Trade and Industry.Organization for Economic Co-operation and Development. (2002). Benchmarking industry-science relationships.

Paris: Author.Policy Research in Engineering Science and Technology (PREST). (1990). The impact and utility of European Commis-

sion research programme evaluation reports (EUR13098 EN). Brussels, Belgium: European Commission.Policy Research in Engineering Science and Technology (PREST), AUEB, BETA, ISI, Joanneum Research, IE HAS,

et al. (2002). Assessing the socio-economic impact of the framework programme. Manchester, UK: University ofManchester.

Rip, A. (2003). Societal challenges for R&D evaluation. In P. Shapira & S. Kuhlmann (Eds.), Learning form science andtechnology evaluation. Experiences from the United States and Europe (pp. 32-53). Cheltenham, UK: EdwardElgar.

Roessner, J. D. (2002). Outcome measurement in the USA: State of the art. Research Evaluation, 11(2), 85-93.Rothwell, R. (1992). Successful industrial innovation: Critical factors for the 1990s. R&D Management, 22(3), 221-

239.Rothwell, R. (1993, May). Systems integration and networking: The fifth generation innovation process. Paper pre-

sented at the Chaire Hydro-Quebec Conference en Gestion de al Technologie, Montreal, Quebec, Canada.Sanz Menéndez, L., Fernández Carro, J. R., & García, C. E. (1999). Centralidad y cohesión en las redes de colaboración

empresarial en la I+D subsidiada [Centrality and cohesion in interfirm collaborative networks participating in pub-licly funded R&D programs]. Papeles de Economía Española, 81, 219-241.

Schmookler, J. (1966). Invention and economic growth. Boston: Harvard University Press.Scriven, M. (1967). The methodology of evaluation. In R. W. Tyler, R. M. Gagné, & M. Scriven (Eds.), Perspectives of

curriculum evaluation (pp. 39-83). Chicago: Rand McNally.Shapira, P. (2003). Evaluating manufacturing extension services in the United States: Experiences and insights. In P.

Shapira & S. Kuhlmann (Eds.), Factors affecting technology transfer in industry-US federal laboratory partner-ships (pp. 260-292). Cheltenham, UK: Edward Elgar.

Sherwin, C. W., & Isenson, R. S. (1967, March). Estimating the science and technology components of an R&D budget.Paper presented at the Second Cost-Effectiveness Symposium of the Washington Operations Research Council,Washington, DC.

Soete, L., & Corpakis, D. (2003). Editorial: R&D for competitiveness and employment—The role of benchmarking.Institute for Prospective and Technological Studies Report, 71, 2-12.

Stame, N. (2004). Theory-led evaluation and types of complexity. Evaluation, 10(1), 58-76.

Molas-Gallart, Davies / Toward Theory-Led Evaluation 81

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from

STRATA-ETAN Expert Working Group. (2002). Benchmarking national research policies: The Impact of RTD on Com-petitiveness and Employment (IRCE). Brussels, Belgium: European Commission Directorate General Research.

van Raan, A. F. J. (2000). R&D evaluation at the beginning of the new century. Research Evaluation, 8(2), 81-86.von Hippel, E. (1988). The sources of innovation. New York: Oxford University Press.

82 American Journal of Evaluation / March 2006

© 2006 American Evaluation Association. All rights reserved. Not for commercial use or unauthorized distribution. by Juan Pardo on March 20, 2008 http://aje.sagepub.comDownloaded from