22
Ž . Research Policy 29 2000 657–678 www.elsevier.nlrlocatereconbase Evaluating technology programs: tools and methods Luke Georghiou a , David Roessner b,c, ) a UniÕersity of Manchester, Oxford Rd., Manchester M13 9PL, UK b Department of Public Policy, Georgia Institute of Technology, Atlanta, GA 30332-0345, USA c Science and Technology Policy Program, SRI International, 1611 N. Kent St., Arlington, VA 22209, USA Abstract This article reviews the analytical tools, methods, and designs being used to evaluate public programs intended to stimulate technological advance. The review is organized around broad policy categories rather than particular types of policy intervention, because methods used are rarely chosen independent of context. The categories addressed include publicly-supported research conducted in universities and public sector research organizations; evaluations of linkages, especially those programs seeking to promote academic-industrial and public-private partnerships; and the evaluation of diffusion and industrial extension programs. The internal evaluation procedures of science such as peer review and bibliometrics are not covered, nor are methods used to projects and individuals ex ante. Among the conclusion is the observation that evaluation work has had less of an impact in the literature that it deserves, in part because much of the most detailed and valuable work is not easily obtainable. A second conclusion is that program evaluations and performance reviews, which have distinctive objectives, measures, and tools, are becoming entangled, with the lines between the two becoming blurred. Finally, new approaches to measuring the payoffs from research that focus on linkages between knowledge producers and users, and on the characteristics of research networks, appear promising as the limitations of the production function and related methods have become apparent. q 2000 Elsevier Science B.V. All rights reserved. Keywords: Technology programs; Tools; Methods 1. Introduction Demand for evaluation has been fueled by the desire to understand the effects of technology poli- cies and programs, to learn from the past and, more instrumentally, to justify the continuation of those policies to a sometimes skeptical audience. The rela- tion between the evolution of evaluation methods and approaches and that of technology policies, we shall argue, is complex. The development of evalua- tion approaches has responded to varied stimuli. In ) Corresponding author. Department of Public Policy, Georgia Institute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the first OECD review in this field noted the convergence of the internal evaluation traditions of science with a growing demand for Ž evaluation of public policy in general Gibbons and . Georghiou, 1987 . The latter trend would later be characterized as a feature of ‘‘new public manage- ment’’ and reach its apotheosis in the current re- quirement for programmatic and institutional perfor- mance indicators. A third influence was the growing tendency to associate science with competitive per- formance and the search for more effective ways to achieve that linkage. The growth of activity was Ž catalogued in other reviews at that time Roessner, . 1989; Meyer-Krahmer and Montigny, 1990 that ex- 0048-7333r00r$ - see front matter q 2000 Elsevier Science B.V. All rights reserved. Ž . PII: S0048-7333 99 00094-3

Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

Ž .Research Policy 29 2000 657–678www.elsevier.nlrlocatereconbase

Evaluating technology programs: tools and methods

Luke Georghiou a, David Roessner b,c,)

a UniÕersity of Manchester, Oxford Rd., Manchester M13 9PL, UKb Department of Public Policy, Georgia Institute of Technology, Atlanta, GA 30332-0345, USA

c Science and Technology Policy Program, SRI International, 1611 N. Kent St., Arlington, VA 22209, USA

Abstract

This article reviews the analytical tools, methods, and designs being used to evaluate public programs intended tostimulate technological advance. The review is organized around broad policy categories rather than particular types ofpolicy intervention, because methods used are rarely chosen independent of context. The categories addressed includepublicly-supported research conducted in universities and public sector research organizations; evaluations of linkages,especially those programs seeking to promote academic-industrial and public-private partnerships; and the evaluation ofdiffusion and industrial extension programs. The internal evaluation procedures of science such as peer review andbibliometrics are not covered, nor are methods used to projects and individuals ex ante. Among the conclusion is theobservation that evaluation work has had less of an impact in the literature that it deserves, in part because much of the mostdetailed and valuable work is not easily obtainable. A second conclusion is that program evaluations and performancereviews, which have distinctive objectives, measures, and tools, are becoming entangled, with the lines between the twobecoming blurred. Finally, new approaches to measuring the payoffs from research that focus on linkages betweenknowledge producers and users, and on the characteristics of research networks, appear promising as the limitations of theproduction function and related methods have become apparent. q 2000 Elsevier Science B.V. All rights reserved.

Keywords: Technology programs; Tools; Methods

1. Introduction

Demand for evaluation has been fueled by thedesire to understand the effects of technology poli-cies and programs, to learn from the past and, moreinstrumentally, to justify the continuation of thosepolicies to a sometimes skeptical audience. The rela-tion between the evolution of evaluation methodsand approaches and that of technology policies, weshall argue, is complex. The development of evalua-tion approaches has responded to varied stimuli. In

) Corresponding author. Department of Public Policy, GeorgiaInstitute of Technology, Atlanta, GA 30332-0345, USA

the mid-1980s, the first OECD review in this fieldnoted the convergence of the internal evaluationtraditions of science with a growing demand for

Ževaluation of public policy in general Gibbons and.Georghiou, 1987 . The latter trend would later be

characterized as a feature of ‘‘new public manage-ment’’ and reach its apotheosis in the current re-quirement for programmatic and institutional perfor-mance indicators. A third influence was the growingtendency to associate science with competitive per-formance and the search for more effective ways toachieve that linkage. The growth of activity was

Žcatalogued in other reviews at that time Roessner,.1989; Meyer-Krahmer and Montigny, 1990 that ex-

0048-7333r00r$ - see front matter q 2000 Elsevier Science B.V. All rights reserved.Ž .PII: S0048-7333 99 00094-3

Page 2: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678658

tended the analysis to innovation programs from theUS and European perspectives, respectively. Sincethat time, as policies have increasingly focused uponfostering linkages of various types within innovationsystems, evaluation methods have been developedwhich aim to characterise and measure those link-ages.

In this paper, we have chosen to focus on technol-ogy policy rather than the broader scope of innova-tion policy. The evaluation literature is now so ex-tensive that it has been necessary to be selectiveeven within this constraint. Evaluation methods tendto cluster around particular types of policy interven-tion. Hence, we have found it more useful to orga-nize the review by broad policy category than by anattempt to group methods independently of theircontext. Following this approach, three related focalpoints for evaluation are used to structure whatfollows. These appear to follow a sequential modelof innovation from basic research, through aca-demic–industrial linkages to industrial collaborativeR&D and finally diffusion and extension. However,it will emerge from what follows that similar ques-

Žtions for example those concerning economic re-.turns may be posed at any stage of this sequence.

Furthermore, evaluations have been instrumental inexposing the many feedback loops and unexpectedconsequences that have modified the way in whichthe innovation process is understood.

The first of our focal points concerns evaluationof publicly supported research carried out in univer-sities and public-sector research organizations. Onereason this is included is the growing significanceattached to the economic and social rationales forpublic support of research. Hence, the relevant sec-tion addresses means by which the economic andsocial value of science is being assessed. The inter-nal evaluation procedures of science, principally peerreview with some contribution from its offshoots inscientometrics, are not covered here. Modified peerreview, or merit review as it is sometimes called,extends the terms of reference of a peer review panelto cover the broader socio-economic issues that arediscussed below. Often, the outcomes of such panelreviews are highly influential because of the status oftheir members. However, from a methodological per-spective, interest in such approaches is limited to thevariations in structure and composition of consulta-

tion and to the ways in which information inputs andreporting are organized. In general, the organiza-tional and political dimensions of evaluation, thoughimportant, are not covered in this review.

In the second part of this section, the scope isbroadened to include evaluations that focus uponlinkages, including those of programs seeking topromote academic–industrial and public–privatepartnerships. The centrality of these interfaces totechnology policy has stimulated a large body ofresearch, much of which has an evaluative character.The selection made here aims to give a flavor of therange of evaluation methods used rather than toreview the topic as a whole. These interfaces are alsopresent in the next focus, collaborative R&D pro-grams, though industrial collaboration is usually theirmain aim. This policy instrument has had a specialrelationship with the development of evaluation, withthe two growing together during the 1980s. It contin-ues to attract a large amount of evaluation activity.

Finally, experience in the evaluation of diffusionand extension programs is discussed. In methodolog-ical terms, this involves a transition from the ratherlumpy and unpredictable distribution and attributionof benefits that characterize research, to a domainwhere large numbers of client firms are in receipt ofless visible assistance. Evaluation here is thus char-acterized by treatment of large data sets, though wewill argue that these conceal a wide range of servicesand clients.

Two other points need to be made concerning thescope of this review. The first is that evaluation is asocial process, which means that methods cannot beequated with techniques for collection of data ortheir subsequent analysis. The choice of what issignificant to measure, how and when to measure it,and how to interpret the results are dependent uponthe underlying model of innovation that the evaluatoris using, implicitly or explicitly. Much of the datacollected by evaluators are themselves conditionedby the positioning of the evaluation and those whoexecute it. In consequence, it is usually necessary tounderstand the setting of the evaluation and thediscourse in which its results are located before thechoice of approach can be fully appreciated.

The second qualification about our scope ad-dresses the level of evaluation covered. The centralunit of analysis is the program evaluation. In many

Page 3: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678 659

ways, this is the easiest territory for evaluators inthat a program almost by definition has boundariesin space and time and certainly should have objec-tives. However, when moving across national andcultural borders this criterion for inclusion may beover-restrictive. Evaluation in France, in particular,is mainly at the level of institutions, even if many of

Žthe same issues are addressed Laredo and Mustar,´.1995 .

Aspects that remain excluded from this review areex ante evaluation, evaluation of projects and of

Žindividuals though each of these may well be linked.to the evaluations we discuss . The last consideration

in terms of scope is that of the most aggregatedevaluations, which address whole sectors, policies,and national or international systems. Many of thesetend to intrude into the general realm of technologypolicy studies, but some are more firmly rooted inthe tradition of evaluation. We will argue in theconcluding section of this paper that understanding aprogram may well require more work on the sys-temic context than is presently common practice.

2. Evaluation of the socio-economic impacts ofresearch in universities and public laboratories

2.1. Measuring the returns

Within the past 15 years, there have been severalcomprehensive assessments of methods for measur-ing the returns, benefits, andror impacts of publicsupport for research. The Office of Technology As-sessment’s 1986 Technical Memorandum was stimu-lated by Congressional interest in improving researchfunding decisions through the use of quantitativetechniques associated with the concept of investmentŽU.S. Congress, Office of Technology Assessment,

.1986 . The OTA review covered economic and out-put-based, quantitative measures used to evaluateR&D funding. Economic methods included macroe-conomic production functions, investment analysisand consumer and producer surplus techniques. Out-put methods included bibliometric, patent count,converging partial indicators, and science indicatorsapproaches.

Ž .Hertzfeld 1992 provided a probing critique ofmethods employed to measure the returns to spaceresearch and development, but did so within a larger

framework of general approaches to measuring theeconomic returns to investments in research. Heclassified these approaches into three distinct typesŽ .Hertzfeld, 1992, p. 153 :1. The adaptation of macroeconomic production

function models to estimate the effects of techno-logical change or technical knowledge that can beattributed to R&D spending on GNP and otheraggregate measures of economic impact.

2. Microeconomic models that evaluate the returnsto the economy of specific technologies by esti-mating the consumer and producer surpluses thesetechnologies create.

3. Measuring the direct outputs of research activitythrough reported or survey data on patents, li-censes, contractor reports of inventions, value ofroyalties or sales, and so forth.Most recently, the Critical Technologies Institute

Žof the RAND Corporation now renamed the Science.and Technology Policy Institute published a report

prepared for the White House Office of Science andTechnology Policy that reviewed methods available

Žfor evaluating fundamental science Cozzens et al.,.1994 . The RAND study classified evaluation meth-

ods into three types:1. Retrospective, historical tracing of the knowledge

inputs that resulted in specific innovations.2. Measuring research outputs in aggregate from

Žparticular sets of activities e.g., programs, pro-.jects, fields, institutions using bibliometrics, cita-

tion counts, patent counts, compilations of ac-complishments, and so forth.

3. Economic theoryreconometric methods employ-ing, as measures of performance, productivitygrowth, increase in national income, or improve-ments in social welfare as measured by changes

Žin consumer andror producer surpluses Cozzens.et al., 1994, pp. 41–43 .

What is striking about these critical assessments isthat, despite their nearly 10-year span, there is virtu-ally complete agreement among all three concerningthe strengths and weaknesses of the various ap-proaches to determining the benefits or payoffs frominvestments in research generally, and fundamentalscience in particular. Moreover, improvements overthe period in the several techniques described havebeen modest and incremental. There have been nu-merous examples of the application of these tech-

Page 4: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678660

niques to measure the impacts of research that spanthe past 30 years. There is considerable agreementon which studies represent the best of each genre,and on the limitations of each approach.

ŽHistorical trace studies such as Hindsight Sher-. Žwin and Isenson, 1967 , TRACES Illinois Institute

.of Technology Research Institute, 1968 , andŽ .‘‘TRACES II’’ Battelle, 1973 offer the possibility

of detailed information about the relative contribu-tion of basic vs. applied research, institutional con-texts, and sources of research support. But the methodrequires subjective judgments about the significanceof particular research events, and suffers from lackof generalizability because of non-random innova-tion selection processes. It also traces single productsof research backwards in time to identify the fullrange of contributions, rather than tracing forward

Ž .from a single research activity e.g., a project toidentify multiple impacts or consequences. Further, itis an extremely costly method; given the complexityof knowledge flow processes and variations acrossfields of science and technology, the number of casesnecessary for generalization would be prohibitivelyexpensive. Finally, historical trace studies fail toaccount for the indirect effects of research, including

Ždead ends from which substantial learning takes.place , spillovers, and synergistic effects.

Other, more aggregate, approaches seek to over-come the limitations of historical trace studies. Mea-sures of basic research outputs have been refined tothe point where reasonably valid and reliable data onthe quantity and quality of research outputs can beobtained. It has been argued that while any singleoutput measure is insufficient, the relative effective-ness and efficiency of research programs, organiza-tional units, and institutions can be measured using

Žcombinations of measures e.g., Martin and Irvine,.1983 . The problem for such indicators generally is

one of linking outputs to impacts, especially impactsthat are valued in markets or the political arena. Forsome policy purposes, such as resource allocationdecisions among several institutions conducting ba-sic research, these kinds of studies can be of consid-erable value. But they are inadequate for questionsconcerning the impact of basic research on largersocietal goals, especially when measures of value aresought. In addition, except in the most general sense,these approaches offer little guidance to research

program managers seeking to maximize the benefitsthat result from their programs of research. As theRANDrCTI study concluded,

Most direct indicators of R&D outcomes such ascitation and other bibliometric indicators are only

Ž .indirectly and loosely related to those aspects ofeconomic and other goals of public improvement

Žwhich are of greatest interest Cozzens et al.,.1994, p. 44 .

As noted above, economic assessments of re-search fall into two basic categories: productionfunction analyses and studies seeking social rates ofreturn. Production function studies assume that aformal relationship exists between R&D expendi-tures and productivity. While there is ample evidencethat such a relationship exists, there are numerousproblems with this approach. Prominent among thestudies that employ this approach are those by Deni-

Ž . Ž . Ž .son 1979 , Link 1982 , Levy and Terleckyj 1984 ,Ž . Ž .Jaffe 1989 and Adams 1990 . Regardless of the

assumptions underlying these studies, all concludethat the relationship between private R&D expendi-tures and various measures of economic growth andproductivity are positive and substantial. But thereare fundamental problems with this approach that areexacerbated when applied to public R&D expendi-tures. First, the ‘‘technical information’’ term in theproduction function is only approximated by R&Dexpenditure data, and in any event its effect on thelarger production system is not well understood.Second, the approach does not account adequatelyfor externalities that result from R&D activity. Third,if the focus is on publicly supported R&D, addi-tional problems arise because the intent of mostpublic R&D is not to stimulate economic growth,

Ž .but to achieve public agency missions. Any contri-bution to economic growth is thus due to indirectknowledge transfers. Fourth, the contributions ofbasic and applied research are difficult to distinguish,yet these are important for many policy purposes.Finally, this approach is designed to address eco-nomic benefits that result from incremental changesin production efficiencies, but cannot adequately ac-count for the effect of new products or radical

Žchanges in production efficiency Cozzens et al.,.1994, p. 48 .

Page 5: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678 661

Aggregate social rate of return studies attempt toestimate the social benefits that accrue from changesin technology and relate the value of these benefitsto the cost of the investments in research that pro-duced the changes of interest. These studies employ,by analogy, what amount to the internal rate ofreturn calculations often used by private firms. So-cial benefits are measured as the sum of consumerand producer surpluses. Prominent examples of this

Ž .approach are studies by Mansfield 1980; 1991 ,Ž . Ž .Mansfield et al. 1977 , Tewksbury et al. 1980 and

Ž .Link 1981 . At a more disaggregate level, the con-sumer surplus approach has been used to estimatethe returns from public investments in technologydevelopment programs as well. We discuss this, andthe use of an alternative technique, briefly in asubsequent section of this article.

As in the case of production function studies,aggregate consumer surplus studies tend to show thatthe aggregate rate of return to research is positive,and that the social rate of return exceeds, on average,the private rate of return. Also, as in the case ofproduction function studies, this approach has signif-icant drawbacks. First, the causal mechanism thatlinks R&D investment to social or private returns isimputed but not direct. Second, because calculationsof consumers’ and producers’ surplus depend on theexistence of supply and demand curves, they areinappropriate for goods and services that are radical

Ždepartures from their predecessors which is just thecase for the most interesting products of basic re-

.search . Third, because they rely on data from indi-vidual cases, like historical trace studies they aredifficult to generalize from and expensive to con-duct. Fourth, they rely on an adaptation of an invest-ment model intended for short-term investment cal-culations, not the long-term benefits that might ac-crue from basic research. Thus, the discounting re-quirements of the models may severely underesti-mate the contribution of basic research. Finally, aswith other approaches, spillovers are difficult to

Židentify and account for Cozzens et al., 1994, p..55 .In summary, the dilemma the limitations of these

approaches pose for the research evaluator lies in thequestion of attribution, this being that to realiseeconomic effects of research a range of complemen-tary factors from within and beyond the innovation

process are brought to bear, obscuring the relation-ship under study.

Hertzfeld concludes his assessment of the varioustechniques with a recommendation:

. . . the most promising type of economic study formeasuring returns to space R&D is the documen-tation of actual cases based on direct surveyinformation. Case studies provide relatively clearexamples of the returns, both theoretical and ac-tual, to space R&D investments. A well-struc-tured series of case studies coupled with goodtheoretical modeling aimed at integrating the dataand incorporating those data with more generaleconomic measures of benefits may well be theway to establish reliable and ‘believable’ eco-

Žnomic measures of the returns to space R&D p..168 .

Ž .Hall 1995 , in a review of the private and socialreturns to R&D supports the employment of abroader range of approaches, cautions that case studyevidence may be hampered by focus on ‘winners’,and needs to be supplemented by a computation ofreturns.

3. Evaluating linkages

Since the RANDrCTI assessment was published,a number of papers have appeared that are critical ofthe entire conceptual framework economists havebeen using to analyze scientific activity. This ‘‘neweconomics of science,’’ perhaps best formulated by

Ž .Dasgupta and David 1994 , is critical of the ‘‘oldeconomics of science’’ based on production func-tion-based approaches to assessing the payoffs from

Ž .science and its linkages to technology , and intro-duces broader perspectives that may lead to newapproaches for evaluating basic science. By incorpo-rating the insights gained from sociological studiesof science, economists’ perspectives are expandingto incorporate the characteristics of research institu-tions and the professional networks that bind knowl-edge producers and users. One of Dasgupta and

Page 6: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678662

David’s propositions emerging from their work readsas follows:

The organization of research under the distinctrules and reward systems governing universityscientists, on the one hand, and industry scientistsand engineers, on the other, historically has per-mitted the evolution of symbiotic relationshipsbetween those engaged in advancing science andthose engaged in advancing technology. In themodern era, each community of knowledge seek-ers, and society at large, has benefited enor-

Žmously thereby Dasgupta and David, 1994, p..518 .

Although much of the ‘‘new economics’’ is de-voted to the implications of this broader thinking toresource allocation efficiencies, its inclusion of theprocesses of knowledge transfer and use and of thenetworks by which such transfers take place offersopportunities for new ways of thinking about how tovalue science beyond attempts to monetize the valueof the knowledge produced andror to infer suchvalue from the applications that enter economic mar-kets.

Bozeman et al., as part of an ongoing evaluationof the US Department of Energy’s Office of BasicEnergy Sciences, have recognized this opportunityand are currently developing new theory aboutknowledge and innovation that, in our view, showspromise for methodological advances in evaluating

Žbasic research Bozeman and Rogers, 1999; Boze-.man et al., 1998; Rogers and Bozeman, 1998 . Boze-

man and Rogers propose a ‘‘pre-economic’’ ap-proach to evaluating scientific knowledge, one thatdoes not purport to supplant economic evaluationsbut to complement them. Their approach is based onthe range and repetition of uses of knowledge bothby scientific colleagues and by technologists. One ofthe several advantages of this approach is that, unlikemost evaluations that use the program or project asthe unit of analysis, it incorporates the role of re-search collectives and networks engaged in knowl-edge production and use. In their view,

innovation and knowledge flows cannot be as-sessed independently of the collective arrange-ment of skilled people, their laboratories and in-

struments, their institutions, and their social net-works of communication and collaboration. Inno-vation gives rise to diverse scientific and techno-logical outcomes including, among others, scien-tific knowledge, new technological devices, craftand technique, measures and instruments, andcommercial products. But it is the collective ar-rangements and their attendant dynamics, not theaccumulated innovation products, that should moreproperly be considered the main asset when as-

Žsessing the value of research Bozeman and.Rogers, 1999 .

Thus, for Bozeman and Rogers, the appropriateevaluation unit of analysis should be the set ofindividuals connected by their uses of a particularbody of information for a particular type of applica-tion — the creation of scientific or technical knowl-edge. They term this set of individuals the ‘‘knowl-edge value collective.’’ In this scheme, science canbe valued by its capacity to generate uses for scien-tific and technical information. That capacity is cap-tured by the dynamics of the knowledge value col-lectives associated with use, as measured by theirgrowth or decline, their productivity or barrenness,

Ž .and their ability to develop or deflect human capi-tal.

In many ways, this work builds upon the perspec-tive developed by the Centre for Sociology of Inno-

Ž .vation CSI in France, who have focused theirevaluation activities upon the emergence of net-works. One approach, applied mainly at the institu-tional level, uses the concept of techno-economicnetworks to characterise in a dynamic manner thenature, types and durability of alliances betweenpublic laboratories and companies, with particularemphasis on the ‘‘intermediaries’’, meaning docu-ments, embodied knowledge, artifacts, economic

Žtransactions and informal exchanges Callon et al.,.1997 . The emphasis in the CSI approach moves

away from economic appraisal, focusing instead uponhow networks are assembled to realize innovationsin collective goods. This brief treatment cannot dojustice to the developing work on collectives, butsuffice it to say that it appears to offer considerablepromise for capturing many of the payoffs frombasic research, such as the development of human

Page 7: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678 663

capital, that thus far have been systematically under-estimated or ignored.

Citations to scientific papers made in patents havebeen used as an indicator of growing linkage be-

Žtween academic science and industry Narin and.Norma, 1985; Narin et al., 1997 . While supporting

Ž .this indicator, Pavitt 1998, p. 109 has argued thatpatenting by universities gives ‘‘a restricted anddistorted picture of the range of the contributionsthat university research makes to practical applica-tions.’’ The explanation of this view lies in thecomplex nature of the relationship between universi-ties and the corporate sector, involving flows ofideas and people.

Ž .Roessner and Coward 1999, forthcoming areconducting an analytical review and synthesis of theresearch literature on cooperative research relation-ships between US academic institutions and industry,and US federal laboratories and industry. The scopeof the review and synthesis was restricted to:Ø empirical studies based on original or archival

data only rather than theoretical, normative, orpolicy analyses;

Ø studies of research cooperation among US institu-tions;

Ø published or at least readily available reports andŽstudies i.e., conference papers, dissertations, the-

ses, contractor reports, and unpublished papers.generally were excluded ;

Ø published or printed since 1986.As of this writing, 33 studies have been reviewed

of the approximately 50 to be examined. While onlya handful of these studies were intended as formalprogram evaluations, most were intended to yieldpolicy-relevant results. Surveys, case studies, andpersonal interviews accounted for most of the meth-

Ž .ods employed in these studies 22 ; two conductedquantitative analyses of archival data, two employedmodeling, and the remainder were literature reviewsor essays or used a combination of methods. As ageneral observation, the preponderant use of surveys,case studies, and interviews is an indication of boththe recent emergence of cooperative R&D relation-ships as a subject of study and the complex, dynamiccharacteristics of these relationships.

The following three brief case studies of evalua-tions of programs to stimulate academic–industrylinkages demonstrate both general lessons for

methodological practice and the importance of thespecific and individual context of each evaluation.

ŽAll were subject to a mix of approaches the normal.case for evaluations and demonstrate the strengths

and limitations of each method employed in meetingthe needs of different clients and stakeholders.

3.1. EÕaluation of the Center for AdÕanced Technol-( )ogy DeÕelopment CATD

The Center for Advanced Technology Develop-Ž .ment CATD was established in 1987 at Iowa State

Ž .University ISU with funds from the US Depart-ment of Commerce. CATD seeks to bridge two gapsin the innovation process:Ø between the research results of the university and

the commercial market; andØ between a company’s problems and the expertise

resident at the university.To address the first gap, CATD funds research

intended to demonstrate proof-of-concept and de-velop advanced prototypes. To address the second,CATD can match funds provided by industry toaddress industrial problems.

An extensive evaluation of CATD was conductedŽ .in 1995–1996 Roessner et al., 1996 . The evalua-

tion is significant for several reasons: there wassufficient financial support to conduct a relatively

Žthorough study a situation uncharacteristic of mostsingle, university-based technology transfer organiza-

.tions , there were multiple clients for the evaluation,and the evaluation design combined multiple strate-gies and types of data. There were three primaryclients for the evaluation: the National Institute of

Ž .Standards and Technology NIST of the US Depart-ment of Commerce, which paid for the evaluation,the Iowa State Legislature, and CATD staff. NISTwas interested in knowing whether it should supportlong- or short-term projects. For the state, CATDneeded to validate its successes and show evidenceof success, as developed by an independent evalua-tor, to its constituencies. Finally, CATD manage-ment wanted to know which of its activities gener-ated the greatest payoff, how its activities wereperceived by its industrial clients, and what changeswere needed to increase the payoffs from its activi-ties.

Page 8: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678664

A project with multiple clients, each asking dif-ferent questions, called for a design employing mul-tiple evaluation strategies. In addition, multiple

Žstrategies and multiple measures of outcomes and.impacts would yield greater confidence in the re-

sults if the several strategies reached similar conclu-sions. The research team employed three researchstrategies:Ø a survey of CATD client firms and a formal but

simplified benefitrcost analysis, which would ad-dress payoff from investment questions;

Ø a series of detailed case studies, which wouldaddress the client perception and managementquestions;

Ø a ‘‘benchmarking’’ analysis involving similarprograms, which would also address the payoffquestions using a different method.Of these, the benchmarking exercise was the most

novel approach. Benchmarking programs such asCATD presents formidable challenges to analystsand evaluators, most of which have to do with thelack of data on programs that are truly comparable.For these reasons, the Georgia Tech team used threedifferent approaches to benchmarking CATD outputsand impacts:Ø data from the FY 1993 survey of members of the

Association of University Technology Managers;Ø four selected state cooperative technology pro-

grams; andØ results of a survey of companies conducting co-

operative research with federal laboratoriesŽ .Bozeman et al., 1995 .Methodologically, the case studies offered the

best fit between client questions and the evaluationresults. The least successful effort was benchmark-ing, primarily because it proved so difficult to iden-tify comparable programs that had been evaluated ata comparable level of detail. The benefitrcostframework was required to address the questionsposed by decisionmakers concerned about the futureof the program itself, but the results offered no morethan a modest justification for the public expendi-tures and little for purposes of research management.The case studies yielded three models of university–industry collaboration that could be associated withmeasures of impact. Managers could then decide,

Žbased on their preferred outcomes e.g., jobs vs..increased private investment vs. startup companies ,

what project selection criteria to emphasize. The factthat quite different evaluation methods produced

Ž .similar overall positive results strengthened the ac-ceptability of each type of result.

3.2. Impact on industry of participation in the engi-neering research centers program

The National Science Foundation’s EngineeringŽ .Research Center ERC program, initiated in 1985,

was intended to stimulate US industrial competitive-ness by encouraging a particular brand of industry–

Žuniversity research collaboration the university-.based industrial research consortium , emphasizing

interdisciplinary research, and fostering a team-based,systems approach to engineering education.

Evaluating such a program presents formidablechallenges, especially if, as was the case here, some

Ž .clients of the evaluation e.g., Congress wished toobtain information on benefits to companies in termsof dollar payoff of their participation in ERCs. Theoverall objectives of the evaluation were to examinepatterns of interaction between ERCs and participat-ing companies, and to identify results of that interac-tion in terms of types of impact and value of impactsŽ .benefits to firms . These deceptively simple objec-tives posed several challenges because, among otherreasons, the consequences of industry–university in-teraction take multiple forms over time, flow viamultiple paths within the firm, and are likely to haveintermediate effects not valued in monetary terms —

Ž .but set in motion or prevent activities that do havesuch value.

In an effort to deal with these challenges in theinitial research design, SRI International, the evalua-tor, first conducted a series of case studies within asmall number of firms to provide details on intra-firm

Žactivities related to ERC participation Feller and.Roessner, 1995; Ailes et al., 1997 . Next, drawing

upon the case studies, a focus group consisting ofrepresentatives from companies that participate inERCs was held. Third, the research team designed asurvey instrument based on the results of the casestudies, focus group, and model, and surveyed themore than 500 firms that were participating. Follow-ing the survey, SRI conducted telephone interviewswith about 20 survey respondents who reported par-

Page 9: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678 665

ticularly detailed or varied benefits from their partic-ipation.

Results of the case studies of ERC-company pairsand the focus group required that ‘‘typical’’ ap-

Žproaches to survey design e.g., how respondents are.identified; the unit of analysis be rethought, and that

prevailing ideas about how firms value their partici-Žpation in ERCs as well as other investments in

.external R&D are incomplete and simplistic. To bespecific, the preliminary investigations suggestedthat:

Ž .Ø The ‘‘known’’ direct consequences observableto a firm of its participation in an ERC areextremely limited, often restricted to a handful ofpeople in the firm.

Ø The business unit that pays the fee to participateŽ .and is identified as the ‘‘member’’ may not be

Ž .the business unit that receives the observablebenefits from participation.

Ø Companies regard benefits received from partici-pation in an ERC to be a function of their effortsto make use of ERC results and contacts, ratherthan of the amount of their membership fee.

Ø Firms rarely attempt to estimate the precise dollarbenefits gained from their participation in an ERC.These results had a number of very important

implications for any effort to employ surveys tomeasure impacts or benefits from industry participa-tion in collaborative R&D relationships with univer-sities or with other R&D suppliers. First, the busi-ness unit, not the firm or even the division, is theactive response category. Thus, the survey instru-ment must be directed to the unit that is the primarydirect beneficiary of the company’s participation inthe ERC. Second, surveys must distinguish carefullybetween at least two key roles within the unit: the‘‘champion’’ and the decision maker who approvesthe budget for ERC membership. Data must beobtained from both. The SRI team addressed this

Ž .problem by identifying these roles as 1 the‘‘champion,’’ the person who interacts most inten-

Ž .sively with the ERC, and 2 the ‘‘approver,’’ theperson who reviews or approves the budget for theunit’s participation in the ERC. Third, the surveymust establish the nature of each respondent’s in-volvement with the ERC. Valid analysis of anyimpact or benefit data must control for the nature ofeach respondent’s interaction. Finally, it is important

to separate ‘‘results’’ or ‘‘impacts’’ from ‘‘benefits,’’because different respondents within the same unitmay agree on results or impacts but impute quitedifferent levels of benefit to them.

Companies surveyed derived a multiplicity ofbenefits from their participation in ERCs, but fewmade a systematic effort to monetize these benefitsor develop formal measures of payoff or cost-ef-fectiveness in order to justify the cost of member-ship. Instead, they evaluated their participation inERCs as a dynamic process rather than expecting afixed set of benefits that can be assessed by presentor retrospective rating of outcomes. Occasionally,specific outcomes were realized and an economicvalue estimated, but this was rare. Telephone inter-views revealed that firms concluded that efforts tomeasure benefits in dollars would cost more than thecompany’s investment in the ERC. Only a smallproportion of companies developed a new product orprocess or commercialized a new product or processobtained as a result of ERC interaction, but thosethat did valued these benefits particularly highly.Thus, it proved to be extremely important to distin-guish between whether a benefit was experiencedand the value placed on that benefit.

3.3. Teaching company scheme

An example of a European study in this domainsaw evaluations addressing what is widely perceivedas a successful initiative, the UK’s Teaching Com-

Ž . Ž .pany Scheme TCS Senker and Senker, 1997 . TheTCS is a long-running initiative that provides accessto technology for firms and facilitates academic–in-dustrial technology transfer by employing graduatesto work in a partnering company on a project ofrelevance to the business. However, the study foundlittle evidence of a positive association between par-ticipation in the Scheme and the performance ofacademic departments. In fact, there was a higherpositive association for departments involved in otherforms of industrial linkage. From the perspective ofevaluation methodology, this example illustrates thedifficulty of isolating the effects of a single stimulusfrom a wide range of variables and indeed of findingsuitable proxies for collective academic performancewhen linkages are still primarily at the level ofindividuals. In another evaluation study, the TCS

Page 10: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678666

scored highest on a cost-effectiveness index devisedby the UK National Audit Office to compare a rangeof programs which supported innovation. This indexwas based upon nineteen performance indicatorswhich were intended to reflect a combination of theefficiency of delivering assistance and the measur-

Žable effect on innovation National Audit Office,.1995 . While the attempt to compare across widely

differing schemes is laudable, the report servesmainly to illustrate the pitfalls involved in trying tocapture complex effects with quantitative indicators,and the importance of implicit or explicit weightingof criteria. TCS did well because it involved lowadministrative overheads by comparison withschemes offering grants to firms. However, grantingschemes have different objectives that may not beachievable through low overhead policy instruments.

4. Evaluation of collaborative R&D

For Europe, the emergence of publicly supportedcollaborative R&D between firms acting in research

Ž .consortia and usually with academic partners was adistinctive feature of the 1980s, despite long an-tecedents in industrial research associations. The USwas initially inhibited in this area by anti-trust legis-

Ž .lation Guy and Arnold, 1986 , but private initiativessuch as the Microelectronics and Computer Technol-ogy were followed by the Advanced Technology

Ž .Program ATP in 1990. The latter was much moresimilar to its European counterparts in structure andaims, though somewhat smaller. It also differed inoffering single company support targeted at smallfirms in addition to support for joint ventures.

From the perspective of this review, an importantfeature of such programs is the role they have playedin the stimulation of research evaluation method and

Ž .practice Georghiou, 1998 . The novelty of this pol-icy instrument and the expectation that new types ofeffects would be significant provided a stimulus toprogram managers to seek a deeper understanding oftheir actions. Added to this has been an ongoingdesire to demonstrate to policy-makers and otherstakeholders that, on the one hand the programs arecontributing to the competitiveness of firms, but thatnonetheless, those firms have been induced to under-take R&D that would not have taken place in the

absence of an intervention. For the ATP, a furtherhurdle has been the need to demonstrate to politicalcritics that the program is satisfying the theoreticalcriteria for intervention by operating in the marginwhere private returns alone do not justify the R&Dinvestment without the additional consideration ofsocial returns. In Europe, questions of rationale havegenerally been dealt with at the ex ante stage. Inmost cases, it has been sufficient to demonstrate thateconomic and social benefits have been achieved. Asin the previous section, methodological issues arediscussed in terms of a series of case studies, se-lected both for the prominence of the programsinvolved and for the evaluation issues they raised.

4.1. The AlÕey Programme

From the earliest days of evaluation of collabora-tive programs, it was clear that a broad range ofeffects needed to be considered. The evaluation ofthe UK’s Alvey Programme for Advanced Informa-

Ž .tion Technology Guy and Georghiou, 1992 set thestyle for most subsequent UK national evaluations ofcollaborative R&D, as well as forming an input tobroader European practice. That particular exercisewas a real-time evaluation, tracking the programfrom shortly after its inception, with periodic topic-oriented reports fed back to the management, manyof which addressed the complex process issues aris-ing from the program, for example the problems with

Žintellectual property conditions Cameron and.Georghiou, 1995 . The final report, delivered 2 years

after the initiative ended, found that Alvey had suc-ceeded in meeting its technological objectives aswell as its main structural objective of fostering acollaborative culture, particularly in the academic–industrial dimension. However, it had manifestly notsucceeded in its commercial objective of revitalisingthe UK IT industry. The problem, the evaluatorsfound, was in the objective itself. What was, despiteits size, still an R&D program had been sold as asubstitute for an integrated industrial policy withgoals to match. For reasons that went well beyondtechnological capability, the market position of UKIT firms declined substantially during that period.This was perhaps the first example of a long-runningevaluation problem in this sphere: such programshave multiple and sometimes conflicting goals that

Page 11: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678 667

are often not articulated in a format amenable toinvestigation through an evaluation. Hence, it wasnecessary to reconstruct the objectives in an evalu-able form.

Since that time, the UK has developed the so-Ž .called ROAME or ROAMEF system, whereby

prior-approval for a program requires a statementŽdetailing the rationale, objectives verifiable if possi-

.ble , and plans for appraisal of projects, monitoringŽ . Žand evaluation plus feedback arrangements Hills.and Dale, 1995 . This clearly simplifies the situation

for evaluators but also emphasizes the need forevaluators to engage with policy rationales, a topicreturned to below in the discussion on the evaluationof the ATP.

4.2. The EU Framework Programmes

For most Europeans, the development of evalua-tion has gone hand-in-hand with the history of pan-European collaborative initiatives, notably the Euro-pean Commission’s Framework Programmes and theinter-governmental EUREKA Initiative. The Frame-work Programmes have been evaluated in manydifferent ways at the behest of a variety of stakehold-

Žers Olds, 1992; Georghiou, 1995; Guzzetti, 1995;.Peterson and Sharp, 1998 . The legally based evalua-

tion process has always been based upon conveninga panel of independent experts for each sub-programand asking them to report upon its scientific, man-agerial and socio-economic aspects. Reports focusedheavily on the first two criteria, confining themselvesto generalized remarks on the third. Since 1995, thisapproach has been elaborated, following pressurefrom Member States, and now consists of continuousmonitoring, reporting annually, and 5-year assess-ments, carried out midway through program imple-mentation but including within the frame of analysis

Ž .the preceding program Fayl et al., 1998 .However, from a methodological perspective, the

most interesting developments have taken place atthe margins of this system or outside it altogether.The first significant development came from a studycarried out in France that aimed to look at the impactof all EU R&D funding upon the national research

Ž .system Laredo et al., 1990 . This was the prototype´of a family of ‘‘national impact’’ studies which,using a survey and interviews, took a cross-cutting

view of effects and revealed a number of unexpectedresults, for example the significance of Europeanprojects in providing a basis for doctoral training. Byworking at the level of the organisation rather thanthe project, some idea of the aggregate and interac-tive effects of participation could be gained.

Also from France came the best known Europeanattempt to calculate the economic effects of pro-grams upon participating organizations. Known asthe ‘‘BETA method’’ after the laboratory at Stras-bourg University where it originated, this approach isbased upon in-depth interviews during which man-agers in the organizations are asked to identify ‘‘ad-ded value’’ from sales and cost-savings, and toattribute a proportion of this to their participation inthe project in question. The most innovative aspectof the model is a formula for calculating indirectbenefits that arise from transfer of technology fromthe project to other activities of the participant, frombusiness obtained through new contacts or reputationenhancement gained via the project, from organiza-tional or management improvements, or from theimpact on human capital or knowledge in the organi-zation. The principal difficulty surrounding this ap-proach is that of attributing particular economic ef-fects to a single project, when in many cases, theinnovation may have drawn upon multiple sourcesfrom inside and outside the company concerned. Themethod produces ratios attractive to research spon-

Žsors because in a well publicized example Bach et.al., 1995 they show direct effects some 14 times

greater than the initial public investment. The figurecannot, of course, be taken as a rate of return as itdoes not include in the calculation any other invest-ments necessary to realize the returns.

Future directions for evaluation of the FrameworkProgramme are discussed in a recent report from the

ŽEuropean Technology Assessment Network Airaghi.et al., 1999 . A new emphasis upon broader social

objectives in the current Fifth Framework Pro-gramme implies the need to deal with a broaderrange of stakeholders and to enter the difficult terri-tory of measuring the effects of R&D on employ-ment, health, quality of life and the environment, allareas mainly driven by other factors. A second chal-lenge is the need to deal with the sometimes elusiveconcept of ‘‘European value-added’’, in other words,to assess whether the R&D objectives were most

Page 12: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678668

efficiently pursued at the European Community level,as opposed to national, regional, or indeed globallevels.

4.3. EUREKA

Evaluation of the EUREKA Initiative has evolvedlargely independently from that of the FrameworkProgramme. EUREKA is normally described as a‘‘bottom-up’’ initiative in which firms may seekentry for any projects that meet the broad criterion ofdeveloping advanced technology with a market ori-entation. There is no central budget and only a verysmall central secretariat coordinating decisions takenby representatives of the Member States who choosewhether or not, and by how much, to fund their ownnationals’ participation in collaborative projects. Forthe first decade of EUREKA’s existence, evaluation

Ž .took place optionally at the behest and expense ofthe Member State holding the Chair for that year.

After an administratively oriented panel exercisein 1991, the largest evaluation of EUREKA, andindeed of any European initiative to date, took placeduring 1992r1993. This involved teams from 14countries working together, with a survey of allparticipants and interviews with about three partici-pants from each of a selection of projects. Findings

Ž .are in Ormala et al. 1993 , and a description of theŽ .process appears in Dale and Barker 1994 . This

evaluation stood at a crossroads in terms of Euro-pean approaches as it involved an explicit combina-tion of tools developed during the Alvey and Frenchimpact studies, with further inputs from the Nordic

Žand German evaluation traditions described respec-.tively in Ormala, 1989 and Kuhlmann, 1995 . Not

surprisingly, it has provided a template for manyexercises that followed, within and outside EU-REKA. In particular, the questions developed for thesurvey, on topics such as the additionality of collabo-rative research and on the relation of R&D to firmstrategy, have been extensively reproduced andadapted.

Since 1996, EUREKA has adopted a new ap-proach which, though conventional in the tools it

Žuses adaptations of the survey instruments from the.1993 evaluation , has been innovative in its structure

Ž .Sand and Nedeva, 1998; Georghiou, 1999 . Knownas the Continuous and Systematic Evaluation, this

involves the automatic despatch of a questionnaireon outputs and effects upon completion of the pro-

Ž .ject this functions as a Final Report . Shorter fol-low-up questionnaires are despatched after one, three

Ž .and in the future 5 years to participants indicatingcommercial effects. In addition a sample of some20% of participants who completed 3 years before isinterviewed. An annual report is prepared by anindependent expert panel on the basis of these find-ings. While simple in concept, this is a rare exampleof an evaluation that systematically follows-up ef-fects during the market phase. A further innovationby EUREKA in late 1999 was the convening of keyparticipants in all past and present evaluations toreview their collective findings and the degree towhich these had been adopted. One key recommen-

Ždation the need for post-project support for small.firms had recurred in each evaluation. Its lack of

adoption was seen as a measure both of its impor-tance and of the lack of an adequate system tofollow-up and learn from evaluation findings.

4.4. SEMATECH

SEMATECH, the US government industry re-search consortium founded in 1987, has been thesubject of several academic studies, each seekingdifferent measures of its effectiveness and cost-ef-fectiveness. SEMATECH originally consisted of aconsortium consisting of 14 US firms in the semi-

Žconductor industry together accounting for 80% of.US semiconductor component manufacturing and

the Defense Advanced Research Projects AgencyŽ .DARPA of the US Department of Defense. Privateinvestment in the consortium has typically totalledUS$100 million annually, matched by governmentfunds. As of 1998, SEMATECH is totally privatelysupported, and the number of member firms hasdeclined.

The broadest assessment of SEMATECH wasconducted by Grindley, Mowery, and Silverman,who published their results in 1994. The authorsdrew upon the available literature, SEMATECHarchives, and interviews with SEMATECH managersand members of the technical advisory board. Theydiscuss evaluation criteria, observing that these areparticularly problematic because of several features

Page 13: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678 669

of research consortia generally and SEMATECH inparticular:Ø agreement among the several stakeholders on

goals was lacking;Ø similarly, agreement on appropriate evaluation

criteria was lacking;Ø the relevant time horizon for achievement of SE-

MATECH’s broadest goals extends beyond theŽtime of its existence 5 years at the time of this

.evaluation ;Ø it is extremely difficult to establish a counterfac-

tual against which to assess the impact of what isbasically a unique arrangement;

Ø SEMATECH has been highly flexible, changingŽits goals as conditions changed Grindley et al.,

.1994, p. 736 .They conclude that flexibility in goals was crucial

for SEMATECH’s survival; that allowing for flexi-bility, its goals have been achieved; and that thepolitical goals of achieving ‘‘world leadership forthe US semiconductor industry’’ and ‘‘industry com-petitiveness’’ constituted unrealistic criteria againstwhich the program could be judged.

Ž .Link et al. 1996 evaluated SEMATECH withconsiderably more restrictive criteria in mind. Theysought to measure the returns to member companiesfrom their investments in SEMATECH, restrictingthe returns to those accruing from research alone,excluding benefits from improvements in researchmanagement, research integration, and spilloversŽ .Link et al., 1996, p. 739 . They drew a representa-tive sample of eleven projects and then interviewedthe people in member companies who were mostknowledgeable about each project. Many companieseither estimated a range of benefits or said that theyexperienced significant benefits but could not esti-mate them accurately. Respondents also reported thatintangible benefits related to research management,integration, and spillovers were more important thanbenefits related to research. The authors proceeded touse these benefit estimates and project cost dataobtained from SEMATECH accounting records tocalculate an internal rate of return for member com-panies. The ‘‘best’’ estimate of IRR was 63%, afigure that included both public and private fundsinvested in each project, burdened with an appropri-ate overhead figure, and with projects weighted bysize. This study demonstrates that it is possible to

obtain benefit estimates in quantified form frommembers of research consortia but, consistent withsimilar results from a variety of other studies ofresearch collaboration and technology transfer, theseestimates fail to capture the bulk of benefits associa-

Žtion with consortium membership Roessner andBean, 1994; Feller and Roessner, 1995; Roessner et

.al., 1998 .The third evaluation, carried out by Irwin and

Klenow and published in 1996, used similarly nar-row evaluation criteria: basically several measures ofmember firm performance as compared with non-member firms. Although the major strength of thisevaluation design is, in principle, the use of an

Žimplicit control group via regression analyses usingdummy variables to represent memberrnon-member

.status , in reality the fact that SEMATECH membersare the dominant firms in the industry weakens the

Žvalue of this usually desirable feature. See the dis-cussion of the Irwin and Klenow evaluation in the

.chapter by Klette, Moen, and Griliches in this issue.

4.5. ATP

The US Department of Commerce’s ATP is draw-ing substantial critical attention from policy-makers,and partly for this reason is also the supporter —and subject — of numerous formal evaluation stud-ies. These studies are worth describing briefly here,less because any of them individually represents amethodological advance, but because of the sheersize and variety of the evaluation effort. In the wordsof the Director of ATP’s Economic Assessment Of-fice, which supports and conducts many of these

w xevaluations, ‘‘It ATP is probably the most highlyscrutinized program relative to its budget size of any

Žgovernment program to date’’ Ruegg, 1998a,b, p..3 . The range of research topics being supported by

ATP as of mid-1998 suggests a rich future source ofŽ .largely economic analyses and evaluations of virtu-

Žally every aspect of the program Ruegg, 1998b, p..8 .

Papers prepared for a recent ATP-sponsored sym-posium on evaluation reflect the range of studiesbeing conducted and a number of issues related to

Ž .evaluation methods. The paper by Jaffe 1998 re-views the extensive past efforts to measure the social

Page 14: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678670

rates of return from R&D and compare them withprivate returns. In the case of ATP, of course, mea-suring the average rates of net social return to a

Žportfolio of projects may serve to justify or under-.mine the rationale for the program, but in the ab-

sence of predictive models directed at the individualproject level, project selection decisions remain unin-formed about possible spillover effects from individ-ual projects. Identifying technological and marketfactors that tend to be systematically related to largepositive differences between the social and privatereturns at the project level continues to be a form-idable task facing economists.

One of the ATP program’s objectives is to accel-erate the development and commercialization of new

Ž .technologies. Laidlaw 1998 surveyed 28 projectsfunded in 1991 to obtain estimates of the amountthat ATP funding reduced development cycle time,and estimates of the economics of reducing cycletime by 1 year. Estimates of cost reductions orsavings resulting from reduced development cycletime ranged from one million dollars to ‘‘billions,’’suggesting that such estimates may be highly unreli-able. In another project reported in the symposium,

Ž .Link 1998 sought estimates of the reduction inresearch costs attributable to ATP funding, realizedby seven companies participating in the PrintedWiring Board Research Joint Venture. Companiesestimated research cost savings due to workyearssaved and testing and machine time saved, and pro-duction cost savings due to productivity improve-ments. Total estimates are reported 2 years into theproject and at the project’s end. Link provides nodiscussion of the reliability of these estimates, how-ever.

Researchers at CONSAD Research attempted toestimate the preliminary impacts on the nationaleconomy of an ATP-sponsored project whose objec-tive was to control dimensional variation in automo-bile manufacturing processes. At the plant level,experts were asked to estimate the direct impacts ofthe newly developed technologies on the productionprocesses across different assembly plants; other ex-perts estimated the rate of adoption of the technolo-gies by automobile manufacturers and the magnitudeof the impact of the resulting increase in productquality on the sales of automobiles. A macroeco-nomic inter-industry model was then employed to

estimate the impacts on the US economy of theincreased demand due to quality improvements. Theresults showed cumulative effects over 5 years ofUS$8.7 billion in increased total industrial output

Žand an increase of more than 160,000 jobs CON-.SAD Research, 1998 .

The Republican-led US Congress has been highlycritical of ATP, and consequently has initiated anumber of inquiries concerning its cost-effectivenessand political justification. Prominent among the for-mal evaluation efforts launched by the Congress is alarge-scale, survey-based evaluation of ATP con-

Ž .ducted by the General Accounting Office GAO . Anunusual aspect of the evaluation was the use of acomparison group, quasi-experimental design. In ad-dition to 89 successful applicants for ATP awards,39 ‘‘near winners’’ who had been scored as havingvery high scientific and technical merit, had satisfiedthe program’s requirements, and had strong technicaland business merit but had not received ATP fundingwere surveyed. The 128 winners and near winnerswere surveyed by telephone using a 76-item, closed-ended instrument. Data were obtained on applicants’efforts to obtain funding before applying to ATP,and if they intended to pursue their projects whetheror not they received ATP funding. Near winnerswere asked about the fate of their projects afterfailing to obtain ATP support. The results were usedby both critics and supporters of ATP to support

Ž .their views U.S. General Accounting Office, 1996 .If nothing else, the GAO evaluation demonstrates

the difficulties associated with evaluating public pro-grams intended to support pre-competitive technol-ogy development in private firms. While relevantand, apparently in the case of the GAO evaluation,reliable data on key questions about pre- and post-award project histories can be obtained from awardwinning companies and from a comparison group ofcompanies, drawing conclusions about whether theprogram is addressing a market failure or not is apolitical rather than an analytical exercise.

The GAO experience has been echoed in similarEuropean work on the issue of additionality — whatdifference did the intervention make. Traditionaltreatments have focused on input additionality —that is whether the incremental spending by an as-sisted firm was greater than or equal to the amount

Ž .of subsidy. Papaconstantinou and Polt 1997, p. 11 ,

Page 15: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678 671

summarizing an OECD conference, concluded thatseveral participants saw:

Ža focus on additionality the changes in behaviourand performance that would not have occurred

.without the programme as a criterion for successis simply a reflection of the difficulty of accu-rately measuring spillovers or externalities andthus the net social benefits of programmes.

They present a view more consistent with thenetwork models described earlier in this article: that

Žthe concept of ‘behavioral additionality’ inducedand persistent changes in the innovative behavior of

.firms provides a sounder measure of the effects ofŽ .intervention Buisseret et al., 1995 in keeping with

policy rationales founded in the notion of systemicrather than market failure.

4.6. Economic estimates of the net benefits of publicsupport of technology deÕelopment

Although our focus in this article is on methodsfor evaluating technology programs, in contrast to

Žeither more aggregate levels of analysis e.g., poli-. Žcies or less aggregate levels e.g., individual pro-

.jects or technologies , we would be remiss if weomitted reference to methods that economists havedeveloped to measure the net social benefits of pub-lic support for technology development. The con-sumerrproducer surplus method, pioneered by

Ž . Ž .Griliches 1958 and Mansfield et al. 1977 , usesestimates of producer and consumer surplus to mea-sure the social and private rates of return to invest-

Ž .ment in particular technologies. Link and Scott 1998have used what they term a ‘‘counterfactual’’ modelto determine the relative efficiency of public vs.private investment in technologies with public goodcharacteristics. In principle, these two approachescould be applied to the individual projects or tech-nologies that comprise the portfolio of technologiesin a publicly supported program of technology R&D,and the results aggregated to evaluate the entireprogram. The consumerrproducer surplus approachwould yield an estimate of whether, and by howmuch, the average social rate of return over allprojects or technologies in the portfolio exceeds theaverage private rate of return. The counterfactual

approach would yield an estimate of whether theaverage benefitrcost ratio summed over all projectsis greater than one and, if so, by how much.

The counterfactual approach has been applied toŽseveral technologies supported by NIST e.g., Link,

.1996 , but not, as far as we know, to overall pro-grams such as ATP and SEMATECH that supportmultiple technologies. Both the GrilichesrMansfieldand the LinkrScott approaches rest on industry-re-ported estimates of private investments in technologydevelopment. In addition, the consumerrproducersurplus approach requires knowledge of the supply–demand parameters for process technologies, andmust make fairly heroic assumptions to obtain equiv-alent estimates for new products. The counterfactualapproach requires estimates by industry of what theircosts would have been in the absence of government

Ž .support thus the label, counterfactual . Both ap-proaches provide useful insights into the issue of netreturns from public investments in support of newtechnologies, but since both rely, in part, on ex-pressed preferences rather than revealed preferencethey embody a subjective element and must be usedand interpreted with that subjectivity in mind.

5. Evaluation of diffusion and extension pro-grams 1

For several decades, industrialized nations haveinitiated a variety of programs intended to promotethe diffusion of industrial technology within theirborders. The primary purpose of most of these pro-grams has been to enhance the competitiveness oftarget firms, which in turn is expected to increase thecompetitiveness — and, hence, the economic growth— of the nation or region in which the target firmsoperate. Some of these programs single out new,state-of-the-art technologies for diffusion; others em-phasize best practice technologies and techniques;still others focus as much or more on business

1 Although purists may argue that the terms signify differencesamong program goals or practices, for our purposes, the programswhose evaluation we discuss in this section can be labelledinterchangeably as industrial modernization, industrial extension,or manufacturing extension.

Page 16: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678672

practices, training, and marketing as on technologyitself. In most instances, target firms are small- and

Ž .medium-sized manufacturing enterprises SMEs ,typically defined as firms with fewer than 500 em-ployees. Program staff seek to deliver a range ofservices whose number and mix vary widely acrossprograms in the general terrain of training, advice tofirms and promotion of linkages.

Each of the 50 states in the United States now hasat least one industrial modernizationrextension pro-gram. States have been the initiators of these types ofprograms, with significant industrial extension pro-grams in several states dating back to the 1960s. Notuntil very recently did the US federal governmentbegin to support a nationwide program of such pro-grams under the Manufacturing Extension ProgramŽ .MEP within the Commerce Department’s NIST.The MEP has an active evaluation program, andmost larger evaluative efforts to date and the attemptto create a community of evaluation researchers fo-cused on MEP-supported extension centers have beensupported or encouraged by this group. Althoughstate-supported programs have existed for decades,and the national effort is now ten years old, the stateof the art in evaluating industrial modernization pro-

Ž .grams at least in the US is just beginning to evolvefrom a fairly primitive state. In a 1996 special issueof this journal devoted to the evaluation of industrialmodernization programs, Shapira and RoessnerŽ .1996 observed that at the time there was consider-able experimentation, innovation, and mutual learn-ing, but ‘‘to date there have been comparatively fewsystematic evaluation efforts in the industrial mod-

Ž .ernization field’’ p. 182 . They cited the newness ofthe field, the small amount of resources devoted toevaluation, and lack of agreement on designs, mea-sures, and who should manage the evaluations.

Ž .Feller et al. 1996 offer a harsh assessment of thereasons for the relatively primitive state of the evalu-ation art as of 1996. They discount the usual reasonsfor such situations, namely inherent conceptual prob-lems or difficulties in obtaining data, claiming that inother program areas such as manpower training,medicine, and education similar issues have not fore-stalled the emergence of a substantial evaluationresearch literature, tradition, and practitioner com-

Ž .munity p. 313 . With few exceptions, they say,referring to the state of US evaluation, ‘‘the ability

to repeat the litany of formidable barriers has servedas a convenient rationale for not conducting evalua-tions or as an umbrella justification for limited ex-pectations or egregious shortcomings.’’ In the re-mainder of this section, we focus on these fewexceptions and on the progress that has been made inthe last 4 years.

In their 1996 review of methods used to evaluateindustrial modernization programs, Shapira et al.Ž .1996 concluded that ‘‘most of the evaluation meth-ods used by industrial modernization programs are

Žeither indirect program monitoring or external re-.views or implemented with one data point after

Ž .service completion’’ p. 202 . Cost and the lack ofdemand for more sophisticated evaluation strategieslargely account for the situation; as Shapira and

Ž .Youtie 1998 observed recently, there appears to beno direct correlation between the usefulness of anevaluation method with that method’s degree of so-phistication or use of controls. The state of theevaluation art is thus defined by a small number ofstudies that employ relatively rigorous designs: theidentification and use of comparison groups and theuse of statistical controls to achieve internal validityof results.

Identification of appropriate comparison groups 2

is a daunting task because, among other reasons,firms receiving services from industrial modernatiza-

Ž .tion programs clients are self-selected, automati-cally distinguishing them in unknown ways fromnon-client firms. At least two prominent evaluationshave used different approaches to identifying a com-parison group of non-client firms. The Michigan

Ž .Manufacturing Technology Center MMTC , one ofthe first NIST manufacturing centers, has evaluatedthe impact of 5 years of service delivery to SMEs bycomparing a set of performance measures collectedfrom client firms to performance measures on thesame variables from firms participating in the Perfor-

Ž .mance Benchmarking Service PBS , a separate ser-

2 In this article, we use the term ‘‘comparison’’ groups ratherthan ‘‘control’’ groups to distinguish the use of quasi-experimen-tal designs employing the logic of comparing firms that receivedservices from similar firms that did not receive services, from trueexperimental designs in which treatment and control groups areselected at random from a population of potential target firms.

Page 17: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678 673

vice operating under the auspices of the MMTC. 3

PBS currently has about 3000 firms in its data set.Shapira and Youtie’s evaluation of the Georgia Man-

Ž .ufacturing Extension Alliance GMEA uses a differ-ent method of identifying a comparison groupŽ .Shapira and Youtie, 1998 . The GMEA evaluationsent a ‘‘control’’ survey to all manufacturers in thestate of Georgia with more than ten employees. The1000 responses received were weighted to reflect theactual distribution of manufacturers by industry andemployment size. Then GMEA client performance,measured by value added per employee, was com-pared with the weighted sample of all Georgia manu-facturers.

Ž .Jarmin 1999 , in an interesting effort to measurethe performance of MEP client firms against non-cli-ent manufacturers, used the US Census Bureau’s

Ž .Longitudinal Research Database LRD to create anonclient comparison group. Jarmin’s evaluation useddata from eight MEP centers in two states. He linkedclient data obtained from these eight centers to theLRD by using the Standard Statistical Establishment

Ž .List SSEL , which uses the same identifiers as theLRD. Then he linked clients to the SSEL by creatingmatching variables such as firm name and zip codefrom the several fields that are common between thetwo data sets. Jarmin controlled for selection bias, aserious problem in all such comparison group analy-ses, by estimating a Heckman-style two-stage modelŽ .Jarmin, 1999, p. 103 .

The GMEA evaluation employed a very widerange of evaluation techniques to gather and analyzedata, thus exemplifying perhaps the most compre-hensive state-of-the-art in industrial modernizationevaluation. In particular, they developed:Ø a customer profile assembled by program person-

nel at the point of initial contact with a customer;Ø activity reports that track field agent activities and

customer interactions;Ø client valuation surveys administered by mail to

each customer when all major GMEA serviceshad been completed;

Ø progress tracking via benchmarking and non-customer controls via the state survey of all man-

3 This description of the MMTC evaluation is based on DziczekŽ .et al. 1998 .

ufacturers and a 1-year follow-up of GMEA cus-tomers; and

Ø a series of case studies to provide in-depth infor-mation about the effects of GMEA services onfirm operations and profitability.In addition to the obvious analyses of these data,

Shapira and Youtie combined data on the private andpublic returns to GMEA-related investments and di-vided this figure by private plus public investment toobtain a benefitrcost ratio. Private returns includedincreases in sales, savings in labor, materials, energy,or other costs, reductions in the amount of inventory,and avoidance of capital spending. Private invest-ment included estimates of the value of customerstaff time commitment, increased capital spending,and fees. Public investment included federal, stateand local program expenditures. Public returns weremeasured by federal, state, and local taxes paid bycompanies and their employees, estimated from salesincreases or job creationrretention. The analysis ac-

Žcounted for zero sum effects i.e., sales increases.may be at the expense of other firms in the state by

adjusting sales to about 30% of the reported sales inthe benefitrcost model.

The results of these evaluations raise two kinds ofissues. First, there are considerable methodologicalissues. Most are related to the complexity of theprograms themselves and to the inability of evalua-tors to identify a ‘‘true’’ control group of non-clientfirms. Second, there are political issues related todifferences between what the evaluation communityand the practitionerrpolicy community regard as themost useful and credible attributes of evaluations.With regard to the first issue, it has proven highlyproblematic to sort out the effects of widely varyinglengths and types of services delivered to clientfirms, which are accompanied by widely varyinglevels of resource commitment by the clients them-selves. Aggregating across any significant number ofclients masks enormous variations in these two keyvariables. Further complicating this aspect of evalua-tions is the difficulty that firms have estimating thedollar value of industrial modernization program ser-vices for their operations.

The clients for evaluations of industrial modern-ization programs are as widely varied as the pro-grams themselves. Federal officials, state-levelpoliticians, and practitioners all seek and value dif-

Page 18: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678674

ferent kinds of information, and have different crite-Ž .ria of ‘‘value.’’ As Shapira and Youtie 1998 point

out, US federal sponsors have little interest in mea-sures of customer satisfaction, but instead seek mea-sures of economic impacts. Information about therelative impact of different program services is ofvital interest to program managers and field staff, buthas little value for those charged with justifying theprogram’s existence. Testimonials — basically su-perficial case studies — prove useful for programjustification, but more formal cases, conducted usingrigorous social science methods, have no greaterimpact for justification purposes. They can be highlyvalued, however, by program managers and fieldstaff because they reveal the paths by which servicesare translated into changes in client operations and,in turn, into changes in client productivity.

European experiences of evaluating diffusion andextension programs are most extensive in Germany,as a consequence of that country’s technology policytraditions. However, an assessment equally as harshas those in the US has been made by Kuhlmann and

Ž .Meyer-Krahmer 1995 about the state-of-the-art:

The practice of evaluation of technology policy inGermany, when critically surveyed, does not nur-ture any naıve illusions about satisfying impact¨control. So far, evaluations have scarcely donemore than provide evidence that technology pol-icy interventions correlate with trends in techno-logical and industrial development.

They conclude that progress is dependent, amongŽother things, upon more objective data less reliant

.on perceptions of participants , integration of evalua-tion with prospective methods for technology assess-ment in order to assess longer term unintended im-pacts of public intervention in technology. In re-sponse, a network of European evaluators has beenattempting to unify evaluation with both TA andforesight approaches under the umbrella of a Euro-

Žpean Strategic Intelligence System Kuhlmann et al.,.1999 .

6. Conclusions

In Section 1, we observed that evaluation methodsand the methods used for more general technology

policy studies are related — in other words, thatbroad social and political trends influence the prac-tice of technology program evaluation. At certaintimes, evaluation studies have been at the leadingedge of technology policy studies, for example ineliciting an understanding of collaborative R&D andof networking more generally. In other respects, theyhave tended to lag behind for methodological andpolitical reasons. The demand for evidence of directeconomic effects has left many evaluation studiescoupled to a more linear view of the world than iscommon in the mainstream of technology policystudies. In general, though, evaluation may be seenas complementary to work carried out in otherbranches of the field, for example, more aggregatedstudies of national performance or more detailedwork at the level of the organization. Evaluationwork has probably had less of an impact in theliterature than it deserves, in part because much ofthe most detailed and valuable work is not easilyobtainable. There is a disturbing tendency for evalua-tion data that could form a valuable reference pointfor future studies to be lost in the grey literature,especially for those in countries other than the one inwhich the study was performed. Evaluators shouldmake greater efforts to ensure that their main find-ings are also published in more conventional media.This problem also has inhibited the effectiveness ofevaluation itself. Despite their limitations, most stud-ies would benefit from a greater use of benchmark-ing and comparison group approaches. The use ofarchival data and multivariate methods should con-tinue to be explored as one means of constructinganalytical controls.

Looking at the broader role of evaluation in pol-icy, some interesting changes are on the horizon. 4

In the US, passage of the Government PerformanceŽ .and Results Act GPRA in 1993 has galvanized the

attention of the R&D community. GPRA calls forannual ‘‘performance reviews’’ at relatively highlevels of agency activity, generally well above the

4 For a comparison of research evaluation, practices in twodifferent political settings, the US and Canada, and of the differ-ences between GPRA and Canada’s earlier, formal requirementthat program evaluation be an integral part of the budget process,

Ž .see Roessner and Melkers 1997 .

Page 19: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678 675

program level. It also contains an ‘‘escape clause’’that enables agencies supporting fundamental re-search to utilize qualitative measures of performancerather than the quantitative ones favored in the legis-lation. Program evaluation activity has increased inresponse to GPRA, under the apparent assumptionthat these evaluations will help support agency GPRArequirements. Appropriately or not, program evalua-tions and performance reviews are becoming entan-gled, and it appears to us that, as pressure to developperformance measures and justify program budgetsincreases as a result of GPRA, the lines between thetwo will become blurred. A similar tendency has alsoemerged in Europe, with monitoring and perfor-mance indicator approaches intersecting with evalua-tions, and in some cases, competing. This raisessome important issues, not the least of which echoesour earlier observation that valid program evalua-tions must, increasingly, account for the systemiccontext in which they are performed. Technologyprograms do not exist in a vacuum, either politicallyor theoretically. The short-term yet aggregate per-spective of GPRA’s performance reviews conflictswith the longer-term, program-level, yet context-de-pendent perspective of technology program evalua-tions. The fundamental requirement for the design ofa performance indicator regime is a clear understand-ing of context, goals and the relationships which linkgoals to effects. Whether this important distinctionwill be recognized and dealt with by governmentofficials and the evaluation community remains to beseen.

Limitations of the production function approachto measuring the payoffs from research are nowapparent, but promising new directions are justemerging. Return on Investment and Internal Rate ofReturn measures, despite their quantitative appeal,have limited value for program justification and evenless for R & D program management decisions.Methods are needed that capture more fully thenoneconomic benefits from research — or at leastthe benefits not easily translated into monetized form

Žby those who receive them directly e.g., private. Žfirms or who seek to develop valid metrics e.g.,

.professional evaluators . We look forward to theoutcome of efforts that focus on the characteristics ofnetworks and linkages between knowledge producersand knowledge users as possible units of analysis,

and measures of value, for the products of research.Methods that emphasize the dimensions of humancapital and institutional development offer fertileground for development.

Despite their widespread use as techniques forobtaining estimates of the benefits of research, andespecially of linkages between research institutions,surveys of the intended beneficiaries are problematic.The studies we reviewed in this article documentsome of the difficulties: within the ‘‘user’’ organiza-tion, those benefiting most from the interaction be-tween external sources of knowledge and technologymay not be the same as those who interact directlywith such sources and know the most about theinteraction; the greatest benefits are longer term andqualitative rather than short-term and quantitative,and are thus difficult to estimate in monetized form;efforts to obtain and validate such estimates conflictwith firms’ proprietary concerns and with their prior-

Žities for allocating staff time i.e., responding to.surveys ; the inherently risky and long-term nature

of the innovation process itself often precludes reli-able estimates of the payoffs from research. Wheremonetized estimates are obtained it must be recog-

Ž .nised that these embody the unknown model ofinnovation and consequent attribution of benefits thatthe respondent is applying. The tendency to broadenthe range of acknowledged stakeholders in R&Dprograms that has emerged recently in Europe pro-vides another problem for survey approaches as manyof these ‘‘social’’ groups are too distant from theresearch to provide detailed and structured responses.We are not advocating that surveys be shelved asappropriate evaluation tools, but we do urge that thesubtleties now appearing in reported research beanticipated and incorporated in future evaluation de-signs. We also encourage, as mentioned above, thecreative construction of comparison groups, as wasdone in the GAO evaluation of the ATP program,and the use of case studies as the preferred methodfor understanding what actually transpires in thecomplex process of technological innovation.

We made the point that progress in technologyprogram evaluation methods has been incrementalover the past 15 years. Evaluation, as the applied endof technology policy studies, will continue to grapplewith real-world problems such as lack of clarity inobjectives and the need to meet multiple and often

Page 20: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678676

conflicting stakeholder needs. On the positive sideevaluation is an adaptable beast and through a cre-ative combination of methods has often managed toprovide illumination and rational analysis in the mostdifficult of circumstances. It also continues to havemuch to offer to technology policy studies as awhole. There is a trade-off in which evaluators haveto work within more restrictive terms of referencethan other researchers, but in return gain access to adepth of data that would otherwise not be possible.They also provide a conduit by means of which thebroader concepts of technology policy studies arecarried into the policy arena, often at the highestlevels. Our review suggests to us that there are anumber of promising directions that evaluation meth-ods might take that could lead to significant ad-vances. We hope we are right, and that agenciessupporting such work recognize the potential and actupon it.

References

Ailes, C.P., Roessner, D., Feller, I., 1997. The Impact on Industryof Interaction with Engineering Research Centers. SRI Interna-tional, Arlington, VA, January 1997. Final Report prepared forthe National Science Foundation, Engineering Education andCenters Division.

Airaghi, A., Busch, N.E., Georghiou, L., Kuhlmann, S., Ledoux,M.J., van Raan, A.F.J., Viana Baptista, J., 1999. Options andLimits for Assessing the Socio-Economic Impact of EuropeanRTD Programmes. ETAN, Commission of the European Com-munities, January 1999.

Bach, L., Conde-Molist, N., Ledoux, M.J., Matt, M., Schaeffer,V., 1995. Evaluation of the economic effects of BRITE-EURAM programmes on the European industry. Scientomet-

Ž .rics 34 3 , 325–349.Battelle Columbus Laboratories, 1973. Interactions of Science and

Technology in the Innovative Process: Some Case Studies.Battelle Columbus Laboratories, Columbus.

Bozeman, B., Rogers, J., 1999. ‘‘Use-and-Transformation’’: AnElementary Theory of Knowledge and Innovation. School ofPublic Policy, Georgia Institute of Technology, Atlanta, GAŽ .draft paper .

Bozeman, B., Papadakis, M., Coker, K., 1995. Industry Perspec-tives on Commercial Interactions with Federal Laboratories.Georgia Institute of Technology, School of Public Policy,Atlanta, GA, January 1995.

Bozeman, B., Rogers, J., Roessner, D., Klein, H., Park, J., 1998.The R&D Value Mapping Project: Final Report. Report to theDepartment of Energy, Office of Basic Energy Sciences.Georgia Institute of Technology, Atlanta, GA.

Buisseret, T., Cameron, H., Georghiou, L., 1995. What difference

does it make? Additionality in the public support of R&D inlarge firms. International Journal of Technology Management

Ž .10 4–6 , 587–600.Callon, M., Laredo, P., Mustar, P., 1997. Technico-economic´

networks and the analysis of structural effects. In: Callon, M.,Ž .Laredo, P., Mustar, P. Eds. , The Strategic Management of´

Research and Technology — Evaluation of Programmes. Eco-nomica International, Paris.

Cameron, H., Georghiou, L., 1995. Managerial performance —the process evaluation. In: Callon, M., Laredo, P., Mustar, P.´Ž .Eds. , The Strategic Management of Research and Technol-ogy — Evaluation of Programmes. Economica International,Paris.

CONSAD Research, 1998. Estimating economic impacts of newdimensional control technology applied to automobile body

Ž .manufacturing. Journal of Technology Transfer 23 2 , 53–60.Cozzens, S., Popper, S., Bonomo, J., Koizumi, K., Flanagan, A.,

1994. Methods for Evaluating Fundamental Science.RANDrCTI DRU-875r2-CTI, Washington, DC.

Dale, A., Barker, K., 1994. The evaluation of EUREKA: apan-European collaborative evaluation of a pan-European col-

Ž .laborative technology programme. Research Evaluation 4 2 ,66–74.

Dasgupta, P., David, P.A., 1994. Toward a new economics ofscience. Research Policy 23, 487–521.

Denison, E.F., 1979. Accounting for Slower Economic Growth:The United States in the 1970s. Brookings, Washington, DC.

Dziczek, K., Luria, D., Wiarda, E., 1998. Assessing the impact ofa manufacturing extension center. Journal of Technology

Ž .Transfer 23 1 , 29–35.Fayl, G., Dumont, Y., Durieux, L., Karatzas, I., O’Sullivan, L.,

1998. Evaluation of research and technological developmentprogrammes: a tool for policy design. Research Evaluation 7Ž .2 .

Feller, I., Roessner, D., 1995. What does industry expect fromuniversity partnerships? Issues in Science and Technology,Fall 1995, 80–84.

Feller, I., Glasmeier, A., Mark, M., 1996. Issues and perspectiveson evaluating manufacturing modernization programs. Re-

Ž .search Policy 25 2 , 309–319.Georghiou, L., 1995. Assessing the framework programmes — a

Ž .meta-evaluation. Evaluation 1 2 , 171–188.Georghiou, L., 1998. Issues in the evaluation of innovation and

Ž .technology policy. Evaluation 4 1 , 37–51.Georghiou, L., 1999. Socio-economic effects of collaborative

R&D — European experiences. Journal of Technology Trans-fer 24, 69–79.

Gibbons, M., Georghiou, L., 1987. Evaluation of Research — ASelection of Current Practices. OECD, Paris.

Griliches, Z., 1958. Research costs and social returns: hybrid cornand related innovations. Journal of Political Economy.

Grindley, P., Mowery, D.C., Silverman, B., 1994. SEMATECHand collaborative research: lessons in the design of high-tech-nology consortia. Journal of Policy Analysis and Management

Ž .13 4 , 723–758.Guy, K., Arnold, E., 1986. Parallel Convergence. Frances Pinter,

London, pp. 32–67.

Page 21: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678 677

Guzzetti, L., 1995. A brief history of European Union researchpolicy, Commission of the European Communities, October1995, pp. 76–83.

Hall, B.H., 1995. The private and social returns to research anddevelopment: what have we learned? Paper presented at AEI-Brookings Conference on The Contributions of Research tothe Economy and Society, Washington, DC, October 3 1994,Revised 1995.

Hertzfeld, H., 1992. Measuring the returns to space research andŽ .development. In: Hertzfeld, H., Greenberg, J. Eds. , Space

Economics. American Institute of Astronautics, Washington.Ž .Hills, P.V., Dale, A., 1995. Research Evaluation 5 1 , 35–44.

Illinois Institute of Technology Research Institute, 1968. Technol-ogy in Retrospect and Critical Events in Science. NationalScience Foundation, Washington.

Irwin, D., Klenow, P., High-tech R&D subsidies — estimatingthe effects of SEMATECH. Journal of International Eco-nomics, 40, 323–44.

Jaffe, A., 1989. Real effects of academic research. AmericanŽ .Economic Review 79 5 , 957–970.

Jaffe, A.B., 1998. The importance of ‘spillovers’ in the policymission of the advanced technology program. Journal of Tech-

Ž .nology Transfer 23 2 , 11–19.Jarmin, R.S., 1999. Evaluating the impact of manufacturing exten-

sion on productivity growth. Journal of Policy Analysis andŽ .Management 18 1 , 99–119.

Kuhlmann, S., 1995. German government departments’ experi-ence of RT and D programme evaluation and methodology.

Ž .Scientometrics 34 3 , 461–471.Kuhlmann, S., Meyer-Krahmer, F., 1995. Practice of technology

policy evaluation in Germany: introduction and overview. In:Ž .Becher, G., Kuhlmann, S. Eds. , Evaluation of Technology

Policy Programmes in Germany. Kluwer, Dordrecht.Kuhlmann et al., 1999. Final Report of the Advanced Science and

Ž .Technology Policy Planning Network ASTPP A ThematicNetwork of the European Targeted Socio-Economic Research

Ž .Programme TSER : Report to Commission of the EuropeanCommunities Contract No. SOE1-CT96-1013.

Laidlaw, F.J., 1998. ATP’s impact on accelerating developmentand commercialization of advanced technology. Journal of

Ž .Technology Transfer 23 2 , 33–41.Laredo, P., Mustar, P., 1995. France, the guarantor model and the´

Ž .institutionalisation of evaluation. Research Evaluation 5 1 ,1995.

Laredo, P., Callon, M., et al., 1990. L’impact de programmes´communitaires sur le tissu scientifique et technique francais.La Documentation Francaise, Paris.

Levy, D.M., Terleckyj, N., 1984. Effects of government R&D onprivate R&D investment and productivity. Bell Journal ofEconomics 14, 551–561.

Link, A., 1981. Basic research and productivity increase in manu-facturing: additional evidence. American Economic Review 71Ž .5 , 1111–1112.

Link, A., 1982. The impact of federal research and developmentspending on productivity. IEEE Transactions on Engineering

Ž .Management EMr29 4 , 166–169.

Link, A.N., 1996. Evaluating Public Sector Research and Devel-opment. Praeger, New York.

Link, A.N., 1998. Case study of R&D efficiency in an ATP jointŽ .venture. Journal of Technology Transfer 23 2 , 43–51.

Link, A.N., Scott, J.T., 1998. Public Accountability: EvaluatingTechnology-Based Institutions. Kluwer, Boston.

Link, A.N., Teece, D.J., Finan, W.F., 1996. Estimating the bene-fits from collaboration: the case of SEMATECH. Review of

Ž .Industrial Organization 11 5 , 737–751.Mansfield, E., 1980. Basic research and productivity increase in

manufacturing. American Economic Review 70.Mansfield, E., 1991. Academic research and industrial innovation.

Research Policy 20, 1–12.Mansfield, E., Rapoport, J., Romeo, A., Wagner, S., Beardsley,

G., 1977. Social and private rates of return from industrialŽ .innovations. Quarterly Journal of Economics 91 2 , 221–240.

Martin, B., Irvine, J., 1983. Assessing basic research: some partialindicators of scientific progress in radio astronomy. ResearchPolicy, North-Holland.

Meyer-Krahmer, F., Montigny, P., 1990. Evaluations of innova-tion programs in selected European countries. Research Policy

Ž .18 6 , 313–332.Narin, F., Hamilton, K.S., Olivastro, D., 1997. The increasing

linkage between US technology and public science. ResearchŽ .Policy 26 3 , 317–330.

National Audit Office, 1995. The Department of Trade and Indus-try’s Support for Innovation, HMSO HC715, London.

Olds, B.J., 1992. Technological Eur-phoria? An Analysis of Euro-pean Community Science and Technology Programme Evalua-tion Reports. Beliedsstudies Technologie Economie, Rotter-dam.

Ormala, E., 1989. Nordic experiences of the evaluation of techno-Ž .logical research and development. Research Policy 18 6 ,

343–359.Ormala, E., et al., 1993. Evaluation of EUREKA Industrial and

Economic Effects. EUREKA Secretariat, Brussels.Papaconstantinou, G., Polt, W., 1997. Policy evaluation in innova-

tion and technology: an overview. In: OECD Proceedings,Policy Evaluation in Innovation and Technology — Towardsbest practices, OECD, Paris.

Pavitt, K., 1998. Do patents reflect the useful research output ofŽ .universities? Research Evaluation 7 2 , 105–111.

Peterson, J., Sharp, M., 1998. Technology Policy in the EuropeanUnion. Macmillan, Basingstoke, p. 210.

Roessner, D., 1989. Evaluating government innovation pro-grammes: lessons from the US experience. Research Policy 18Ž .6 .

Roessner, D., Bean, A.S., 1994. Payoffs from industry interactionwith federal laboratories. Journal of Technology Transfer 20Ž .4 .

Roessner, D., Coward, H.R., 1999. An Analytical Synthesis ofEmpirical Research on U.S. University–Industry and FederalLaboratory–Industry Research Cooperation. SRI International,Washington, DC. Final Report to the National Science Foun-

Ž .dation, forthcoming .Roessner, D., Melkers, J., 1997. Evaluation of national research

Page 22: Evaluating technology programs: tools and methodsdimetic.dime-eu.org/dimetic_files/ResPolGeorghiouRoesser.pdfInstitute of Technology, Atlanta, GA 30332-0345, USA the mid-1980s, the

( )L. Georghiou, D. RoessnerrResearch Policy 29 2000 657–678678

and technology policy programs in the United States andŽ .Canada. Journal of Evaluation and Program Planning 20 1 .

Roessner, D., Ailes, C., Feller, I., Parker, L., 1998. Impact onindustry of participation in NSF’s engineering research cen-

Ž .ters. Research-Technology Management 41 5 , 40–44.Rogers, J., Bozeman, B., 1998. Creation of Knowledge Value: A

Taxonomy of Knowledge Value Alliances. Paper prepared forpresentation at Workshop on Research Evaluation, Ecoles desMines, Paris, France, June 30–July 2.

Ruegg, R.T., 1998a. Symposium overview. Journal of TechnologyŽ .Transfer 23 2 , 3–4.

Ruegg, R.T., 1998b. The Advanced Technology Program, itsevaluation plan and progress in implementation. Journal of

Ž .Technology Transfer 23 2 , 5–9.Sand, F., Nedeva, M., 1998. The EUREKA Continuous and

Systematic Evaluation procedure: an assessment of the socio-economic impact of the international support given by theEUREKA Initiative to industrial R&D co-operation. Proceed-ings of the APEC Symposium on the Evaluation of S&TProgrammes among APEC Member Economies, 2–4 Decem-ber, Wellington, New Zealand, National Center for Scienceand Technology Evaluation, Ministry of Science and Technol-ogy of China.

Senker, J., Senker, P., 1997. Implications of industrial relation-

ships for universities: a case study of the UK teaching com-Ž .pany scheme. Science and Public Policy 24 3 .

Shapira, P., Roessner, J.D., 1996. Evaluating industrial modern-ization: introduction to the theme issue. Research Policy 25Ž .2 , 181–183.

Shapira, P., Youtie, J., 1998. Evaluating industrial modernizatin:methods, results and insights from the georgia manufacturing

Ž .extension alliance. Journal of Technology Transfer 23 1 ,17–27.

Shapira, P., Youtie, J., Roessner, J.D., 1996. Current practices inthe evaluation of US industrial modernization programs. Re-

Ž .search Policy 25 2 , 185–214.Sherwin, C.W., Isenson, R.S., 1967. Project hindsight: defense

department study of the utility of research. Science 156,1571–1577.

Tewksbury, J.G., Crandall, M.S., Crane, W.E., 1980. Measuringthe societal benefits of innovation. Science 209, 658–662.

US Congress, General Accounting Office. Measuring Perfor-mance: The Advanced Technology Program and Private-Sec-tor Funding. GAOrRCED-96-47, January 1996.

U.S. Congress, Office of Technology Assessment, 1986. ResearchFunding as an Investment: Can We Measure the Returns?Office of Technology Assessment, Washington, DC.