Bibliometric evidence for empirical trade-offs
in national funding strategies
Duane Shelton and Loet Leydesdorff
ISSI 2011 Durban
Outline
Modeling of Input-Output Relations Best Models from Correlations and Regression Trade-offs in Allocation of R&D investments Validation by Forecasts from Extrapolations,
Regressions, and Individual Country Models Conclusions
Some prior work
Leydesdorff. A series starting in 1990 with regression of papers with GERD. Most recently a 2009 publication with Wagner on which GERD components are best in encouraging papers .
Shelton. Started in 2006 modeling paper share as a function of GERD share to account for US decline. Recently a 2010 presentation with Foland using GERD components to account for the European Paradox.
Output dependent variables (DVs)
Papers and Paper Share Science Citation Index Scopus
Patents and Patent Share Triadic USPTO PCT
The full paper covers all; here we will focus on The full paper covers all; here we will focus on those in those in redred..
Input variables (IVs) from OECD
Overall GERD (Gross Expenditures on R&D) GERD source components:
Government Industry Abroad (funding from abroad) Other
GERD spending components: HERD (higher education sector) BERD (business sector) Non-Profit (other than universities) GOVERD (government labs)
Number of researchers
Shares provide the best national comparisons Some indicators are nearly zero-sum: countries
compete for a nearly fixed number of slots for paper publications and patent grants. (Paper submissions and patent applications are unbounded.)
The slots do rise slowly with time, and this complicates national comparisons.
Thus, in analyzing relative positions of nations, their share of most outputs is a more relevant indicator.
Modeling of the inputs that cause these output shares is also best done in shares.
Of course, once a model is built for shares, it can easily be used to calculate absolutes.
All inputs and outputs depend on the size of the country, making all country-wise correlations high, and obscuring identification of which variables are most important
One might divide all variables by some measure of size, but stepwise multiple linear regression can also tease out which input IVs are best for predicting output DV.
IVs are added one-by-one in order of which makes the best model for the DV.
The size of nations is a confounding factorThe size of nations is a confounding factor
Step-wise regression of 2007 SCI paper share (ps07) vs. three IVs
Government GERD share
HERD share
Overall GERD share
Fit of regression line
SCI
Scopus
1999 2007 1999 2007
Capital vs. Labor
GERD 0.982 0.977 0.977 0.938
Researchers 0.894 0.838 0.842 0.920
Funding Components
Industry 0.973 0.959 0.968 0.920
Government 0.989 0.989 0.986 0.944
Spending Components
HERD 0.976 0.983 0.977 0.928
BERD 0.980 0.968 0.975 0.927
Correlations: Papers vs. InputsRed indicates strongest correlation of pair; it will dominate a 2 IV model
IV1 IV2 Coeff1 Coeff2 Const p1
p2 R2
GERD Researchers 0.819 -0.027 0.536 0.000 0.697 95.5%
GERD 0.800 0.492 0.000 95.5%
Government Industry 0.774 0.067 0.330 0.000 0.351 97.9%
Government 0.846 0.316 0.000 97.9%
HERD 0.979 -0.048 0.000 96.6%
Government HERD 0.527 0.383 0.127 0.000 0.000 98.8%
Regressions of SCI paper share in 2007
For example the best single IV model is:
Papers07 = 0.846 Governments07 + 0.316
Step-wise regression of 2007 triadic patent share (Patents07) vs. three IVs
Industry GERD share
BERD share
Government GERD share
Fit of regression line
Fit is OK, but not as good as paper models
Coefficientsa
Model
Unstandardized Coefficients
Standardized Coefficients
t Sig. B Std. Error Beta
1 (Constant) .438 .450 .974 .337
Gov -.973 .251 -.843 -3.878 .000
Ind 1.778 .224 1.725 7.934 .000
a. Dependent Variable: Patents07
Shelton, R. D. & Leydesdorff, L. (in preparation). Publish or Patent: Bibliometric evidence for empirical trade-offs in national funding strategies
Triadic
USPTO
1999 2007 1999 2007
Capital vs. Labor
GERD 0.924 0.895 0.947 0.830
Researchers 0.847 0.680 0.664 0.428
Funding Components
Industry 0.934 0.913 0.970 0.861
Government 0.881 0.818 0.834 0.628
Spending Components
HERD 0.949 0.890 0.910 0.791
BERD 0.921 0.905 0.966 0.852
Correlations: Patents vs. InputsRed indicates strongest correlation of pair; it will dominate 2 IV model
IV1 IV2 Coeff1 Coeff2 Const p1
p2 R2
GERD Researchers 1.34 -0.46 0.327 0.000 0.014 83.3%
Industry Government 1.78 -0.973 0.438 0.000 0.000 88.6%
Industry BERD 4.32 3.46 0.201 0.004 0.021 85.9%
Industry NonProfit 2.04 -0.653 -0.584 0.000 0.000 98.3%
Industry 0.941 0.058 0.000 83.4%
BERD 0.953 0.078 0.000 81.8%
Regressions for 2007 triadic patent share
For example the best single IV model is:
Patents07 = 0.941 Industrys07 + 0.058
Regressions show a trade-off in allocations
To maximize papers, a country should maximize its government funding of R&D, instead of industry funding
To maximize patents, a country should do the opposite: maximize its industrial funding of R&D, which can be encouraged by government
Similarly spending in the higher education sector seems to encourage papers, while spending in the business sector more encourages patents
Thus these allocations are simply a choice between longer and shorter term benefits of R&D
Not surprising, but regressions provide some quantitative confirmation of this logic
Summary of models for paper share Simple extrapolations of trends in output paper share mi provide
a reality check for models based on input resource drivers
The Shelton Model based on GERD share works well for big countries. It accounts for the decline in US and EU due to the rise of China's share of GERD wi .mi = ki wi
The Shelton-Leydesdorff Model based on government share accounts for the EU increase in efficiency in the 1990s, and the long-term US decline. mi = ki’ wi’ + c’
Adding a second IV, HERD spending share wi’’ works even better. This accounts for the EU passing the US in 1995.mi = ki’wi’ + ki’’wi’’ + c’’
Validation of paper share models
Like any theory, models need to be tested to see how well they account for new phenomena.
Scattergrams can show how well regression models fit a year’s data, or perhaps a new data point. They don’t forecast the future so well.
Once key IVs are identified by statistics, individual country models can be built and tested by “forecasting the past.”
Simple extrapolation of output DVs serves as a reality check
Extrapolation of SCI paper shares
This model forecasts that the PRC will not pass the US until about 2020, and the EU27 until after 2025
Extrapolation of paper share in the Scopus database
This can be compared to a recent similar forecast by the UK Royal Society.
EU27
EU15
Russian Federation
China
USA
UK
Spain
JapanGermany
FranceCanada
0
5
10
15
20
25
30
35
40
0 5 10 15 20 25 30 35 40
% Share of Government Funding 2007
% S
har
e o
f P
ub
lica
tio
ns
2007
(O
EC
D+
co
un
trie
s)Scattergram of paper share vs. government funding share
Chinese TaipeiRussian Federation
China
United Kingdom
TurkeySweden
Spain
Korea
Japan
Italy
Germany
France
Canada
Australiay = 0.9299x + 0.1737
R2 = 0.8946
0
1
2
3
4
5
6
7
8
9
0 2 4 6 8 10
% Share of Government Funding
% S
har
e o
f P
ub
lica
tio
ns
(OE
CD
+ c
ou
ntr
ies;
wit
ho
ut
US
)
Same scattergram focused on smaller countries
EU27EU15
China
United States
United KingdomKorea
Japan
Germany
France
Patents07 = 1.0811 Industry07 + 0.0104
R2 = 0.8696
0
5
10
15
20
25
30
35
40
45
0 5 10 15 20 25 30 35 40
% Industrial Funding of R&D
% T
riad
ic P
aten
ts (
OE
CD
+)
Scattergram of patent share vs. industrial funding share
Chinese TaipeiIsrael China
United Kingdom
Sweden
Korea
Germany
France
Canada
Australia
Patents07 = 0.6353 Industry07 + 0.2316
R2 = 0.394
0
2
4
6
8
10
12
14
0 2 4 6 8 10 12
% Share of Industrial Funding of R&D
% S
har
e o
f T
riad
ic P
aten
tsSame scattergram focused on smaller countries
0
10
20
30
40
50
2005
2006
2007
2008
2009
2010
2011
2012
2013
2014
2015
2016
2017
Per
cen
tag
e o
f W
orl
d S
har
e
US Forecast
EU15 Forecast
PRC Forecast
US Actual
PRC Actual
EU15 Actual
Performance of Shelton Model in forecasting from 2005 to 2010
Based on forecasts of GERD and its share from 2005 data. Accuracy of US and EU is not bad. PRC is growing slower than forecast.
Paper Forecasts from Shelton-Leydesdorff Model
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
2005 2006 2007 2008 2009 2010
Per
cen
t o
f W
oS
US Actual
EU15 Actual
PRC Actual
US Geo Forecast
EU15 Geo Forecast
PRC Geo Forecast
Uses 5-year average of rates of Gov increase. EU and PRC fit well, but US is worse than forecast, because its rate of Gov increase has plummeted to near zero. (Individual models used.)
Performance of Shelton-Leydesdorff model: forecasting from 2005 to 2010
Conclusions Regressions show that investment choices are
complementary: some are best for papers and some for patents
Models based on these resource inputs have some success in forecasting
But a take-away for the professors in the audience: just using HERD share to predict paper share is surprisingly accurate
Thus if nations want to excel in papers, they should just give money to professors!
HERD
Papers
07
35302520151050
35
30
25
20
15
10
5
0
Scatterplot of Papers07 vs HERD
ps07 = 0.027 + 0.930HERD p=0.000
R2 = 98.6%
Paper share ≈ HERD share!
Paper Share Compared to HERD Share
0.0
5.0
10.0
15.0
20.0
25.0
30.0
35.0
40.0
1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
Per
cen
t
US
EU
PRC
US hs
EU hs
PRC hs
Forget statistics: Simply predicting paper share with HERD share works well for the US and EU. It also predicts that the EU should lead the US.
Performance of HERD as predictor of paper share