Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Supplementary material: An advanced data-driven
hybrid model of SARIMA-NNNAR for tuberculosis
incidence time series forecasting in Qinghai Province,
China
Yongbin Wang1,*, Chunjie Xu2,*, Yuchun Li1, Weidong Wu1, Lihui Gui1, Jingchao Ren1, Sanqiao
Yao1
1 Department of Epidemiology and Health Statistics, School of Public Health, Xinxiang Medical
University, Xinxiang, Henan Province, P.R. China; 2 Department of Occupational and
Environmental Health, School of Public Health, Capital Medical University, Beijing, P.R. China
Correspondence: Sanqiao Yao and Yongbin Wang
Department of Epidemiology and Health Statistics, School of Public Health, Xinxiang Medical
University, Xinxiang 453000, Henan Province, P.R. China
Tel +86 037 383 1646
Email [email protected](Sanqiao Yao); [email protected](Yongbin Wang)
*These authors contributed equally to this paper
Table S1 Resulting parameter estimates and their statistical tests of the best-fitting
SARIMA(1,0,1)(0,1,2)12 model based on the TB incidence data between January 2004 and July
2016
Variables Estimates Standard error t P
AR1 0.879 0.068 12.926 <0.001
MA1 -0.561 0.011 -51.000 <0.001
SMA1 -0.806 0.091 -8.857 <0.001
SMA2 0.227 0.097 2.340 0.020
Abbreviations: SARIMA, seasonal autoregressive integrated moving average; AR1,
autoregressive, lag1; MA1, moving average, lag1; SMA1, seasonal moving average, lag1; SMA2,
seasonal moving average, lag2.
2
Table S2 Ljung-Box Q statistics for the residual series yielded by the best-performing three
techniques at various lags based on the TB incidence data between January 2004 and July 2016
LagsSARIMA model NNNAR model SARIMA-NNNAR model
Box-Ljung Q P Box-Ljung Q P Box-Ljung Q P
1 0.031 0.861 0.698 0.404 0.015 0.901
3 1.036 0.793 2.263 0.520 0.172 0.982
6 12.255 0.057 10.415 0.108 2.206 0.900
9 17.203 0.046 16.629 0.055 7.014 0.636
12 17.770 0.123 18.245 0.108 8.844 0.716
15 18.802 0.223 19.844 0.178 9.971 0.822
18 25.893 0.102 28.464 0.055 14.690 0.683
21 27.706 0.149 29.045 0.064 17.466 0.683
24 28.365 0.245 32.702 0.111 18.968 0.754
27 29.440 0.340 34.605 0.182 19.343 0.857
30 30.528 0.439 35.810 0.214 25.268 0.712
33 30.951 0.570 36.462 0.311 27.091 0.756
36 32.875 0.618 37.547 0.398 36.379 0.451
Abbreviations: SARIMA, seasonal autoregressive integrated moving average; NNNAR, neural
nonlinear autoregression.
3
Table S3 ARCH effects for the actual TB incidence rate and residual series yielded by the best-
performing three techniques at various lags based on the TB incidence data between January
2004 and July 2016
LagsActual values SARIMA model NNNAR model SARIMA-NNNAR model
LM-test P LM-test P LM-test P LM-test P
1 54.735 <0.001 2.869 0.090 6.605 0.010 1.829 0.176
3 62.912 <0.001 3.018 0.389 6.685 0.083 2.379 0.498
6 67.562 <0.001 6.870 0.333 6.477 0.372 4.642 0.591
9 73.503 <0.001 10.298 0.327 8.934 0.443 6.152 0.725
12 85.778 <0.001 13.166 0.357 9.540 0.656 6.694 0.877
15 92.005 <0.001 13.507 0.563 10.216 0.806 8.586 0.898
18 95.172 <0.001 21.299 0.265 13.327 0.772 9.690 0.942
21 94.102 <0.001 21.460 0.431 18.240 0.634 15.149 0.815
24 98.897 <0.001 20.867 0.647 16.656 0.863 21.403 0.615
27 101.740 <0.001 19.814 0.839 22.095 0.733 25.333 0.556
30 100.750 <0.001 19.366 0.932 26.198 0.665 26.051 0.673
33 97.734 <0.001 21.655 0.935 26.283 0.790 31.181 0.558
36 95.380 <0.001 27.301 0.851 27.673 0.839 38.237 0.368
Abbreviations: ARCH, autoregressive conditional heteroscedastic; SARIMA, seasonal
autoregressive integrated moving average; NNNAR, neural nonlinear autoregression; LM,
Lagrangian multiplier.
4
Table S4 MSE and R2 values in the training and validation sets corresponding to the different
hidden layer units in the 5-data ahead forecasting
SizeTraining set Validation set
MSE R2 MSE R2
1 0.630 0.505 0.731 0.424
2 0.218 0.763 0.668 0.501
3 0.210 0.778 0.353 0.725
4 0.311 0.669 0.915 0.574
5 0.071 0.923 0.124 0.929
6 0.332 0.723 1.608 0.320
7 0.054 0.942 1.065 0.796
8 0.014 0.985 0.799 0.564
9 0.023 0.976 1.322 0.160
10 0.013 0.986 1.876 0.543
11 0.030 0.969 1.611 0.478
12 0.002 0.998 0.263 0.795
13 0.006 0.993 0.537 0.660
14 0.001 0.998 2.683 0.336
15 0.002 0.997 0.710 0.573
16 0.001 0.999 0.658 0.628
17 0.006 0.993 0.810 0.397
18 0.064 0.936 2.326 0.474
19 0.002 0.998 0.540 0.603
20 0.012 0.987 0.544 0.745
Abbreviations: MSE, mean squared error.
Note: The detailed descriptions regarding the Jordan neural network can be found in the
references. 1, 2 In our work, the validation set including 12 data was randomly selected in the data
except for the testing samples.
5
Table S5 MSE and R2 values in the training and validation sets corresponding to the different
hidden layer units in the 12-data ahead forecasting
SizeTraining set Validation set
MSE R2 MSE R2
1 0.67 0.498 0.834 0.229
2 0.395 0.609 1.361 0.217
3 0.17 0.831 1.100 0.065
4 0.125 0.874 1.083 0.219
5 0.119 0.871 0.341 0.810
6 0.045 0.955 1.870 0.042
7 0.015 0.984 2.161 0.128
8 0.020 0.980 1.553 0.155
9 0.004 0.995 0.835 0.250
10 0.003 0.997 0.957 0.117
11 0.010 0.990 0.952 0.309
12 0.002 0.998 1.402 0.016
13 0.001 0.999 1.684 0.002
14 0.005 0.995 2.756 0.183
15 0.002 0.998 1.300 0.091
16 0.002 0.998 1.790 0.112
17 0.000 1.000 0.770 0.223
18 0.000 1.000 1.394 0.042
19 0.000 1.000 0.813 0.306
20 0.000 1.000 1.159 0.082
Abbreviations: MSE, mean squared error.
6
Table S6 Comparisons of the mimic and predictive performance measures among the best-
performing four models
Models Fitting power Projected power
MAE MAPE RMSE MER MAE MAPE RMSE MER
In-sample dataset during January 2004 to December
201512 step-ahead projections
SARIMA 0.746 9.525 1.008 0.095 0.972 8.685 1.153 0.091
NNNAR 0.463 6.767 0.625 0.058 1.23811.17
61.558 0.116
SARIMA-NNNAR 0.424 5.610 0.564 0.053 0.803 7.649 0.979 0.075
Jordan 0.604 8.553 0.778 0.076 1.49113.15
21.834 0.140
Reduced percentages (%)
C versus A43.164 41.053 44.048 44.211 17.37
2
11.96
8
15.12
0
17.41
5
C versus B5.228 12.632 6.052 5.263 44.73
6
40.62
1
50.30
4
44.79
7
C versus D29.801 34.409 27.506 30.263 46.14
4
41.84
2
46.61
9
46.42
9
In-sample dataset during January 2004 to July 2016 5 step-ahead projections
SARIMA 0.724 9.137 1.014 0.090 0.795 9.450 0.920 0.091
NNNAR 0.606 8.477 0.803 0.074 0.735 8.860 0.914 0.084
SARIMA-NNNAR 0.508 6.596 0.722 0.063 0.656 7.879 0.803 0.075
Jordan 0.581 7.946 0.765 0.071 0.794 8.940 0.806 0.088
Reduced percentages (%)
C versus A 29.881 27.790 28.839 29.97817.52
6
16.61
4
12.68
5
17.45
3
C versus B 16.218 22.170 10.138 14.45910.72
4
11.06
1
12.10
2
10.68
9
C versus D 12.565 16.990 5.621 11.268 17.38 11.86 0.372 14.77
7
0 8 3
Abbreviations: SARIMA, seasonal autoregressive integrated moving average; NNNAR, neural
nonlinear autoregression; MAE, mean absolute error, MAPE, mean absolute percentage error;
RMSE, root mean squared error; MER, mean error rate. A is the SARIMA approach; B is the
NNNAR approach; C is the SARIMA-NNNAR hybrid approach; D stands for the Jordan neural
network.
Figure S1. Architectural layout of a simple neural network nonlinear autoregressive (NNNAR(p,
k)) method. This simple architectural layout composes of a hidden layer with k (k=7 in this
architectural layout) neurons and p (p=3 in this architectural layout) delays and an output layer
with 1 neuron. It is a two-layer feed forward network, with a sigmoid transfer function in the
hidden layer and a linear transfer function in the output layer. This simple architectural layout
uses the earlier inputs at lags p as a single input to train network and forecast. An extension of
the basic NNNAR network (NNNAR(p, P, k)m) further applies the last P sample points from the
same m season besides the earlier inputs at lags p to train network and forecast.
8
Figure S2 Flow chart of the SARIMA-NNNAR hybrid technique.
9
Figure S3 Sample ACF and PACF plots for the original TB incidence rate time-series during
January 2004 through December 2015 in Qinghai province. The original series showed marked
seasonal pattern due to the spikes at lags 1, 12, and 24 in the ACF and PACF plots. Thus a
seasonal difference was taken to stabilize the varied variance and mean over time.
10
Figure S4 Sample ACF and PACF plots for the seasonally differenced TB incidence rate time-
series during January 2004 through December 2015 in Qinghai province. This graph implied that
after the seasonal difference, the serial looks much more stationary compared to the actual. And
the existence of the local maximum points at lags 1 and 12 in the ACF and PACF plots is a hint of
the possible values of p, P, q, and Q being 1 and 2.
11
Figure S5 Resulting diagnostic results for the residual series of TB morbidity rate between
January 2004 and July 2016 from the SARIMA(1,0,1)(0,1,2)12 model. (A) Standardized error
series; (B) Autocorrelation function(ACF) plot for the error series; (C) Partial autocorrelation
function (PACF) plot for the error series; (D) Q-statistic P-values. Based on these graphs, the
derived SARIMA method appears to be appropriate for simulating the data.
12
Figure S6 Resulting diagnostic results for the residual series of TB morbidity rate between
January 2004 and July 2016 from the NNNAR(5,1,4)12 model. (A) Standardized error series; (B)
Autocorrelation function(ACF) plot for the errors; (C) Partial autocorrelation function (PACF) plot
for the errors; (D) Q-statistic P-values. As shown above, it seems the obtained NNNAR model
can be applied to fit the data.
13
Figure S7 Resulting diagnostic tests for the residual series of TB morbidity rate between January
2004 and July 2016 from the SARIMA-NNNAR(2,18) hybrid model. (A) Standardized error series;
(B) Autocorrelation function(ACF) plot for the errors; (C) Partial autocorrelation function (PACF)
plot for the errors; (D) Q-statistic P-values. As seen above, the derived SARIMA-NNNAR
combined technique remains quite adequate in mimicking the dynamic structure of the data.
14
Figure S8 Training errors by different iterations for Jordan neural network. (A) Training error in
the 5-data ahead forecasting; (B) Training error in the 12-data ahead forecasting.
15
1 Wu W, An SY, Guan P, Huang DS, Zhou BS. Time series analysis of human brucellosis in
mainland China by using Elman and Jordan recurrent neural networks. BMC Infect Dis.
2019;19(1):414. doi:10.1186/s12879-019-4028-x
2 Bilski J, Smoląg J. Parallel Approach to Learning of the Recurrent Jordan Neural Network.
International Conference on Artificial Intelligence and Soft Computing2013.
16