23
MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH N. BARBU 1,2 , V. CUCULEANU 1,3 , S. STEFAN 1 1 University of Bucharest, Faculty of Physics, RO-077125, P.O.BOX MG-11, Magurele, Bucharest, E-mail: [email protected] 2 National Meteorological Administration, Bucharest, Romania 3 Academy of Romanian Scientists, Splaiul Independentei 54, RO-050094, Bucharest, Romania Received September 29, 2014 Water resources are very important for ecosystems and water deficit may cause serious social and economical issues. The aim of this study is to analyze the performances of prediction procedure based on Multiple Linear Regression Model (MLRM), for the precipitation amounts for yearly and seasonal time scales, in Romania. For this purpose we have used as predictand annual and seasonal amount of precipitation and as predictors Mean Sea Level Pressure (MSLP), geopotential height at 300 hPa (HGT300), wind speed at 700 hPa (WS700), temperature at 850 hPa (T850) and Total Column Water (TCW). The selection of predictors is based on the collinearity and multicollinearity analysis. Multicollinearity problems occur only during winter. All data sets used in this study are reanalysis gridded data with a spatial resolution of 2º x 2º lat/lon and are obtained from 20th Century Reanalysis Version 2 (20CR V2) Project. The analysis is made for a period of 140 years between 1871 and 2010, the period 1871 – 2000 being used to build the MLRM and the period 2000 – 2010 for testing the prediction performances of the MLRM. Using MLRM we have obtained some good correlation between predicted and measured precipitation amount. The correlation coefficient varies between 0.34 and 0.98, with the smallest values in winter and the greatest values in spring. The Spearman correlation was also used to validate the MLRM, and correlation coefficients are between 0.25 for winter and 0.98 for spring. Key words: precipitation amount prediction, multiple linear regression, Sperman correlation. 1. INTRODUCTION Precipitation is a principal element of the hydrological cycle, and changes in precipitation pattern is important because may lead to floods or droughts events that may cause economical loss and casualties. Frequency of precipitation plays an important role in the management of agriculture, water resources and ecosystems, and the time scales of the precipitation variability varies from months to years or decades. Identifying and understanding the influence of the large-scale air circulation patterns which produce temporal and spatial variations of precipitation in Romania is therefore of a great importance. Rom. Journ. Phys., Vol. 59, Nos. 9–10, P. 1127–1149, Bucharest, 2014

MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE

REGRESSION APPROACH

N. BARBU1,2, V. CUCULEANU1,3, S. STEFAN1 1 University of Bucharest, Faculty of Physics, RO-077125, P.O.BOX MG-11, Magurele, Bucharest,

E-mail: [email protected] 2 National Meteorological Administration, Bucharest, Romania

3 Academy of Romanian Scientists, Splaiul Independentei 54, RO-050094, Bucharest, Romania

Received September 29, 2014

Water resources are very important for ecosystems and water deficit may cause serious social and economical issues. The aim of this study is to analyze the performances of prediction procedure based on Multiple Linear Regression Model (MLRM), for the precipitation amounts for yearly and seasonal time scales, in Romania. For this purpose we have used as predictand annual and seasonal amount of precipitation and as predictors Mean Sea Level Pressure (MSLP), geopotential height at 300 hPa (HGT300), wind speed at 700 hPa (WS700), temperature at 850 hPa (T850) and Total Column Water (TCW). The selection of predictors is based on the collinearity and multicollinearity analysis. Multicollinearity problems occur only during winter. All data sets used in this study are reanalysis gridded data with a spatial resolution of 2º x 2º lat/lon and are obtained from 20th Century Reanalysis Version 2 (20CR V2) Project. The analysis is made for a period of 140 years between 1871 and 2010, the period 1871 – 2000 being used to build the MLRM and the period 2000 – 2010 for testing the prediction performances of the MLRM. Using MLRM we have obtained some good correlation between predicted and measured precipitation amount. The correlation coefficient varies between 0.34 and 0.98, with the smallest values in winter and the greatest values in spring. The Spearman correlation was also used to validate the MLRM, and correlation coefficients are between 0.25 for winter and 0.98 for spring.

Key words: precipitation amount prediction, multiple linear regression, Sperman correlation.

1. INTRODUCTION

Precipitation is a principal element of the hydrological cycle, and changes in precipitation pattern is important because may lead to floods or droughts events that may cause economical loss and casualties. Frequency of precipitation plays an important role in the management of agriculture, water resources and ecosystems, and the time scales of the precipitation variability varies from months to years or decades. Identifying and understanding the influence of the large-scale air circulation patterns which produce temporal and spatial variations of precipitation in Romania is therefore of a great importance.

Rom. Journ. Phys., Vol. 59, Nos. 9–10, P. 1127–1149, Bucharest, 2014

Page 2: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

N. Barbu, V. Cuculeanu, S. Stefan 2 1128

In addition to global warming, on regional scale, several other factors may determine future climate change such as variation in air circulation and topography [1]. Variations in air circulation influence the climate in large areas both on inter-annual and long time scales while the topography modifies the effects of air circulation on local scale [2].

Precipitation studies have been made for various time periods and on various spatial scales: global [3], hemispheric [4], regional [5] and local [6]. A general review of seasonal and annual precipitation trends in Italy using historical records was made by Buffoni et al. [7]. Various methods are used to analyze spatial and temporal variability of precipitation, for example: trend and change point analysis [8, 4], cluster analysis [9, 10], Empirical Orthogonal Function (EOF) analysis [11, 6], canonical correlation analysis [11], and multiple linear regressions [12, 13, 14].

The multiple linear regression approach was developed and successfully applied several decades ago to the specification of surface temperature from mid-troposphere circulation, mainly for the purpose of weather prediction, by Klein [15]. The multiple linear regression model was also used in several studies of atmospheric physics, such as pollutant dynamics [16, 17], the estimation and prediction of atmospheric concentrations of the natural occurring radionuclides, radon and thoron [18], and to estimate and predict the radiative forcing of the clouds [19].

In Romania, various studies pointed out changes observed in the precipitation regime and their connection with changes in the large-scale circulation patterns [6, 20]. The main physical and physico-geographical factors controlling the spatial distribution of the climatic conditions in Romania are the large scale mechanisms (represented by the air circulation) and local mechanism induced by the Black Sea and Carpathian Mountains (orographic forcing). Interactions between large-scale air circulations with orography contribute to the determination of the precipitation field structure [21]. Winter precipitation variability in Romania is mainly controlled by variation in air circulation, the south-westerly circulation having the dominant role, modulated by the Carpathian topography [6, 20].

The aim of this paper is to develop a rainfall predictive model on a yearly and seasonally time scale. In order to achieve this, the multiple linear regression model (MLRM) is used. The MLRM was build up by using as predictors mean sea level pressure, geopotential height at 300 hPa, wind speed at 700 hPa, temperature at 850 hPa and total column water. First of all predictors were tested in terms of colinearitatea and multicolinearity. After that, the selected predictors were used to build up the MLRM for a period of 130 years between 1871 and 2000. Therefore the analytical equation obtained by building up the MLRM was used to predict the precipitation amount for yearly and seasonally time scales for 10 years, during the period 2001–2010. The performances of the MLRM were tested by using Pearson and Spearman correlation between predicted and measured precipitation amount.

Page 3: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

3 Modeling the precipitation amounts dynamics in Romania 1129

The paper is structured as follows. The data sets used in this study are presented in Section 2. Section 3 presents the methodology. In Section 4 the collinearity and multicollinearity analysis are discussed. Section 5 presents the estimation of annual and seasonal precipitation amount. Section 6 is dedicated to the prediction of annual and seasonal precipitation amount. The main conclusions are presented in Section 7.

2. DATA

Data sets used in this study are reanalysis gridded data obtained from the NOAA’s Twentieth Century Reanalysis Project [22] and consist on annual and seasonal amount of precipitation (PP) and annual and seasonal average of mean sea level pressure (MSLP), geopotential height at 300 hPa (HGT300), wind speed at 700 hPa (WS700), temperature at 850 hPa (T850) and total column water (TCW). The newly available Twentieth Century Reanalysis (20CR) covers the period 1871–2010. It assimilates surface pressure observations only [23, 22] into an atmospheric model on a horizontal resolution of T62 (approximately 1.9°) and also uses observed monthly sea-surface temperatures and sea-ice distributions as boundary conditions. An Ensemble Kalman Filter is used to optimally combine the imperfect observations and estimates of current state, producing an ensemble of 56 realizations of the reanalysis [22]. This allows an investigation of the observational uncertainties in the data assimilation. Spatial resolution of gridded data is 2 x 2 degrees latitude/longitude.

All data are built up for a period of 140 years, during 1871 – 2010. Data were divided into two groups: the first group is a period of 130 years between 1871 and 2000 and the second is a period of 10 years between 2001 and 2010. The first group was used to build MLRM, and the second to test the prediction performances of the MLRM. For this study we have been used the same data base source for predictors and predictands to be consistent with the multiple linear regression method.

The grid points, located in different regions of Romania, used to extract PP, MSLP, HGT300, WS700, T850 and TCW time series are presented in Figure 1.

Each grid point used in the study is located in a different region of Romania with different climate induced by the orographic barrier of the Carpathian Mountains and by the Black Sea. The grid point 1 is located in western part of Romania with influences of dry air masses from Central Europe and the wet air masses from Mediterranean Sea. The grid point 2 is located in eastern part of Romania with influences of synoptic patterns due to the Black Sea and the Russian Plain. The grid point 3 is located in Intra-Carpathian Region with influences of the Carpathian Mountains configuration. The grid point 4 is located in southern part of Romania with influences of the southwestern air circulation and by the Black Sea presence.

Page 4: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

N. Barbu, V. Cuculeanu, S. Stefan 4 1130

Fig. 1 – Regions in Romania subjected to particular climate and weather patterns and associated grid points used in this study (Region 1 – western part of Romania, Region 2 – eastern part of Romania,

Region 3 – Intra-Carpathian region and Region 4 – southern part of Romania).

To highlight different climatic characteristics of each region Figure 2 presents the annual amounts of precipitation for all regions in Romania subjected to particular climate and weather patterns.

Fig. 2 – Box plot of annual amount of precipitation for all regions of Romania

(whiskers represents minimum and maximum).

The smallest value of annual amount of precipitation averaged over the entire study period (377 mm) belongs to the Region 2 and the greatest value of annual amount of precipitation averaged over the entire study period (556 mm) belongs to the Region 3, followed by the Region 1 (538 mm). Region 4 has a multiannual mean of precipitation equal to 444 mm.

Page 5: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

5 Modeling the precipitation amounts dynamics in Romania 1131

3. METHODOLOGY

In order to predict the annual and seasonal precipitation amount the multiple linear regression model (MLRM) has been used. MLRM is a statistical technique that is used to model a linear relationship between a dependent variable (predictand), a continuous variable and one or more independent variables (predictors) and assumes a linear relationship between variables [24].

The estimation procedure is made by using the analytical expression of the multiple linear regression with selected predictors. Therefore, the predictand y(t) is associated with the predictors ( ){ } 1,2, ,i i n

x t= …

by the following relationship [25]:

( ) ( ) ( ) ( )0 1 1 2 2 n ny t x t x t x t= β + β + β +…+ β (1)

where β0 is the regression constant, β1…βn are the regression coefficients. The performances of the MLRM are quantified by the following statistic parameters:

• Multiple correlation coefficient (R) – is a measure of correlation between predictand and predictors;

• Squared multiple correlation coefficient (R2) – quantifies the proportion of variance of the predictand that is explained by the predictors;

• F-Test – is a global test of significance of the ensemble of coefficients. The F-Test is used to test the null hypothesis stating that all regression coefficients are equal to zero and the alternative hypothesis stating that at least one of the regression coefficients is different of zero. Large values of the F-Test provide evidence against the null hypothesis.

• p-value – characterizes the significance level so that the null hypothesis to be rejected or accepted against the alternative hypothesis. The p-value is computed by using the t-Student distribution for the regression coefficients and F distribution for the F-Test for the overall model. The p-value less than 0.05 indicate that the null hypothesis may be rejected and the alternative one has to be accepted.

• Sum Squared Residuals (SSR) – indicate the deviation between the predictand estimated or predicted by the MLRM and the measured values.

Therefore the analytical expression obtained by building up the MLRM can be used to predict the values of the dependent variable.

The performance of MLRM was also tested by using Pearson correlation and Spearman (so called rank correlation) correlation between measured and estimated precipitation amount for yearly and seasonally time scale. Pearson’s correlation coefficient is a measure of the strength of the linear relationship between two variables [26]. Spearman’s rank correlation coefficient is a nonparametric (distribution-free) rank statistic proposed by Spearman [27] as a measure of the

Page 6: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

N. Barbu, V. Cuculeanu, S. Stefan 6 1132

strength of an association between two variables. Spearman’s coefficient is not a measure of the linear relationship between two variables, as some “statisticians” declare. It assesses how well an arbitrary monotonic function can describe the relationship between two variables, without making any assumptions about the frequency distribution of the variables.

4. COLLINEARITY AND MULTICOLLINEARITY ANALYSIS

The selection of predictors is an iterative process based partly on user’s subjective judgment. The two main requirements for predictors are: a good relationship with predictant, and a reasonable temporal extension. In order to select the independent predictors to build the MLRM the collinearity and multicollinearity analysis were performed.

The collinearity analysis was made by determining the correlation coefficients (R) and corresponding level of significance (p-value) between predictors. Collinearity is a linear relationship between two predictors. Two collinear predictors explaining the same part of variation for the predictand and determine less significance in the case of individual terms of partial regression coefficients. The regression coefficients value becomes more dependent on the distribution and values of errors. When the correlation coefficient is close to 1 (larger than 0.99) the collinearity difficulties appear. A correlation describes the strength of an association between variables. An association between variables means that the value of one variable can be predicted, to some extent, by the value of the other. A correlation is a special kind of association: there is a linear relation between the values of the variables.

Table 1 presents only the highest correlation coefficients (R) and corresponding level of significance (p) for yearly and seasonally time scales for MSLP, HGT300, TCW, T850 WS700 predictors and region for which those were obtained. Because collinearity problems occurs when the correlation coefficients are larger than 0.99 we presented here only the highest values of the R. The R values for all variables, all regions and all time scales are less than 0.99 and this indicates that there are not collinearity problems.

Table 1

The highest correlation coefficients (R) and corresponding level of significance (p) for yearly and seasonally time scales for MSLP, HGT300, TCW, T850 WS700 predictors

MSLP Region HGT300 Region TCW Region T850 Region Year HGT300 R 0.38 1

p 7.80E-

06 TCW R -0.64 4 0.27 2

p 3.10E-

16 0.002

Page 7: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

7 Modeling the precipitation amounts dynamics in Romania 1133

Table 1 (continued) T850 R -0.44 4 0.73 1 0.5 4

p 1.60E-

07 1.00E-

21 1.60E-

09 WS700 R -0.53 1 -0.12 1 0.33 1 0.11 1

p 8.00E-

11 0.17 1.10E-

04 0.22 Winter HGT300 R 0.5 1

p 2.00E-

09 TCW R -0.68 4 0.38 2

p 1.10E-

18 6.80E-

06 T850 R -0.45 4 0.7 2 0.79 2

p 1.30E-

07 2.00E-

19 9.90E-

27 WS700 R -0.52 4 -0.07 2 0.35 1 0.33 4

p 2.70E-

10 0.46 4.10E-

05 1.20E-

04 Spring HGT300 R 0.2 1 p 0.02 TCW R -0.76 4 0.43 2

p 1.40E-

24 2.80E-

07 T850 R -0.57 4 0.76 2 0.77 3

p 2.60E-

12 1.60E-

24 3.00E-

25 WS700 R -0.42 1 0.16 1 0.42 1 0.35 1

p 8.40E-

07 0.06 7.40E-

07 5.20E-

05 Summer HGT300 R 0.21 4 p 0.02 TCW R -0.58 4 0.2 4

p 1.27E-

12 0.02 T850 R -0.43 4 0.78 1 0.03 4

p 3.26E-

07 3.10E-

26 0.72 WS700 R 0.39 4 -0.29 3 -0.3 4 -0.34 2

p 4.80E-

06 8.50E-

04 6.20E-

04 8.20E-

05 Autumn HGT300 R 0.21 1 p 0.02 TCW R -0.61 1 0.39 2

p 3.90E-

14 4.70E-

06 T850 R -0.5 4 0.79 1 0.67 2

p 1.50E-

09 1.70E-

26 7.60E-

18 WS700 R -0.62 1 -0.15 2 0.39 1 0.23 1

p 1.20E-

14 0.09 4.80E-

06 0.01

Page 8: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

N. Barbu, V. Cuculeanu, S. Stefan 8 1134

For the analysis of the collinearity one may draw the following conclusions: T850 is positive correlated with HGT300. This is due to the anticyclonic systems (high pressure systems) that generally were extended up to the troposphere.

TCW is negative correlated with MSLP, R varies between -0.52 (p-value is 2.14e-10) for summer for Region 1 and -0.76 (p-value is 1.37e-24) for spring for Region 4. This result was expected because climate of Region 4 is influenced by the Mediterranean Sea and Black Sea and climate of Region 1 is influenced by the continental dry air masses that come from the Central Europe. Cyclonic activity that is pronounced during spring contributes to the humidity advection in Region 4 from the Mediterranean Sea.

WS700 is negative correlated with MSLP and positive correlated with TCW (R values between 0.42 and corresponding p-value 7.4e-7 for spring for Region 1 and 2.08e-5 and corresponding p-value 0.99 for summer for Region 3). Explanation is that the wind speed is proportional to the baric gradient; it is associated to the cyclone and wind speed contributes to the moisture advection. Region 3 is sheltered by the Carpathian Mountains that act as a barrier against wet air masses from the Mediterranean Sea.

HGT300 is positive correlated with MSLP, R is 0.5 and corresponding p-value is 2.04e-9 for winter for Region 1 and 0.06 and corresponding p-value 0.95 for spring for Region 2. Generally baric systems have vertical extension up near the tropopause and this leads to the correlation between MSLP and HGT300.

T850 is positive correlated with TCW. It is well known that warm air is able to storing a large amount of water vapor than cold air.

Variability of the frequency of occurrence and life cycle of the baric systems are a great importance for selected predictors. The Carpathian Mountains modulate the climate characteristics and they induce a climate regionalization of Romania.

For selected predictors on all time scales and for all regions of Romania there are not collinearity problems.

For the multicollinearity analysis the variance inflation factor (VIF) is used. The main statistic for multicollinearity diagnostic is the VIF defined as [28]:

2

11 – j

VIFR

= (2)

where 2jR is the coefficient of determination when variable jx is regressed on j-1

remaining independent variables. A variable is considered to be problematic if its VIF is larger than 10.0 [28].

In table 2 VIF values for all predictors, all regions of Romania and all time scales are listed.

Page 9: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

9 Modeling the precipitation amounts dynamics in Romania 1135

Table 2

The VIF values for all predictors, all time scales and all regions of Romania (1 is Region 1, 2 is Region 2, 3 is Region 3 and 4 is Region 4). The values of VIF that exceeds 10 are in bold

1 2 3 4

Year MSLP 5.41 4.39 5.12 4.79

HGT300 7.21 5.23 6.31 4.55

TCW 1.92 2.04 2.03 2.31

T850 5.8 5.28 5.55 4.71

WS700 1.5 1.26 1.38 1.24

Winter MSLP 10.42 9.14 9.63 10.58

HGT300 14.07 11.47 12.98 10.79

TCW 4.71 3.96 4.27 3.82

T850 11.96 12.38 11.66 10.94

WS700 1.47 1.46 1.4 1.55

Spring MSLP 8.29 6.92 8.27 8.33

HGT300 9.49 7.92 8.64 8.21

TCW 4.82 3.96 5.07 4.05

T850 9.48 9.84 9.16 9.97

WS700 1.35 1.15 1.28 1.17

Summer MSLP 2.55 3.01 2.67 3.14

HGT300 4.53 3.26 3.88 2.44

TCW 1.81 1.68 1.77 1.72

T850 4.45 4.06 4.01 3.14

WS700 1.1 1.14 1.13 1.24

Autumn MSLP 6.1 4.6 5.69 4.4

HGT300 9.21 6.53 8.13 5.56

TCW 2.05 2.25 2.09 1.84

T850 9.67 8.2 9.01 7.09

WS700 1.76 1.22 1.59 1.45

By analyzing the VIF values it can be seen that only for winter the multicolliniarity is present, the VIF values exceeds the upper limit (namely 10.0) for MSLP for western part and southern part of Romania and for HGT300 and T850 for all regions of Romania. The explanation may be that winter has a high thermodynamic stability, and this means that temporal fluctuations of predictors

Page 10: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

N. Barbu, V. Cuculeanu, S. Stefan 10 1136

are lower than for other seasons. The anticyclonic regime is dominant during winter and this leads to the same weather situation for a long period of time compared to other seasons. The VIF values for WS700 are the lowest and those are less than 1.76. To avoid the multicollinearity problems the predictors that exceed 10.0 are not used to build MLRM for winter season. The values of VIF that exceeds 10.0 can lead to multicollinearity consequences such as the negative values for predicted values of precipitation amount. For the analyses of VIF we can say that the considered variables are good predictors for precipitation amount prediction except for winter.

5. ESTIMATION OF PRECIPITATION AMOUNT

Annual and seasonal precipitation amount were estimated by using multiple linear regression software develop by Wessa [29]. The regression coefficients are determined by using the least square method. The estimation of annual and seasonal amount of precipitation is performed by build up the MLRM for 130 years during the period 1871–2000 using as predictors MSLP, HGT300, T850, TCW and WS700. Therefore the analytical expressions obtained were used to estimate the precipitation amount for yearly and seasonally time scales for the same period that was used to build up the MLRM.

5.1. ESTIMATION OF ANNUAL PRECIPITATION AMOUNT

In Table 3 the regression statistics for the estimation of annual precipitation amount for all regions of Romania are presented. One may notice that the multiple correlation coefficients (R) have the closer values for all regions. The R value is the greatest for western and eastern part of Romania (0.87), and the smallest value of R (0.82) was obtained for southern part of Romania. The p-values for all regions are close to zero and this indicates that the results are statistically significant at a level of 0.01.

Table 3

Regression statistics for all regions of Romania for annual time scale

Region R R² p F-test SSR 1 0.87 0.76 2.11E-36 77.19 275617 2 0.84 0.71 9.50E-32 60.88 260226 3 0.87 0.75 1.81E-35 73.69 323827 4 0.82 0.67 1.45E-28 51.23 327359

Figure 3 presents the estimated and measured values of annual precipitation amount for Region 1 (a), Region 2 (b), Region 3 (c) and Region 4 (d).

Page 11: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

11 Modeling the precipitation amounts dynamics in Romania 1137

(a) (b)

(c) (d)

Fig. 3 – Estimated and measured annual amount of precipitation (PP) for all four regions of Romania (a – Region 1, b – Region 2, c – Region 3 and d – Region 4) considered in this study.

The explained variance (R²) varies between 67% for region 4 and 76% for region 1. The performance of the MLRM is also quantified by SSR, small values of SSR indicates a good prediction of precipitation amount. SSR has the smallest value for Region 2 (260226) and the highest values for Region 4 (327359). In case of yearly time scale the best performance of the MLRM was noted for Region 2.

From Figure 3 it can be seen that for all regions of Romania the estimation annual precipitation amounts are very close to the measured values.

5.2. ESTIMATION OF ANNUAL PRECIPITATION AMOUNT

Table 4 presents the regression statistics for the estimation procedure of the MLRM for all seasons and all regions of Romania. The selected predictors used to build up the MLRM, according to the multicolliniarity analysis (see Table 3), for spring, summer and autumn are MSLP, HGT300, TCW, T850 and WS700 whereas

Page 12: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

N. Barbu, V. Cuculeanu, S. Stefan 12 1138

for winter the predictors are MSLP, TCW and WS700 for Intra-Carpahian and eastern part of Romania and TCW and WS700 for southern and western part of Romania.

Table 4

Regression statistics for all four regions of Romania and for seasonal time scale

Region R R² p F-test SSR

Winter 1 0.51 0.26 1.90E-06 21.9 102044

2 0.6 0.36 9.20E-21 23.21 92694

3 0.73 0.53 5.00E-12 48.02 65584

4 0.43 0.19 6.80E-09 14.64 159251

Spring 1 0.88 0.78 1.70E-38 85.54 49889

2 0.82 0.67 1.80E-28 50.99 56434

3 0.88 0.78 6.40E-39 87.27 57108

4 0.85 0.73 3.80E-33 65.51 64809

Summer 1 0.92 0.84 4.30E-47 127.26 67677

2 0.82 0.67 2.00E-28 50.83 60549

3 0.91 0.83 1.20E-45 119.34 72234

4 0.85 0.73 2.70E-33 66.01 35483

Autumn 1 0.82 0.68 5.50E-29 52.45 73293

2 0.78 0.61 1.70E-23 38.04 60395

3 0.83 0.68 2.20E-29 53.59 72963

4 0.77 0.6 5.60E-23 36.81 81219

Multiple correlation coefficients (R) indicate a good correlation between estimated and measured precipitation amount for seasonally time scale, which means that the predictors selected for prediction of precipitation amount capture very well the mechanisms that control the precipitation dynamics. The greatest R (0.92) was obtained for summer season for western part of Romania and the smallest R (0.43) was obtained for southern part of Romania during winter. Generally summer is the season with the largest values of R for all regions of Romanian and, on the other hand winter is the season with the smallest R values. The smallest values of R obtained for winter are due to the lower number of predictors used compared to other seasons. The p-values are close to zero for all seasons and all regions of Romania and this indicates that the results are

Page 13: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

13 Modeling the precipitation amounts dynamics in Romania 1139

statistically significance at 0.05 level. The greatest explained variance (84%) was found for summer season for region 1 and the lowest explained variance (19%) was found for winter for region 4. Quantifying the performance of the MLRM in terms of SSR the high performance was found for summer for region 4.

Figure 4 presents the estimated and measured precipitation amount for winter for all regions of Romania (Region 1 - a, Region 2 - b, Region 3 - c and Region 4 - d).

(a) (b)

(c) (d)

Fig. 4 – Estimated and measured winter amount of precipitation (PP) for all regions of Romania (a – Region 1, b – Region 2, c – Region 3 and d – Region 4) considered in this study for the period

1871–2000.

In Figure 5 is presented the estimated and measured precipitation amount for spring for all regions of Romania (Region 1 - a, Region 2 - b, Region 3 - c and Region 4 - d).

Page 14: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

N. Barbu, V. Cuculeanu, S. Stefan 14 1140

(a) (b)

(c) (d) Fig. 5 – The same as in Figure 4, but for spring.

Figure 6 presents the estimated and measured precipitation amount for summer for all regions of Romania (Region 1 - a, Region 2 - b, Region 3 - c and Region 4 - d).

(a) (b)

Page 15: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

15 Modeling the precipitation amounts dynamics in Romania 1141

(c) (d)

Fig. 6 – The same as in Figure 4, but for summer.

In Figure 7 is presented the estimated and measured precipitation amount for autumn for all regions of Romania (Region 1 - a, Region 2 - b, Region 3 - c and Region 4 - d).

(a) (b)

(c) (d)

Fig. 7 – The same as in Figure 4, but for autumn.

Page 16: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

N. Barbu, V. Cuculeanu, S. Stefan 16 1142

From Figures 4, 5, 6 and 7 one can clearly see that in case of winter season there are differences between estimated and measured values of precipitation amount. In case of spring, summer and autumn there are small differences between estimated and predicted values of precipitation amount. From this analysis we can expect poor performance of the MLRM only for winter.

6. PREDICTION OF PRECIPITATION AMOUNT

The prediction procedure is based on the regression analytical expression with the data from previous period using the selected predictors on the basis of the procedure exposed in Section 4. The analytical expressions were obtained by using MLRM with selected predictors for a period of 130 years, between 1871 and 2010 for yearly and seasonal time scales. The prediction period is a period of 10 years, between 2001 and 2010.

6.1. PREDICTION OF ANNUAL PRECIPITATION AMOUNT

Figure 8 presents the linear relationship between predicted and measured precipitation amount for all regions of Romania (Region 1 - a, Region 2 - b, Region 3 - c and Region 4 - d) for yearly time scale.

(a) (b)

Page 17: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

17 Modeling the precipitation amounts dynamics in Romania 1143

(c) (d)

Fig. 8 – Measured precipitation amount (PP measured) function of predicted precipitation amount (PP predicted) for all regions of Romania (a – Region 1, b – Region 2, c – Region 3 and d – Region 4)

for yearly time scale. The linear regression equation and corresponding parameters are drawn in bottom right corner. The prediction was made for a period of 10 years (2001–2010).

We have obtained closer values of correlation coefficients (R) for all four regions of Romania and they are larger than 0.85. The greatest value of the correlation coefficients between measured and predicted values is 0.92 and it was obtained for western part of Romania and Intra-Carpathian region. All values of R are statistical significant at a level of 0.05, with the p-value less than 1.64e-3. In case of yearly time scale one can say that the performance of the MLRM is high.

6.2. PREDICTION OF SEASONAL PRECIPITATION AMOUNT

Figures 9, 10, 11 and 12 present the linear relationship between predicted and measured precipitation amount for all regions of Romania (Region 1 - a, Region 2 - b, Region 3 - c and Region 4 - d) for winter, spring, summer, and autumn.

(a) (b)

Page 18: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

N. Barbu, V. Cuculeanu, S. Stefan 18 1144

(c) (d) Fig. 9 – The same as in Figure 8, but for winter.

(a) (b)

(c) (d)

Fig. 10 – The same as in Figure 9, but for spring.

Page 19: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

19 Modeling the precipitation amounts dynamics in Romania 1145

(a) (b)

(c) (d)

Fig. 11 – The same as in Figure 9, but for summer.

(a) (b)

Page 20: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

N. Barbu, V. Cuculeanu, S. Stefan 20 1146

(c) (d)

Fig. 12 – The same as in Figure 9, but for summer.

The correlation coefficients (R) obtained varies between 0.34 for winter for western part of Romania and 0.98 for spring for eastern part of Romania. Generally the lowest values of R are obtained for winter and the highest values of R are obtained for spring. The poor performances of prediction of the precipitation amount in winter are poor for western and eastern part of Romania, the correlation coefficients are 0.34 (p-value is 0.33) respectively 0.52 (p-value is 0.13). This is due to the smallest number of predictors used for prediction compared to the others seasons. However, for the Intra-Carpathian region and eastern part of Romania the correlation coefficients are high, 0.87 respectively 0.73.

6.3. TESTING THE PERFORMANCE OF MLRM PREDICTION

The performances of the MLRM were tasted by using Spearman correlation; the Spearman correlation is more robust, it is not sensible to the linear trend.

Table 5

Spearman correlation coefficients between measured and predicted annual and seasonal precipitation amount for all regions of Romania (1 – Region 1, 2 – Region 2, 3 – Region 3 and 4 – Region 4).

The prediction period is 2001–2010

1 2 3 4 Year 0.89 0.83 0.83 0.65

Winter 0.25 0.73 0.86 0.29 Spring 0.91 0.9 0.98 0.9

Summer 0.84 0.66 0.9 0.87 Autumn 0.76 0.66 0.76 0.67

Table 5 presents the Spearman correlation coefficients between measured and predicted precipitation amount for yearly and seasonal time scales. The correlation

Page 21: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

21 Modeling the precipitation amounts dynamics in Romania 1147

coefficient varies between 0.25 in winter for western part of Romania and 0.98 for eastern part of Romania in spring. And from this analysis the performance of the MLRM is weaker for winter because the predictors used by the model are less than for others seasons. The highest performance of the MLRM was found for spring for all regions. The Spearman’s correlation coefficients are close to the Pearson’s correlation coefficients.

7. CONCLUSIONS

In this study, the MLRM was developed to investigate the influence of the MSLP, HGT300, TCW, T850 and WS700 on the annual and seasonal precipitation amount in Romania. The collinearity and multicollinearity analysis indicate the fact that the MSLP, HGT300, TCW, T850 and WS700 are very good predictors for precipitation amount at yearly and seasonally time scales with small inconvenients for winter when for MSLP, HGT300 and T850 predictors the multicollinearity problems appears.

The regression statistics for the precipitation amount estimation show larger values of multiple R with high level of significance (p-value close to zero) for the entire year and for all seasons that varies between 0.43 in winter for southern part of Romania and 0.92 in summer for western part of Romania.

The correlation coefficients between measured and predicted precipitation amount obtained for 10 years during the period 2000-2010 by using MLRM varies between 0.34 for winter for western part of Romania and 0.98 for spring for eastern part of Romania.

For yearly time scale the correlation coefficients varies between 0.85 and 0.92 and Spearman correlation coefficient varies between 0.65 and 0.89. Those results are in accordance with those found in previous studies. For example Rajeevan et al. [30] reported that the correlation between predicted and observed fot 24 years period was 0.77-0.84, and Mizanur Rahman et al. [14] reported that the correlation between predicted and observed rainfall for the 31 years during the period 1976-2007 is 0.74.

Using Spearman rank correlation method the correlation coefficients are approximately the same with Pearson’s correlation coefficients, and vary between 0.25 in winter for western part of Romania and 0.98 for eastern part of Romania in spring.

The multiple regression model presented in this study can be used to predict precipitation anomalies in Romania in future climate using simulated predictors. This is very useful because predictor fields, like HGT 300, T850 or MSL are accurately simulated by model.

Finally, we can conclude that the MLRM developed in this study is a very good model for precipitation amount prediction on yearly and seasonal time scales,

Page 22: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

N. Barbu, V. Cuculeanu, S. Stefan 22 1148

and analytical equation obtained by running the MLRM for 130 years period can be used for future climate projection. It is important to select optimal number of predictors to build up the MLRM. A small number of predictors lead to less performance of the MLRM.

In addition, the methodology can be applied for prediction of precipitation fields not only for Romanian territory but also for other regions from Europe.

Acknowledgments. Author Barbu N. work was supported by the strategic grant POSDRU/159/1.5/9.137750, “Project Doctoral and Postdoctoral programs support for increased competitiveness in Exact Sciences research” co-financed by the European Social Founds within the Sectoral Operational Program Human Resources Development 2007–2013.

The authors thank to the Executive Agency for Higher Education, Research Development and Innovation Funding (UEFISCDI) for the research funds in the research project CLIMYDEX “Changes in climate extremes and associated impact in hydrological events in Romania”, cod PNII-PCCE-ID-2011-2-0073.

The present study is also useful for the project EMERSYS (Toward an integrated, joint cross-border detection system and harmonized rapid responses procedures to chemical, biological, radiological and nuclear emergencies), MIS-ETC code 774.

REFERENCES 1. J. H. Christensen, B. Hewiston, A. Busuioc, A. Chen, X. Gao, I. Held, R. Jones, R. K. Kolli,

W. T. Kown, R. Laprise., V. Magana Rueda, L. Mearns, C. G. Menéndez., J. Räisänen, A Rinke, A. Saar and P. Whetton, Regional Climate Projections. In: Climate Change 2007: The Physical Science Basis. Contribution of Working Group I to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change [Solomon S., Qin D., Manning M., Chen Z., Marquis M., Averyt K.B., Tignor M. and Miller H.L. (eds.)]. Cambridge University Press, Cambridge, United Kingdom and New York, NY, USA (2007).

2. R. Bojariu and F. Giorgi, The North Atlantic Oscillation signal in a regional climate simulation for the European region. Tellus, 57A:641–653 (2005).

3. H. F. Diaz, R.S. Bradley and J.K. Eischeid, Precipitation Fluctuation Over Global Land Areas Since the Lates 1800’s, J. Geoph. Res. 94D1:1195–1210 (1989).

4. R. S. Bradley, H. F. Diaz, J. K. Eischeid, P.D. Jones, P. M. Kelly and C. M. Goodess, Precipitation Fluctuation over Northern Hemisphere Land Areas Since the Mid-19th Century, Science (Reprint Series), 237:171–175 (1987).

5. C. D. Schönwiese, J. Rapp, T. Fuchs and M. Denhard, Observed climate trends in Europe 1891–1990. Meteorol Z NF 3:22–28 (1994).

6. A. Busuioc and H. von Storch, Changes in the winter precipitation in Romania and its relation to the large scale circulation. Tellus 48A:538–552 (1996).

7. L. Buffoni, M. Maugeri and T. Nanni, Precipitation in Italy from 1833 to 1996. Theor. Appl. Climatol. 63:33–40 (1999).

8. A. N. Pettitt, A non-parametric approach to the change-point problem. Applied Statistics.126–135 (1979).

9. G. Galliani and F. Filippini, Climate clusters in a small area. J. Clim. 3:47–63 (1985). 10. C. Cacciamani, S. Tibaldi and S. Nanni, Mesoclimatology of winter temperature and precipitation

in the Alpine Region of Northern Italy. Int. J. Climatol. 14:777–814 (1994). 11. H. von Storch, Spatial patterns: EOFs and CCA. In: von Storch H, Navarra A (eds) Analysis of

climate variability: applications of statistical techniques. Springer, Heidelberg, 227–258 (1995).

Page 23: MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT ... · MODELING THE PRECIPITATION AMOUNTS DYNAMICS FOR DIFFERENT TIME SCALES IN ROMANIA USING MULTIPLE REGRESSION APPROACH

23 Modeling the precipitation amounts dynamics in Romania 1149

12. M. C. Valverde Ramíres, N. J. Ferreira and H. F. de Campos Velho, Linear and Nonlinear Statistical Downscaling for Rainfall Forecasting over Southeastern Brazil. Weather and Forecasting. 21:969–989 (2006).

13. M. D. Mizanur Rahman, M. Rafiuddin and M. D. Mahbub Alam, Seasonal forecasting of Bangladesh summer monsoon rainfall using simple multiple regression model. J. Earth Syst. Sci. 122(2):551–558 (2013).

14. R. Chifurira and D. Chikobvu, A Weighted Multiple Regression Model to Predict Rainfall Patterns: Principal Component Analysis approach. Mediterranean Journal of Social Sciences, 5(7):34–42 (2014).

15. W. H. Klein, Specification of monthly mean surface temperatures from 700 mb heights. J. Appl. Meteor. 1:154–156 (1962).

16. M. C. Hubbard and W. G. Cobourn, Development of a regression model to forecast ground-level ozone concentration in Louisville, KY. Atmos. Environ. 32(14–15):2637–2647 (1998).

17. A. Vlachogianni, P. Kassomenos, A. Karppinen, S. Karakitsios and J. Kukkonen, Evaluation of a multiple regression model for the forecasting of the concentrations of NOx and PM10 in Athens and Helsinki. Science of the Total Environment. 409:1559–1571 (2011).

18. F. Simion, V. Cuculeanu, E. Simion and A. Geicu, Modeling the ²²²Rn and ²²°Rn progeny concentration in atmosphere using multiple linear regression with meteorological variables as predictors. Rom. Rep. Phys. 65, 524–544 (2013).

19. V. Cuculeanu, I. Ungureanu and S. Stefan, Study of the relationship among radiative forcing, albedo and cover fraction of the clouds. Rom. Journ. Phys. 58, 987–999 (2013).

20. A. Busuioc, Large-scale mechanisms influencing the winter Romanian climate variability, Detecting and Modelling Regional Climate Change and Associated Impacts, M. Brunet and D. Lopez eds., Springer-Verlag, 333–343 (2001).

21. R. A. Houze, Cloud dynamics. Academic Press, Inc, San Diego (1993). 22. G. P. Compo, Whitaker J.S., Sardeshmukh P.D, Matsui N., Allan R.J., Yin X., Gleason

B.E., Vose R.S., Rutledge G., Bessemoulin P.,Brönnimann S., Brunet M., Crouthamel R.I., Grant A.N., Groisman P.Y., Jones P.D., Kruk M.C.,Kruger A.C., Marshall G.J.,Maugeri M., Mok H.Y.,Nordli O., Ross T.F., Trigo R.M., Wang X.L.,Woodruff S.D. andWorley S.J., The Twentieth Century Reanalysis Project, Q. J. R. Meteorol. Soc. 137:1–28. doi: 10.1002/qj.776 (2011).

23. G. P. Compo, J. S. Whitaker and P. D. Sardeshmukh, Feasibility of a 100 year reanalysis using only surface pressure data. Bull. Am. Meteorol. Soc. 87:175–190, doi:10.1175/BAMS-87-2-175 (2006).

24. P. Aksornsingchai and C. Srinilta, Statistical Downscaling for Rainfall and Temperature Prediction in Thailand, Proceedings of the International MultiConference of Engineers and Computer Scientists, 2011, March 16 – 18, 2011, Hong Kong (2011).

25. D.S. Wilks, Forecast verification. Statistical Methods in the Atmospheric Sciences, Acadenic Press, p467 (1995).

26. K. Pearson, Mathematical contributions to the theory of evolution. III. Regression, heredity, and panmixia. Philosophical Transactions of the Royal Society Ser. A 187:253–318 (1896).

27. C. E. Spearman, The proof and measurement of association between two things. Am. J. Physiol. 15:72–101 (1904).

28. P. A. Rogerson, A Statistical Method for the Detection of Geographic Clustering. Geogr Anal 33(3):215–227 (2001).

29. P. Wessa, Free Statistics Software, Office for Research Development and Education, version 1.1.23-r7, URL http://www.wessa.net/ (2014).

30. M. Rajeevan, D.S. Pai and R. Anil Kumar, New statistical models for long range forecasting of south-west monsoon rainfall. Climate Dynamic. 28(7–8):813–828, doi:10.1007/s00382-006-0197-6 (2006).