Click here to load reader
Upload
mojca
View
214
Download
2
Embed Size (px)
Citation preview
Trivariate Frequency Analyses of Peak Discharge,Hydrograph Volume and Suspended SedimentConcentration Data Using Copulas
Nejc Bezak & Matjaž Mikoš & Mojca Šraj
Received: 9 January 2014 /Accepted: 26 March 2014 /Published online: 26 April 2014# Springer Science+Business Media Dordrecht 2014
Abstract Copula functions are often used for multivariate frequency analyses, but dischargeand suspended sediment concentrations have not yet been modelled together with the use of 3-dimensional copula functions. One hydrological station from Slovenia and five stations fromUSA with watershed areas from 920 km2 to 24,996 km2 were used for trivariate frequencyanalyses of peak discharges, hydrograph volumes and suspended sediment concentrations.Different parametric marginal distributions were applied and parameters were estimated withthe method of L-moments. Maximum pseudo-likelihood method was used for copula param-eters estimation. With the use of statistical and graphical tests we selected the most appropriatecopula model. Symmetric and asymmetric versions of Archimedean copulas were appliedaccording to the dependence characteristics of the individual stations. We selected Gumbel-Hougaard copula as the most appropriate model for all discussed stations. Primary joint returnperiods OR and secondary Kendall’s return periods were calculated and comparison betweenselected copula functions was made. We can conclude that copula functions are usefulmathematical tool, which can also be used for modelling variables that are presented in thispaper.
Keywords Multivariate analysis . Symmetric copulas . Asymmetric copulas . Flood frequencyanalysis . Suspended sediments
1 Introduction
Hydropower reservoir filling and turbine abrasion is a major challenge for water managersdealing with water resources in many mountainous countries. Therefore reliable procedures areneeded to efficiently estimate suspended sediment loads. Furthermore most of the suspended
Water Resour Manage (2014) 28:2195–2212DOI 10.1007/s11269-014-0606-2
N. Bezak (*) :M. Mikoš :M. ŠrajFaculty of Civil and Geodetic Engineering, University of Ljubljana, Jamova 2, SI-1000 Ljubljana, Sloveniae-mail: [email protected]
M. Mikoše-mail: [email protected]
M. Šraje-mail: [email protected]
material is transported during few extreme events, which are usually in coincidence with highpeak discharge values and consequently also with large hydrograph volumes. Frequencyanalyses are mostly performed in hydrology and water resources management to obtainrelationship between design variables and recurrence interval. Therefore copulas seem to bean interesting option for simultaneous study of peak discharges (Q), hydrograph volumes (V)and suspended sediment concentrations (SSC).
Copulas have become frequently used mathematical tool for hydrological analyses andapplications in the last decade. Copulas have been used for modelling droughts (Wong et al.2010; Ganguli and Reddy 2012; Ma et al. 2013; Yusof et al. 2013), to check adequacy of damspillway (De Michele et al. 2005), for flood coincidence risk analyses (Chen et al. 2012), forrainfall analyses (Zhang and Singh 2007a; Balistrocchi and Bacchi 2011), for geostatisticalinterpolations (Bardossy and Li 2008; Bardossy 2011) and also for flood frequency analyses(Favre et al. 2004; Salvadori and De Michele 2004; Grimaldi and Serinaldi 2006; Zhang andSingh 2006, 2007b; Serinaldi and Grimaldi 2007; Wang et al. 2009; Reddy and Ganguli 2012;Sraj et al. 2014). Favre et al. (2004) and Salvadori and De Michele (2004) provided basictheory for frequency analysis via copulas. Salvadori and De Michele (2004) defined differentprimary and secondary returns periods which are characteristics of multivariate frequencyanalysis. Favre et al. (2004) also presented some steps of analysis, like copula parametersestimation, marginal distributions selection, simulations with copulas, and graphical goodnessof fit tests, which are part of copula flood frequency analysis procedure. Most of the authorsused copulas from Archimedean family (Ali-Mikhail-Haq, Clayton, Frank, Gumbel-Hougaard,Joe) for bivariate or trivariate flood frequency analyses. Zhang and Singh (2006) used oneparameter Archimedean copulas for bivariate flood frequency analysis, while Zhang and Singh(2007b) used Gumbel-Hougaard copula for trivariate flood frequency analysis. In both casesdischarge series from North America were used. Wang et al. (2009) also used copula functionsfrom Archimedean family for flood frequency analysis at the confluences of river systems.This procedure can be used in areas where insufficient discharge data is available for analysis.Reddy and Ganguli (2012) applied most frequently used Archimedean copulas (Ali-Mikhail-Haq, Clayton, Frank and Gumbel-Hougaard) for bivariate flood frequency analysis in India.Grimaldi and Serinaldi (2006) and Serinaldi and Grimaldi (2007) introduced asymmetriccopulas to hydrological applications and investigated impact of modelling asymmetric samplewith symmetric copulas. The authors found that parameter of symmetric copula is correlatedwith the smaller value of the parameters of asymmetric copulas. Therefore, symmetric copulacan underestimate dependence between the most correlated variables (Grimaldi and Serinaldi2006). So, in the symmetric case all dependences are described with one parameter, while inthe asymmetric case we have more than one parameter to describe dependencies.
Suspended sediments are important hydrological and environmental variable, which iscorrelated with soil erosion, ecological conditions of the watershed, conditions of streams,hydrotechnical works (Bonacci and Oskorus 2010) and also with the frequency of the extremerainfall events. It is well known fact that majority of the suspended sediment load istransported during few extreme events (Rodríguez-Blanco et al. 2010; Tena et al. 2011). Fromthis point of view it seems to be reasonable to consider the corresponding suspended sedimentconcentration (SSC) events (defined based on peak discharge) in the flood frequency analysis.Frequency analyses of hydrological variable SSC is not usually done, but some examples canbe found in the literature (Tramblay et al. 2008, 2010; Benkhaled et al. 2013). Tramblay et al.(2008) made frequency analysis of the annual maximum (AM) SSC series for more than 200stations in the North America. Different frequently used distributions were selected foranalysis and stationarity of the samples was also checked. Furthermore, Tramblay et al.(2010) carried out regional frequency analysis of the SSC series in Californian Rivers.
2196 N. Bezak et al.
Benkhaled et al. (2013) performed frequency analysis on AM of SSC series at M’chounechgauge station in Abiod wadi near Biskra in Algeria.
A copula function, which is a multivariate distribution function, is used to analyse discharge(Q), hydrograph volume (V) and SSC data in this paper. These three hydrological variables areusually not modelled simultaneously, and especially not with the use of copulas. Suspendedsediment loads are usually correlated with peak discharge values and consequently also withhydrograph volumes, so these hydrological phenomena are multidimensional and can beanalysed with the use of copulas, which can give us additional information about the observedhydrological process.
The aim of this paper was to carry out frequency analyses of Q, V and SSC series from sixstations with the use of 3-dimensional symmetric and asymmetric copula functions, where allrequired steps of copula analysis are presented and explained on practical examples.
2 Data
For the purpose of analyses we used data from Slovenian hydrological station Gornja Radgonaon the Mura River and five stations on the USA Rivers (Potomac-01638500, Delaware-01463500, Schuylkill-01470500, Juniata-01567000, and Iowa-05454500) (Table 1). Thelocation of the selected stations is shown in Fig. 1. For all the considered stations dailydischarge and suspended sediment concentrations series were selected as the basis for the AMseries sample definition. Main characteristics of the considered stations and samples arepresented in Table 1, while the AM data samples are shown in Fig. 2.
The alpine nival-pluvial water regime with most of the maximum discharges occurred in thesummer is characteristic for the Gornja Radgona hydrological station in Slovenia. Further-more, the strength of seasonality r for the observed period was 0.56. If seasonality coefficient ris near 1, the seasonality is strong and if it is closer to 0, the timing of events is more complex,and the seasonality is not significant (Burn 1997). Calculated r value for the Gornja Radgonastation showed that the seasonality was present, but not very strong. For the USA stations onthe Delaware, Iowa, Juniata and Potomac rivers majority of the annual maximum dischargeshappened in spring; however some extreme events also happened in other parts of the year. Forthe station Berne on Schuylkill River most of the annual maximum discharges occurred inwinter. Seasonality coefficient r for USA rivers varied between 0.47 for the station Berne onSchuylkill River and 0.68 for the station Newport on the Juniata River. Watershed areas of theconsidered rivers were between 920 and 24,996 km2, similarly also mean AM discharge valueswere in the range between 233 and 3,210 m3/s, however mean AM SSC values had smaller
Table 1 Main characteristics of the considered annual maximum series samples
River Station Watershedarea [km2]
Period Qmean; Qsd
[m3/s]Vmean; Vsd
[108 m3]SSCmean;SSCsd [g/m
3]
Mura Gornja Radgona, Slovenia 10,197 1977–2005 619.1; 231.4 2.1; 1.2 621.0; 515.0
Delaware Trenton, New Jersey 17,560 1950–1981 2,419.7; 1,238.1 8.9; 4.5 413.0; 301.3
Iowa Iowa City, Iowa 8,472 1944–1986 273.7; 168.8 1.7; 1.4 684.1; 814.2
Juniata Newport, Pennsylvania 8,687 1951–1984 1,169.0; 730.1 4.9; 3.5 332.1; 217.8
Potomac Point of Rocks, Maryland 24,996 1961–1985 3,210.1; 1,788.1 11.1; 6.6 716.3; 478.8
Schuylkill Berne, Pennsylvania 920 1950–1980 233.4; 137.8 0.7; 0.4 265.0; 246.6
Multivariate Frequency Analyses Using Copulas 2197
dispersion extended from 265 to 716 g/m3 (Table 1). SSC measurements in Slovenia began in1955 and almost 50 gauging stations were included in the measuring network, however notmany of them have long and continuous time series available (Bezak et al. 2013b). The USGSsuspended sediment concentrations database contains measurements from more than 1,500stations around the USA with the average period of record of 3–5 years (Holtschlag 2001).
USA SLOVENIA
Fig. 1 Location of the analysed stations, namely five stations in the USA and one station in Slovenia
Fig. 2 Presentation of the original AM data samples
2198 N. Bezak et al.
Most of the analysed USA stations are located in the north eastern part of the country; howeverthe Iowa River is part of the Mississippi River basin (Fig. 1). SSC values do not depend just onthe hydraulic characteristics of the streams, but also on some other anthropogenic (e.g. damconstruction, other hydrotechnical works, land use, location of mines in watersheds) andnatural (rainfall intensity, location of sediment sources) influences. Consequently dischargevalues are not always good indicator of the SSC values, therefore Q-SSC curves (ratingcurves) should be used with caution when scatter between Q and SSC is large (Rodríguez-Blanco et al. 2010). As, for example for the Iowa station, Kendall’s correlation coefficientbetween AM Q and AM SSC values was 0.1. Therefore, also relationship between Q and SSCvalues presented in Table 1 is not completely linear, namely high Q values do not necessarycorrespond to high SSC values.
3 Methods
First, annual maximum discharges were defined and corresponding hydrograph volumes andsuspended sediment concentrations were extracted from discharge and SSC series. Therefore,only discharge peaks are definitely annual maximums, while hydrograph volumes and SSCvalues were defined based on corresponding discharge values. To define the correspondinghydrograph volumes, first baseflow was separated from daily discharge series. R package lfstat(Koffler and Laaha 2012) was used to define baseflow values. Analyses and observations ofhydrograph are useful for understanding numerous interacting processes within the catchment(Parajka et al. 2013). Mann-Kendall (MK) test (Kendall 1975) was performed to detect thepresence of trends in the selected samples and Box-Pierce test (Box and Pierce 1970) wasselected to test autocorrelation in the samples. However, because AM series were used in thestudy, autocorrelation in the samples was not expected. After the samples testing, univariatefrequency analyses were carried out. Generalized extreme value (GEV), exponential (EXP),gamma (GAM), generalized Pareto (GPA), Gumbel (GUM), Pearson 3 (P3), log-Pearson 3(LP3) and log-normal (LN) distributions were applied as marginal distributions of Q, V andSSC samples. Parameters of parametric distribution functions were estimated with the methodof L-moments (Hosking and Wallis 1997). Non-parametric distributions (e.g. kernel density)could be alternative to the chosen parametric distributions. The best fitting distributionfunction for individual considered variable was selected based on different graphical tests,statistical tests and model selection criteria (Bezak et al. 2013a). The Kolmogorov-Smirnovtest (K-S), root mean square error (RMSE), mean absolute error (MAE) model selectioncriteria and QQ plots were used for marginal distributions selection.
The first step of the copula approach was to assess the dependence between modelledvariables. Different graphical and statistical tools were performed. Therefore, the Chi-plot(Fisher and Switzer 1985, 2001) and the K-plot (Genest and Boies 2003) were used. Likewise,Pearson, Kendall and Spearman correlation coefficients were calculated, where Pearsoncorrelation coefficient measures only linear dependence, whereas the other two coefficientsare based on ranks and are more appropriate for expressing dependence between variables.Copulas from Archimedean family were used in this study, both symmetric and asymmetricversions of Gumbel-Hougaard, Frank and Clayton copulas were applied to AM series samples.Table 2 shows selected trivariate copula functions, where parameter of symmetric copulas is θ,whereas θ1 and θ2 are parameters of asymmetric copula functions (θ2>θ1). Trivariate asym-metric copula functions can be alternative to the symmetric model in case when the depen-dence between two variables is stronger as dependences between the other two pairs ofvariables. More information about these copula functions can be found in Joe (1997), Nelsen
Multivariate Frequency Analyses Using Copulas 2199
(1999) and Salvadori et al. (2007). Parameters of selected copula functions were estimatedwith the maximum pseudo-likelihood method (Genest et al. 1995), where log-likelihood hasthe form:
logL θð Þ ¼X
i¼1
n
log cθ Uið Þf g ¼X
i¼1
n
log cθRi;1
nþ 1;…;
Ri;d
nþ 1
� �� �; ð1Þ
where cθ is copula density which can be calculated as partial derivative of copula functionsdefined in Table 2:
cθ u1; u2; u3ð Þ ¼ ∂3Cθ u1; u2; u3ð Þ∂u1∂u2∂u3
: ð2Þ
Maximum pseudo-likelihood method is semiparametric approach and pseudo-observation values always lie between 0 and 1 ([0,1]d). The parameters of the theoreticalcopulas (Table 2) are estimated with the numerical maximization of Eq. (1). More aboutsome other parameter estimation techniques, as the method of moments, maximumlikelihood or inference functions for margins is introduced in Joe (1997), Nelsen(1999) and Salvadori et al. (2007).
Graphical and statistical tests were used in the study to check the adequacy of selectedcopula functions for modelling 3-dimensional sample (Q, Vand SSC). We applied Cramér-vonMises test (Genest et al. 2009):
Sn ¼X
i¼1
n
Cn Uið Þ−Cθ Uið Þf g2; ð3Þ
where vector Ui are the pseudo-observations calculated from analysed sample, Cθ is testedtheoretical copula (Table 2) andCn is empirical copula, which is defined as (Genest et al. 2009):
Cn uð Þ ¼ 1
n
X
i¼1
n
1 Ui≤uð Þ: ð4Þ
According to Genest et al. (2009) Cramér-von Mises test, defined with Eq. (3), is the mostpowerful goodness of fit test based on empirical process. P-values for Cramér-von Mises test
Table 2 Applied trivariate symmetric and asymmetric Archimedean copulas
Copula Cθ(u1,u2,u3) or Cθ1 u3;Cθ2 u1; u2ð Þð ÞSymmetric Gumbel-
Hougaard exp − −lnu1ð Þθ þ −lnu2ð Þθ þ −lnu3ð Þθ� �1
θ
� �
Symmetric Frank − 1θ ln 1þ exp −θu1ð Þ−1ð Þ exp −θu2ð Þ−1ð Þ exp −θu3ð Þ−1ð Þ
exp −θð Þ−1ð Þ2n o
Symmetric Clayton u−θ1 þ u−θ2 þ u−θ3 −2� −1θ
Asymmetric Gumbel-Hougaard-M6 exp − −lnu1ð Þθ2 þ −lnu2ð Þθ2
h iθ1θ2 þ −lnu3ð Þθ1
! 1θ1
8<
:
9=
;
Asymmetric Frank-M3− 1
θ1ln 1− 1−exp −θ1ð Þð Þ−1 1− 1− 1−exp −θ2ð Þð Þ−1 � 1−exp −u1θ2ð Þð Þ 1−exp −u2θ2ð Þð Þ
h iθ1θ2
!1−exp −u3θ2ð Þð Þ
( )
Asymmetric Clayton-M4u1−θ2 þ u2−θ2−1 �θ1
θ2 þ u3−θ1−1� �− 1
θ1
2200 N. Bezak et al.
can be calculated with the parametric bootstrap procedure defined by Genest and Remillard(2008) or by a little bit faster procedure based on multiplier central limit theorem (Kojadinovicet al. 2011).
The next step of the copula frequency analysis approach was to calculate some primary andsecondary return periods. Primary joint return period OR is defined with the next expression(Salvadori et al. 2007):
TOR ¼ μ1−Cθ uð Þ ; ð5Þ
where μ is the mean interarrival time of the two consecutive events. Secondary return periodcalled Kendall’s return period is defined as (Salvadori et al. 2011):
T>x ¼ μ
1−Kc tð Þ ; ð6Þ
where KC is Kendall’s distribution associated with copula function Cθ. This notation ofthe return period is meaningful because the critical layer associated with the Kendall’sreturn period partitions Rd into three regions: sub-critical region (Rt
<), super-criticalregion (Rt
>), and critical layer (Salvadori et al. 2011). An advantage of this approach,compared to the OR methodology, is that all realizations lying over the critical layer,which is defined with selected t value (Eq. (6)), have the same Kendall’s return periodvalue (Salvadori et al. 2011). Based on copula simulations Salvadori et al. 2011provided algorithm for calculation of the KC, in which the only condition is that copulafunction is available in the parametric form (Table 2). For the Archimedean copulas KC
can be calculated as:
Kc tð Þ ¼ t −φ tð Þφ0 tþð Þ; ð7Þ
where φ ′(t+) is the right derivative of the generating function φ(t), which correspondsto the chosen Archimedean copula (Vandenberghe et al. 2011).
4 Results
In this section complete procedure (marginal distribution selection, copula model definition,and multivariate return periods) of performing frequency analysis via copulas is presented.Tables are mostly used for presentation of results, whereas individual steps of frequencyanalysis are also shown in graphical form.
4.1 Marginal Distributions Selection
To define samples of maximum discharges, hydrograph volumes and suspended sedimentconcentrations baseflow was extracted from the daily discharge series using R packagelfstat (Koffler and Laaha 2012). Hydrograph volumes were defined based on baseflowseparation results. So, besides maximum discharge values, the corresponding hydrographvolumes and suspended sediment concentration values were used in the analyses(Table 1).
The extracted samples were checked with the Mann-Kendall test for stationarity and withthe Box-Pierce test for autocorrelation of the individual samples. For stations on the Mura,
Multivariate Frequency Analyses Using Copulas 2201
Delaware, Schuylkill, Juniata and Potomac Rivers all samples were stationary (0.01) and noneof them demonstrated statistically significant (0.01) autocorrelation. For the station on theIowa River SSC series indicated clear negative trend (Mann-Kendall test), which was statis-tically significant (0.01). Due to the presence of statistically significant trend this station wasnot used for further analysis. Statistically significant negative trend could be explained with theconstruction of the Coralville Dam in the year 1958 (located upstream of the Iowa Citystation). Furthermore, dam construction also has the influence on the SSC values, becausethe transport of the sediments is interrupted.
With the use of the RMSE, MAEmodel selection criteria and Kolmogorov-Smirnov test weselected marginal distributions, which gave the best fit to the AM series samples. Selecteddistribution functions for individual variables of considered stations are shown in Table 3. Alldistribution functions shown in Table 3 could not be rejected (Kolmogorov-Smirnov test) withthe significance level 0.05. Final results were also checked with graphical QQ plots (Fig. 3). Inall cases logarithmic or extreme value distributions were selected as the most appropriate fordescribing individual variables.
Table 3 Distribution functionswhich were selected as marginaldistributions
River Q V SSC
Mura Gumbel Log-normal GEV
Delaware Log-Pearson III Log-normal Log-Pearson III
Juniata Log-normal Gumbel GEV
Potomac GEV Gumbel GEV
Schuylkill Log-normal Gumbel GEV
(a) (b) (c)
Fig. 3 Example of QQ plots for the selected marginal distributions for the Gornja Radgona station onthe Mura River
2202 N. Bezak et al.
4.2 Copula Model Definition
After marginal distribution functions were defined, dependence between separate variables (Q,V, and SSC) was assessed. We calculated Kendall’s correlation coefficient values for all threepairs of variables (Table 4). In three of five stations correlation between Q-V was higher thancorrelation between Q-SSC; correlation between Q-V and Q-SSC was always higher thancorrelation between V-SSC. Kendall’s correlation coefficients of the considered rivers werebetween 0.19 and 0.61. For stations on the Mura, Delaware, and Potomac rivers all correlationcoefficient values were statistically significant (significance level 0.05). For stations on theJuniata and Schuylkill rivers correlations between V and SSC were not statistically significant(0.05). For the Iowa City station only correlation between Q and V was statistically significant(0.05). Anyway, this station was not used for further analysis due to the presence of trend in theSSC sample. We also used the Chi-plot, K-plot, and scatter plot of pseudo observations toevaluate dependence between parameters (Fig. 4).
Kendall’s correlation coefficients, K-plots and Chi-plots (Fig. 4) were also used to definethe most appropriate copula for each gauging station. For station on the Juniata River wedecided to use asymmetric copulas, whereas for other four stations symmetric versions ofArchimedean copulas were used (Table 2). For Gornja Radgona (Mura), Trenton (Delaware),and Point of Rocks (Potomac) stations the use of symmetric copulas was chosen, due to thefact that all correlation coefficients (Table 4) were statistically significant (significance level0.05). For the Berne (Schuylkill) station correlation for two pairs of variables (Q-V and Q-SSC) was larger than correlation between hydrograph volumes and suspended sedimentconcentrations (V-SSC). Asymmetric copulas are mostly used in hydrology in cases whencorrelation for one pair of variables is larger than for the other two pairs (Grimaldi andSerinaldi 2006), therefore we decided to use symmetric copulas also for the Berne stationon the Schuylkill River, whereas for the Newport station on the Juniata River asymmetriccopulas were used (Table 2).
Next step of our analyses was to estimate parameters of the selected copula functions.Maximum pseudo-likelihood method (Eq. (1)) was used for this purpose. Statistical goodnessof fit test (Cramér-von Mises), which is defined with Eq. (3), was used to test each copulafunction. Parametric bootstrap procedure was selected for calculations of p-values. ForNewport station we tested asymmetric copulas, whereas for the other four stations we carriedout the Cramér-vonMises test for the symmetric versions of the Archimedean copula functionspresented in Table 2. All copula models could not be rejected by the Cramér-von Mises testwith previously mentioned significance level (0.05). Based on the Cramér-von Mises testand graphical goodness of fit tests the most appropriate copula models for each stationwere determined. Figure 5 shows graphical goodness of fit test (type I) for the asym-metric Gumbel-Hougaard copula from the Archimedean family for the station on theJuniata River. Results of the graphical goodness of fit test (type II) for the symmetricGumbel-Hougaard copula for the station on the Delaware River are presented in Fig. 6.
Table 4 Kendall’s correlationcoefficient values for pairs ofconsidered variables
River Sample size Q-V Q-SSC V-SSC
Mura 29 0.56 0.50 0.40
Delaware 32 0.40 0.48 0.25
Juniata 34 0.43 0.34 0.13
Potomac 25 0.61 0.53 0.29
Schuylkill 31 0.43 0.43 0.19
Multivariate Frequency Analyses Using Copulas 2203
(a) (b)
(c) (d)
(e) (f)
Fig. 4 Example of dependence assessment for the Point of Rocks station on the Potomac River
2204 N. Bezak et al.
4.3 Multivariate Return Periods
Next step of the frequency analyses via copulas was to determine different primary andsecondary return periods. For completely defined copula models we calculated primary returnperiods called OR (Eq. (5)) and secondary Kendall’s return periods (Eq. (6)). For comparisonpurpose we evaluated selected return periods (OR and Kendall) for all considered copulamodels which were defined for each station. Joint return period OR was calculated based onthe quantile values (QT, VT, and SSCT) of the considered variables that correspond to theunivariate return period 10 and 100 years (Table 5). Quantile values were determined based onthe selected marginal distribution functions (Table 3). The Gumbel-Hougaard and Frankcopulas always gave higher joint return period (OR) values than Clayton copula andGumbel-Hougaard copula gave higher return period OR values than Frank copula. Figure 7shows primary joint return period OR for the Gumbel-Hougaard copula, which was selected asthe most appropriate for modelling Q, V and SSC in case of all considered stations. Further-more, results in Fig. 7 are presented for various V values, depending on the station. SecondaryKendall’s return period values for the t value of 0.9 (Eq. (6) are presented in Table 6). Returnperiod values were calculated with algorithm proposed by Salvadori et al. (2011), where m=107 was used for simulations. Differences among three considered copulas were relativelylarge, furthermore Clayton copula gave higher return period values (Kendall’s return period)than the Frank and Gumbel-Hougaard copulas and the Frank copula gave higher Kendall’sreturn period values than the Gumbel-Hougaard copula. Asymmetric copulas which were used
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Q−V
Q [m3/s]
V [m
3]
xx
x
x
x x
xx
x
x
x
x
x
x
x
x
x
x
x
x
xx
x
x
x
x
x
xx
x
xx
x
x
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Q−SSC
Q [m3/s]
SS
C [g
/m3]
x
x
x
x
x
x
x
x
x
xxx
x
x
x
x x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
V−SSC
V [m3]
SS
C [g
/m3]
x
x
x
x
x
x
x
x
x
xxx
x
x
x
x x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
xGumbel−Hougaard copula
0.0 0.2 0.4 0.6 0.8 1.00.0
0.2
0.4
0.6
0.8
1.0
0.00.2
0.40.6
Q [m3/s]
V [m3]S
SC
[g/m
3]
1.0
0.81.0
x
x
x
x x
xx
x
x
x
x
x
x
xx
x
x
xx
x
x
x
x
x
x
x
x
x
x
x
x
x
x
x
Fig. 5 Graphical goodness of fit test I for the asymmetric Gumbel-Hougaard copula for the Newport station onthe Juniata River
Multivariate Frequency Analyses Using Copulas 2205
in case of Newport stations on the Juniata river gave higher Kendall’s return period values thansymmetric versions of copulas which were applied to other stations.
5 Discussion
From Fig. 5 we can see that different pairs of variables have different behaviour, which is theconsequence of the asymmetric copulas. But not all dependences are modelled separately(Grimaldi and Serinaldi 2006). In the case of symmetric copulas all pairs of variables are
1000 2000 3000 4000 5000 6000 7000 8000
5.0e
+08
1.5e
+09
Q−V
Q [m3/s]
V [m
3]
x
xx
x
x
x
x
x
xx
x
x
x
xx
x
x
x x
x
xx
x
x
xx
x
xx
x
x
x
1000 2000 3000 4000 5000 6000 7000 8000
020
060
010
0014
00 Q−SSC
Q [m3/s]
SS
C [g
/m3]
x
x
x
xx
x
x
xxx x
x
xx
xx
x xx
x
x
xx
x
x x
xxx
x
x
x
5.0e+08 1.0e+09 1.5e+09 2.0e+09
020
060
010
0014
00 V−SSC
V [m3]
SS
C [g
/m3]
x
x
x
xx
x
x
xxx x
x
xxxx
x xx
x
x
xx
x
x x
xxx
x
x
x
Gumbel−Hougaard copula
0 2000400060008000
0 5
0010
0015
00
0.0e+005.0e+08
1.0e+091.5e+09
2.0e+092.5e+09
Q [m3/s]
V [m
3]
SS
C [g
/m3]
x
xx
x
x
xx
x
x
x
xx
x
xx x
xxx
xxx
xxxxxx
xxx
x
(a) (b)
(c) (d)
Fig. 6 Graphical goodness of fit test II for the symmetric Gumbel-Hougaard copula for the Trenton station onthe Delaware River
Table 5 Quantile values for univariate return period 10 years and joint return periods OR [years] for theconsidered stations
River Copula type QT [m3/s] VT [108 m3] SSCT [g/m3] Gumbel-Hougaard Frank Clayton
Mura Symmetric 935 3.7 1,292 5.7 4.6 4.0
Delaware Symmetric 3,705 14.9 792 5.4 4.4 4.0
Juniata Asymmetric 1,793 9.0 586 5.2 4.3 4.0
Potomac Symmetric 5,472 19.8 1,226 5.8 4.7 4.3
Schuylkill Symmetric 404 1.3 530 5.2 4.4 4.1
2206 N. Bezak et al.
Fig. 7 Joint return period OR[in years] for various V valuesfor the considered stations
Multivariate Frequency Analyses Using Copulas 2207
modelled with the same copula parameter, which can in some cases lead to a loss ofinformation. Example of the symmetric copula is shown in Fig. 6 where symmetricGumbel-Hougaard copula was used. Data was transformed to the real space with the use ofmarginal distribution functions. This type of graphical goodness of fit test was also used byGenest and Favre (2007).
Different relationships between some primary, secondary, and also univariate return periodscan be observed (Salvadori et al. 2007; Vandenberghe, et al. 2011). From Table 5 we can seethat univariate return period is always higher than joint return period OR, which is inagreement with findings made by Salvadori et al. (2007).
For each of the considered station we used three different copula functions from theArchimedean family. All copula models could not be rejected with the chosen significancelevel (0.05). It should be noted that sample sizes were relatively small, which can haveinfluence on the calculation of the p-values. To distinguish between copulas we used graphicalgoodness of fit tests. We selected Gumbel-Hougaard copula as the most appropriate formodelling peak discharges, hydrograph volumes and suspended sediment concentrations in
(a) (b) (c)
Fig. 8 Generated copula values transformed into real space with the use of marginal distribution functions(histogram) and the kernel density (line) for the Newport station on the Juniata River
Table 6 Kendall’s return period[years] for t (Eq. (6)) value of 0.9for the considered stations
River Copula type Gumbel-Hougaard Frank Clayton
Mura Symmetric 28.8 174.3 850.9
Delaware Symmetric 33.6 261.8 902.4
Juniata Asymmetric 41.9 356.8 974.8
Potomac Symmetric 27.1 149.2 433.7
Schuylkill Symmetric 37.9 290.4 887.9
2208 N. Bezak et al.
all cases. This copula was also selected as the most appropriate in some others hydrologicalapplications (Zhang and Singh 2006; Poulin et al. 2007). This copula was selected also due tothe fact that Frank and Clayton copula can underestimate the risk of an event because theycannot model the tail dependence efficiently (Poulin et al. 2007). This can also be seen fromTable 6, where Kendall’s return periods are presented. Clayton and Frank copula gave higherreturn period values than Gumbel-Hougaard copula for all considered stations and differencesamong calculated return period were relatively large. Similar conclusions were also made byPoulin et al. (2007) for the calculation of the primary joint return period called AND.
After we selected the most appropriate marginal distribution functions and copula modellarge samples (10,000) of all three variables were generated with the selected copula modeland then these triples of variables were transformed from copula space ([0,1]) to the real spacewith the use of marginal distributions. Figure 8 shows one result of these simulations for theasymmetric Gumbel-Hougaard copula for the station on the Juniata River. Kernel densityestimation was used to fit the data and maximum of the kernel density function was selected asthe value of the variable that is most likely to happen as the annual maximum of the individualvariable. This procedure was done for all three considered variables (Q, V, and SSC). Table 7shows results of these simulations for different copula functions by considering the samemarginal functions for each station. One can notice that copula with the highest estimatedvalues is not always the same and results vary from station to station. Most likely AM values(Table 7) are correlated with samples (Q, V and SSC) mean and standard deviation values(Table 1). This phenomenon was not a surprise, because in more watery rivers higher values ofQ and V are expected.
6 Conclusions
This paper presents trivariate frequency analysis of Q, V and SSC with the use of copulafunctions, which can be used for multivariate modelling. Several parametric distributions wereused as marginal distribution functions. These three variables (Q, V and SSC) are usually notconsidered simultaneously and especially not with the use of copulas. Six hydrological stationsfrom Slovenia and USA with watershed areas from 920 km2 to 24,996 km2 were consideredand in total almost 200 hundred years of AMwere analysed. Mean AM SSC values were in therange between 265 and 716 g/m3, however mean AM discharge values had larger rangeextended from 233 to 3,210 m3/s. Due to the statistically significant negative trend of the SSCseries for the Iowa River this station was not used for further copula analyses. Symmetric andasymmetric Archimedean copulas were used based on dependences among variables. After
Table 7 Comparison between most likely AM values for different copula functions
Copula Gumbel-Hougaard Clayton Frank
River Q[m3/s]
V[108 m3]
SSC[g/m3]
Q[m3/s]
V[108 m3]
SSC[g/m3]
Q[m3/s]
V[108 m3]
SSC[g/m3]
Mura 506 1.4 325 508 1.2 376 494 1.4 341
Delaware 1,966 6.2 206 2,166 6.1 183 2,149 5.9 169
Juniata 914 3.6 232 826 3.6 212 898 3.7 199
Potomac 2,223 8.3 479 2,247 7.7 507 2,164 8.0 502
Schuylkill 155 0.56 129 157 0.55 129 160 0.62 127
Multivariate Frequency Analyses Using Copulas 2209
complete procedure of frequency analyses was carried out, some main conclusions canbe made:
a) Copulas can be used for modelling peak discharges, hydrograph volumes and suspendedsediment concentrations values.
b) Gumbel-Hougaard copula was selected as the most appropriate copula for modelling allthree pairs of variables in case of all considered stations.
c) Asymmetric copulas have an advantage over symmetric version due to the more param-eters, but symmetric copula functions can still be used in cases where dependences aresimilar.
d) Different primary and secondary return periods can eventually be computed if there isneed in practical applications (e.g. design).
Acknowledgement We wish to thank the Environmental Agency of the Republic of Slovenia (ARSO) for dataprovision. We would also like to express our thanks to the United States Geological Survey (USGS) for makingthe hydrological data available to the public on their web site. The results of the study are part of the Faculty ofCivil and Geodetic engineering (UL FGG) work on the Slovenian national research project J2-4096 and on theinternational research project SedAlp, which is financed by the European Union through the Alpine Spaceprogram. The critical and useful comments of three anonymous reviewers and associate editor helped to improvethis manuscript, for which the authors are very grateful.
References
Balistrocchi M, Bacchi B (2011) Modelling the statistical dependence of rainfall event variables through copulafunctions. Hydrol Earth Syst Sci 15:1959–1977. doi:10.5194/hess-15-1959-2011
Bardossy A (2011) Interpolation of groundwater quality parameters with some values below the detection limit.Hydrol Earth Syst Sci 15:2763–2775. doi:10.5194/hess-15-2763-2011
Bardossy A, Li J (2008) Geostatistical interpolation using copulas. Water Resour Res 44. doi:10.1029/2007wr006115
Benkhaled A, Higgins H, Chebana F, Necir A (2013) Frequency analysis of annual maximum suspendedsediment concentrations in Abiod wadi, Biskra (Algeria). Hydrol Process. doi:10.1002/hyp.9880
Bezak N, Brilly M, Sraj M (2013a) Comparison between the peaks over threshold method and the annualmaximum method for flood frequency analyses. Hydrol Sci J. doi:10.1080/02626667.2013.831174
Bezak N, Sraj M, Mikos M (2013b) Overview of suspended sediments measurements in Slovenia and anexample of data analysis. Gradbeni Vestnik 62:274–280 (In Slovene)
Bonacci O, Oskorus D (2010) The changes in the lower Drava River water level, discharge and suspendedsediment regime. Environ Earth Sci 59:1661–1670. doi:10.1007/s12665-009-0148-8
Box GEP, Pierce DA (1970) Distribution of residual autocorrelations in autoregressive-integrated movingaverage time series models. J Am Stat Assoc 65:1509–1526. doi:10.1080/01621459.1970.10481180
Burn DH (1997) Catchment similarity for regional flood frequency analysis using seasonality measures. J Hydrol202:212–230. doi:10.1016/s0022-1694(97)00068-1
Chen L, Singh VP, Guo SL, Hao ZC, Li TY (2012) Flood coincidence risk analysis using multivariate copulafunctions. J Hydrol Eng 17:742–755. doi:10.1061/(asce)he.1943-5584.0000504
De Michele C, Salvadori G, Canossi M, Petaccia A, Rosso R (2005) Bivariate statistical approach to checkadequacy of dam spillway. J Hydrol Eng 10:50–57. doi:10.1061/(asce)1084-0699(2005)10:1(50)
Favre AC, El Adlouni S, Perreault L, Thiemonge N, Bobee B (2004) Multivariate hydrological frequencyanalysis using copulas. Water Resour Res 40. doi:10.1029/2003wr002456
Fisher NI, Switzer P (1985) Chi-plots for assessing dependence. Biometrika 72:253–265. doi:10.1093/biomet/72.2.253
Fisher NI, Switzer P (2001) Graphical assessment of dependence: is a picture worth 100 tests? Am Stat 55:233–239. doi:10.1198/000313001317098248
Ganguli P, Reddy MJ (2012) Risk assessment of droughts in Gujarat using bivariate copulas. Water ResourManag 26:3301–3327. doi:10.1007/s11269-012-0073-6
2210 N. Bezak et al.
Genest C, Boies JC (2003) Detecting dependence with Kendall plots. Am Stat 57:275–284. doi:10.1198/0003130032431
Genest C, Favre AC (2007) Everything you always wanted to know about copula modeling but were afraid toask. J Hydrol Eng 12:347–368. doi:10.1061/(asce)1084-0699(2007)12:4(347)
Genest C, Remillard B (2008) Validity of the parametric bootstrap for goodness-of-fit testing in semiparametricmodels. Ann Instit Henri Poincare Probabilites Stat 44:1096–1127. doi:10.1214/07-aihp148
Genest C, Ghoudi K, Rivest LP (1995) A semiparametric estimation procedure of dependence parameters inmultivariate families of distributions. Biometrika 82:543–552. doi:10.1093/biomet/82.3.543
Genest C, Remillard B, Beaudoin D (2009) Goodness-of-fit tests for copulas: a review and a power study. InsurMath Econ 44:199–213. doi:10.1016/j.insmatheco.2007.10.005
Grimaldi S, Serinaldi F (2006) Asymmetric copula in multivariate flood frequency analysis. Adv Water Resour29:1155–1167. doi:10.1016/j.advwatres.2005.09.005
Holtschlag DJ (2001) Optimal estimation of suspended-sediment concentrations in streams. Hydrol Process 15:1133–1155. doi:10.1002/hyp.207
Hosking JRM, Wallis JR (1997) Regional frequency analysis: an approach based on L-moments. CambridgeUniversity Press, Cambridge
Joe H (1997) Multivariate models and dependence concepts. Chapman & Hall, LondonKendall MG (1975) Multivariate analysis. Griffin, LondonKoffler D, Laaha G (2012) LFSTAT- an R-package for low-flow analysis. EGU General Assembly, Vienna 22–
27.4Kojadinovic I, Yan J, Holmes M (2011) Fast large-sample goodness-of-fit tests for copulas. Stat Sin 21:841–871.
doi:10.1007/s11222-009-9142-yMa MW, Song SB, Ren LL, Jiang SH, Song JL (2013) Multivariate drought characteristics using trivariate
Gaussian and Student t copulas. Hydrol Process 27:1175–1190. doi:10.1002/hyp.8432Nelsen RB (1999) An introduction to copulas. Springer, New YorkParajka J, Viglione A, Rogger M, Salinas JL, Sivapalan M, Bloschl G (2013) Comparative assessment of
predictions in ungauged basins—part 1: runoff-hydrograph studies. Hydrol Earth Syst Sci 17:1783–1795.doi:10.5194/hess-17-1783-2013
Poulin A, Huard D, Favre AC, Pugin S (2007) Importance of tail dependence in bivariate frequency analysis. JHydrol Eng 12:394–403. doi:10.1061/(asce)1084-0699(2007)12:4(394)
Reddy MJ, Ganguli P (2012) Bivariate flood frequency analysis of Upper Godavari River flows usingArchimedean copulas. Water Resour Manag 26:3995–4018. doi:10.1007/s11269-012-0124-z
Rodríguez-Blanco ML, Taboada-Castro MM, Palleiro L, Taboada-Castro MT (2010) Temporal changes insuspended sediment transport in an Atlantic catchment, NW Spain. Geomorphology 123:181–188. doi:10.1016/j.geomorph.2010.07.015
Salvadori G, De Michele C (2004) Frequency analysis via copulas: Theoretical aspects and applications tohydrological events. Water Resour Res 40. doi:10.1029/2004wr003133
Salvadori G, De Michele C, Kottegoda NT, Rosso R (2007) Extremes in nature an approach using copulas.Springer, Dordrecht
Salvadori G, De Michele C, Durante F (2011) On the return period and design in a multivariate framework.Hydrol Earth Syst Sci 15:3293–3305. doi:10.5194/hess-15-3293-2011
Serinaldi F, Grimaldi S (2007) Fully nested 3-copula: procedure and application on hydrological data. J HydrolEng 12:420–430. doi:10.1061/(asce)1084-0699(2007)12:4(420)
Sraj M, Bezak N, Brilly M (2014) Bivariate flood frequency analysis using the copula function: a case study ofthe Litija station on the Sava River. Hydrol Process. doi:10.1002/hyp.10145
Tena A, Batalla RJ, Vericat D, Lopez-Tarazon JA (2011) Suspended sediment dynamics in a large regulated riverover a 10-year period (the lower Ebro, NE Iberian Peninsula). Geomorphology 125:73–84. doi:10.1016/j.geomorph.2010.07.029
Tramblay Y, St-Hilaire A, Ouarda T (2008) Frequency analysis of maximum annual suspended sedimentconcentrations in North America. Hydrol Sci J 53:236–252. doi:10.1623/hysj.53.1.236
Tramblay Y, Ouarda T, St-Hilaire A, Poulin J (2010) Regional estimation of extreme suspendedsediment concentrations using watershed characteristics. J Hydrol 380:305–317. doi:10.1016/j.jhydrol.2009.11.006
Vandenberghe S, Verhoest NEC, Onof C, De Baets B (2011) A comparative copula-based bivariate frequencyanalysis of observed and simulated storm events: a case study on Bartlett-Lewis modeled rainfall. WaterResour Res 47. doi:10.1029/2009wr008388
Wang C, Chang NB, Yeh GT (2009) Copula-based flood frequency (COFF) analysis at the confluences of riversystems. Hydrol Process 23:1471–1486. doi:10.1002/hyp.7273
Wong G, Lambert MF, Leonard M, Metcalfe AV (2010) Drought analysis using trivariate copulas conditional onclimatic states. J Hydrol Eng 15:129–141. doi:10.1061/(asce)he.1943-5584.0000169
Multivariate Frequency Analyses Using Copulas 2211
Yusof F, Hui-Mean F, Suhaila J, Yusof Z (2013) Characterisation of drought properties with bivariate copulaanalysis. Water Resour Manag 27:4183–4207. doi:10.1007/s11269-013-0402-4
Zhang L, Singh VP (2006) Bivariate flood frequency analysis using the copula method. J Hydrol Eng 11:150–164. doi:10.1061/(asce)1084-0699(2006)11:2(150)
Zhang L, Singh VP (2007a) Bivariate rainfall frequency distributions using Archimedean copulas. J Hydrol 332:93–109. doi:10.1016/j.jhydrol.2006.06.033
Zhang L, Singh VP (2007b) Trivariate flood frequency analysis using the Gumbel-Hougaard copula. J HydrolEng 12:431–439. doi:10.1061/(asce)1084-0699(2007)12:4(431)
2212 N. Bezak et al.