18

Click here to load reader

Trivariate Frequency Analyses of Peak Discharge, Hydrograph Volume and Suspended Sediment Concentration Data Using Copulas

  • Upload
    mojca

  • View
    214

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Trivariate Frequency Analyses of Peak Discharge, Hydrograph Volume and Suspended Sediment Concentration Data Using Copulas

Trivariate Frequency Analyses of Peak Discharge,Hydrograph Volume and Suspended SedimentConcentration Data Using Copulas

Nejc Bezak & Matjaž Mikoš & Mojca Šraj

Received: 9 January 2014 /Accepted: 26 March 2014 /Published online: 26 April 2014# Springer Science+Business Media Dordrecht 2014

Abstract Copula functions are often used for multivariate frequency analyses, but dischargeand suspended sediment concentrations have not yet been modelled together with the use of 3-dimensional copula functions. One hydrological station from Slovenia and five stations fromUSA with watershed areas from 920 km2 to 24,996 km2 were used for trivariate frequencyanalyses of peak discharges, hydrograph volumes and suspended sediment concentrations.Different parametric marginal distributions were applied and parameters were estimated withthe method of L-moments. Maximum pseudo-likelihood method was used for copula param-eters estimation. With the use of statistical and graphical tests we selected the most appropriatecopula model. Symmetric and asymmetric versions of Archimedean copulas were appliedaccording to the dependence characteristics of the individual stations. We selected Gumbel-Hougaard copula as the most appropriate model for all discussed stations. Primary joint returnperiods OR and secondary Kendall’s return periods were calculated and comparison betweenselected copula functions was made. We can conclude that copula functions are usefulmathematical tool, which can also be used for modelling variables that are presented in thispaper.

Keywords Multivariate analysis . Symmetric copulas . Asymmetric copulas . Flood frequencyanalysis . Suspended sediments

1 Introduction

Hydropower reservoir filling and turbine abrasion is a major challenge for water managersdealing with water resources in many mountainous countries. Therefore reliable procedures areneeded to efficiently estimate suspended sediment loads. Furthermore most of the suspended

Water Resour Manage (2014) 28:2195–2212DOI 10.1007/s11269-014-0606-2

N. Bezak (*) :M. Mikoš :M. ŠrajFaculty of Civil and Geodetic Engineering, University of Ljubljana, Jamova 2, SI-1000 Ljubljana, Sloveniae-mail: [email protected]

M. Mikoše-mail: [email protected]

M. Šraje-mail: [email protected]

Page 2: Trivariate Frequency Analyses of Peak Discharge, Hydrograph Volume and Suspended Sediment Concentration Data Using Copulas

material is transported during few extreme events, which are usually in coincidence with highpeak discharge values and consequently also with large hydrograph volumes. Frequencyanalyses are mostly performed in hydrology and water resources management to obtainrelationship between design variables and recurrence interval. Therefore copulas seem to bean interesting option for simultaneous study of peak discharges (Q), hydrograph volumes (V)and suspended sediment concentrations (SSC).

Copulas have become frequently used mathematical tool for hydrological analyses andapplications in the last decade. Copulas have been used for modelling droughts (Wong et al.2010; Ganguli and Reddy 2012; Ma et al. 2013; Yusof et al. 2013), to check adequacy of damspillway (De Michele et al. 2005), for flood coincidence risk analyses (Chen et al. 2012), forrainfall analyses (Zhang and Singh 2007a; Balistrocchi and Bacchi 2011), for geostatisticalinterpolations (Bardossy and Li 2008; Bardossy 2011) and also for flood frequency analyses(Favre et al. 2004; Salvadori and De Michele 2004; Grimaldi and Serinaldi 2006; Zhang andSingh 2006, 2007b; Serinaldi and Grimaldi 2007; Wang et al. 2009; Reddy and Ganguli 2012;Sraj et al. 2014). Favre et al. (2004) and Salvadori and De Michele (2004) provided basictheory for frequency analysis via copulas. Salvadori and De Michele (2004) defined differentprimary and secondary returns periods which are characteristics of multivariate frequencyanalysis. Favre et al. (2004) also presented some steps of analysis, like copula parametersestimation, marginal distributions selection, simulations with copulas, and graphical goodnessof fit tests, which are part of copula flood frequency analysis procedure. Most of the authorsused copulas from Archimedean family (Ali-Mikhail-Haq, Clayton, Frank, Gumbel-Hougaard,Joe) for bivariate or trivariate flood frequency analyses. Zhang and Singh (2006) used oneparameter Archimedean copulas for bivariate flood frequency analysis, while Zhang and Singh(2007b) used Gumbel-Hougaard copula for trivariate flood frequency analysis. In both casesdischarge series from North America were used. Wang et al. (2009) also used copula functionsfrom Archimedean family for flood frequency analysis at the confluences of river systems.This procedure can be used in areas where insufficient discharge data is available for analysis.Reddy and Ganguli (2012) applied most frequently used Archimedean copulas (Ali-Mikhail-Haq, Clayton, Frank and Gumbel-Hougaard) for bivariate flood frequency analysis in India.Grimaldi and Serinaldi (2006) and Serinaldi and Grimaldi (2007) introduced asymmetriccopulas to hydrological applications and investigated impact of modelling asymmetric samplewith symmetric copulas. The authors found that parameter of symmetric copula is correlatedwith the smaller value of the parameters of asymmetric copulas. Therefore, symmetric copulacan underestimate dependence between the most correlated variables (Grimaldi and Serinaldi2006). So, in the symmetric case all dependences are described with one parameter, while inthe asymmetric case we have more than one parameter to describe dependencies.

Suspended sediments are important hydrological and environmental variable, which iscorrelated with soil erosion, ecological conditions of the watershed, conditions of streams,hydrotechnical works (Bonacci and Oskorus 2010) and also with the frequency of the extremerainfall events. It is well known fact that majority of the suspended sediment load istransported during few extreme events (Rodríguez-Blanco et al. 2010; Tena et al. 2011). Fromthis point of view it seems to be reasonable to consider the corresponding suspended sedimentconcentration (SSC) events (defined based on peak discharge) in the flood frequency analysis.Frequency analyses of hydrological variable SSC is not usually done, but some examples canbe found in the literature (Tramblay et al. 2008, 2010; Benkhaled et al. 2013). Tramblay et al.(2008) made frequency analysis of the annual maximum (AM) SSC series for more than 200stations in the North America. Different frequently used distributions were selected foranalysis and stationarity of the samples was also checked. Furthermore, Tramblay et al.(2010) carried out regional frequency analysis of the SSC series in Californian Rivers.

2196 N. Bezak et al.

Page 3: Trivariate Frequency Analyses of Peak Discharge, Hydrograph Volume and Suspended Sediment Concentration Data Using Copulas

Benkhaled et al. (2013) performed frequency analysis on AM of SSC series at M’chounechgauge station in Abiod wadi near Biskra in Algeria.

A copula function, which is a multivariate distribution function, is used to analyse discharge(Q), hydrograph volume (V) and SSC data in this paper. These three hydrological variables areusually not modelled simultaneously, and especially not with the use of copulas. Suspendedsediment loads are usually correlated with peak discharge values and consequently also withhydrograph volumes, so these hydrological phenomena are multidimensional and can beanalysed with the use of copulas, which can give us additional information about the observedhydrological process.

The aim of this paper was to carry out frequency analyses of Q, V and SSC series from sixstations with the use of 3-dimensional symmetric and asymmetric copula functions, where allrequired steps of copula analysis are presented and explained on practical examples.

2 Data

For the purpose of analyses we used data from Slovenian hydrological station Gornja Radgonaon the Mura River and five stations on the USA Rivers (Potomac-01638500, Delaware-01463500, Schuylkill-01470500, Juniata-01567000, and Iowa-05454500) (Table 1). Thelocation of the selected stations is shown in Fig. 1. For all the considered stations dailydischarge and suspended sediment concentrations series were selected as the basis for the AMseries sample definition. Main characteristics of the considered stations and samples arepresented in Table 1, while the AM data samples are shown in Fig. 2.

The alpine nival-pluvial water regime with most of the maximum discharges occurred in thesummer is characteristic for the Gornja Radgona hydrological station in Slovenia. Further-more, the strength of seasonality r for the observed period was 0.56. If seasonality coefficient ris near 1, the seasonality is strong and if it is closer to 0, the timing of events is more complex,and the seasonality is not significant (Burn 1997). Calculated r value for the Gornja Radgonastation showed that the seasonality was present, but not very strong. For the USA stations onthe Delaware, Iowa, Juniata and Potomac rivers majority of the annual maximum dischargeshappened in spring; however some extreme events also happened in other parts of the year. Forthe station Berne on Schuylkill River most of the annual maximum discharges occurred inwinter. Seasonality coefficient r for USA rivers varied between 0.47 for the station Berne onSchuylkill River and 0.68 for the station Newport on the Juniata River. Watershed areas of theconsidered rivers were between 920 and 24,996 km2, similarly also mean AM discharge valueswere in the range between 233 and 3,210 m3/s, however mean AM SSC values had smaller

Table 1 Main characteristics of the considered annual maximum series samples

River Station Watershedarea [km2]

Period Qmean; Qsd

[m3/s]Vmean; Vsd

[108 m3]SSCmean;SSCsd [g/m

3]

Mura Gornja Radgona, Slovenia 10,197 1977–2005 619.1; 231.4 2.1; 1.2 621.0; 515.0

Delaware Trenton, New Jersey 17,560 1950–1981 2,419.7; 1,238.1 8.9; 4.5 413.0; 301.3

Iowa Iowa City, Iowa 8,472 1944–1986 273.7; 168.8 1.7; 1.4 684.1; 814.2

Juniata Newport, Pennsylvania 8,687 1951–1984 1,169.0; 730.1 4.9; 3.5 332.1; 217.8

Potomac Point of Rocks, Maryland 24,996 1961–1985 3,210.1; 1,788.1 11.1; 6.6 716.3; 478.8

Schuylkill Berne, Pennsylvania 920 1950–1980 233.4; 137.8 0.7; 0.4 265.0; 246.6

Multivariate Frequency Analyses Using Copulas 2197

Page 4: Trivariate Frequency Analyses of Peak Discharge, Hydrograph Volume and Suspended Sediment Concentration Data Using Copulas

dispersion extended from 265 to 716 g/m3 (Table 1). SSC measurements in Slovenia began in1955 and almost 50 gauging stations were included in the measuring network, however notmany of them have long and continuous time series available (Bezak et al. 2013b). The USGSsuspended sediment concentrations database contains measurements from more than 1,500stations around the USA with the average period of record of 3–5 years (Holtschlag 2001).

USA SLOVENIA

Fig. 1 Location of the analysed stations, namely five stations in the USA and one station in Slovenia

Fig. 2 Presentation of the original AM data samples

2198 N. Bezak et al.

Page 5: Trivariate Frequency Analyses of Peak Discharge, Hydrograph Volume and Suspended Sediment Concentration Data Using Copulas

Most of the analysed USA stations are located in the north eastern part of the country; howeverthe Iowa River is part of the Mississippi River basin (Fig. 1). SSC values do not depend just onthe hydraulic characteristics of the streams, but also on some other anthropogenic (e.g. damconstruction, other hydrotechnical works, land use, location of mines in watersheds) andnatural (rainfall intensity, location of sediment sources) influences. Consequently dischargevalues are not always good indicator of the SSC values, therefore Q-SSC curves (ratingcurves) should be used with caution when scatter between Q and SSC is large (Rodríguez-Blanco et al. 2010). As, for example for the Iowa station, Kendall’s correlation coefficientbetween AM Q and AM SSC values was 0.1. Therefore, also relationship between Q and SSCvalues presented in Table 1 is not completely linear, namely high Q values do not necessarycorrespond to high SSC values.

3 Methods

First, annual maximum discharges were defined and corresponding hydrograph volumes andsuspended sediment concentrations were extracted from discharge and SSC series. Therefore,only discharge peaks are definitely annual maximums, while hydrograph volumes and SSCvalues were defined based on corresponding discharge values. To define the correspondinghydrograph volumes, first baseflow was separated from daily discharge series. R package lfstat(Koffler and Laaha 2012) was used to define baseflow values. Analyses and observations ofhydrograph are useful for understanding numerous interacting processes within the catchment(Parajka et al. 2013). Mann-Kendall (MK) test (Kendall 1975) was performed to detect thepresence of trends in the selected samples and Box-Pierce test (Box and Pierce 1970) wasselected to test autocorrelation in the samples. However, because AM series were used in thestudy, autocorrelation in the samples was not expected. After the samples testing, univariatefrequency analyses were carried out. Generalized extreme value (GEV), exponential (EXP),gamma (GAM), generalized Pareto (GPA), Gumbel (GUM), Pearson 3 (P3), log-Pearson 3(LP3) and log-normal (LN) distributions were applied as marginal distributions of Q, V andSSC samples. Parameters of parametric distribution functions were estimated with the methodof L-moments (Hosking and Wallis 1997). Non-parametric distributions (e.g. kernel density)could be alternative to the chosen parametric distributions. The best fitting distributionfunction for individual considered variable was selected based on different graphical tests,statistical tests and model selection criteria (Bezak et al. 2013a). The Kolmogorov-Smirnovtest (K-S), root mean square error (RMSE), mean absolute error (MAE) model selectioncriteria and QQ plots were used for marginal distributions selection.

The first step of the copula approach was to assess the dependence between modelledvariables. Different graphical and statistical tools were performed. Therefore, the Chi-plot(Fisher and Switzer 1985, 2001) and the K-plot (Genest and Boies 2003) were used. Likewise,Pearson, Kendall and Spearman correlation coefficients were calculated, where Pearsoncorrelation coefficient measures only linear dependence, whereas the other two coefficientsare based on ranks and are more appropriate for expressing dependence between variables.Copulas from Archimedean family were used in this study, both symmetric and asymmetricversions of Gumbel-Hougaard, Frank and Clayton copulas were applied to AM series samples.Table 2 shows selected trivariate copula functions, where parameter of symmetric copulas is θ,whereas θ1 and θ2 are parameters of asymmetric copula functions (θ2>θ1). Trivariate asym-metric copula functions can be alternative to the symmetric model in case when the depen-dence between two variables is stronger as dependences between the other two pairs ofvariables. More information about these copula functions can be found in Joe (1997), Nelsen

Multivariate Frequency Analyses Using Copulas 2199

Page 6: Trivariate Frequency Analyses of Peak Discharge, Hydrograph Volume and Suspended Sediment Concentration Data Using Copulas

(1999) and Salvadori et al. (2007). Parameters of selected copula functions were estimatedwith the maximum pseudo-likelihood method (Genest et al. 1995), where log-likelihood hasthe form:

logL θð Þ ¼X

i¼1

n

log cθ Uið Þf g ¼X

i¼1

n

log cθRi;1

nþ 1;…;

Ri;d

nþ 1

� �� �; ð1Þ

where cθ is copula density which can be calculated as partial derivative of copula functionsdefined in Table 2:

cθ u1; u2; u3ð Þ ¼ ∂3Cθ u1; u2; u3ð Þ∂u1∂u2∂u3

: ð2Þ

Maximum pseudo-likelihood method is semiparametric approach and pseudo-observation values always lie between 0 and 1 ([0,1]d). The parameters of the theoreticalcopulas (Table 2) are estimated with the numerical maximization of Eq. (1). More aboutsome other parameter estimation techniques, as the method of moments, maximumlikelihood or inference functions for margins is introduced in Joe (1997), Nelsen(1999) and Salvadori et al. (2007).

Graphical and statistical tests were used in the study to check the adequacy of selectedcopula functions for modelling 3-dimensional sample (Q, Vand SSC). We applied Cramér-vonMises test (Genest et al. 2009):

Sn ¼X

i¼1

n

Cn Uið Þ−Cθ Uið Þf g2; ð3Þ

where vector Ui are the pseudo-observations calculated from analysed sample, Cθ is testedtheoretical copula (Table 2) andCn is empirical copula, which is defined as (Genest et al. 2009):

Cn uð Þ ¼ 1

n

X

i¼1

n

1 Ui≤uð Þ: ð4Þ

According to Genest et al. (2009) Cramér-von Mises test, defined with Eq. (3), is the mostpowerful goodness of fit test based on empirical process. P-values for Cramér-von Mises test

Table 2 Applied trivariate symmetric and asymmetric Archimedean copulas

Copula Cθ(u1,u2,u3) or Cθ1 u3;Cθ2 u1; u2ð Þð ÞSymmetric Gumbel-

Hougaard exp − −lnu1ð Þθ þ −lnu2ð Þθ þ −lnu3ð Þθ� �1

θ

� �

Symmetric Frank − 1θ ln 1þ exp −θu1ð Þ−1ð Þ exp −θu2ð Þ−1ð Þ exp −θu3ð Þ−1ð Þ

exp −θð Þ−1ð Þ2n o

Symmetric Clayton u−θ1 þ u−θ2 þ u−θ3 −2� −1θ

Asymmetric Gumbel-Hougaard-M6 exp − −lnu1ð Þθ2 þ −lnu2ð Þθ2

h iθ1θ2 þ −lnu3ð Þθ1

! 1θ1

8<

:

9=

;

Asymmetric Frank-M3− 1

θ1ln 1− 1−exp −θ1ð Þð Þ−1 1− 1− 1−exp −θ2ð Þð Þ−1 � 1−exp −u1θ2ð Þð Þ 1−exp −u2θ2ð Þð Þ

h iθ1θ2

!1−exp −u3θ2ð Þð Þ

( )

Asymmetric Clayton-M4u1−θ2 þ u2−θ2−1 �θ1

θ2 þ u3−θ1−1� �− 1

θ1

2200 N. Bezak et al.

Page 7: Trivariate Frequency Analyses of Peak Discharge, Hydrograph Volume and Suspended Sediment Concentration Data Using Copulas

can be calculated with the parametric bootstrap procedure defined by Genest and Remillard(2008) or by a little bit faster procedure based on multiplier central limit theorem (Kojadinovicet al. 2011).

The next step of the copula frequency analysis approach was to calculate some primary andsecondary return periods. Primary joint return period OR is defined with the next expression(Salvadori et al. 2007):

TOR ¼ μ1−Cθ uð Þ ; ð5Þ

where μ is the mean interarrival time of the two consecutive events. Secondary return periodcalled Kendall’s return period is defined as (Salvadori et al. 2011):

T>x ¼ μ

1−Kc tð Þ ; ð6Þ

where KC is Kendall’s distribution associated with copula function Cθ. This notation ofthe return period is meaningful because the critical layer associated with the Kendall’sreturn period partitions Rd into three regions: sub-critical region (Rt

<), super-criticalregion (Rt

>), and critical layer (Salvadori et al. 2011). An advantage of this approach,compared to the OR methodology, is that all realizations lying over the critical layer,which is defined with selected t value (Eq. (6)), have the same Kendall’s return periodvalue (Salvadori et al. 2011). Based on copula simulations Salvadori et al. 2011provided algorithm for calculation of the KC, in which the only condition is that copulafunction is available in the parametric form (Table 2). For the Archimedean copulas KC

can be calculated as:

Kc tð Þ ¼ t −φ tð Þφ0 tþð Þ; ð7Þ

where φ ′(t+) is the right derivative of the generating function φ(t), which correspondsto the chosen Archimedean copula (Vandenberghe et al. 2011).

4 Results

In this section complete procedure (marginal distribution selection, copula model definition,and multivariate return periods) of performing frequency analysis via copulas is presented.Tables are mostly used for presentation of results, whereas individual steps of frequencyanalysis are also shown in graphical form.

4.1 Marginal Distributions Selection

To define samples of maximum discharges, hydrograph volumes and suspended sedimentconcentrations baseflow was extracted from the daily discharge series using R packagelfstat (Koffler and Laaha 2012). Hydrograph volumes were defined based on baseflowseparation results. So, besides maximum discharge values, the corresponding hydrographvolumes and suspended sediment concentration values were used in the analyses(Table 1).

The extracted samples were checked with the Mann-Kendall test for stationarity and withthe Box-Pierce test for autocorrelation of the individual samples. For stations on the Mura,

Multivariate Frequency Analyses Using Copulas 2201

Page 8: Trivariate Frequency Analyses of Peak Discharge, Hydrograph Volume and Suspended Sediment Concentration Data Using Copulas

Delaware, Schuylkill, Juniata and Potomac Rivers all samples were stationary (0.01) and noneof them demonstrated statistically significant (0.01) autocorrelation. For the station on theIowa River SSC series indicated clear negative trend (Mann-Kendall test), which was statis-tically significant (0.01). Due to the presence of statistically significant trend this station wasnot used for further analysis. Statistically significant negative trend could be explained with theconstruction of the Coralville Dam in the year 1958 (located upstream of the Iowa Citystation). Furthermore, dam construction also has the influence on the SSC values, becausethe transport of the sediments is interrupted.

With the use of the RMSE, MAEmodel selection criteria and Kolmogorov-Smirnov test weselected marginal distributions, which gave the best fit to the AM series samples. Selecteddistribution functions for individual variables of considered stations are shown in Table 3. Alldistribution functions shown in Table 3 could not be rejected (Kolmogorov-Smirnov test) withthe significance level 0.05. Final results were also checked with graphical QQ plots (Fig. 3). Inall cases logarithmic or extreme value distributions were selected as the most appropriate fordescribing individual variables.

Table 3 Distribution functionswhich were selected as marginaldistributions

River Q V SSC

Mura Gumbel Log-normal GEV

Delaware Log-Pearson III Log-normal Log-Pearson III

Juniata Log-normal Gumbel GEV

Potomac GEV Gumbel GEV

Schuylkill Log-normal Gumbel GEV

(a) (b) (c)

Fig. 3 Example of QQ plots for the selected marginal distributions for the Gornja Radgona station onthe Mura River

2202 N. Bezak et al.

Page 9: Trivariate Frequency Analyses of Peak Discharge, Hydrograph Volume and Suspended Sediment Concentration Data Using Copulas

4.2 Copula Model Definition

After marginal distribution functions were defined, dependence between separate variables (Q,V, and SSC) was assessed. We calculated Kendall’s correlation coefficient values for all threepairs of variables (Table 4). In three of five stations correlation between Q-V was higher thancorrelation between Q-SSC; correlation between Q-V and Q-SSC was always higher thancorrelation between V-SSC. Kendall’s correlation coefficients of the considered rivers werebetween 0.19 and 0.61. For stations on the Mura, Delaware, and Potomac rivers all correlationcoefficient values were statistically significant (significance level 0.05). For stations on theJuniata and Schuylkill rivers correlations between V and SSC were not statistically significant(0.05). For the Iowa City station only correlation between Q and V was statistically significant(0.05). Anyway, this station was not used for further analysis due to the presence of trend in theSSC sample. We also used the Chi-plot, K-plot, and scatter plot of pseudo observations toevaluate dependence between parameters (Fig. 4).

Kendall’s correlation coefficients, K-plots and Chi-plots (Fig. 4) were also used to definethe most appropriate copula for each gauging station. For station on the Juniata River wedecided to use asymmetric copulas, whereas for other four stations symmetric versions ofArchimedean copulas were used (Table 2). For Gornja Radgona (Mura), Trenton (Delaware),and Point of Rocks (Potomac) stations the use of symmetric copulas was chosen, due to thefact that all correlation coefficients (Table 4) were statistically significant (significance level0.05). For the Berne (Schuylkill) station correlation for two pairs of variables (Q-V and Q-SSC) was larger than correlation between hydrograph volumes and suspended sedimentconcentrations (V-SSC). Asymmetric copulas are mostly used in hydrology in cases whencorrelation for one pair of variables is larger than for the other two pairs (Grimaldi andSerinaldi 2006), therefore we decided to use symmetric copulas also for the Berne stationon the Schuylkill River, whereas for the Newport station on the Juniata River asymmetriccopulas were used (Table 2).

Next step of our analyses was to estimate parameters of the selected copula functions.Maximum pseudo-likelihood method (Eq. (1)) was used for this purpose. Statistical goodnessof fit test (Cramér-von Mises), which is defined with Eq. (3), was used to test each copulafunction. Parametric bootstrap procedure was selected for calculations of p-values. ForNewport station we tested asymmetric copulas, whereas for the other four stations we carriedout the Cramér-vonMises test for the symmetric versions of the Archimedean copula functionspresented in Table 2. All copula models could not be rejected by the Cramér-von Mises testwith previously mentioned significance level (0.05). Based on the Cramér-von Mises testand graphical goodness of fit tests the most appropriate copula models for each stationwere determined. Figure 5 shows graphical goodness of fit test (type I) for the asym-metric Gumbel-Hougaard copula from the Archimedean family for the station on theJuniata River. Results of the graphical goodness of fit test (type II) for the symmetricGumbel-Hougaard copula for the station on the Delaware River are presented in Fig. 6.

Table 4 Kendall’s correlationcoefficient values for pairs ofconsidered variables

River Sample size Q-V Q-SSC V-SSC

Mura 29 0.56 0.50 0.40

Delaware 32 0.40 0.48 0.25

Juniata 34 0.43 0.34 0.13

Potomac 25 0.61 0.53 0.29

Schuylkill 31 0.43 0.43 0.19

Multivariate Frequency Analyses Using Copulas 2203

Page 10: Trivariate Frequency Analyses of Peak Discharge, Hydrograph Volume and Suspended Sediment Concentration Data Using Copulas

(a) (b)

(c) (d)

(e) (f)

Fig. 4 Example of dependence assessment for the Point of Rocks station on the Potomac River

2204 N. Bezak et al.

Page 11: Trivariate Frequency Analyses of Peak Discharge, Hydrograph Volume and Suspended Sediment Concentration Data Using Copulas

4.3 Multivariate Return Periods

Next step of the frequency analyses via copulas was to determine different primary andsecondary return periods. For completely defined copula models we calculated primary returnperiods called OR (Eq. (5)) and secondary Kendall’s return periods (Eq. (6)). For comparisonpurpose we evaluated selected return periods (OR and Kendall) for all considered copulamodels which were defined for each station. Joint return period OR was calculated based onthe quantile values (QT, VT, and SSCT) of the considered variables that correspond to theunivariate return period 10 and 100 years (Table 5). Quantile values were determined based onthe selected marginal distribution functions (Table 3). The Gumbel-Hougaard and Frankcopulas always gave higher joint return period (OR) values than Clayton copula andGumbel-Hougaard copula gave higher return period OR values than Frank copula. Figure 7shows primary joint return period OR for the Gumbel-Hougaard copula, which was selected asthe most appropriate for modelling Q, V and SSC in case of all considered stations. Further-more, results in Fig. 7 are presented for various V values, depending on the station. SecondaryKendall’s return period values for the t value of 0.9 (Eq. (6) are presented in Table 6). Returnperiod values were calculated with algorithm proposed by Salvadori et al. (2011), where m=107 was used for simulations. Differences among three considered copulas were relativelylarge, furthermore Clayton copula gave higher return period values (Kendall’s return period)than the Frank and Gumbel-Hougaard copulas and the Frank copula gave higher Kendall’sreturn period values than the Gumbel-Hougaard copula. Asymmetric copulas which were used

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Q−V

Q [m3/s]

V [m

3]

xx

x

x

x x

xx

x

x

x

x

x

x

x

x

x

x

x

x

xx

x

x

x

x

x

xx

x

xx

x

x

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Q−SSC

Q [m3/s]

SS

C [g

/m3]

x

x

x

x

x

x

x

x

x

xxx

x

x

x

x x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

V−SSC

V [m3]

SS

C [g

/m3]

x

x

x

x

x

x

x

x

x

xxx

x

x

x

x x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

xGumbel−Hougaard copula

0.0 0.2 0.4 0.6 0.8 1.00.0

0.2

0.4

0.6

0.8

1.0

0.00.2

0.40.6

Q [m3/s]

V [m3]S

SC

[g/m

3]

1.0

0.81.0

x

x

x

x x

xx

x

x

x

x

x

x

xx

x

x

xx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

Fig. 5 Graphical goodness of fit test I for the asymmetric Gumbel-Hougaard copula for the Newport station onthe Juniata River

Multivariate Frequency Analyses Using Copulas 2205

Page 12: Trivariate Frequency Analyses of Peak Discharge, Hydrograph Volume and Suspended Sediment Concentration Data Using Copulas

in case of Newport stations on the Juniata river gave higher Kendall’s return period values thansymmetric versions of copulas which were applied to other stations.

5 Discussion

From Fig. 5 we can see that different pairs of variables have different behaviour, which is theconsequence of the asymmetric copulas. But not all dependences are modelled separately(Grimaldi and Serinaldi 2006). In the case of symmetric copulas all pairs of variables are

1000 2000 3000 4000 5000 6000 7000 8000

5.0e

+08

1.5e

+09

Q−V

Q [m3/s]

V [m

3]

x

xx

x

x

x

x

x

xx

x

x

x

xx

x

x

x x

x

xx

x

x

xx

x

xx

x

x

x

1000 2000 3000 4000 5000 6000 7000 8000

020

060

010

0014

00 Q−SSC

Q [m3/s]

SS

C [g

/m3]

x

x

x

xx

x

x

xxx x

x

xx

xx

x xx

x

x

xx

x

x x

xxx

x

x

x

5.0e+08 1.0e+09 1.5e+09 2.0e+09

020

060

010

0014

00 V−SSC

V [m3]

SS

C [g

/m3]

x

x

x

xx

x

x

xxx x

x

xxxx

x xx

x

x

xx

x

x x

xxx

x

x

x

Gumbel−Hougaard copula

0 2000400060008000

0 5

0010

0015

00

0.0e+005.0e+08

1.0e+091.5e+09

2.0e+092.5e+09

Q [m3/s]

V [m

3]

SS

C [g

/m3]

x

xx

x

x

xx

x

x

x

xx

x

xx x

xxx

xxx

xxxxxx

xxx

x

(a) (b)

(c) (d)

Fig. 6 Graphical goodness of fit test II for the symmetric Gumbel-Hougaard copula for the Trenton station onthe Delaware River

Table 5 Quantile values for univariate return period 10 years and joint return periods OR [years] for theconsidered stations

River Copula type QT [m3/s] VT [108 m3] SSCT [g/m3] Gumbel-Hougaard Frank Clayton

Mura Symmetric 935 3.7 1,292 5.7 4.6 4.0

Delaware Symmetric 3,705 14.9 792 5.4 4.4 4.0

Juniata Asymmetric 1,793 9.0 586 5.2 4.3 4.0

Potomac Symmetric 5,472 19.8 1,226 5.8 4.7 4.3

Schuylkill Symmetric 404 1.3 530 5.2 4.4 4.1

2206 N. Bezak et al.

Page 13: Trivariate Frequency Analyses of Peak Discharge, Hydrograph Volume and Suspended Sediment Concentration Data Using Copulas

Fig. 7 Joint return period OR[in years] for various V valuesfor the considered stations

Multivariate Frequency Analyses Using Copulas 2207

Page 14: Trivariate Frequency Analyses of Peak Discharge, Hydrograph Volume and Suspended Sediment Concentration Data Using Copulas

modelled with the same copula parameter, which can in some cases lead to a loss ofinformation. Example of the symmetric copula is shown in Fig. 6 where symmetricGumbel-Hougaard copula was used. Data was transformed to the real space with the use ofmarginal distribution functions. This type of graphical goodness of fit test was also used byGenest and Favre (2007).

Different relationships between some primary, secondary, and also univariate return periodscan be observed (Salvadori et al. 2007; Vandenberghe, et al. 2011). From Table 5 we can seethat univariate return period is always higher than joint return period OR, which is inagreement with findings made by Salvadori et al. (2007).

For each of the considered station we used three different copula functions from theArchimedean family. All copula models could not be rejected with the chosen significancelevel (0.05). It should be noted that sample sizes were relatively small, which can haveinfluence on the calculation of the p-values. To distinguish between copulas we used graphicalgoodness of fit tests. We selected Gumbel-Hougaard copula as the most appropriate formodelling peak discharges, hydrograph volumes and suspended sediment concentrations in

(a) (b) (c)

Fig. 8 Generated copula values transformed into real space with the use of marginal distribution functions(histogram) and the kernel density (line) for the Newport station on the Juniata River

Table 6 Kendall’s return period[years] for t (Eq. (6)) value of 0.9for the considered stations

River Copula type Gumbel-Hougaard Frank Clayton

Mura Symmetric 28.8 174.3 850.9

Delaware Symmetric 33.6 261.8 902.4

Juniata Asymmetric 41.9 356.8 974.8

Potomac Symmetric 27.1 149.2 433.7

Schuylkill Symmetric 37.9 290.4 887.9

2208 N. Bezak et al.

Page 15: Trivariate Frequency Analyses of Peak Discharge, Hydrograph Volume and Suspended Sediment Concentration Data Using Copulas

all cases. This copula was also selected as the most appropriate in some others hydrologicalapplications (Zhang and Singh 2006; Poulin et al. 2007). This copula was selected also due tothe fact that Frank and Clayton copula can underestimate the risk of an event because theycannot model the tail dependence efficiently (Poulin et al. 2007). This can also be seen fromTable 6, where Kendall’s return periods are presented. Clayton and Frank copula gave higherreturn period values than Gumbel-Hougaard copula for all considered stations and differencesamong calculated return period were relatively large. Similar conclusions were also made byPoulin et al. (2007) for the calculation of the primary joint return period called AND.

After we selected the most appropriate marginal distribution functions and copula modellarge samples (10,000) of all three variables were generated with the selected copula modeland then these triples of variables were transformed from copula space ([0,1]) to the real spacewith the use of marginal distributions. Figure 8 shows one result of these simulations for theasymmetric Gumbel-Hougaard copula for the station on the Juniata River. Kernel densityestimation was used to fit the data and maximum of the kernel density function was selected asthe value of the variable that is most likely to happen as the annual maximum of the individualvariable. This procedure was done for all three considered variables (Q, V, and SSC). Table 7shows results of these simulations for different copula functions by considering the samemarginal functions for each station. One can notice that copula with the highest estimatedvalues is not always the same and results vary from station to station. Most likely AM values(Table 7) are correlated with samples (Q, V and SSC) mean and standard deviation values(Table 1). This phenomenon was not a surprise, because in more watery rivers higher values ofQ and V are expected.

6 Conclusions

This paper presents trivariate frequency analysis of Q, V and SSC with the use of copulafunctions, which can be used for multivariate modelling. Several parametric distributions wereused as marginal distribution functions. These three variables (Q, V and SSC) are usually notconsidered simultaneously and especially not with the use of copulas. Six hydrological stationsfrom Slovenia and USA with watershed areas from 920 km2 to 24,996 km2 were consideredand in total almost 200 hundred years of AMwere analysed. Mean AM SSC values were in therange between 265 and 716 g/m3, however mean AM discharge values had larger rangeextended from 233 to 3,210 m3/s. Due to the statistically significant negative trend of the SSCseries for the Iowa River this station was not used for further copula analyses. Symmetric andasymmetric Archimedean copulas were used based on dependences among variables. After

Table 7 Comparison between most likely AM values for different copula functions

Copula Gumbel-Hougaard Clayton Frank

River Q[m3/s]

V[108 m3]

SSC[g/m3]

Q[m3/s]

V[108 m3]

SSC[g/m3]

Q[m3/s]

V[108 m3]

SSC[g/m3]

Mura 506 1.4 325 508 1.2 376 494 1.4 341

Delaware 1,966 6.2 206 2,166 6.1 183 2,149 5.9 169

Juniata 914 3.6 232 826 3.6 212 898 3.7 199

Potomac 2,223 8.3 479 2,247 7.7 507 2,164 8.0 502

Schuylkill 155 0.56 129 157 0.55 129 160 0.62 127

Multivariate Frequency Analyses Using Copulas 2209

Page 16: Trivariate Frequency Analyses of Peak Discharge, Hydrograph Volume and Suspended Sediment Concentration Data Using Copulas

complete procedure of frequency analyses was carried out, some main conclusions canbe made:

a) Copulas can be used for modelling peak discharges, hydrograph volumes and suspendedsediment concentrations values.

b) Gumbel-Hougaard copula was selected as the most appropriate copula for modelling allthree pairs of variables in case of all considered stations.

c) Asymmetric copulas have an advantage over symmetric version due to the more param-eters, but symmetric copula functions can still be used in cases where dependences aresimilar.

d) Different primary and secondary return periods can eventually be computed if there isneed in practical applications (e.g. design).

Acknowledgement We wish to thank the Environmental Agency of the Republic of Slovenia (ARSO) for dataprovision. We would also like to express our thanks to the United States Geological Survey (USGS) for makingthe hydrological data available to the public on their web site. The results of the study are part of the Faculty ofCivil and Geodetic engineering (UL FGG) work on the Slovenian national research project J2-4096 and on theinternational research project SedAlp, which is financed by the European Union through the Alpine Spaceprogram. The critical and useful comments of three anonymous reviewers and associate editor helped to improvethis manuscript, for which the authors are very grateful.

References

Balistrocchi M, Bacchi B (2011) Modelling the statistical dependence of rainfall event variables through copulafunctions. Hydrol Earth Syst Sci 15:1959–1977. doi:10.5194/hess-15-1959-2011

Bardossy A (2011) Interpolation of groundwater quality parameters with some values below the detection limit.Hydrol Earth Syst Sci 15:2763–2775. doi:10.5194/hess-15-2763-2011

Bardossy A, Li J (2008) Geostatistical interpolation using copulas. Water Resour Res 44. doi:10.1029/2007wr006115

Benkhaled A, Higgins H, Chebana F, Necir A (2013) Frequency analysis of annual maximum suspendedsediment concentrations in Abiod wadi, Biskra (Algeria). Hydrol Process. doi:10.1002/hyp.9880

Bezak N, Brilly M, Sraj M (2013a) Comparison between the peaks over threshold method and the annualmaximum method for flood frequency analyses. Hydrol Sci J. doi:10.1080/02626667.2013.831174

Bezak N, Sraj M, Mikos M (2013b) Overview of suspended sediments measurements in Slovenia and anexample of data analysis. Gradbeni Vestnik 62:274–280 (In Slovene)

Bonacci O, Oskorus D (2010) The changes in the lower Drava River water level, discharge and suspendedsediment regime. Environ Earth Sci 59:1661–1670. doi:10.1007/s12665-009-0148-8

Box GEP, Pierce DA (1970) Distribution of residual autocorrelations in autoregressive-integrated movingaverage time series models. J Am Stat Assoc 65:1509–1526. doi:10.1080/01621459.1970.10481180

Burn DH (1997) Catchment similarity for regional flood frequency analysis using seasonality measures. J Hydrol202:212–230. doi:10.1016/s0022-1694(97)00068-1

Chen L, Singh VP, Guo SL, Hao ZC, Li TY (2012) Flood coincidence risk analysis using multivariate copulafunctions. J Hydrol Eng 17:742–755. doi:10.1061/(asce)he.1943-5584.0000504

De Michele C, Salvadori G, Canossi M, Petaccia A, Rosso R (2005) Bivariate statistical approach to checkadequacy of dam spillway. J Hydrol Eng 10:50–57. doi:10.1061/(asce)1084-0699(2005)10:1(50)

Favre AC, El Adlouni S, Perreault L, Thiemonge N, Bobee B (2004) Multivariate hydrological frequencyanalysis using copulas. Water Resour Res 40. doi:10.1029/2003wr002456

Fisher NI, Switzer P (1985) Chi-plots for assessing dependence. Biometrika 72:253–265. doi:10.1093/biomet/72.2.253

Fisher NI, Switzer P (2001) Graphical assessment of dependence: is a picture worth 100 tests? Am Stat 55:233–239. doi:10.1198/000313001317098248

Ganguli P, Reddy MJ (2012) Risk assessment of droughts in Gujarat using bivariate copulas. Water ResourManag 26:3301–3327. doi:10.1007/s11269-012-0073-6

2210 N. Bezak et al.

Page 17: Trivariate Frequency Analyses of Peak Discharge, Hydrograph Volume and Suspended Sediment Concentration Data Using Copulas

Genest C, Boies JC (2003) Detecting dependence with Kendall plots. Am Stat 57:275–284. doi:10.1198/0003130032431

Genest C, Favre AC (2007) Everything you always wanted to know about copula modeling but were afraid toask. J Hydrol Eng 12:347–368. doi:10.1061/(asce)1084-0699(2007)12:4(347)

Genest C, Remillard B (2008) Validity of the parametric bootstrap for goodness-of-fit testing in semiparametricmodels. Ann Instit Henri Poincare Probabilites Stat 44:1096–1127. doi:10.1214/07-aihp148

Genest C, Ghoudi K, Rivest LP (1995) A semiparametric estimation procedure of dependence parameters inmultivariate families of distributions. Biometrika 82:543–552. doi:10.1093/biomet/82.3.543

Genest C, Remillard B, Beaudoin D (2009) Goodness-of-fit tests for copulas: a review and a power study. InsurMath Econ 44:199–213. doi:10.1016/j.insmatheco.2007.10.005

Grimaldi S, Serinaldi F (2006) Asymmetric copula in multivariate flood frequency analysis. Adv Water Resour29:1155–1167. doi:10.1016/j.advwatres.2005.09.005

Holtschlag DJ (2001) Optimal estimation of suspended-sediment concentrations in streams. Hydrol Process 15:1133–1155. doi:10.1002/hyp.207

Hosking JRM, Wallis JR (1997) Regional frequency analysis: an approach based on L-moments. CambridgeUniversity Press, Cambridge

Joe H (1997) Multivariate models and dependence concepts. Chapman & Hall, LondonKendall MG (1975) Multivariate analysis. Griffin, LondonKoffler D, Laaha G (2012) LFSTAT- an R-package for low-flow analysis. EGU General Assembly, Vienna 22–

27.4Kojadinovic I, Yan J, Holmes M (2011) Fast large-sample goodness-of-fit tests for copulas. Stat Sin 21:841–871.

doi:10.1007/s11222-009-9142-yMa MW, Song SB, Ren LL, Jiang SH, Song JL (2013) Multivariate drought characteristics using trivariate

Gaussian and Student t copulas. Hydrol Process 27:1175–1190. doi:10.1002/hyp.8432Nelsen RB (1999) An introduction to copulas. Springer, New YorkParajka J, Viglione A, Rogger M, Salinas JL, Sivapalan M, Bloschl G (2013) Comparative assessment of

predictions in ungauged basins—part 1: runoff-hydrograph studies. Hydrol Earth Syst Sci 17:1783–1795.doi:10.5194/hess-17-1783-2013

Poulin A, Huard D, Favre AC, Pugin S (2007) Importance of tail dependence in bivariate frequency analysis. JHydrol Eng 12:394–403. doi:10.1061/(asce)1084-0699(2007)12:4(394)

Reddy MJ, Ganguli P (2012) Bivariate flood frequency analysis of Upper Godavari River flows usingArchimedean copulas. Water Resour Manag 26:3995–4018. doi:10.1007/s11269-012-0124-z

Rodríguez-Blanco ML, Taboada-Castro MM, Palleiro L, Taboada-Castro MT (2010) Temporal changes insuspended sediment transport in an Atlantic catchment, NW Spain. Geomorphology 123:181–188. doi:10.1016/j.geomorph.2010.07.015

Salvadori G, De Michele C (2004) Frequency analysis via copulas: Theoretical aspects and applications tohydrological events. Water Resour Res 40. doi:10.1029/2004wr003133

Salvadori G, De Michele C, Kottegoda NT, Rosso R (2007) Extremes in nature an approach using copulas.Springer, Dordrecht

Salvadori G, De Michele C, Durante F (2011) On the return period and design in a multivariate framework.Hydrol Earth Syst Sci 15:3293–3305. doi:10.5194/hess-15-3293-2011

Serinaldi F, Grimaldi S (2007) Fully nested 3-copula: procedure and application on hydrological data. J HydrolEng 12:420–430. doi:10.1061/(asce)1084-0699(2007)12:4(420)

Sraj M, Bezak N, Brilly M (2014) Bivariate flood frequency analysis using the copula function: a case study ofthe Litija station on the Sava River. Hydrol Process. doi:10.1002/hyp.10145

Tena A, Batalla RJ, Vericat D, Lopez-Tarazon JA (2011) Suspended sediment dynamics in a large regulated riverover a 10-year period (the lower Ebro, NE Iberian Peninsula). Geomorphology 125:73–84. doi:10.1016/j.geomorph.2010.07.029

Tramblay Y, St-Hilaire A, Ouarda T (2008) Frequency analysis of maximum annual suspended sedimentconcentrations in North America. Hydrol Sci J 53:236–252. doi:10.1623/hysj.53.1.236

Tramblay Y, Ouarda T, St-Hilaire A, Poulin J (2010) Regional estimation of extreme suspendedsediment concentrations using watershed characteristics. J Hydrol 380:305–317. doi:10.1016/j.jhydrol.2009.11.006

Vandenberghe S, Verhoest NEC, Onof C, De Baets B (2011) A comparative copula-based bivariate frequencyanalysis of observed and simulated storm events: a case study on Bartlett-Lewis modeled rainfall. WaterResour Res 47. doi:10.1029/2009wr008388

Wang C, Chang NB, Yeh GT (2009) Copula-based flood frequency (COFF) analysis at the confluences of riversystems. Hydrol Process 23:1471–1486. doi:10.1002/hyp.7273

Wong G, Lambert MF, Leonard M, Metcalfe AV (2010) Drought analysis using trivariate copulas conditional onclimatic states. J Hydrol Eng 15:129–141. doi:10.1061/(asce)he.1943-5584.0000169

Multivariate Frequency Analyses Using Copulas 2211

Page 18: Trivariate Frequency Analyses of Peak Discharge, Hydrograph Volume and Suspended Sediment Concentration Data Using Copulas

Yusof F, Hui-Mean F, Suhaila J, Yusof Z (2013) Characterisation of drought properties with bivariate copulaanalysis. Water Resour Manag 27:4183–4207. doi:10.1007/s11269-013-0402-4

Zhang L, Singh VP (2006) Bivariate flood frequency analysis using the copula method. J Hydrol Eng 11:150–164. doi:10.1061/(asce)1084-0699(2006)11:2(150)

Zhang L, Singh VP (2007a) Bivariate rainfall frequency distributions using Archimedean copulas. J Hydrol 332:93–109. doi:10.1016/j.jhydrol.2006.06.033

Zhang L, Singh VP (2007b) Trivariate flood frequency analysis using the Gumbel-Hougaard copula. J HydrolEng 12:431–439. doi:10.1061/(asce)1084-0699(2007)12:4(431)

2212 N. Bezak et al.