A Flexible Copula Speci cation Test under Censorship...(2010) proposed a likelihood-based criterion...

A Flexible Copula Specification Test under Censorship

Juan Lin∗ Ximing Wu†

Abstract

We propose a copula specification test for copula-based multivariate survival models under

censorship. This flexible test is applicable to both Archimedean and non-Arhcimedean copulas

and provides useful pointers for alternative copula construction when a null distribution is

rejected. It is shown to be consistent and asymptotically distribution free. We demonstrate

its good finite sample performance via Monte Carlo simulations. Two empirical applications

illustrate the usefulness of this test.

Keywords: copula specification; smooth test; censorship; survival analysis

∗Department of Finance, School of Economics & Wang Yanan Institute for Studies in Economics, Xiamen Uni-versity, China. Email: linjuan.bnu@gmail.com†Corresponding author. Department of Agricultural Economics, Texas A&M University, College Station, TX,

77843, U.S.A. Tel: (979) 458-1355, Fax: (979) 862-1563, Email: xwu@tamu.edu

1 Introduction

Copula has been an increasingly popular tool to model the dependence structure of multivariate

data. One major advantage of the copula approach is its flexibility in separately modeling the

marginal distributions of individual variables and their dependence structure; cf. Cherubini et al.

(2004, 2011), McNeil et al. (2005), Patton (2012) and references therein. It has been widely

appreciated that the specification of copula models is of great importance since different copulas

lead to multivariate models that may have very different dependence properties. For instance, the

Gaussian copula has been widely used to model the collateralized debt obligation (CDO) due to its

simplicity. However, it imposes the restriction that extreme events are asymptotically independent

and therefore does not accommodate tail dependence during times of financial turmoil. Due to this

drawback, it has been vilified as “the formula that killed Wall Street” (Salmon, 2009). A number

of studies have attempted to address the issue of copula specification. For general overviews of

copula specification tests for complete (i.e., uncensored) data, see among others Chen and Fan

(2005, 2006), Genest et al. (2009), Berg (2009) and Fermanian (2013).

This study concerns censored data, which are frequently encountered in many areas, such as

actuarial science, finance, economics and biology; cf. Frees et al. (1996), Klugman and Parsa (1999),

Denuit et al. (2006), Rigobon and Stoker (2007) among others. For example, in household economic

surveys, high earnings are typically topcoded (right censored) for confidentiality. Unemployment

duration data are often censored because an unemployment spell might continue beyond the end of

data collection period. In credit risk analysis, a firm’s lifetime is censored if either default does not

occur by the end of the observation period, or if the firm drops out of the sample for reasons other

than default such as merger, acquisition or liquidation. In actuarial science, the insurance losses are

usually bounded from above because the amount of claim cannot exceed the insured’s policy limit.

The lifetimes of annuitants are censored if they live beyond the end of their contractual guarantee

period.

There is a small body of work that extends copula specification tests to censored data, with

a focus on bivariate Archimedean copulas. Shih (1998) introduced a procedure to assess the fit

of the Clayton copula. Wang and Wells (2000) proposed a model selection procedure within a

family of Archimedean copulas based on the truncated version of the Kendall process introduced

by Genest and Rivest (1993). Lakhal-Chaieb (2010) further modified Wang and Wells’s (2000)

test, proposing a new inverse probability censoring weighted estimator for Kendall’s τ . Emura

et al. (2010) proposed a test that is based on the discrepancy between the unweighted and weighted

concordance estimators of the associated Archimedean copula parameter. Wang (2010) considered a

test based on the correlation between two random quantities that are shown to be independent under

Archimedean copulas by Genest and Rivest (1993). Yilmaz and Lawless (2011) proposed some

likelihood and pseudo-likelihood ratio procedures for parametric and semiparametric Archimedean

copula families.

Several recent studies considered non-Archidemean copulas for censored data. Andersen et al.

(2005) proposed a goodness-of-fit test based on the partition of the unit square and the comparison

of the estimated frequencies for a parametrically estimated copula and an empirical copula. This

test, however, may not be suitable for heavy censoring; see Yilmaz and Lawless (2011). Chen et al.

(2010) proposed a likelihood-based criterion of copula selection under general censorship. Gribkova

and Lopez (2015) considered non-parametric copula estimation under bivariate censoring.

In this paper we propose a general copula specification test for censored data along the lines

of Neyman’s (1937) smooth test of distributions. The smooth test has been developed in many

directions. For instance, Gray and Pierce (1985) extended it to censored data; Lin and Wu (2015)

applied it to copula specification for complete data. Ledwina (1994) introduced a data driven

smooth test of distribution, which is generalized to test independence by Kallenberg and Ledwina

(1999). This study extends data driven smooth test to multivariate models with censored data.

Unlike most of the existing tests, the proposed test accommodates both Archimedean and non-

Archimedean copulas. We show that the proposed test is asymptotically distribution free and

establish its large sample properties. In addition, it is diagnostic and conductive to improved

copula specification: when a null hypothesis is rejected under moment functions selected by some

data-driven principle, an alternative copula can be constructed by augmenting the null distribution

with these selected moment functions in the spirit of Efron and Tibshirani (1996).

The remainder of this paper is organized as follows. Section 2 briefly reviews Neyman’s smooth

test of distribution and several extensions pertinent to this study. Section 3 proposes a copula

smooth test for censored data under simple hypotheses without unknown parameters. In Section

4, we present a general copula smooth test with estimated marginal distributions and copula pa-

rameters and derive its large sample properties. Finite sample performances of the proposed test

are investigated in section 5 through Monte Carlo simulations. Section 6 provides two empirical

applications in actuarial science. All proofs are relegated to the appendix.

Throughout the paper, we use upper case letters to denote cumulative distribution functions

and corresponding lower case letters to denote density functions. We use subscript t to index

observations and subscript j to denote the coordinate of multivariate random vectors. We use Ik

to denote an k × k identity matrix and a prime symbol to indicate matrix transpose.

2 Background

Neyman (1937) introduced the smooth test of goodness of fit. The main thrust of smooth test is

to embed a hypothesized null density into a larger exponential family of alternatives, transforming

the hypothesis regarding a certain distribution into that of a general distribution against a nested

special case. Gray and Pierce (1985) extended Neyman’s smooth test to accommodate censored

data. Denote by T the survival time of interest from an unknown continuous distribution and

by D a censoring variable. We assume that D is independent of T . Suppose n independently

(but not necessarily identically) distributed observations (Xt, δt)nt=1 are available, where Xt =

Tt ∧ Dt and δt = I(Tt ≤ Dt), in which a ∧ b = min(a, b) and I(·) is the indicator function.

The hypothesis of interest is whether a random sample T1, ..., Tn are generated from a specific

distribution F (t; θ), where θ is a vector of finite-dimensional unknown parameters. Let f(t; θ) and

S(t; θ) be the corresponding density and survival function respectively. Under the null hypothesis,

S(Tt; θ)nt=1 are distributed according to the standard uniform distribution. Gray and Pierce

(1985) considered a family of alternative densities for T of the form

f(t; θ) exp(λ′ψ(S(t; θ))− λ0), (1)

where ψ = (ψ1, ..., ψk)′ is a vector of linearly independent, bounded real valued functions defined

on [0, 1], λ = (λ1, ..., λk)′ ∈ Rk, and λ0 is a normalization constant.1 Neyman’s smooth test of

distribution then amounts to the test of λ = 0.

Define ut = S(Xt; θ). The average log-likelihood of the observed data under alternative (1)

through a change of variable is given by

n∑t=1

δt[log f(Xt; θ) + λ′ψ(ut)] + (1− δt) log

∫ ut

0exp(λ′ψ(u))du

− λ0. (2)

Let θ be the maximum likelihood estimate for θ and ut = S(Xt; θ). The score of (2) with respect

to λ, evaluated at λ = 0, is given by

n∑t=1

g(ut, δt),

g(ut, δt) = δtψ(ut) + (1− δt)∫ ut

0ψ(u)du/ut −

0ψ(u)du

= δtψ(ut) + (1− δt)E[ψ(U)|U < ut]− E[ψ(U)],

where U is a standard uniform random variable. It is seen that the uncensored observations con-

tribute to the score function in the usual form of ψ(S(Xt; θ)), while contributions from the censored

observations take the form E[ψ(S(Tt; θ))|Tt > Xt]. Thus, g yields the conditional expectation of

the score function for the complete data Ttnt=1 given the observed data (Xt, δt)nt=1. This is

similar to the idea of the EM algorithm for censored data proposed by Dempster et al. (1977).

It is straightforward to show that under the null hypothesis, n−1∑n

t=1E[g(ut, δt)] = 0. Let Ω =

n−1∑n

t=1 var[g(ut, δt)]. It follows that under the null hypothesis, gp→ 0 and

√nΩ−1/2g

d→ N(0, Ik)

1In a similar spirit, Pena (1998) considered a smooth goodness-of-fit test for hazard functions, embedding a baselinehazard function in a larger family of hazard functions.

as n → ∞ under suitable regularity conditions. Let Ω be a consistent estimate of Ω. Neyman’s

smooth test of distribution for censored data is then given by

Nk = ng′Ω−1g.

Under the null hypothesis, Nk converges in distribution to the χ2 distribution with k degrees of

freedom as n→∞.

Compared to the omnibus tests (e.g. the Komogorov-Smirnov test or Cramer-von Mises test),

the smooth tests have attractive finite sample properties (see Rayner and Best (1990) for a compre-

hensive review of the smooth tests). In practice, caution should be exercised regarding the choice

of the number of basis functions k, which influences the power of smooth tests. A customary prac-

tice is to use a small k, usually less than 4. Alternatively Ledwina (1994) proposed a data driven

smooth test. She suggested first applying the Bayesian Information Criterion (BIC) to choose a

suitable distribution within family (1) and then constructing the test statistic based on the chosen

distribution. In an extension to Ledwina (1994), Inglot and Ledwina (2006) considered a hybrid

selection criterion, which combines the strength of the BIC and Akaike Information Criterion (AIC)

and is shown to improve the power.

Kallenberg and Ledwina (1999) and Kallenberg (2009) proposed smooth tests for bivariate

distributions. Let ψi1i2(u1, u2) = ψi1(u1)ψi2(u2) and Ψ be a non-empty subset of ψi1i2 : 1 ≤ i1, i2 ≤d(n). Denote the cardinality of Ψ by k = |Ψ| and write Ψ = Ψ1, ...,Ψk. They considered a

bivariate version of (1) for the joint distribution of the pseudo-observations of two random variables,

(k∑i=1

λiΨi(u1, u2)− λ0

), (u1, u2) ∈ [0, 1]2, (3)

where λ0 is a normalization constant. The test of independence within the exponential family (3)

is equivalent to a test of the hypothesis: λ = (λ1, . . . , λk)′ = 0. A smooth test of independence can

then be constructed analogously to the test of univariate uniformity.

In their study of bivariate distributions, Kallenberg and Ledwina (1999) suggested two spec-

ifications of the candidate set Ψ. The first specification includes only ‘diagonal’ entries such as

ψ11, ψ22, ... and so on. The second specification is more flexible and allows ‘non-diagonal’ entries

ψi1i2 with i1 6= i2. The set Ψ is arranged such that Ψ1 = ψ11 and the remaining entries are ar-

ranged in the descending order according to the absolute value of sample average of ψi1i2 ’s. In

either case, the BIC truncation rule is then applied to the candidate set.2 Denote by K the number

of terms selected according to the BIC. Kallenberg and Ledwina (1999) showed that Kp→ 1 and

the corresponding test statistic converges in distribution to χ21 under independence as n→∞ .

2In a similar spirit, Kallenberg (2009) used a threshold rule to select significant elements of Ψ for copula specifi-cations.

3 Copula test for censored data under simple hypothesis

To fix idea, in this section we introduce the smooth copula specification test for censored data under

the simplifying assumption of no unknown parameters. The general test under unknown marginal

distributions and copula parameters is given in the next section.

A copula is a multivariate probability distribution with standard uniform marginal distributions.

Copulas describe the dependence between random variables. Suppose (T1, T2) is a paired survival

times with joint survival function S(t1, t2) and density function f(t1, t2). Let (S1, S2) and (f1, f2)

denote the corresponding continuous marginal survival functions and density functions for (T1, T2).

A straightforward application of Sklar’s (1959) Theorem yields that there exists a unique copula

function C(·, ·) with density function c(·, ·) on [0, 1]2, such that

S(t1, t2) = C(S1(t1), S2(t2)),

f(t1, t2) = c(S1(t1), S2(t2))f1(t1)f2(t2),

where C(u1, u2) is sometimes called the survival copula.

Let c0(u1, u2;α), where (u1, u2) ∈ [0, 1]2, be a class of copula density functions parametrized

by a finite p-dimensional coefficient α ∈ A ⊂ Rp. In this section we consider the simplest case

where the marginal distributions are known and the hypothesized copula distribution is completely

specified with α = α0. The corresponding simple null hypothesis then takes the form

H0 : Pr(c(u1, u2) = c0(u1, u2;α0)) = 1, (4)

against the alternative hypothesis

H1 : Pr(c(u1, u2) = c0(u1, u2;α0)) < 1.

for some α0 ∈ A ⊂ Rp.We propose a smooth test of copula specification in the presence of censored data. Let ψ =

(ψ1, . . . , ψk)′ be a vector of linearly independent bounded real valued functions defined on [0, 1]2

and λ = (λ1, ..., λk)′ ∈ Rk. Similarly to Neyman’s smooth test of distribution, our smooth test

of copula specification can be motivated by a smooth alternative. Denote U0j = Sj(Tj), j = 1, 2.

Consider a family of densities for (U01 , U

02 ) of the form,

cψ(u1, u2;α0, λ) = c0(u1, u2;α0) exp(λ′ψ(u1, u2)− λ0) (5)

with a normalization constant λ0 = log∫c0(u1, u2;α0) exp(λ′ψ(u1, u2))du1du2. We note that cψ

in (5) has an appealing information theoretic interpretation: it can be derived as the density that

minimizes the conditional Kullback-Leibler information criterion between the target density and

the null density subject to moment conditions associated with ψ (cf. Efron and Tibshirani (1996)).

Under the assumption that (U01 , U

02 ) are distributed according to a distribution in family (5),

hypothesis (4) is equivalent to that λ = 0. Define

µψ = E[ψ(U01 , U

02 )] =

∫[0,1]2

ψ(u1, u2)c0(u1, u2;α0)du1du2,

where the expectation is taken with respect to the hypothesized null copula distribution. For sim-

plicity, we shall omit the subscript [0, 1]2 in the double integration whenever there is no ambiguity.

In the presence of right censorship, suppose we have n independent (but not necessarily identically

distributed) observations (X1t, X2t, δ1t, δ2t)nt=1, where Tjt and Djt are survival and censoring vari-

ables, Xjt = Tjt ∧ Djt and δjt = I(Tjt ≤ Djt), j = 1, 2. One can then construct a score test on

λ = 0 based on the discrepancy between µψ, which is the population mean of ψ(U01 , U

02 ) under

the null, and the sample analog of the conditional mean of the ψ(U01 , U

02 ) given the observations

(X1t, X2t, δ1t, δ2t)nt=1.

Let Ujt = Sj(Xjt), j = 1, 2. Denote by Cψ(u1, u2;α0) the distribution function of the den-

sity cψ(u1, u2;α0). The average log-likelihood under the parametric copula family (5) is given by

n−1∑n

t=1 lψ(U1t, U2t;α0), where

lψ(u1t, u2t;α) = δ1tδ2t log cψ(u1t, u2t;α) + δ1t(1− δ2t) log∂Cψ(u1t, u2t;α)

+ (1− δ1t)δ2t log∂Cψ(u1t, u2t;α)

∂u2+ (1− δ1t)(1− δ2t) logCψ(u1t, u2t;α). (6)

Note that we suppress the dependence of lψ(u1t, u2t;α) on δ1t, δ2t for notational simplicity. Such

simplifications will also be applied to other likelihood functions. Substituting (5) into (6) yields

the following log-likelihood

n∑t=1

δ1tδ2t[λ

′ψ(U1t, U2t) + log c0(U1t, U2t;α0)]

+ δ1t(1− δ2t) log

∫ U2t

0exp(λ′ψ(U1t, u2))c0(U1t, u2;α0)du2

+ (1− δ1t)δ2t log

∫ U1t

0exp(λ′ψ(u1, U2t))c0(u1, U2t;α0)du1

+ (1− δ1t)(1− δ2t) log

∫ U2t

∫ U1t

0exp(λ′ψ(u1, u2))c0(u1, u2;α0)du1du2 − λ0

The derivative of (7) with respect to λ, evaluated at λ = 0, is given by

gψ =1

n∑t=1

gψ(U1t, U2t;α0), (8)

gψ(u1t, u2t;α) = δ1tδ2tψ(u1t, u2t) + δ1t(1− δ2t)

∫ u2t0 ψ(u1t, u2)c0(u1t, u2;α)du2

C1(u1t, u2t;α)

+ (1− δ1t)δ2t

∫ u1t0 ψ(u1, u2t)c0(u1, u2t;α)du1

C2(u1t, u2t;α)

+ (1− δ1t)(1− δ2t)

∫ u2t0

∫ u1t0 ψ(u1, u2)c0(u1, u2;α)du1du2

C0(u1t, u2t;α)

−∫ψ(u1, u2)c0(u1, u2;α)du1du2 (9)

with C1(u1t, u2t;α) =∫ u2t

0 c0(u1t, u2;α)du2 and C2(u1t, u2t;α) =∫ u1t

0 c0(u1, u2t;α)du1. Again we

suppress the dependence of gψ on δ1t and δ2t when there is no confusion.

Similar to g defined in Section 2, gψ estimates the conditional expectation of the score function

for the (unobserved) complete data given the observations (X1t, X2t, δ1t, δ2t)nt=1. Define

Ωψ =1

n∑t=1

var[gψ(U1t, U2t;α0)]. (10)

It follows that under the null hypothesis, gψp→ 0 and

√nΩ−1/2ψ gψ

d→ N(0, Ik) as n → ∞ under

suitable regularity conditions.

Next define Ωψ = n−1∑n

t=1 gψ(U1t, U2t;α0)gψ(U1t, U2t;α0)′. We construct a smooth test for

copula specification as follows:

Qk = ng′ψΩ−1ψ gψ. (11)

Under null hypothesis (4), Qk can be shown to converge in distribution to the χ2 distribution with

k degrees of freedom under suitable regularity conditions given in the next section.

4 Copula test for censored data under composite hypothesis

In practice, the marginal distributions and copula coefficients are usually unknown and have to be

estimated. In the presence of unknown parameters, we now face the following composite hypothesis:

H0 : Pr(c(u1, u2) = c0(u1, u2;α0)) = 1 for some α0 ∈ A, (12)

against the alternative hypothesis

H1 : Pr(c(u1, u2) = c0(u1, u2;α)) < 1 for all α ∈ A.

One can estimate the marginal distributions using parametric or nonparametric methods. In

general, tests with parametrically estimated marginal distributions are more efficient if the para-

metric margins are correctly specified. However, under incorrect distributional assumptions, the

tests can suffer persistent size distortions even if the sample size goes to infinity; see Yilmaz and

Lawless (2011). In contrast, tests with non-parametrically estimated marginals are immune from

misspecification of the marginal distributions. We therefore focus on tests with non-parametrically

estimated marginals in this study.

Recall that Ujt = Sj(Xjt), j = 1, 2. We estimate Sj(·) by the Kaplan-Meier estimator

Sj(x) =∏

Xj(t)≤x

(1− 1

n− t+ 1

)δj(t),

where for j = 1, 2, Xj(1) ≤ Xj(2) ≤ ... ≤ Xj(n) are the order statistics of Xjtnt=1, and δj(t)nt=1 are

similarly defined. Lai and Ying (1991) established the consistency of Sj(·) under the assumption

of independent censoring.

Define (U1t, U2t) = (S1(X1t), S2(X2t)). Given (U1t, U2t)nt=1, we proceed to estimate the copula

parameters via the maximum likelihood estimation,

αn = arg maxα

n∑t=1

l(U1t, U2t;α), (13)

where l(u1t, u2t;α) is the same as (6) with c0 taking the place of cψ and C0 taking the place of Cψ.

Let ΨS be a non-empty subset of basis functions ψi1i2 : 1 ≤ i1, i2 ≤ d, where d is a fixed

upper bound.3 Denote the cardinality of ΨS by KS = |ΨS |. Similarly to (5), we consider a smooth

alternative distribution given by

cS(u1, u2;α0, λS) = c0(u1, u2;α0) exp(∑KS

i=1 λS,iΨS,i(u1, u2)− λS,0), (14)

where λS = (λS,1, ..., λS,KS)′ and λS,0 is a normalization constant.

One can envision that a smooth test on hypothesis (12) can be constructed based on the

estimated score

gS(αn) =1

n∑t=1

gS(U1t, U2t; αn), (15)

where gS(u1t, u2t;α) is the same as (9) with ψ replaced by ΨS . However, the test statistics (11) for

simple hypotheses is not directly applicable here because of the presence of nuisance parameters in

(15). There are two sets of nuisance parameters: the finite dimensional copula parameter α and

the infinite-dimensional marginal survival function Sj , j = 1, 2.

3Note that Escanciano and Lobato (2009) and Escanciano et al. (2013) also considered a fixed upper bound in theselection criterion.

In order to construct a test based on (15), we need to account for the influences of nuisance

parameters. We start with the asymptotic distribution of αn, which is studied by Chen et al. (2010).

Let lα(u1, u2;α) and lj(u1, u2;α) denote the partial derivatives of l(u1, u2;α) with respect to α and

uj . Further denote lαα(u1, u2;α) = ∂2l(u1, u2;α)/∂α∂α′, lαj(u1, u2;α) = ∂2l(u1, u2;α)/∂α∂uj and

Wjt ≡Wj(Xjt, δjt;α0) = Elαj(U1s, U2s;α0)Ij(Xjt, δjt)(Xjs)|Xjt, δjt, (16)

Ij(Xjt, δjt)(Xjs) = −Sj(Xjs)

[∫ Xjs

−∞

dNjt(u)

Pn,j(u)−∫ Xjs

−∞

IXjt ≥ udHj(u)

Pn,j(u)

where Hj(x) ≡ − log(Sj(x)) is the cumulative hazard function of Xj , Njt(x) = δjtI(Xjt ≤ x),

dNjt(x) = Njt(x)−Njt(x−), and Pnj(x) = n−1∑n

k=1 P (Xjk ≥ x). Also define

Bn = − 1

n∑t=1

Elαα(U1t, U2t;α0),

Σn =1

n∑t=1

varlα(U1t, U2t;α0) +2∑j=1

The asymptotic distribution of αn is given by the following theorem.

Theorem 1. (a) Under conditions C1-C5 given in the Appendix, ||αn − α0|| = op(1). (b) Fur-

thermore, if conditions A1-A4 hold, then BnΣ−1/2n√n(αn − α0)

d→ N(0, Ip).

Under the null hypothesis (12), we also have Bn = n−1∑n

t=1 lα(U1t, U2t;α0)lα(U1t, U2t;α0)′ by

the information matrix equality. The asymptotic variance of√n(αn − α0), B−1

n ΣnB−1n , can be

simplified to

B−1n +

nB−1n

n∑t=1

varW1t +W2tB−1n .

If the marginal distributions are known, the asymptotic variance of αn is reduced to B−1n .

Next define

n∑t=1

lα(U1t, U2t; αn)lα(U1t, U2t; αn)′,

Wjt =1

n∑s=1,s 6=t

lαj(U1s, U2s; αn)Ij(Xjt, δjt)(Xjs),

where for j = 1, 2,

[I(Xjt ≤ Xjs)δjt

n−1∑n

k=1 I(Xjk ≥ Xjt)− 1

n∑l=1

I(Xjs ≥ Xjl)I(Xjt ≥ Xjl)δjl[n−1

∑nk=1 I(Xjk ≥ Xjl)]2

As indicated in Chen et al. (2010), an alternative expression for Ij(Xjt, δjt)(Xjs) is

IXjt ≤ Xjs, δjt = 1Pnj(Xjt)

−∑

Xjl≤Xjs

I(Xjt ≥ Xjl)∆Hj(Xjl)

Pnj(Xjl)

,where Pnj(x) = n−1

∑nk=1 I(Xjk ≥ x) and ∆Hj(x) =

IYj(x)>0Yj(x)

dNj(x), in which Yj(x) =∑n

k=1 IXjk ≥x and Nj(x) =

∑nk=1Njk(x). ∆Hj(x) is the so-called Nelson’s estimator. Chen et al. (2010) es-

tablished the equivalence between these two expressions.

The asymptotic variance of αn can be consistently estimated by

B−n + B−n

n∑t=1

(2∑j=1

2∑j=1

Wjt)′

B−n ,

where B−n is the generalized inverse of Bn. Further define

GS,α =1

n∑t=1

E[gS,α(U1t, U2t;α0)],

Zjt ≡ Zj(Xjt, δjt;α0) = E[gS,j(U1s, U2s;α0)Ij(Xjt, δjt)(Xjs)|Xjt, δjt], j = 1, 2, (18)

where gS,α(u1, u2;α) and gS,j(u1, u2;α) are the partial derivatives of gS(u1, u2;α) with respect to

α and uj . We obtain the following asymptotic distribution of gS(αn).

Theorem 2. Suppose that all conditions of Theorem 1 are satisfied and conditions T1-T3 given

in the Appendix hold. Under null hypothesis (12), gS(αn)p→ 0 and

√nΩ−1/2S gS(αn)

d→ N(0, IKS),

ΩS =1

n∑t=1

gS(U1t, U2t;α0) +

2∑j=1

Zjt + G′S,αB−1n

lα(U1t, U2t;α0) +

2∑j=1

. (19)

Remark 1. The asymptotic variance ΩS reflects the influence of nuisance parameters in gS(αn).

If the marginal distributions are known, the terms involving W ′s and Z ′s are not needed, re-

sulting in ΩS = n−1∑n

t=1 vargS(U1t, U2t;α0) + G′S,αB−1

n lα(U1t, U2t;α0)

. When the copula pa-

rameter α is known, the third term on the right hand side of (19) is not needed, resulting in

ΩS = n−1∑n

t=1 vargS(U1t, U2t;α0) +∑2

j=1 Zjt. If both the marginal distributions and copula

parameter are known, ΩS is further reduced to n−1∑n

t=1 vargS(U1t, U2t;α0), which is exactly the

variance (10) derived in the previous section for simple hypotheses.

Remark 2. Another interesting special case is the independent copula, under which the estimation

error of the marginal distribution is asymptotically ignorable; see Genest et al. (1995). The proposed

test can be accordingly simplified to test independence between T1 and T2 under censorship. Similarly

to (14), we consider a smooth alternative distribution

cS(u1, u2;λS) = exp(∑KS

i=1 λS,iΨS,i(u1, u2)− λS,0).

The corresponding score function is given by gI = n−1∑n

t=1 gI(U1t, U2t), where

gI(u1t, u2t) = δ1tδ2tψ(u1t, u2t) + δ1t(1− δ2t)

∫ u2t0 ψ(u1t, u2)du2

+ (1− δ1t)δ2t

∫ u1t0 ψ(u1, u2t)du1

+ (1− δ1t)(1− δ2t)

∫ u2t0

∫ u1t0 ψ(u1, u2)du1du2

u1tu2t

It follows readily that gIp→ 0 and

√nΩ−1/2I gI

d→ N(0, IKS), where

ΩI =1

n∑t=1

gI(U1t, U2t) +2∑j=1

.This test extends the rank based independence test of Kallenberg and Ledwina (1999) to censored

Next we present a consistent estimator for ΩS . Define

GS,α =1

n∑t=1

gS,α(U1t, U2t; αn), (20)

Zjt =1

n∑s=1,s 6=t

gS,j(U1t, U2t; αn)Ij(Xjt, δjt)(Xjs), j = 1, 2. (21)

ϕt = gS(U1t, U2t; αn) +2∑j=1

Zjt + G′S,αB−n

lα(U1t, U2t; αn) +2∑j=1

It follows that ΩS can be estimated consistently by

ΩS =1

n∑t=1

(ϕt − ϕ)(ϕt − ϕ)′, (22)

where ϕ = n−1∑n

t=1 ϕt.

We can now construct a semiparametric smooth test of copula specification as follows.

Theorem 3. Suppose all conditions of Theorem 1 and 2 are satisfied. The semiparametric smooth

test of copula specification for censored data is given by

QS = ngS(αn)′Ω−1S gS(αn). (23)

Under null hypothesis (12), QSd→ χ2

A critical component of data driven smooth tests is the selection of suitable set of functions

ΨS from a candidate set Ψ = ψi1i2 : 1 ≤ i1, i2 ≤ d. In the spirit of Kallenberg and Ledwina

(1999), we rearrange the candidate set to Ψs such that Ψs,1 = ψ11 and the remaining elements,

ψi1i2 : 1 ≤ i1, i2 ≤ d, (i1, i2) 6= (1, 1), are ordered in the descending order according to their

corresponding entries in the vector Ω−1/2Ψ |

√ngΨ|. The normalization of the moments is necessary

as these moments are generally correlated and differ in variance.4 Denote by |Ψs| the cardinality of

Ψs. Let Ψs,(k) = Ψs,1, ...,Ψs,k, k = 1, ..., |Ψs| and the corresponding gs,(k)(U1t, U2t;α0), Ωs,(k) and

Qs,(k), as given in (15), (22) and (23), are similarly defined; their sample analogs are denoted by

gs,(k)(αn), Ωs,(k) and Qs,(k), respectively. For each k, let us,(k) = Ω−1/2s,(k) gs,(k)(αn). Following Inglot

and Ledwina (2006), we use the following criterion to select a suitable ΨS , whose cardinality is

denoted by KS :

KS = mink : Qs,(k) − Γ(k, n, ζ) ≥ Qs,(r) − Γ(r, n, ζ), 1 ≤ k, r ≤ |Ψs|, (24)

where Γ(r, n, ζ) is a complexity penalty defined as:

Γ(r, n, ζ) =

r log n if max1≤i≤|Ψs| |√nus,i| ≤

√ζ log n,

2r if max1≤i≤|Ψs| |√nus,i| >

√ζ log n,

where us,i is the ith element of us,(k) and ζ = 2.4. Note that the penalty is ‘adaptive’ such that

either the AIC or BIC is adopted in a data driven manner, depending on the empirical evidence

pertinent to the magnitude of deviation from the null copula distribution.

Next we present the asymptotic properties of the proposed test QS based on a set of basis

functions ΨS selected according to the procedure described above.

4In contrast, Kallenberg and Ledwina (1999) use ranks to construct a test of independence. The moments basedon Legendre polynomials are free of nuisance parameters and orthonormal under the null hypothesis.

Theorem 4. Let KS be selected according to (24). Suppose all conditions of Theorem 1 and 2 hold.

(a) Suppose (U1t, U2t) follows the distribution C0(u1, u2, α0). Then Pr(KS = 1)p→ 1 and QS

d→ χ21.

(b) Let P be an alternative. Suppose that under P, αn converges to some pseudo-parameter α

such that n−1∑n

t=1EP [gs,Υ(S1(X1t), S2(X2t);α)] 6= 0 for some Υ in 1, ..., k, where gs,Υ(u1t, u2t;α)

is the Υth element of gs,(k)(u1t, u2t;α), k in 1, . . . , |Ψs|. Then QS →∞ as n→∞.

Since the proposed test is constructed in a data driven fashion, the number of functions selected

is random even under the null hypothesis. Consequently, the χ21 distribution does not provide

an adequate approximation to the distribution of the test statistic for moderate sample sizes.

Nevertheless since it is asymptotically distribution free, the asymptotic distribution of the test

under the null can be easily approximated via simulations. Below we present a simple simulation

procedure for this purpose.

• Step 1: Generate an n-by-2 i.i.d. random sample (U1t, U2t)nt=1 from the Uniform[0, 1]2.

• Step 2: Select a set of basis functions ΨS according to (24); Calculate the test statistic Q∗S .

• Step 3: Repeat steps 1 and 2 L times; denote the results by Q∗S,lLl=1.

• Step 4: Use the qth percentile of Q∗S,lLl=1 to approximate the qth percent critical value for

the distribution of QS .

We stress that unlike tests that rely on bootstrap procedures for inference, the simulation

procedure adopted in this test is particularly easy to implement. It samples from independent

standard uniform distributions and does not entail simulation of the unknown censoring process

nor repeated estimation of the hypothesized copula from bootstrapped samples.

5 Simulations

We conduct a series of Monte Carlo simulations to assess the finite-sample performance of the

proposed test. Following the design of Wang (2010), we consider two commonly used parametric

Archimedean copulas as the null copula distributions: the Clayton and Frank copula. They share

a common structure Cφ(u1, u2) = φ−1φ(u1) + φ(u2), 0 < u1, u2 < 1, where the function φ(·) is

often referred to as the generator of an Archimedean copula. Regarding the alternative copulas,

we include the same two Archimedean copulas and Gumbel copula. For each copula, we consider

three levels of dependence by setting its parameter such that the corresponding Kendall’s τ = 0.3,

0.5 and 0.7. We generate the marginal distributions of T1 and T2 from the standard exponential

distribution. The censoring variable D1 and D2 are generated from the exponential distributions

with mean 5, such that there are about 30% of pairs where at least one component is censored. We

set sample size n = 300 and repeat each experiment 1,000 times.

We select the basis functions from the normalized Legendre polynomials, which are orthonormal

with respect to the Lebesgue measure on [0, 1]. Our ψ functions consist of elements of the tensor

products of these basis functions. Let ψj denote the jth normalized Legendre polynomial and

ψij(u1, u2) = ψi(u1)ψj(u2). We consider the candidate set ψ11, ψ12, ψ22, ψ13, ψ23, ψ33, which is the

upper triangle of a 3 by 3 matrix of basis functions. We apply the information criterion prescribed

in the previous section to select a suitable set of basis functions, based on which the test statistic is

calculated. We have experimented with larger candidate sets and found little difference in terms of

size and power across experiment designs and sample sizes. The critical value is computed following

the prescribed simulation procedure.

Most existing copula tests for censored data rely on bootstrap for inference. One exception is

Wang (2010). Let U and V be uniform random variables with an Archimedean copula function Cφ

and Y1 = φ(U)/ φ(U) + φ(V ) and Y2 = Cφ(U, V ). Genest and Rivest (1993) show that Y1 and Y2

are independent. Wang (2010) designed an Archimedean copula specification test that examines

the correlation between estimated Y1 and Y2. For censored data, he applied a multiple imputation

procedure to recover the pseudo complete data and then construct the test based on multiply

imputed complete data sets. Both our test and Wang’s test do not rely on bootstrap critical

values and therefore are computationally efficient. Unlike Wang’s test, ours does not require data

imputation.

For each experiment, we estimate the marginal distributions nonparametrically and calculate

their corresponding test statistics of copula specification. The empirical sizes and powers of the

proposed tests QS are reported in Table 1. For comparison, we also include the reported results

by Wang (2010) in the columns signified by Tm. The bold entries reflect the empirical sizes and

the regular font entries show the power of the tests. Our test shows reasonable size around 5% and

Wang’s test tends to be undersized. The two tests have similar powers in the testing of Clayton

copula. When the null hypothesis is Frank copula, our test shows considerably higher power,

especially when τ = 0.3 and 0.5. We caution that the two tests are not completely comparable.

Wang’s test utilizes a relationship specific to Archimedean copulas while our test is a general one

that can be applied to all copulas. In addition, Wang’s test only examines one specific deviation from

the null (no-zero correlation between two independent random variables), whereas ours considers

many plausible deviations in a data-driven fashion.

Table 1: Empirical sizes and powers of the proposed tests (%)

Copula under the null hypothesis

Clayton Frank

True Copula τ QS Tm QS Tm

Clayton 0.3 7.7 2.0 51.3 10.7

0.5 7.6 3.0 91.7 43.5

0.7 6.4 5.3 96.1 73.4

Frank 0.3 69.5 76.8 4.1 2.0

0.5 97.1 100 4.7 2.2

0.7 98.4 100 7.8 3.2

Gumbel 0.3 95.2 84.6 47.6 19.6

0.5 100 99.6 84.5 67.0

0.7 100 100 97.1 94.2

Note: This table reports the finite sample performances of the proposed test (QS). The finite sample performances

of Wang’s (2010) test (Tm) are also reported. The bold entries reflect the empirical sizes. The nominal size is 5%.

The sample size is 300. Results are based on 1, 000 replications.

6 Empirical example

In this section, we apply the proposed tests to two insurance company data.5 In either example, we

first conduct the specification test. When the null hypothesis is rejected, we proceed to estimate

an alternative copula function using diagnostic information provided by the test.

6.1 LOSS and ALAE data

This dataset, collected by the US Insurance Services Office, consists of 1,500 observations of insur-

ance loss and Allocated Loss Adjustment Expenses (ALAE). Among them, 34 observations of the

loss are right censored because the amount of claim cannot exceed the policy limit. The ALAE

data are not censored. Frees and Valdez (1998), Klugman and Parsa (1999), Chen and Fan (2005),

Denuit et al. (2006) and Lin and Wu (2015) have analyzed the subset of 1,466 complete observa-

tions. Chen et al. (2010) and Yilmaz and Lawless (2011) have studied the full dataset with 1,500

observations. In the analysis of the full dataset, it is important to account of the censoring in the

loss data; for example, the mean loss of censored claims is much higher than the corresponding

mean for uncensored claims, see Frees and Valdez (1998).

5We thank the Society of Actuaries, through the courtesy of Edward W. (Jed) Frees and Emiliano A. Valdez, forallowing use of these two data sets in this paper.

We estimate the marginal survival functions by the Kaplan-Meier estimator. The scatterplots

of the loss versus ALAE data on the logarithm scale and the resulting pseudo-observations are

displayed in Figure 1a and 1b.6 The black and gray symbols represent uncensored and censored

observations respectively in Figure 1a and 1b. There exists apparent positive right-tail dependence

between Loss and ALAE: large losses tend to be associated with large ALAE’s. On the other hand,

there is relatively weaker left-tail dependence between them. Due to this asymmetric dependence,

Gumbel copula was chosen as the best fitting copula by the existing studies. We therefore conduct

the specification test under null hypotheses of Gumbel copula. The semiparametric estimator yields

αn = 1.4440. The smooth test statistic QS = 0.0151, with a data-driven dimension KS = 1. The

simulated critical value for the sample size n = 1, 500 at the 5% level is 5.6612. In accordance with

the previous studies, the proposed smooth test does not reject the hypothesis of Gumbel copula.

2 4 6 8 10 12 14

*********************

********

**********************************

********

***********************

*******

*********************************

**********************************************************

****************************

***************

***********

******

*********************************************************************

*************

***************

********

******************************************

********

*********

*******************************************************************

***********

********

********************

************************************************

********

*********

*******************************

******

************

****************************************

******

********************

******

**************

***********

**************

******

************************

********

*******

***************

******

********

*********

********

******

********

*******

(a) Loss and ALAE on a log scale

0.0 0.2 0.4 0.6 0.8 1.0

*******************

*********************************

************************

*******************************

**********************************************************

******************

**********

******

***********

**********************************************************************

*******

********

***********

*******************************

*******************************************************************

******

**************

*************************************************

************

*******************

***************************************

*******************

******

********

****************

***********

******

*******

******

*********

**** *

******

(b) Pseudo-observations

Figure 1: Scatterplots of Loss Versus ALAE

We also conduct the specification test under the null hypotheses of Gaussian copula due to its

popularity in practice 7. The semiparametric estimate of the Gaussian copula coefficient yields

αn = 0.4671. The smooth test statistic, with a data-driven dimension KS = 3, is calculated to

be QS = 39.5312, considerably larger than the critical value of 5.6612. Thus the proposed test

decisively rejected the hypothesis of Gaussian copula at the 5% level.

6In this section, the pseudo-observations correspond to the marginal distributions rather than the marginal survivalfunctions.

7One important application of Gaussian copula for censored data is to construct the conditional Spearman corre-lations. Ang and Chen (2002) used the truncated normal distribution to define the conditional Pearson correlationson joint downside movements or upside movements, which occur when both a U.S. equity portfolio and the U.S.market fall or rise. They find significant correlation asymmetry in the U.S. equity market. By applying the Gaussiancopula, it’s straightforward to construct the conditional Spearman’s correlations and capture the asymmetry in thefinancial market.

An advantage of the proposed tests is that they provide useful pointers in the construction of

alternative copulas. When a null hypothesis is rejected, we can augment the null copula distribution

with extra features suggested by the selected moment functions of our data-driven test. This

strategy is used by Kallenberg (2008) for copula estimation with complete data and here we extend

it to censored data. In the present example, given the hypothesized Gaussian copula and the

selected moment functions ψ11, ψ12, ψ23, the implied alternative copula takes the form

ca(u1, u2;α, λ) = cG(u1, u2;α) exp(λ1ψ11(u1, u2) + λ2ψ12(u1, u2) + λ3ψ23(u1, u2)− λ0),

where cG is the Gaussian copula density function, and λ0 is a normalization constant. We adopt a

two-stage estimation approach. In the first step, we estimate the parameter α by restricting λi = 0,

i = 1, 2, 3. In the second step, we fix the estimated αn and use the EM algorithm to evaluate

the log-likelihood function corresponding to the parametric copula family ca(u1, u2; αn, λ). At the

maximization step, we use a Newton-type algorithm (see Wu, 2010) to update λ’s.

Figure 2a and 2b report the contour plots of the Gaussian copula density cG(αn = 0.4671) and

the alternative copula density function ca. The Gaussian copula allows only symmetric dependence.

In contrast, the alternative copula density captures a salient feature of the data: the density at the

upper-right corner is clearly higher than that at the lower-left tail corner, reflecting the asymmetric

tail behaviors revealed in Figure 1b.

0.2 0.4 0.6 0.8

(a) Gaussian copula density cG

0.2 0.6

(b) Alternative copula density ca

Figure 2: Contour plots of copula density functions for Loss and ALAE data

6.2 Annuitants mortality data

Standard insurance industry practice assumes independence of lifetimes when pricing annuities

where the promise is based on more than one life. Frees et al. (1996) study a dataset on 14,947

joint and last-survivor annuity contracts of a large Canadian insurer over the period December 29,

1988, through December 31, 1993. They use a Frank copula to model joint-life mortality and find

strong positive dependence between joint lives. Their finding suggests that annuity values ought

to be reduced by approximately 5 percent when dependent mortality models are used. Similarly

Carriere (2000) found that a (generalized) Frank copula fits the data well.

The dataset contains 11,454 heterosexual non-duplicating joint life contracts. During the sam-

ple period, there were 195 contracts with both annuitants died, 1,303 contracts with one of two

annuitants died, and 9,956 contracts with both annuitants survived. Due to the extremely high

censoring rate (98.3%) in this data, we utilize a more balanced subset in this exercise. In particu-

lar, we include 195 uncensored observations and randomly draw 195 observations from the sample.

The resultant subsample has 29 contracts with one variable censored and 166 contracts with both

variables censored; the overall censoring rate is 50%. We estimate the marginal survival functions

by the Kaplan-Meier estimator. The scatterplots of the lifetimes of the primary annuitants versus

those of secondary annuitants, and their corresponding pseudo-observations are displayed in Figure

3a and 3b. The black and gray symbols represent uncensored and censored lifetimes respectively

in Figure 3a and 3b. We conduct the specification test under the null hypotheses of Frank copula.

The semiparametric estimator yields αn = 7.9965, implying a strong positive dependence between a

couple’s mortalities. The smooth test statistic QS = 5.0244, with a data-driven dimension KS = 1.

The simulated critical value for the sample size n = 390 at the 5% level is 7.6409. In accordance

with the previous studies, the proposed smooth test does not reject the hypothesis of Frank copula.

References

Andersen, P. K., Ekstrøm, C. T., Klein, J. P., Shu, Y., and Zhang, M.-J. (2005), “A Class of Good-

ness of Fit Tests for a Copula Based on Bivariate Right-Censored Data,” Biometrical Journal,

47, 815–824.

Ang, A. and Chen, J. (2002), “Asymmetric correlations of equity portfolios,” Journal of financial

Economics, 63, 443–494.

Berg, D. (2009), “Copula goodness-of-fit testing: an overview and power comparison,” European

Journal of Finance, 15, 675–701.

60 70 80 90 100

Lifetimes of primary annuitants

*******

********

*******

************

********

*******

********

(a) Lifetimes of the annuitants

0.0 0.2 0.4 0.6 0.8 1.0

Lifetimes of primary annuitants

*******

******

(b) Pseudo-observations

Figure 3: Scatterplots of lifetimes of the primary and secondary annuitants

Carriere, J. F. (2000), “Bivariate Survival Models for Coupled Lives,” Scandinavian Actuarial

Journal, 2000, 17–32.

Chen, X. and Fan, Y. (2005), “Pseudo-likelihood ratio tests for semiparametric multivariate copula

model selection,” Canadian Journal of Statistics, 33, 389–414.

— (2006), “Estimation and model selection of semiparametric copula-based multivariate dynamic

models under copula misspecification,” Journal of Econometrics, 135, 125–154.

Chen, X., Fan, Y., Pouzo, D., and Ying, Z. (2010), “Estimation and model selection of semipara-

metric multivariate survival functions under general censorship,” Journal of Econometrics, 157,

129 – 142.

Cherubini, U., Luciano, E., and Vecchiato, W. (2004), Copula Methods in Finance, Wiley, England.

Cherubini, U., Mulinacci, S., Gobbi, F., and Romagnoli, S. (2011), Dynamic Copula methods in

finance, Wiley, England.

Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977), “Maximum likelihood from incomplete

data via the EM algorithm,” Journal of the Royal Statistical Society. Series B (Methodological),

1–38.

Denuit, M., Purcaruff, O., and Van Keilegorni, I. (2006), “Bivariate Archimedean copula models

for censored data in non-life insurance,” Journal of Actuarial Practice, 13, 5–32.

Efron, B. and Tibshirani, R. (1996), “Using specially designed exponential families for density

estimation,” The Annals of Statistics, 24, 2431–2461.

Emura, T., Lin, C.-W., and Wang, W. (2010), “A goodness-of-fit test for Archimedean copula

models in the presence of right censoring,” Computational Statistics & Data Analysis, 54, 3033–

Escanciano, J. C. and Lobato, I. N. (2009), “An automatic portmanteau test for serial correlation,”

Journal of Econometrics, 151, 140–149.

Escanciano, J. C., Lobato, I. N., and Zhu, L. (2013), “Automatic specification testing for vector

autoregressions and multivariate nonlinear time series models,” Journal of Business & Economic

Statistics, 31, 426–437.

Fermanian, J.-D. (2013), “An overview of the goodness-of-fit test problem for copulas,” in Copulae

in Mathematical and Quantitative Finance, Springer, pp. 61–89.

Frees, E. and Valdez, E. (1998), “Understanding relationships using copulas,” North American

actuarial journal, 2, 1–25.

Frees, E. W., Carriere, J., and Valdez, E. (1996), “Annuity valuation with dependent mortality,”

Journal of Risk and Insurance, 229–261.

Genest, C., Ghoudi, K., and Rivest, L. (1995), “A Semiparametric Estimation Procedure of De-

pendence Parameters in Multivariate Families of Distributions,” Biometrika, 82(3), 543–552.

Genest, C., Remillard, B., and Beaudoin, D. (2009), “Goodness-of-fit tests for copulas: A review

and a power study,” Insurance: Mathematics and Economics, 44, 199–213.

Genest, C. and Rivest, L.-P. (1993), “Statistical inference procedures for bivariate Archimedean

copulas,” Journal of the American statistical Association, 88, 1034–1043.

Gray, R. and Pierce, D. (1985), “Goodness-of-fit tests for censored survival data,” The Annals of

Gribkova, S. and Lopez, O. (2015), “Non-parametric Copula Estimation Under Bivariate Censor-

ing,” Scandinavian Journal of Statistics, 42, 925–946.

Inglot, T. and Ledwina, T. (2006), “Towards data driven selection of a penalty function for data

driven Neyman tests,” Linear algebra and its applications, 417, 124–133.

Jennrich, R. I. (1969), “Asymptotic properties of non-linear least squares estimators,” The Annals

of Mathematical Statistics, 40, 633–643.

Kallenberg, W. C. (2008), “Modelling dependence,” Insurance: Mathematics and Economics, 42,

127–146.

— (2009), “Estimating copula densities, using model selection techniques,” Insurance: mathematics

and economics, 45, 209–223.

Kallenberg, W. C. and Ledwina, T. (1999), “Data-driven rank tests for independence,” Journal of

the American Statistical Association, 94, 285–301.

Klugman, S. and Parsa, R. (1999), “Fitting bivariate loss distributions with copulas,” Insurance:

Mathematics and Economics, 24, 139–148.

Lai, T. L. and Ying, Z. (1991), “Estimating a distribution function with truncated and censored

data,” The Annals of Statistics, 19(1), 417–442.

Lakhal-Chaieb, M. (2010), “Copula inference under censoring,” Biometrika, 97, 505–512.

Ledwina, T. (1994), “Data driven version of the Neyman smooth test of fit,” Journal of American

Statistical Association, 89, 1000–1005.

Lin, J. and Wu, X. (2015), “Smooth tests of copula specifications,” Journal of Business & Economic

McNeil, A., Frey, R., and Embrechts, P. (2005), Quantitative risk management: concepts, tech-

niques, and tools, Princeton university press.

Neyman, J. (1937), “‘Smooth test’ for goodness of fit,” Skandinaviske Aktuarietidskrift, 20, 150–199.

Patton, A. (2012), “A review of copula models for economic time series,” Journal of Multivariate

Analysis, 110, 4–18.

Pena, E. A. (1998), “Smooth goodness-of-fit tests for composite hypothesis in hazard based models,”

The Annals of Statistics, 26, 1935–1971.

Rayner, J. and Best, D. (1990), “Smooth tests of goodness of fit: an overview,” International

Statistical Review, 58, 9–17.

Rigobon, R. and Stoker, T. M. (2007), “Estimation with censored regressors: basic issues,” Inter-

national Economic Review, 48, 1441–1467.

Salmon, F. (2009), “Recipe for Disaster: The Formula That Killed Wall Street,” Wired, 17, 74–79,

Shih, J. H. (1998), “A goodness-of-fit test for association in a bivariate survival model,” Biometrika,

85, 189–200.

Shih, J. H. and Louis, T. A. (1995), “Inferences on the Association Parameter in Copula Models

for Bivariate Survival Data,” Biometrics, 51, 1384–1399.

Wang, A. (2010), “Goodness-of-fit tests for Archimedean copula models,” Statistica Sinica, 441–

Wang, W. and Wells, M. T. (2000), “Model selection and semiparametric inference for bivariate

failure-time data,” Journal of the American Statistical Association, 95, 62–72.

Wu, X. (2010), “Exponential Series Estimator of multivariate densities,” Journal of Econometrics,

156, 354 – 366.

Yilmaz, Y. E. and Lawless, J. F. (2011), “Likelihood ratio procedures and tests of fit in parametric

and semiparametric copula models with censored data,” Lifetime data analysis, 17, 386–408.

Appendix

We first introduce some notations. For a vector a = (a1, ..., ak)′ and r > 0, define |a|r =

i=1 |ai|r)1/r, |a| = |a|2 and |a|∞ = max1≤i≤k |ai|. For a matrix A = (aij)1≤i≤m,1≤j≤n, let ‖A‖denote the Euclidean norm and ‖A‖∞ = max1≤i≤m,1≤j≤n |aij |. For j = 1, 2, denote Ujt = Sj(Xjt).

To simplify notation, we let∑

t =∑n

t=1,∑

j =∑2

j=1. Some conditions are needed to establish the

large sample properties of the proposed tests.

Conditions C1 - C5 are sufficient to ensure the convergence of the maximum likelihood esti-

mators αn to α0 in Theorem 1.

C1 (i) The sequence of survival variables, (T1t, T2t)nt=1, is an i.i.d. sample from an unknown

survival function S(t1, t2) with continuous marginal survival functions Sj(·), j = 1, 2.

(ii) The sequence of censoring variables (D1t, D2t)nt=1, is an independent sample with joint

survival functions Gt(x1, x2)nt=1 = P (D1t > x1, D2t > x2)nt=1 and marginal survival

functions Gjt(·)nt=1, j = 1, 2.

(iii) The censoring variables (D1t, D2t) are independent of survival variables (T1t, T2t) and

there is no mass concentration at 0 in the sense that lim supn→∞ n−1∑

t(1−Gjt(η))→ 0 as

η → 0.

C2 The true (unknown) copula function C(u1, u2) has continuous partial derivatives.

C3 Let A be a compact subset of Rp. For every ε > 0,

lim infα∈A:|α−α0|≥ε

E[l(U1t, U2t;α0)]− E[l(U1t, U2t;α)] > 0.

C4 (i) For any (u1, u2) ∈ (0, 1)2, l(u1, u2;α) is a continuous function of α ∈ A.

(ii) Let Lt = supα∈A |l(U1t, U2t;α)| and Ltα = supα∈A |lα(U1t, U2t;α)|. Then

limM→∞

limn→∞

ELtI(Lt ≥M) + LtαI(Ltα ≥M) = 0.

(iii) For any η > 0, ε > 0, there is M > 0 such that |l(u1, u2;α)| ≤ M |l(u1, u2;α)| for all

α ∈ A and all uj ∈ [η, 1) such that 1− uj ≥ ε(1− uj), j = 1, 2.

C5 If Tjtnt=1 are subject to non-trivial censoring (i.e., Djt 6=∞ ), then Sj is truncated at the tail

in the sense that for some τj , Sj(xj) = Sj(τj) for all xj ≥ τj and

lim inf n−1∑n

t=1Gjt(τj)Sj(τj) > 0.

In contrast to the censoring mechanism in Shih and Louis (1995), Condition C1(ii) allows the

censoring variables (D1t, D2t)nt=1 to be non-identically distributed. In addition, no assumptions

are made on the joint survival function Gt(x1, x2) of the censoring variables (D1t, D2t). Therefore,

general censoring mechanism is allowed in this framework, for example, both variables are censored,

one censored and the other uncensored, or one random censoring and the other fixed censoring.

Conditions C3 is the identifiability condition for the tests. Condition C5 is imposed to handle the

tail instability of the Kaplan-Meier estimator, specifically for non-identically distributed censoring

times.

Conditions A1 - A4 are sufficient to ensure the asymptotic normality of αn in Theorem 1.

A1 (i) Bn = −n−1∑

tElαα(U1t, U2t;α0) is positive definite;

(i) Σn = n−1∑

t varlα(U1t, U2t;α0) +∑

jWjt is finite and positive definite, where Wjt is

defined in (16);

(ii) lα(U1t, U2t;α0) +∑

jWjtnt=1 satisfies Lindeberg condition.

A2 lαα(u1, u2;α) and lαj(u1, u2;α), j = 1, 2 are well-defined and continuous in (u1, u2;α) ∈(0, 1)2 ×A.

A3 (i) |lα(u1, u2;α0)| ≤ q∏2j=1uj(1 − uj)−aj for some q > 0 and aj ≥ 0 such that

lim supn−1∑

tE[∏2j=1Ujt(1− Ujt)−2aj ] <∞;

(ii)|lαj(u1, u2;α0)| ≤ constant × uj(1 − uj)−bjuk(1 − uk)−ak for some bj , ak and j 6= k

such that lim supn−1∑

tE[Ujt(1−Ujt)ξj−bjUkt(1−Ukt)−ak ] <∞ for some ξj ∈ (0, 1/2).

A4 (i)Let Ltαj = supα∈A |lαj(U1t, U2t;α)| and Ltαα = supα∈A |lαα(U1t, U2t;α)|.Then, limM→∞ lim supn→∞ n

−1∑

tELtαjI(Ltαj ≥M) + LtααI(Ltαα ≥M) = 0;

Conditions A3 and A4 allow the score function and its partial derivatives with respect to uj ,

j = 1, 2 to blow up at the boundaries but require them be dominated by a weighted function which

satisfies certain moment conditions. These conditions characterize many popular copula functions

such as the Gaussian, t and Clayton copulas (see Chen et al. (2010) for details).

Proof of Theorem 1. Under conditions C1-C5, the consistence of αn is readily obtained according

to Proposition 3.1 of Chen et al. (2010). Next under conditions A1-A4, the asymptotic normality

of αn is given by Proposition 3.2 of Chen et al. (2010).

Conditions T1 - T3 are sufficient to ensure the asymptotic normality of the test statistics QS

in (23).

T1 (i) For any (u1, u2) ∈ (0, 1)2, gS(u1, u2;α) is a continuous function of α, gS(u1, u2;α) has

continuous partial derivatives with respect to u1, u2;

(ii) n−1∑

tE[supα∈A:|α−α0|=o(1) |gS(U1t, U2t;α)|] <∞.

T2 (i) For any (u1, u2) ∈ (0, 1)2, gS,α(u1, u2;α) is a continuous function of α in a neighborhood

of α0;

(ii) n−1∑

tE[supα∈A:|α−α0|=o(1) |gS,α(U1t, U2t;α)|] <∞.

T3 (i) ΩS defined in (19) is finite and positive definite; (ii) Ωs,(k) has all its eigenvalues bounded

below and above by some finite positive constants.

Proof of Theorem 2. In the proofs that follow we make use of Lemma A.1. of Chen et al. (2010).

We present the results below for the reader’s convenience.

Lemma (Chen et al., 2010) Suppose that Conditions C1 and C5 are satisfied. Then: (i) the

marginal Kaplan-Meier estimators are uniformly strongly consistent: supx≤τj |Sj(x) − Sj(x)| → 0

a.s. for j = 1, 2; (ii) they can be expressed as martingale integrals:

Sj(x)− Sj(x) = −Sj(x)

−∞

Sj(u−)

∑t dMjt(u)∑

t I(Xjt ≥ u)

where Mjt(x) = Njt(x)−∫ x−∞ Jjt(u)dHj(u), in which Njt(x) = δjtI(Xjt ≤ x), Jjt(x) = I(Xjt ≥ x),

and Hj(x) = − log(Sj(x)).

By Theorem 1, we have |αn − α0| = op(1). Applying Taylor’s expansion to g(αn) with respect

to αn at α0 yields

gS(αn) =1

gS(U1t, U2t;α0) + g′S,α(U1t, U2t; αn)(αn − α0)

, (A.1)

where αn is between α0 and αn.

Under Condition T2, we have

supα∈A:|α−α0|=o(1)

∣∣∣∣∣ 1n∑t

gS,α(U1t, U2t;α)− GS,α

∣∣∣∣∣ = o(1).

It follows that (A.1) can be written as,

gS(αn) =1

gS(U1t, U2t;α0) + G′S,α(αn − α0) + op(n−1/2). (A.2)

Next Theorem 1 indicates that αn can be expressed as an asymptotically linear estimator such that

αn − α0 = B−1n

lα(U1t, U2t;α0) + op(1). (A.3)

Plugging (A.3) into (A.2) yields,

gS(αn) =1

gS(U1t, U2t;α0) + G′S,αB−1n lα(U1t, U2t;α0)+ op(1) (A.4)

By the mean-value theorem, we expand (A.4) to obtain

gS(αn) =1

gS(U1t, U2t;α0) + G′S,αB−1n lα(U1t, U2t;α0)+ Jn + op(1),

[gS,j(U1t, U2t;α0) + G′S,αB−1n lαj(U1t, U2t;α0)](Ujt − Ujt),

in which (U1t, U2t) lie on the line segment between (U1t, U2t) and (U1t, U2t).

By Lemma 1, we have

Jn = − 1

gS,j(U1t, U2t;α0) + G′S,αB−1n lαj(U1t, U2t;α0)

Sj(Xjt)

∫ Xjt

−∞

Sj(u−)

∑s dMjs(u)∑

s I(Xjs ≥ u).

Then following the same logic of the proof in Proposition 3.2 of Chen et al. (2010), we can see

that gS(αn) is asymptotically a sum of independent zero-mean random vectors. Given Condition

T3 (i), Theorem 2 now follows from the standard multivariate central limit theorem for independent

but non-identically distributed random variables.

Proof of Theorem 3. By Theorem 1, we have αnp→ α0. Using the boundedness of ψi, 1 ≤ i ≤ d, it

follows that GS,αp→ GS,α and ΩS

p→ ΩS . The result of this theorem then follows readily from the

asymptotic normality of gS(αn) given in Theorem 2.

Proof of Theorem 4. Define the simplified BIC rule as follows

SKS = mink : Qs,(k) − k log n ≥ Qs,(r) − r log n, 1 ≤ k, r ≤ |Ψs|, (A.5)

In order to prove Theorem 4(a), we need to prove that, under the null hypothesis,

limn→∞

Pr(KS = SKS) = 1, (A.6)

and that

limn→∞

Pr(SKS = 1) = 1, (A.7)

We start by proving (A.6). Recall that us,(k) = Ω−1/2s,(k) gs,(k)(αn), which is a k× 1 vector. Define the

An(ζ) =√

n|us,(k)|∞ >√ζ log n

By the definition of Γ(r, n, ζ) defined in (25), in order to prove (A.6), it suffices to prove P (An(ζ))→

0. Define

us,(k) = Ω−1/2s,(k)

gs,(k)(U1t, U2t, α0).

We can rewrite us,(k) as,√nus,(k) =

√nΩ−1/2s,(k) Ω

1/2s,(k)us,(k) +Rn(k),

Rn(k) =√nΩ−1/2s,(k)

gs,(k)(αn)− 1

gs,(k)(U1t, U2t, α0)

. (A.8)

By (A.4) and ‖αn − α0‖ = Op(n−1/2), we have

gs,(k)(αn) =1

gs,(k)(U1t, U2t, α0) +Op(n−1/2).

By (A.8), we can obtain ‖Rn(k)‖∞ = Op(1). Therefore, for any ε > 0, we have P (|Rn(k)|∞ >

2−1√ζ log n) ≤ ε/2 for all large n.

Also define the event

AJ =‖Ω−1/2

s,(k) Ω1/2s,(k)‖ > J

where J is a positive constant to be defined later.

By the consistency of θn and uniform law of large number theorem (ULLN) of Jennrich (1969,

Theorem 2), we have,

‖Ωs,(k) − Ωs,(k)‖ = op(1)

Under Condition T3 (ii), it’s straightforward to show that there exists a positive constant J such

that P (AJ) < ε/2 for all ε > 0.

Notice that

P (An(ζ)) ≤ P (An(ζ) ∩AcJ) + P (AJ)

≤ P(√

n|Ω−1/2s,(k) Ω

1/2s,(k)us,(k)|∞ > 2−1

√ζ log n,AcJ

(|Rn(k)|∞ > 2−1

√ζ log n

)+ ε/2

≤ P(√

n‖Ω−1/2s,(k) Ω

1/2s,(k)‖|us,(k)| > 2−1

√ζ log n,AcJ

)+ ε,

≤ P(√

n|us,(k)| > (2J)−1√ζ log n

)+ ε,

where AcJ denotes the complementary set of AJ .

Now since√n|us,(k)| = Op(1), it follows that

n|us,(k)| > (2J)−1√ζ log n

)→ 0,

as n→∞. Hence, since ε > 0 was arbitrary we conclude that (A.6) holds.

Next, we prove (A.7) also holds. In view of (A.6), we have Pr(SKS = 1) = 1−∑|Ψs|

k=2 Pr(SKS =

k). Because SKS = k implies that dimension k “beats ”dimension 1 and Qs,(1) ≥ 0, we obtain by

definition of SKS ,

Pr(SKS = k) ≤ Pr(Qs,(k) − k log n ≥ Qs,(1) − log n) = Pr(Qs,(k) ≥ (k − 1) log n).

We have

Pr(Qs,(k) ≥ (k − 1) log n) ≤ Q1n +Q2n,

where Q1n = Pr(Qs,(k) ≥ (k − 1) log n/2) and Q2n = Pr(|√nus,(k) −

√nus,(k)|2 ≥ (k − 1) log n/2).

Since under the null Qs,(k) converges to a non-degenerate (χ2k distributed) variable for any k,

it’s easy to prove that Q1n → 0 as n→∞. Similarly, because√n|us,(k) − us,(k)| = Op(1), Q2n → 0

as n→∞.

Further we have

Pr(QS ≤ x) = Pr(Qs,(1) ≤ x)− Pr(Qs,(1) ≤ x, SKS ≥ 2) + Pr(QS ≤ x, SKS ≥ 2).

Because Qs,(1)d→ χ2

1 under (12) and Pr(SKS ≥ 2) = 0 as n → ∞, it follows immediately that

QSd→ χ2

We now prove Theorem 4(b). Define the simplified AIC rule as follows

AKS = mink : Qs,(k) − 2k ≥ Qs,(r) − 2r, 1 ≤ k, r ≤ |Ψs|. (A.9)

In order to prove Theorem 4(b), we need to prove that under the alternative distribution P,

limn→∞

Pr(KS = AKS) = 1, (A.10)

and that

Pr(AKS ≥ Υ)→ 1, (A.11)

We denote by An(ζ)c the complementary set of An(ζ). Under the alternative distribution P, there

exists Υ ≤ |Ψs| such that n−1∑

tEP [gs,Υ(S1(X1t), S2(X2t), α)] 6= 0. Since Υ ≤ |Ψs|, we then have

P (An(ζ)c) ≤ P(√

n|us,(Υ)| <√ζ log n

)→ 0,

since us,(Υ) = us,(Υ) + op(1) by the ULLN. Hence, (A.10) holds.

Now, according to the ULLN, for k = 1, ...,Υ−1, we have |us,(k)|2 → 0 and |us,(Υ)|2 → |us,Υ|2 >0. We can obtain,

P (AKS = k) ≤ P (Qs,(k) − 2k ≥ Qs,(Υ) − 2Υ)

≤ P (n|us,(k)|2 ≥ 2(k −Υ) + n|us,(Υ)|2)→ 0

Therefore, (A.11) also holds.

For a generic constant C > 0, we have

Pr(QS ≤ C) = Pr(QS ≤ C,AKS ≥ Υ) + o(1) ≤ P (n|us,(Υ)|2 ≤ C) + o(1) = o(1),

where the last equality follows the ULLN. Then, QS →∞ as n→∞.

A Flexible Copula Speci cation Test under Censorship...(2010) proposed a likelihood-based criterion...

Documents

DynamicValuationandHedgingofCounterparty CreditExposure · Counterparty Risk Markovian Copula Model and Common Shocks Copula Interpretation Hedging the CVA in the Markovian Copula

Therapeutic Censorship

Vine Copula

News censorship

The Military Censorship of Pictures: Photographs that came under … · 2016-10-21 · The Military Censorship of Pictures: Photographs that came under the ban during the world war

Censorship and Self-Censorship in Research on the Iraq War By …personal.rhul.ac.uk/uhte/014/Censorship/Atlanta... · 2016-03-17 · 1 . Censorship and Self-Censorship in Research

Copula Marginal Algorithm

Green Censorship

Package 'copula

Burton Lee - Startups Under War & Censorship - European Entrepreneurship - Course Intro - Stanford Engineering - Jan 5 2015

Film Censorship

Copula Report

Dynamic asset allocation under regime switching: an in-sample and out-of-sample study under the Copula-Opinion Pooling framework

Package ‘CDVine’ · 28 = rotated BB6 copula (90 degrees) 29 = rotated BB7 copula (90 degrees) 30 = rotated BB8 copula (90 degrees) 33 = rotated Clayton copula (270 degrees) 34

Israeli Censorship

Internet Censorship

The Military Censorship of Pictures: Photographs that came under

Package ‘VineCopula’ · 28 = rotated BB6 copula (90 degrees) 29 = rotated BB7 copula (90 degrees) 30 = rotated BB8 copula (90 degrees) 33 = rotated Clayton copula (270 degrees)

Risk Management in Oil Market: A Comparison between ...journals.ut.ac.ir/article_76015_5eeb16251372d30bd92b2c6f...models such as normal copula, Gumbel copula, and Clayton copula, estimated

Music censorship