A Bayesian Nonlinear Model for Forecasting Insurance Loss Payments

A Bayesian Nonlinear Model for Forecasting Insurance Loss

Payments

Yanwei ZhangStatistical Research

CNA Insurance [email protected]

Vanja DukicApplied Mathematics

University of [email protected]

James GuszczaActuarial, Risk & AnalyticsDeloitte Consulting [email protected]

Abstract

We propose a Bayesian nonlinear hierarchical model that addresses some of the major chal-lenges non-life insurance companies face when forecasting the outstanding claim amounts forwhich they will ultimately be liable. This approach is distinctive in several ways. First, datafrom individual companies are treated as repeated measurements of various cohorts of claims,thus respecting the correlation between successive observations. Second, nonlinear growth curvesare used to model the loss development process in a way that is intuitively appealing and facili-tates prediction and extrapolation beyond the range of the available data. Third, a hierarchicalstructure is employed to reflect the natural variation of major parameters between the claimcohorts, accounting for their heterogeneity. This approach enables us to carry out inference atthe level of industry, company, and/or accident year, based on the full posterior distribution ofall quantities of interest. In addition, prior experience and expert opinion can be incorporatedinto the analyses through judgmentally selected prior probability distributions. The ability ofthe Bayesian framework to carry out simultaneous inference based on the joint posterior is ofgreat importance for insurance solvency monitoring and industry decision making.Keywords: Bayesian estimation, Generalized linear model, Hierarchical model, Insurance lossreserving, Longitudinal data, Nonlinear growth curve.

1 Background

A distinctive feature of insurance is that it is a product whose cost to the supplier is unknown atthe time of sale. Indeed, the loss payments for many types of liability insurance claims can takemany months or even years to complete. Late reported claims, judicial proceedings, and schedulesof benefits for serious employer’s liability claims are among the many reasons for lengthy claimsettlement periods. This fundamental fact presents insurers with a number of unique analyticalchallenges relating to the valuation of the outstanding liabilities, and is one of the primary reasons

1

for the existence of the actuarial profession. The need for statistical methods in insurance was firstexemplified by Thomas Bayes’ intellectual executor Richard Price, who worked as a consultant forLondon’s Equitable life insurance company (Hacking 1990).

Every insurance company must set aside a provision, known as a loss reserve, to pay all claimsfor which it is liable. The uncertainty surrounding this ultimate liability implies that the loss re-serving process must go beyond pure accounting to involve statistical forecasting. This uncertaintyhas two fundamental sources. First, because claims are reported at random times, there is uncer-tainty about the number of claims for which the insurer will ultimately be liable. Second, there isuncertainty about the ultimate size of the insurer’s liability even after the claim has been reported.Expert judgment is used to establish so-called case reserves to pay the liability for specific claims.However aggregate case reserves tend to grow - or in the insurance vernacular “develop” - over timedue to factors such as worse-than-anticipated injury severities, uncertain medical costs, and thelikelihood that certain claims will at some point enter litigation.

The loss reserve is typically a non-life insurance company’s largest balance sheet liability. Itsproper estimation is therefore a matter of vital importance to the company. Indeed, the credit ratingagency A. M. Best recently performed a study of 1,023 general (non-life) insurance company failuresfrom 1969 to 2007. This study attributed 38.1% of these failures to inadequate reserves, higher thanthe failure rate attributable to the next three most frequent causes combined. In contrast, the studyestimated that catastrophes accounted for 4.2% of insurance company failures (see Coyne 2008).

Unsurprisingly, the traditional statistical methods used to estimate unpaid liabilities have beencriticized as inadequate to the task. For example, in discussing loss reserve shortfalls experiencedby the insurance industry in 2002 and 2003, the Standard & Poor’s (2003) report cited “naivety orknavery” on the part of the actuarial profession. Recent years have also seen increasing discussionwithin the actuarial community of the need for loss reserving techniques that are more solidlygrounded in rigorous statistical methodology.

In particular, much of the recent literature has emphasized the need to adopt methodologiesthat provide probabilistic reserve ranges rather than mere point estimates. For example, the 2005Casualty Actuarial Society Working Party on Quantifying Variability in Reserve Estimates statedthat the fundamental question a risk-bearing entity must ask itself is: “Given · · · our current stateof knowledge, what is the probability that [the entity’s] final payments will be no larger than thegiven value?” This statement reflects a rapidly growing interest in methods for estimating predictivedistributions of outstanding loss payments. This growing interest is a fairly recent phenomenon. Asrecently as 2002, England and Verrall (2002) wrote, “there is little in the actuarial literature whichconsiders the predictive distribution of reserve outcomes; to date, the focus has been on estimatingvariability using prediction errors”.

Consistent with this growing interest, a spate of Bayesian models (e.g., de Alba 2002; Ntzoufrasand Dellaportas 2002; de Alba and Nieto-Barajas 2008, Meyers 2009) has appeared in the recent

2

loss reserving literature. Many of the familiar advantages of Bayesian methodology and forecastingare of great practical importance in this context. Most notably, Bayesian posterior distributions ofultimate loss payments are of considerable utility in strategic planning, accounting, and regulatorysettings, where probabilistic ranges and estimates of such quantities as “Value at Risk” [VaR] and“probability of ruin” are increasingly required. A related point is that the conceptual clarity of theBayesian paradigm minimizes the risk of regulatory bodies or company management misinterpretingprobabilistic statements regarding reserve ranges. Another important point is that loss reservinganalysts typically possess considerable experience and background knowledge which must sometimesbe incorporated in loss reserving analyses in fairly arbitrary or atheoretical ways. The Bayesianparadigm offers a formal mechanism for incorporating into one’s analysis information not containedin the available data. Indeed this mechanism is consistent in a mathematically precise way withcertain standard actuarial methodology (e.g., see Verrall 2004). Furthermore, modern Bayesiansimulation methods enable one to fit complex models that more accurately reflect the inherentnature of the loss development process being modelled. For all of these reasons, there is a growingconsensus, shared by these authors, that Bayesian methods are ideally suited to the loss reservingproblem.

Bayesian loss reserving models inherently rely on Bayesian forecasting: one conditions upon allcurrently available information when making a forecast. As a result, Bayesian forecasting naturallyaccounts for all uncertainty involved in estimation of model parameters: it provides not only pointestimates of the future outcomes given the known quantities to date, but also probability density es-timates of these future outcomes. This is particularly useful when these densities are non-symmetricor heavy-tailed, or when forecasts are to be used in further decision making. Therefore Bayesianforecasting, when performed properly, offers many advantages over classical, frequentist forecast-ing. A good introduction and overview of Bayesian forecasting ideas can be found in Geweke andWhiteman (2006) and Pole, West and Harrison (1994).

However, most of the Bayesian loss reserving models proposed to date suffer from either orboth of two major shortcomings. First, most of these models must be supplemented by ad hocextrapolations in situations such as long-tailed liability insurance, in which the development ofpayments for a cohort of claims can extend well beyond the range of the available data. Second,most of these models fail to account for the within-cohort dependencies arising from the repeatedmeasures nature of loss reserving data sets. To overcome these limitations, we adopt a model thatemploys a nonlinear growth curve to describe the average loss development in a way that is bothintuitively appealing and facilitates prediction beyond the range of the data. Furthermore, a multi-level hierarchical structure is employed to reflect the longitudinal nature of the data and accountfor the within-cohort dependencies. The use of hierarchical modelling further improves on existingmethodology by providing a statistically sound method for practitioners to pool information frommultiple companies in order to perform industry or cross-company analyses, or produce more stableestimates for the company or companies of primary interest.

The next section will describe the data used in typical loss reserving exercises, discuss some

3

of the more prominent traditional loss reserving methods, and provide comments motivating theBayesian growth curve model to be presented here. Section 3 will specify the details of this model.Section 4 will apply the model to historical workers’ compensation data for 10 large insurers, andreport the resulting inferences. Section 5 will investigate the model assumptions, compare themodel’s predictions to the actual values from hold-out data sets, and perform a sensitivity analysisby varying prior assumptions. Section 6 will discuss topics for future research and provide closingcomments.

2 Loss reserving models

2.1 Mathematical formulation of loss reserving

The data used for standard loss reserving analyses are typically presented in the form of an incom-plete table, as illustrated in the first part of Table 1 under the heading “Observed loss triangle”.Here, cell yi(tj) denotes the aggregate cumulative paid loss amount for a cohort of insurance claimsthat occurred in year i, evaluated at tj months since the inception of year i. The ith row there-fore tracks the growth (known as “development”) of the cumulative loss payments for the cohortof claims that originated in the ith “accident year”. The time dimension tj can be viewed as the“development age” of the cohort at various evaluation points. Throughout this paper, any quantityx that depends on evaluation time t, will be explicitly expressed in the form x(t). Let I denotethe number of rows (accident years) and J denote the number of columns (evaluation points). Therows and columns can represent such time increments as months, quarters, or years. Without lossof generality, we will assume that the rows and columns represent yearly increments and are equalin number: I=J . We will assume that data through the end of accident year I is available, andwe denote the observed data as DI = {yi(tj)|i + j ≤ I + 1}. This data is typically referred to asa “loss triangle” in the actuarial community. The complement of the loss triangle represents thefuture, as yet unobserved data and shall be denoted DcI = {yi(tj)|I + 1 < i + j ≤ 2I}. For ease ofpresentation, we henceforth adopt the convention that t0 = 0.

The fundamental goal of insurance loss reserving is to estimate the total outstanding liabilityassociated with in-force and previously issued insurance policies. In mathematical terms, the goalis to estimate the ultimate cumulative loss for each accident year using the historical paymentinformation contained in DI . In the case of so-called “short-tailed” insurance products such asproperty damage, the majority of claims are settled within the first several years of the claim’soccurrence. For example, it is typically only a matter of weeks or months between the time offire or tornado damage that the resulting claims are settled. In such cases, it suffices to estimatethe ultimate loss cost as the value of yi(tI) for each accident year i assuming there is no furtherdevelopment after age tI , even when there are only a few evaluations available (say, 5 evaluationswith tI = 60 months). In general however, loss payments can extend well beyond even a moderately

4

Observed loss triangle Tail Premiumi t1 t2 . . . tI−1 tI . . . t∞

1 y1(t1) y1(t2) · · · y1(tI−1) y1(tI) p12 y2(t1) y2(t2) · · · y2(tI−1) p2...

...... · · ·

...i yi(t1) · · · yi(tI+1−i) pi...

......

...I − 1 yI−1(t1) yI−1(t2) pI−1I yI(t1) pI

Table 1: Representation of a typical insurance loss reserving data set.

large development age tI (such as, for example, 120 months) to some future time that is unknown apriori. We will denote the point in time at which all claims have been settled as t∞ and we call theunobserved growth of the loss between tI and t∞ as the tail development (also displayed in Table 1).In general, the goal of insurance loss reserving is therefore to estimate yi(t∞) for i = 1, · · · , I givenDI .

In many loss reserving analyses, in addition to the payment information contained in the losstriangle, some measure of risk exposure for each year is also provided to reflect unequal volumesof business from year to year. For example, one would generally expect more losses to be paidout in a year when more risks are underwritten. One commonly used volume measure is aggregateinsurance premium, the amount charged to the insureds to cover the expected loss cost associatedwith the policy plus the underwriting expenses and a profit margin needed by the issuing company.We denote the premium for the ith year as pi, as displayed in the final column of Table 1. A relatedstatistic of great practical importance is the loss ratio, defined as yi(tj)/pi for the ith year at jth

evaluation. The ultimate loss ratio yi(t∞)/pi is a fundamental measure of the profitability for thebusiness underwritten in the ith year. For example, if expenses take about 10% of the chargedpremium, an ultimate loss ratio greater than 90% would mean that the insurance company is losingmoney, unless offsetting gains result from investing premiums in the financial markets.

2.2 Generalized linear models and Bayesian loss reserving models

For much of its history, the actuarial profession has, of practical necessity, relied on fairly simpledeterministic projection methods for estimating outstanding liabilities. While practical and easyto implement in spreadsheet form, these traditional methods lack statistical methodology for as-sessing loss reserve variability. For example, Taylor and Ashe (1983) remarked that “one finds itinteresting that there has, to date, been little systematic development of techniques for examiningthe dispersions of the various estimates of outstandings”.

5

This shortcoming is increasingly regarded as a detriment in the context of today’s complex op-erational and regulatory landscape. The recent actuarial literature has given considerable attentionto so-called stochastic loss reserving methods. Most of these methods involve fitting a generalizedlinear model [GLM] (McCullagh and Nelder 1989) to the incremental losses zi(tj) = yi(tj)−yi(tj−1),for j = 1, · · · , I. (We adopt the convention that yi(t0) = 0 to simplify notation.) A well-known andwidely used specification for the expected value of zi(tj) is

g(E[zi(tj)]) = µ+ αi + βj , ∀i, j = 1, · · · , I, (1)

where µ is the overall mean, αi and βj are the accident year and development age effects, respec-tively, and g is the link function. The zi(tj) are assumed to be independently distributed from theexponential family (see McCullagh and Nelder 1989). Among the more popular GLM forms are theover-dispersed Poisson, Negative Binomial, and Gamma models, all using log link functions. Thelog-Normal distribution is also often employed where log[zi(tj)] is modelled as Normally distributedand g is taken to be the identity link. See England and Verrall (2002) for a comprehensive overviewof these methods and their variants.

After parameter estimation, the reserve for year i ≥ 2 is estimated to be the sum of thepredicted unobserved incremental losses, i.e.,

∑Ij=I+1−i zi(tj), and the total reserve of all years

combined is∑I

i=2

∑Ij=I+1−i zi(tj). England and Verrall (1999) calculate the prediction error of this

estimate both analytically and through bootstrap simulation. The latter also yields the predictivedistribution of the loss reserves (see also Pinheiro et al. 2003 for details).

The growing interest in methods that produce predictive distributions of loss reserves hasspurred the use of Bayesian models in insurance loss reserving. While not an exhaustive list, thefollowing are some notable contributions on this topic. De Alba (2002) considers a number ofspecifications that allow the use of measures of exposures or loss counts in addition to loss paymentamounts; Ntzoufras and Dellaportas (2002) explore the use of state space modelling and the inclusionof inflation factors; De Alba and Nieto-Barajas (2008) propose a model that accounts for correlationthrough the use of a correlated Gamma process. These models extend the GLM by adding morecomplex structures that allow the use of additional information, or by relaxing of certain GLMassumptions.

However, a common feature of most of these models, frequentist and Bayesian alike, is that theyare best equipped to produce predictions up to time tI . As noted above, this is appropriate onlyfor short-tailed forms of insurance where it is reasonable to assume that all claim payments occuron or before time tI . In the general case, standard practice is to estimate yi(t∞), by multiplyingyi(tI) by a “tail factor” that has been judgmentally selected using industry benchmark data. Asidefrom its ad hoc nature, a major drawback of this practice is the difficulty it creates in obtaining ameasure of the uncertainty of the resulting ultimate loss estimates (e.g, see Mack 1999).

A second limitation of mainstream frequentist loss reserving models is that many do not reflectthe longitudinal nature of loss triangles in their model designs. The independence assumption

6

present in these models is usually not fulfilled in practice (e.g, see de Alba and Nieto-Barajas 2008).

2.3 Introducing growth curves and hierarchical structure

The primary goal of this paper is to introduce a hierarchical Bayesian loss reserving model thataddresses both of these limitations. The model introduced here has similarities to the nonlineargrowth models that are widely used in biological and biomedical sciences to model nonlinear pat-terns in repeated measurements (e.g., see Davidian and Giltinan 1995). Clark (2003) introduced astochastic loss reserving model that uses growth curves to model the loss development, where themean of the cumulative losses yi(tj) is assumed to follow a nonlinear growth curve in the followingway:

E[yi(tj)] = pi · γ ·G(tj ; Θ). (2)

In the above equation, pi is the (given) premium for the ith accident year, and γ denotes the all-year-combined expected ultimate loss ratio for the policies that generated the claims being analyzed,assuming that the expected ultimate loss ratios are the same across different years. Therefore pi · γequals the expected ultimate loss for the ith accident year. G(tj ,Θ) is a parametric growth curvethat depends on parameters Θ and measures the percentage of ultimate losses that have emerged asof time tj . Therefore G must have the properties that G(t0; Θ) = 0 and G(tj ; Θ)→ 1 as tj → t∞.Meyers (2009) performs a Bayesian analysis using the structure in (2), with a Beta cumulativeprobability distribution serving as the growth curve.

Note that model (2) assumes a common ultimate loss ratio γ for all accident years. Whilemotivated by a reasonable desire for parsimony, this simplifying assumption is not entirely sat-isfactory. Guszcza (2008) therefore adds hierarchical structure to the model. Doing so accountsfor within-accident year dependencies in the data and allows the loss ratio γ, and in principle thegrowth curve parameters Θ, to vary randomly across accident years.

The nonlinear structure implicit in (2) brings several advantages over the more popular GLM-based models. First, the nonlinear structure is intuitively appealing, as can be seen by examiningFig. 1. Here, the cumulative aggregate loss payment patterns are plotted over the evaluation time foreach year for the 10 insurance companies used in this study, where each panel represents a companyand each line represents an accident year. We see that the development of each accident year’saggregate losses (the lines) all show a nonlinear growth pattern, motivating the use of parametricgrowth models in typical loss reserving analyses. Second, the expected ultimate loss E[yi(t∞)] =pi × γ is explicitly modelled, and variability measure can therefore be calculated directly fromthe model. Third, the γ parameter representing ultimate loss ratio is of fundamental importancein insurance company operations. Company management will typically possess relevant expertopinions and/or industry data pertaining to the estimation of γ. This additional information isnaturally incorporated into one’s analysis through a Bayesian prior on γ. Finally, model predictionsat any evaluation age can readily be calculated using the growth curve.

7

Evaluation time

Cum

ulat

ive

loss

Comp #1

12 36 60 84 108

Comp #2 Comp #3

12 36 60 84 108

Comp #4 Comp #5

12 36 60 84 108

Comp #6 Comp #7

12 36 60 84 108

Comp #8 Comp #9

12 36 60 84 108

Comp #10

12345678

Figure 1: Observed growth of cumulative losses for the 10 companies studied. The scale for eachpanel is set to be different and hence the labels of the y-axis are suppressed. The most recentaccident year has only one evaluation and is therefore not displayed.

Building on the approach of Clark (2003) and Guszcza (2008), we propose a fully Bayesian non-linear hierarchical loss reserving model using (2) as a starting point. The empirical loss developmentpatterns displayed in Fig. 1 suggest two distinct motivations for introducing hierarchical structure.The first, accounting for within-cohort dependencies resulting from the longitudinal nature of thedata, as has already been commented on. The second is the fact that the loss development patternsappear to vary across accident years and across companies. Thus, this model at once addressesthe two fundamental limitations of mainstream GLM-based loss reserving models, facilitates thesimultaneous estimation of multiple loss triangles, and enjoys the benefits of Bayesian forecastingshared by other Bayesian loss reserving models.

8

3 The Bayesian nonlinear model

We are now ready to formulate the multi-company nonlinear Bayesian growth model. Table 1 isa typical representation of the data used to perform a loss reserve analysis for a single company.Suppose we have data for K companies, each with I accident years and I evaluation periods. Wewill use yik(tj) to denote the cumulative loss of accident year i and company k at the jth evaluation.

3.1 Data distribution

To reflect the longitudinal nature of the loss payment amounts, we model cumulative, rather thanincremental, losses. And since the cumulative insurance losses must be positive, we specify themodel on the logarithmic scale as

log yik(tj) = log µik(tj) + εik(tj), (3)

where

µik(tj) = pik · γik ·G(tj ; Θk), (4)

and where the error term εik(tj) follows a first-order autoregressive process as

εik(tj) = ρ · εik(tj−1) + δik(tj), (5)

δik(tj) ∼ N [0, σ2k · (1− ρ2)], (6)

εik(t0) ∼ N(0, σ2k). (7)

In (4), the growth curve G(tj ; Θk) is chosen to be the log-logistic curve

G(tj ;ωk, θk) =tωkj

tωkj + θωk

k

. (8)

The growth curve parameter θk corresponds to the development age at which half of the ultimateloss has emerged, and the parameter ωk describes the slope of the curve around θk. There willgenerally be many possible candidate growth curves available (e.g., see Seber and Wild 1989).Selecting an appropriate growth curve is a matter of judgment and calling upon one’s backgroundknowledge of the development rate of the type of insurance losses being modelled. The Log-normaldistribution has also been employed by several other authors (e.g., de Alba 2002 and Ntzoufrasand Dellaportas 2002), and it might be preferable in situations where the downside potential isvery large (see England and Verrall 2002). To avoid overfitting the data, the model is specified in

9

such a way as to allow the ultimate loss ratios γik to vary by both company and accident year,while the growth parameters (ωk, θk)

′ are allowed only to vary by company. The assumption thatloss development across accident years follows the same pattern within a single company underliesa majority of the actuarial reserving models and is generally accepted in the literature (e.g., seeMack 1993). Guszcza (2008) tests this assumption in a typical loss triangle, and finds no significantvariation of the growth parameters by accident year.

Equations (5)-(7) define a first-order autoregressive process where the process is initiated by anerror εik(t0). This structure is necessary because it is possible for a discrepancy between the observedand fitted losses at a particular evaluation time to be propagated to subsequent observations. Theautoregressive process thus accounts for the serial correlation that typically arises from repeatedmeasures. The variance parameter σk is allowed to vary by company since the loss payment patternsfor different companies are expected to exhibit differing degrees of variability. We assign diffuseprior distributions to ρ and σk’s so that they are estimated largely based the data. For example,ρ ∼ U(−1, 1) and σk ∼ U(0, 100) for each company k, where U represents the Uniform distribution.

3.2 Multi-level model structure

The ultimate loss ratio parameters γik for a particular company vary due to fluctuations in businessexperience from year to year. Since they must be positive, we model them with a Normal distributionon the logarithmic scale. For each company k, we model γik as

log γik ∼ N(log γk, σ2γyear) for each year i, (9)

where σγyear is the accident-year level variation of the loss ratios on the logarithmic scale. As thecompany-level parameters (γk, ωk, θk)

′ must be positive, we use a multivariate Normal model on thelogarithmic scale as

log(γk, ωk, θk)′ ∼ N [log(γ, ω, θ)′,Σ] for each company k, (10)

where the diagonal components of Σ are σ2γcomp, σ2ω and σ2θ , respectively. Thus, σγcomp represents

the company-level variation of the loss ratios on the logarithmic scale, as distinct from the accident-year level variation σγyear . It should be noted that (9) introduces another source of correlationacross development ages since each accident year’s underlying loss propensity γik is shared byall of the repeated measures obtained on that year. Similarly, correlation across the accidentyears within a company is introduced since different accident years rely on the same parametervector (γk, ωk, θk)

′. De Alba and Nieto-Barajas (2008) approach correlation across development agesby introducing a correlated Gamma process assuming that accident years are independent. Ourspecification relaxes each of the assumptions of development-age and accident-year independenceimplicit in most mainstream loss reserving models (e.g, see Mack 1993 and England and Verrall1999).

10

We assign diffuse hyperprior distributions; e.g., log γ ∼ N(0, 1002), logω ∼ N(0, 1002), log θ ∼N(0, 1002), σγyear ∼ U(0, 100) and Σ ∼ Inv-Wishart3(I), where I is the 3 × 3 identity matrixand the degrees of freedom for the Inverse Wishart are set relatively low, but still high enough tomaintain a proper distribution (e.g., see Johnson and Kotz 1972).

4 The model applied to workers’ compensation insurance data

4.1 Description of the data

We next apply the model specified above to a real data set. The data is extracted from the statutoryannual statements that insurance companies are required to report to the National Association ofInsurance Commissioners (the consortium of state-level insurance regulators in the United States)each year. We choose the workers’ compensation line of business, which is typically purchased byemployers to cover the medical expense and wage replacement for employees who are injured duringthe course of employment. This is a type of long-tailed insurance line: the payments for some claimscan stretch out over the course of a 20-year time horizon, or even longer. An example is the class ofso-called permanent total disability claims. Such claims arise from work injuries or illnesses, such asloss of vision or limbs, that prevent the worker from returning to gainful employment. A permanenttotal injured employee may be entitled to weekly benefits that are provided initially for a period of450 weeks. But these benefits can continue beyond this period provided that the injured worker isable to show that he or she remains unable to earn wages. Benefits to survivors also account for thelong-tailed nature of workers’ compensation. For example, survivors of a claimant who died in hisor her twenties may continue to receive benefits 30 or more years after the claimant’s death. Dueto its long-tailed nature, the workers’ compensation line of business represents the largest portionof the U.S. property and casualty industry’s net reserves, contributing nearly a quarter of totalcurrent reserves.

Of the 1,070 companies represented in the available data, we select data for 10 large insurerswhose combined premium volume accounts for approximately 36% to the industry’s total premium.We mask the names of these 10 companies and refer to them as Comp #1, · · · , Comp #10, reverse-ordered by premium volume. The common accident years available are from 1988 to 1997, resultingin 10 years of historical experience. Losses are evaluated at 12-month intervals, with the highestavailable development age being 120 months. In addition, “lower triangle” losses Dc10 = {yik(tj)|i+j ≥ 12, i = 2, · · · , 10} are available for 4 of the 10 companies studied. These entries were collectedfrom calendar years 1998 to 2006. However, since Schedule P does not report any claims older than120 months, there will be no data available for tj > 120. So for the 4 companies having measuresduring 1998 and 2006, the available data is represented by a full 10 × 10 grid as shown in Fig. 2.For the remaining companies, the data corresponding to the lower right corner (the black area) ofthe grid is not available.

11

Evaluation time

Acc

iden

t yea

r

10

9

8

7

6

5

4

3

2

1

12 24 36 48 60 72 84 96 108 120

partition

training

V1

V2

Figure 2: Visualization of the partition of the data into training and two out-of-sample validationsets (V 1 + V 2).

Although our data contains losses for 10 calendar years, we use only the first 9 of these yearsto fit the model. The remaining data are used as out-of-sample data which we will use to test themodel’s one-year-ahead forecasting accuracy. That is, we remove the most recently evaluated lossesfor each accident year (the gray area in Fig. 2) and set them aside to serve as the first of our hold-outdata sets V1 = {yik(tj)|i+j = 11, i = 1, · · · 10}. The training set is thus D9 = {yik(tj)|i+j ≤ 10, i =1, · · · , 9} (the light area in Fig. 2). Moreover, we use the realized values of the unobserved cellsDc10 for the 4 companies as the second validation data set, denoted V2, to test the model’s abilityto predicting losses multiple years ahead. This partition of the available loss data into the trainingset and the two validation sets for a given company is illustrated in Fig. 2.

In summary, for each company, we have 1 + · · · + 9 = 45 observations in the training set, 10data points in the first hold-out set V1, and for each of the 4 companies having measures during1998 and 2006, we have 1 + · · · + 9 = 45 data points in the second hold-out set V2. Thus, we use45× 10 = 450 observations to estimate model parameters, and evaluate the model on 10× 10 = 100and 45 × 4 = 180 data points in V1 and V2, respectively. As seen in the plot of the cumulativeloss over time for each company in the training data set in Fig. 1, the nonlinear growth pattern isclear, and growth beyond 108 months is likely given that the curve around 108 months still has anoticeably positive slope at the 108-month development age.

12

Iteration

γ

0 200 400 600 800 1000

0.4

0.6

0.8

1.0

1.2

θ

Den

sity

15 20 25 30 35 40

0.00

0.05

0.10

0.15

0.20

Lag

Aut

ocor

rela

tion

0.0

0.2

0.4

0.6

0.8

1.0

0.0

0.2

0.4

0.6

0.8

1.0

0 10 20 30 40 50

0.0

0.2

0.4

0.6

0.8

1.0

Figure 3: Several diagnostic plots based on three separate chains, used to check the convergence ofMCMC simulations. Top plot is the plot of γ values over MCMC iterations; Left plot is the smoothmarginal posterior density plot of θ; Right plot is the autocorrelation plot for ω.

4.2 Inference

To estimate the parameters in the model outlined in Section 3.1, we simulate the full posteriordistribution using Markov Chain Monte Carlo [MCMC] simulation (see, for example, Gelman etal. 2004). The “Metropolis-within-Gibbs” algorithm scheme sequentially samples parameters fromtheir lower-dimensional full conditional distributions over many iterations. The full conditionals arederived from the joint posterior density, which is proportional to the product of the prior and thelikelihood. Denoting the probability density of data as f , γk as g, γik as h, the prior as π, the fulljoint density is

π(σ21, · · · , σ210, σ2γyear ,Σ, ρ, γ, ω, θ)×10∏k=1

g(γk, ωk, θk|γ, ω, θ,Σ)×10∏k=1

9∏i=1

h(γik|γk, σ2γyear)

13

×10∏k=1

9∏i=1

10−i∏j=1

f [yik(tj)|pik, tj , γik, ωk, θk, σ2k, ρ]

= N(log γ|0, 1002)×N(logω|0, 1002)×N(log θ|0, 1002)× Inv-Wishart3(I)

×10∏k=1

N [log(γk, ωk, θk)′| log(γ, ω, θ)′,Σ]×

10∏k=1

9∏i=1

N(log γik| log γk, σ2γyear)

×10∏k=1

9∏i=1

10−i∏j=1

f [yik(tj)|pik, tj , γik, ωk, θk, σ2k, ρ], (11)

where, due to the autoregressive process, the probability density for yik(tj) is

f [yik(tj)|pik, tj , γik, ωk, θk, σ2k, ρ] ={N [log yik(tj)| logµik(tj), σ

2k] if j = 1,

N{log yik(tj)| logµik(tj) + ρ · log[yik(tj−1)/µik(tj−1)], σ2k · (1− ρ2)} if j ≥ 2.

The MCMC algorithm is implemented in the WinBUGS software for Bayesian inference (e.g.,see Spiegelhalter et al. 2003; Scollnik 2001 for an actuarial application), as called from R. We ran100,000 iterations in three parallel chains, discarding the burn-in period of the first 60,000 iterationsat which point the approximate convergence was achieved (the potential scale reduction factors ofGelman and Rubin 1992, were below 1.1 for all parameters). To reduce autocorrelation, we usedevery 40th iteration of each chain. This resulted in 1,000 simulated draws per chain.

To assess convergence of the Markov chain, we examined several diagnostic plots on selectedparameters as illustrated in Fig. 3. The three example plots displayed are the time series plot of γ,the posterior density plot of θ, and the autocorrelation plot of ω, respectively. We see that the threechains appear to have mixed fairly well, show a bell-shaped density, and that the autocorrelationis not statistically different from 0. Each of these plots suggest that the simulation has achievedapproximate convergence.

4.3 Results

Inference is made using the simulated sample values for each parameter. The posterior medianestimates and the 50% credible intervals (shown in brackets) for the parameters of key interestare γ = 0.693 [0.644, 0.748], ω = 1.84 [1.71, 1.98], θ = 25.3 [23.5, 27.1] and ρ = 0.479 [0.445, 0.511].We see that as of 108 months of development age, the industry average emergence percentage ofthe ultimate claim is approximately 93.5% using the log-logistic growth curve in (8) and using themedian estimates of ω and θ. Fig. 4 displays the observed loss ratios of the first accident yeary1k(tj)/p1k from the data along with 15 randomly drawn curves γ1k ·G(tj ;ωk, θk) from the posterior

14

Evaluation time

Obs

erve

d lo

ss r

atio

0.2

0.4

0.6

0.8

Comp #1

12 36 60 84 108

Comp #2 Comp #3

12 36 60 84 108

Comp #4 Comp #5

12 36 60 84 108

Comp #6 Comp #7

12 36 60 84 108

Comp #8 Comp #9

12 36 60 84 108

0.2

0.4

0.6

0.8

Comp #10

Figure 4: Observed loss ratios y1k(tj)/p1k (plotted using the symbol cross) and 15 randomly drawncurves γ1k · G(tj ;ωk, θk) are plotted for the first accident year of each of the 10 companies. Sincethese are loss ratios, all the panels have the same scale of 0 to 1.

simulated values to illustrate the goodness of the fit and the posterior uncertainty. We only showthis for the first accident year to avoid overly complex plots. This plot indicates that the nonlinearmodel structure seems to adequately capture the loss development trend over time. The plot furtherindicates that there is some variation in the precision of the estimates across the companies. Forexample, the sampled curves for Comp #8 appear more tightly clustered than those for Comp # 9.

The estimated value of σγyear , the variation of γik’s across accident years on the logarithmicscale, is σγyear = 0.157, and corresponds to approximately 0.693 × 0.157 = 0.109 on the originalscale using the delta method. The estimated σγcomp , the variation of γk’s across different companieson the logarithmic scale is σγcomp = 0.311, and corresponds to approximately 0.693× 0.311 = 0.216on the original scale. We therefore see that the variation across companies is approximately twicethat of the average variation across accident years within the same company. This is a reasonableresult in that the mixes of the books of business, the distributions of risk exposures, and protocols

15

for risk selection and claims handling tend to be more similar and stable across different yearswithin the same company than across different companies. This between-company heterogeneitywould be further amplified should smaller insurers, which write business only in certain regions ormarket segments, be included in the data. One specific source of the large cross-company variationin the data could be due to the broad introduction of large deductible policies in the 1990’s in theUnited States. With large deductible policies, insurers offer policyholders the option of retaininga large amount of losses, typically 100,000 dollars or more, to reduce the cost of insurance. Thisnew type of workers’ compensation insurance has quite different loss propensity from traditionalfull-coverage insurance, and the varying mixes of the large deductible policies with other policiesacross the companies can result in quite different loss experiences.

With the joint posterior density of the model’s parameters in hand, we proceed to estimate lossreserves, the quantity of primary interest. This can be readily achieved by drawing the ultimateloss yik(t∞) for i = 1, · · · , 9 and k = 1, · · · , 10 using the estimated posterior density. That is, wesimulate

log yik(t∞) ∼ N [log(pik · γik), σ2k]. (12)

The aggregate reserve for each company k is simply the sum of the ultimate losses across accidentyears minus the sum of the latest observed losses. That is, the aggregate reserve for company k,Rk, is calculated as

Rk =

9∑i=1

yik(t∞)−9∑i=1

yik(t10−i). (13)

The median, standard deviation and 50% interval of the reserve for each of the 10 companies aredisplayed in Table 2.

To enable comparison between this approach and a traditional stochastic loss reserving ap-proach, Table 2 also displays the Bayesian estimated reserve and prediction error at evaluation age108 months (we replace yik(t∞) with yik(t9) in (13)), as well as the estimates from a traditionalGLM model that assumes no tail development beyond 108 months (see England and Verrall 1999for details of the calculation of the prediction error) are displayed to enable comparisons betweenthe two approaches. We use the over-dispersed Poisson model with log link function for the GLMas this model replicates one of the most popular deterministic methods (“Chain Ladder”) used inindustry (e.g., see England and Verrall 2002). The Bayesian estimated reserves projected to t9 arereasonably close to those of the GLM method. The Bayesian model yields lower estimated reservesthan the GLM model for 3 of the 10 accident years. However, the prediction error of the Bayesianmodel on average is approximately twice that of the GLM. This is because the Bayesian analysistakes into account the uncertainty in all unknown parameters, as well as heterogeneity betweencompanies and years. Note that in so doing, the Bayesian approach overcomes a serious difficultywith the GLM and other frequentist reserving models. Namely, relying on asymptotic results when

16

CompanyBayesian at t∞ Bayesian at t9 GLM at t9

Reserve Pred Err 50% Interval Reserve Pred Err Reserve Pred Err

1 260.98 46.84 (230.80, 292.54) 170.33 25.98 155.99 10.902 173.13 22.00 (159.37, 188.60) 136.20 15.13 139.63 7.113 216.19 13.95 (206.70, 224.83) 151.82 9.01 130.71 4.534 81.95 7.39 (77.17, 87.14) 63.28 4.80 54.69 3.465 44.60 6.69 (40.33, 49.21) 37.95 5.14 33.56 2.126 48.86 5.27 (45.48, 52.41) 38.31 3.97 37.00 2.057 34.45 2.19 (33.03, 35.90) 26.21 1.49 25.11 0.918 22.91 2.06 (21.62, 24.32) 16.46 1.37 16.83 0.729 30.66 5.62 (27.11, 34.42) 22.58 3.22 18.39 1.52

10 19.88 1.35 (18.94, 20.80) 15.47 0.91 17.71 0.68

Table 2: Reserve estimates from the Bayesian nonlinear model (projected to t∞ and t9, respectively)and the GLM for the 10 companies. Original losses are divided by 10,000 before they are used inthe model. The GLM uses a Poisson distribution with over-dispersion, mean structure in (1) andlog link function. It is fitted separately for each triangle and no tail factor is applied.

analyzing small data sets such as loss triangles can lead serious underestimates of the uncertainty ofone’s model parameters (see Gelman and Hill 2007). As one would expect, all the Bayesian reservesprojected to t∞ are significantly greater than the GLM estimates projected to t9, on average by afactor about 1.4. Assuming no tail development beyond the observed range of the data thereforewould result in a significant reserve shortfall. Further, even if a suitable tail factor were supplied,the variability of the resulting reserves could be considerably underestimated if the uncertainty ofthe tail development were not appropriately accounted for.

5 Model checking and diagnostics

We have thus far presented the course of a typical Bayesian analysis: we have specified a probabilitymodel and computed the posterior distribution of the quantities of interest. Next we assess themodel’s adequacy and performance. The adequacy of our model is investigated via examination ofresiduals. The model’s predictive power is assessed by comparing the model’s predictions with theactual values in the hold-out validation data sets. In addition, we perform a sensitivity analysis toascertain the degree to which posterior inference for certain key model parameters changes under arange of different prior assumptions.

17

5.1 Residual diagnostics

Since the model has a first-order autoregressive feature, the raw residuals log yik(tj) − logµik(tj)will have neither constant variance nor zero correlation. We therefore wish to transform the rawresiduals to have approximately constant variance and zero correlation using the formula

rik(tj) =log yik(tj)− E[log yik(tj)| log yik(tj−1)]√

V ar[log yik(tj)| log yik(tj−1)]. (14)

These standardized residuals are then plotted against the mean logµik(tj) in Fig. 5 to check forany systematic departures from the model for the mean response. As this is a Bayesian analysis,thousands of residuals will be generated. The plot in Fig. 5 shows one randomly selected realization(see Gelman et al. 2004). We see that the scatterplot displays no systematic pattern, with a more orless random scatter around a constant mean of zero, indicating that the mean is correctly specified.Further, most residuals are bounded by [−3, 3] and no clear outliers are detected.

●

●

●

●

●●●●●

●

●

●

●

●●●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●●●

●●

●

●

●

●

●●

●●

●

●

●

●

●

●●

●

●

● ●

●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

● ●

●●

●

●

●

●●

●

●●

●

●

●

●

●●

●

●●

●

●

●

●

● ●

●

●●●●

●

●

●

●

●●

●●

●

●●

●●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

● ●

●●●●

●

●

●

● ●●●●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●

●

●

●●

●

●●

●

●

●

●

●

● ●

●

●●●

●

●

●

●

●●

●●

●

●

●

●

● ●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●

●

●

●

●

●●

●

●

● ●

●●

●

●

●

●

●

●

●●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●●●

●●

●

●

●

●

●

●●

●

●

●

●

●

●●●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

●

● ●

●●●

●

●

●

●

●

●●

●●

●

●

●

●

●●●

●

●

●

● ●

●

●

●

●

● ●

●

●

●

●

●

●●

●

●

●

●

●

●

● ●

●●●

●

●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●

●

●●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

Expected value

Sta

ndar

dize

d re

sidu

als

1 2 3 4 5

−3

−2

−1

01

23

Figure 5: Standardized residuals plotted against the expected values. Both the residuals and theexpected values are on the logarithmic scale.

18

5.2 Hold-out set validation

As discussed in section 4.1, we set aside two sets of hold-out data for model validation. The datasets V1 and V2 are used to test the performance of the Bayesian forecasting of losses for one yearand for multiple years ahead, respectively. The performance of the model in predicting losses oneyear ahead is of great importance because it is often the critical element in the projection of the netcash flow for the upcoming calendar year, and therefore underlies most risk management programs.In addition, it is the key component in assessing the necessary capital charge due to underwritingrisks that is required by current solvency regulations (see Ohlsson and Lauzeningks 2009).

It should also be noted that both hold-out sets contain realized values from a new accident year(the 10th accident year). The Bayesian model allows one to generate predictions for a new accidentyear (where i > I) using the multi-level structure in (9). In contrast, traditional spreadsheet pro-jection and the fixed-effects GLM-based methods use accident year as a categorical variable (usingfixed-effect year-specific parameters) and do not employ hierarchical structure. These traditionalmethods are therefore equipped only to predict loss payout for risks that have been written priorto year I (where i ≤ I). This precludes a straightforward comparison of the predictive performanceof the proposed and GLM-based industry standard models on the two validation sets.

To generate predictions for the new accident year i = 10 using the proposed model, one firstsimulates γik for each company k using (9) with the realized values of γk and σγyear , and thengenerates yik(tj) for each evaluation j and company k using (3) - (8). Here, we simply use therealized premium pik for i = 10 in calculating the mean. However, in the case where this informationis not available, analysts usually have an estimated or planned premium volume for the upcomingbusiness year. This estimate can be used to replace pik, perhaps modelled by a suitable distributionto reflect the initial uncertainty of this measure. This issue will not be further pursued in this paper.

After 3,000 simulations are drawn for the hold-out sets V1 and V2, we compute the posterior95% and 50% credible intervals and check whether they cover the corresponding realized values.For the first hold-out set V1, the coverage rate for the 95% and 50% intervals is approximately 95%and 57%, respectively. For the second set V2, the coverage rate for the 95% and 50% intervals isabout 81% and 40%, respectively. The model’s longer-term prediction is therefore slightly belowthe nominal coverage rate. However, a closer investigation of the validation data points that failedto be covered by the constructed intervals reveals that the majority of them are from the accidentyears i = 8 and i = 9. For example, of the 35 validation points outside the 95% intervals, 14are from accident year i = 8 and 18 are from accident year i = 9. As there are two or fewertraining data points from accident years 8 and 9 for any given company, it is perhaps unrealistic toexpect predictions for these two years to be as stable as the predictions for years 1-7. In an actualreserving project, a logical next step would be to impose informative priors for these accident yearsto reflect expected trends that cannot be inferred from the data (see, e.g., Verrall 2004). The abilityto incorporate background knowledge or beliefs in this way is of course a major advantage of theBayesian approach.

19

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.5

1.0

1.5

Two−sided p−value

Den

sity

Figure 6: Histogram of the two-sided p-values.

In addition to computing the coverage rates, we also assess the performance of our model inthe out-of-sample data using a predictive evaluation criterion commonly used in Bayesian analysis.We use a checking function based on Gelfand et al. (1992). Specifically, we compute the two-sided “p-values”, i.e., the chance of getting a more “extreme” observation: 2 · min(p(Yi(tj) <yi(tj)), p(Yi(tj) > yi(tj))), where p(Yi(tj) < yi(tj)) is the posterior probability of getting a valueless than the observed y. If the specified model is correct, these p-values should be approximatelyuniformly distributed. We perform this only for V1, as the validation points in V2 are highlycorrelated, violating the conditional independence assumptions underlying most standard checkingprocedures. The histogram of the p-values for V1, shown in Fig. 6, suggests that these p-values aresomewhat uniformly distributed.

5.3 Sensitivity analysis to prior specification

In Bayesian analysis, it is typically the case that a multitude of prior distributions are available todescribe the range of one’s prior beliefs or expert opinions. It is thus possible that the present modelprovides a reasonable fit to the data, but that different posterior inference will result from alternateprior distributions. The sensitivity of the Bayesian posterior estimates to various prior distributionassumptions should therefore be assessed. This is particularly true in the case of insurance lossreserving analyses, where it is reported that company management bodies sometimes manipulateloss reserves in order to manage earnings (Beaver et al. 2003) or mask solvency problems (Gaver

20

Prior distribution of γ0

Pos

terio

r es

timat

ion

of lo

ss r

eser

ves

by c

ompa

ny50

100

150

200

250 ● ●

● ●● ● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ●● ● ● ● ● ● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ●● ● ● ● ● ● ●

● ● ● ● ● ● ●

1

2

3

4

5678

910

Reserve Estimation

010

2030

40

● ●●

● ●

● ●

● ● ● ●● ● ●

● ● ● ● ●● ●

● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●

● ● ● ● ● ● ●● ● ● ● ● ● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ●

1

2

3

456

78

9

10

Prediction Error

5010

015

020

0

0.5/0.1 0.7/0.1 0.9/0.1 0.5/0.2 0.7/0.2 0.9/0.2 diffuse

● ●● ●

● ●●

● ● ● ● ● ● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ●● ● ● ● ● ● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ● ● ● ● ●

1

2

3

4

5678910

Lower Bound

5010

015

020

025

030

0

0.5/0.1 0.7/0.1 0.9/0.1 0.5/0.2 0.7/0.2 0.9/0.2 diffuse

● ●● ●

● ● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ●● ● ● ● ● ● ●

● ● ● ● ● ● ●

● ● ● ● ● ● ●● ● ● ● ● ● ●

● ● ● ● ● ● ●

1

2

3

4

5 6789

10

Upper Bound

Figure 7: Sensitivity analysis to the prior distribution of γ. The subplots show the posteriormedian (top left), standard deviation (top right) and the lower (bottom left) and upper bound(bottom right) of the 50% credible intervals of the reserve for each company. The priors wereall Gamma distributions, but with different means and variances. The means and variances ofthe priors are shown in the labels on the horizontal axis as “mean/variance”, where the diffuseLog-Normal specification is labelled as “diffuse” for easy comparison. The roughly horizontal linesindicates very little change in the posterior estimates for the same company.

and Paterson 2004).

To this end, we replace the diffuse prior for γ (the industry average ultimate loss ratio) withsix alternate prior distributions. Gamma distributions are now employed with means of 0.5, 0.7and 0.9. The priors with means of 0.5 and 0.9 reflect “aggressive” and “conservative” opinions,respectively, about the average ultimate loss ratio, while the prior with mean of 0.7 corresponds tothe “realistic” opinion, and is close to our posterior median of γ. We specify two sets of Gammaprior distributions with these means, with variance fixed at 0.1 and 0.2, respectively, resulting in 6different priors in total. For each of these six priors, we summarize the posterior distribution of themain quantity of interest, the loss reserve for each company. The median, standard deviation, lowerand upper bounds of the 50% intervals of the reserves by company using different priors are shown

21

in Fig. 7, along with those using the present diffuse prior. We see that the posterior distribution ofthe reserves for each company is fairly consistent across these different prior distribution scenarios,indicating that the data is informative about the parameters of interest.

6 Discussion

We have analyzed insurance loss payment data for a type of insurance (workers’ compensation)that is characterized by long periods of loss development beyond the range of standard data setsreported in insurers’ annual statements. This feature is captured in the model by the specification ofa nonlinear growth curve, thereby enabling the model to produce estimates of the ultimate values ofpaid losses. This eliminates the need, shared by many commonly used techniques, for a subsequentad hoc extrapolation. We used data from 10 large companies and specified a hierarchical structurethat enables one to analyze these companies simultaneously. Inference can thus be made aboutthe industry average as well as company and accident year-specific ultimate loss ratios and growthpatterns. The analysis is fully Bayesian, and therefore yields the full posterior distributions for allquantities of interest. This allows prediction of the upcoming calendar year’s cash flow throughthe use of simulation. In sum, the model proposed in this paper addresses many of the practicalchallenges that insurance companies face when performing loss reserving analyses, and provides astatistically sound way to improve current actuarial practice.

Further model structure can be added as the specifics of the situation demand. For example,this paper does not account for inflation from year to year. However inflation can readily beincorporated into the nonlinear growth model structure. For example, if it is assumed that a steadyrate of inflation δ is reasonable, then the model structure in (4) can be suitably amended to includethe inflation factor:

µik(tj) = pik · γik ·G(tj ;ωk, θk) · exp[(i+ j − 2) · δ]. (15)

In the above equation, we treat the first calendar year {yik(tj)|i + j = 2} as the base, and sub-sequent calendar year losses are inflated by a factor exp[(i + j − 2) · δ]. The inflation parameterδ can be estimated from the data or pre-specified. This provides an alternative to the traditionalactuarial strategy of restating (“detrending”) the historical data to account for historical inflationand adjusting projected loss payments to account for anticipated future inflation. See Ntzoufrasand Dellaportas (2002) for an example of a Bayesian model with the inclusion of inflation.

The nonlinear growth curve assumption in (4) is motivated by an inspection of the loss de-velopment patterns exhibited in Fig. 1. This specification generally works well except for certaincircumstances in which the data is so volatile that negative growth is observed or no clear nonlineargrowth pattern is detected. This can happen for certain special insurance lines in which manysalvage recoveries and subrogation (payments from third parties) occur. In such situations, variants

22

of the GLM (de Alba 2006) can be used. Semiparametric models (e.g., see Antonio and Beirlant2008) can be employed if the assumption of a nonlinear curve is not appropriate.

It should be noted that nonlinear models and semiparametric smoothing models have beenconsidered in the literature, which allows the extrapolation of loss payment to some known pointbeyond tI and produces corresponding uncertainty measure from the model (see England and Verrall2002). However, these models are based on incremental losses and no saturation level is included asin the growth model. As a result, the estimate of the ultimate loss yi(t∞) is not readily available.

The multi-level structure employed in the ultimate loss ratios allows them to vary by accidentyear and by company. We made the assumption that they are exchangeable at each level, as we areunaware of any systematic differences that might exist across accident years or across companies, inthe example we consider. This assumption could be further relaxed to include relevant company-levelinformation to reduce the unexplained variation at each level. For example, at the company level,indicators could be created for regional insurers underwriting business only in specific areas. Suchcompanies could have different risk profiles and more loss ratio variability than national insurers.Similarly, the insurance companies’ commercial credit score could conceivably be used to reflectvariation in the companies’ financial stability.

7 Acknowledgments

We thank Glenn Meyers, Jed Frees and Peng Shi for preparing and providing us with access to thedata used in this study, and David Clark for his many helpful remarks.

8 References

1. Antonio K. and Beirlant J. (2008). Issues in Claims Reserving and Credibility: A Semipara-metric Approach with Mixed Models. The Journal of Risk and Insurance, 75(3), 643-676.

2. Beaver W. H., McNichols M. F., Nelson K. K. (2003). Management of the Loss ReserveAccrual and the Distribution of Earnings in the Property-casualty Insurance Industry. Journalof Accounting and Economics, 35, 347-376.

3. Clark D. R. (2003). LDF Curve-Fitting and Stochastic Reserving: A Maximum LikelihoodApproach. Casualty Actuarial Society Forum. Available at http://www.casact.org/pubs/

forum/03fforum/03ff041.pdf.

4. Coyne F. J. (2008). Loss Reserving: A Fresh Look: The Difficulty in Setting Reserves and theRisk of Insolvency Are Just Two of the Many Reasons to Revisit Reserving. Best’s Review.

5. Davidian M.and Giltinan D.(1995). Nonlinear Models for Repeated Measurement Data. Lon-don: Chapman and Hall.

23

6. de Alba E. (2002). Bayesian Estimation of Outstanding Claims Reserves. North AmericanActuarial Journal 6(4): 1-20.

7. de Alba E. (2006). Claims Reserving When There Are Negative Values in the Runoff Trian-gle: Bayesian Analysis Using the Three-parameter Log-normal Distribution. North AmericanActuarial Journal, 10(3), 1-15.

8. de Alba E. and Nieto-Barajas L. E. (2008). Claims reserving: A Correlated Bayesian Model.Insurance: Mathematics and Economics, 43, 368-376.

9. England P. D. and Verrall R. J. (1999).Analytic and Bootstrap Estimates of Prediction Errorsin Claims Reserving. Insurance: Mathematics and Economics, 25, 281-293.

10. England P. D. and Verrall R. J. (2002). Stochastic Claims Reserving in General Insurance(with discussion), British Actuarial Journal (8): 443-544.

11. Gelfand A. E., Dey D. K., and Chang H. (1992). Model determination using predictivedistributions with implementation via sampling-based methods. Bayesian statistics 4, (ed. J.M. Bernardo, J. O. Berger, A. P. Dawid, and A. F. M. Smith), pp. 147-68. Oxford UniversityPress.

12. Johnson N.L.and Kotz S. (1972). Distributions in Statistics, Volume 4. New York: Wiley.

13. Gaver J. J., and Paterson J. S. (2004). Do Insurers Manipulate Loss Reserves to MaskInsolvency Problems? Journal of Accounting and Economics, 37, 393-416.

14. Gelman A. and Rubin D. B. (1992).Inference from iterative simulation using multiple se-quences (with discussion). Statistical Science 7, 457-511.

15. Gelman A., Carlin J. B., Stern H. S., and Rubin D. B. (2004). Bayesian Data Analysis.London: Chapman and Hall.

16. Gelman A. and Hill J. (2007). Data Analysis Using Regression and Multilevel/HierarchicalModels. Cambridge University Press.

17. Geweke J. and Whiteman C. H. (2006). Bayesian Forecasting, Handbook of Economic Fore-casting, Volume 1, pages 3-80. Edited by Graham Elliott, Clive W. J. Granger, and AllanTimmerman. Amsterdam: North-Holland.

18. Guszcza J. (2008). Hierarchical Growth Curve Models for Loss Reserving. Casualty Actuar-ial Society Forum. Available at http://www.casact.org/pubs/forum/08fforum/7Guszcza.pdf.

19. Hacking I. (1990). The Taming of Chance, Cambridge University Press.

20. Mack T. (1993). Distribution Free Calculation of the Standard Error of Chain Ladder ReserveEstimates. ASTIN Bulletin 23: 213-225.

21. Mack T. (1999). The standard error of chain ladder reserve estimates: recursive calculationand inclusion of a tail factor. ASTIN Bulletin 29(2): 361-366.

22. McCullagh P. and Nelder J.(1989). Generalized Linear Models, Boca Raton: Chapman andHall.

24

23. Meyers G. (2009). Stochastic Loss Reserving with the Collective Risk Model. Variance 3(2):239-269.

24. Ntzoufras I. and Dellaportas P. (2002). Bayesian Modelling of Outstanding Liabilities Incor-porating Claim Count Uncertainty, North American Actuarial Journal 6(1): 113-128.

25. Ohlsson E. and Lauzeningks J. (2009). The One-year Non-life Insurance Risk. Insurance:Mathematics and Economics, 45, 203-208.

26. Pinheiro P.J.R., Andrade e Silva J.M., and Centeno M.L. (2003). Bootstrap Methodology inClaim Reserving. The Journal of Risk and Insurance, 70, 701-714.

27. Pole A., West M. and Harrison J. (1994). Applied Bayesian Forecasting and Time SeriesAnalysis, Chapman-Hall, New York.

28. Scollnik D. P. M. (2001). Actuarial Modeling with MCMC and BUGS, North AmericanActuarial Journal 5(2): 96-124.

29. Seber G.A.F. and Wild C.J. (1989). Nonlinear Regression, New York: John Wiley and Sons.

30. Spiegelhalter D., Thomas A., Best N., Gilks W., and Lunn D. (2003). BUGS. MRC Biostatis-tics Unit, Cambridge, U.K. Available at www.mrc-bsu.cam.ac.uk/bugs/.

31. Standard & Poor’s. (2003). Insurance Actuaries: A Crisis of Credibility.

32. Taylor G. C. and Ashe F. R. (1983). Second Moments of Estimates of Outstanding Claims.Journal of Econometrics 23, 37-61.

33. Verral R. J. (2004). A Bayesian Generalized Linear Model for the Bornhuetter-FergusonMethod of Claims Reserving. North American Actuarial Journal 8(3): 67-89.

25

Documents

A Bayesian Nonlinear Model for Forecasting Insurance Loss Payments