Modelling Non-life Insurance Policyholder Price Sensitivity1114349/FULLTEXT01.pdfIf P&C Insurance Ltd, at their headquarters in Stockholm, Sweden. If P&C, or Property & Casualty, is

INDEGREE PROJECT TECHNOLOGY,FIRST CYCLE, 15 CREDITS

,STOCKHOLM SWEDEN 2017

Modelling Non-life Insurance Policyholder Price SensitivityA Statistical Analysis Performed with Logistic Regression

PATRIK HARDIN

SAM TABARI

KTH ROYAL INSTITUTE OF TECHNOLOGYSCHOOL OF ENGINEERING SCIENCES

Modelling Non-life Insurance Policyholder Price Sensitivity

A Statistical Analysis Performed with Logistic Regression PATRIK HARDIN SAM TABARI Degree Projects in Applied Mathematics and Industrial Economics Degree Programme in Industrial Engineering and Management KTH Royal Institute of Technology year 2017 Supervisors at If P&C: Henrik Bosaeus, Jonathan Fransson, Filip Allard Supervisors at KTH: Thomas Önskog, Per Thulin Examiner at KTH: Henrik Hult

TRITA-MAT-K 2017:10 ISRN-KTH/MAT/K--17/10--SE Royal Institute of Technology School of Engineering Sciences KTH SCI SE-100 44 Stockholm, Sweden URL: www.kth.se/sci

Abstract

This bachelor thesis within mathematical statistics studies the possibility of modellingthe renewal probability for commercial non-life insurance policyholders. The projectwas carried out in collaboration with the non-life insurance company If P&C InsuranceLtd. at their headquarters in Stockholm, Sweden. The paper includes an introductionto underlying concepts within insurance and mathematics and a detailed review of theanalytical process followed by a discussion and conclusions.

The first stages of the project were the initial collection and processing of explanatoryinsurance data and the development of a logistic regression model for policy renewal. Aninitial model was built and modern methods of mathematics and statistics were appliedin order obtain a final model consisting of 9 significant characteristics. The regressionmodel had a predictive power of 61%. This suggests that it to a certain degree ispossible to predict the renewal probability of non-life insurance policyholders based ontheir characteristics. The results from the final model were ultimately translated into ameasure of price sensitivity which can be implemented in both pricing models and CRMsystems.

We believe that price sensitivity analysis, if done correctly, is a natural step in improvingthe current pricing models in the insurance industry and this project provides a foun-dation for further research in this area.

Keywords: Mathematical Statistics, Regression Analysis, Logistic Regression,Generalized Linear Model, Insurance Pricing, Price Sensitivity, Data Analysis

1

Sammanfattning

Detta kandidatexamensarbete inom matematisk statistik undersoker mojligheten attmodellera fornyelsegraden for kommersiella skadeforsarkringskunder. Arbetet utfordes isamarbete med If Skadeforsakring vid huvudkontoret i Stockholm, Sverige. Uppsatseninnehaller en introduktion till underliggande koncept inom forsakring och matematiksamt en utforlig oversikt over projektets analytiska process, foljt av en diskussion ochslutsatser.

De huvudsakliga delarna av projektet var insamling och bearbetning av forklarandeforsakringsdata samt utvecklandet och tolkningen av en logistisk regressionsmodell forfornyelsegrad. En forsta modell byggdes och moderna metoder inom matematik ochstatistik utfordes for att erhalla en slutgiltig regressionsmodell uppbyggd av 9 signifikantakundkaraktaristika. Regressionsmodellen hade en forklaringsgrad av 61% vilket pekarpa att det till en viss grad ar mojligt att forklara fornyelsegraden hos forsakringskunderutifran dessa karaktaristika. Resultaten fran den slutgiltiga modellen oversattes slutligentill ett priskanslighetsmatt vilket mojliggjorde implementering i prissattningsmodellersamt CRM-system.

Vi anser att priskanslighetsanalys, om korrekt genomfort, ar ett naturligt steg i utvecklin-gen av dagens prissattningsmodeller inom forsakringsbranschen och detta projekt laggeren grund for fortsatta studier inom detta omrade.

Nyckelord: Matematisk Statistik, Regression, Logistisk Regression, Forsakringsprissattning,Priskanslighet, Dataanalys

2

Acknowledgements

A special thanks goes out to the analysts of Product & Price department at If P&CInsurance that have introduced us to the world of insurance and mentored us throughoutthis project including, Henrik Bosaeus, Jonathan Fransson and especially Filip Allardwhom we have been working closest with.

We would also like to thank our thesis supervisor at KTH Royal Institute of Technology,Thomas Onskog from the department of Mathematical Statistics. Thomas has through-out the project continuously supported us and given us feedback which has been muchappreciated and very helpful.

Another special thanks goes out to Per Thulin from the department of Industrial Eco-nomics at KTH Royal Institute of Technology. Per has guided us on the economic aspectsof this project and his help has been much appreciated.

3

Contents

1 Introduction 81.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81.2 Project Formulation and Goals . . . . . . . . . . . . . . . . . . . . . . . . 81.3 Project Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91.4 Project Scope and Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.5 Literature Review . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101.6 Prior Research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Insurance Theory 122.1 Business Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.2 Profitability Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.3 Insurance Pricing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Mathematical Theory 163.1 Regression Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.2 Linear Regression Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3.2.1 Ordinary Linear Model - OLM . . . . . . . . . . . . . . . . . . . . 183.2.2 Generalized Linear Model - GLM . . . . . . . . . . . . . . . . . . . 183.2.3 Logistic Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.2.4 Maximum Likelihood Estimation . . . . . . . . . . . . . . . . . . . 20

3.3 Statistical Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . . . . . 223.3.1 Goodness of Fit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223.3.2 Pearson Goodness of Fit test . . . . . . . . . . . . . . . . . . . . . 233.3.3 Hosmer-Lemeshow Goodness of Fit test . . . . . . . . . . . . . . . 23

3.4 Model Validation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.4.1 AIC & BIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.4.2 Receiver Operating Characteristic . . . . . . . . . . . . . . . . . . 253.4.3 Likelihood-Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . 253.4.4 Deviance Goodness-of-Fit . . . . . . . . . . . . . . . . . . . . . . . 263.4.5 Residual Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.4.6 Influence and Leverage Diagnostics . . . . . . . . . . . . . . . . . . 27

3.5 Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.5.1 Marginal Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Data 294.1 Data Structures and Levels . . . . . . . . . . . . . . . . . . . . . . . . . . 29

4.1.1 Desired structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2 Data Collection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3 Data Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4.3.1 Primary Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3.2 Secondary Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314.3.3 Characteristic Selection . . . . . . . . . . . . . . . . . . . . . . . . 32

4

4.4 Characteristic Grouping and Explanatory Variables . . . . . . . . . . . . . 324.5 Response Variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

5 Model Development 345.1 Initial Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 345.2 Analyzing the Initial Model . . . . . . . . . . . . . . . . . . . . . . . . . . 34

5.2.1 β Odds Ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 355.3 Regrouped Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.4 Analyzing the Regrouped Model . . . . . . . . . . . . . . . . . . . . . . . 37

5.4.1 β Odds Ratios . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.4.2 Multicollinearity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 375.4.3 Statistical Hypothesis testing . . . . . . . . . . . . . . . . . . . . . 38

5.5 Intuitive Reduced Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.6 Stepwise Reduced Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395.7 Final Model Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6 Results 416.1 Final Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

6.1.1 Statistical Hypothesis Testing . . . . . . . . . . . . . . . . . . . . . 416.1.2 Receiver Operating Characteristic . . . . . . . . . . . . . . . . . . 436.1.3 Residuals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446.1.4 Leverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 456.1.5 Influence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6.2 Model Comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

7 Discussion 487.1 Final Model Validation & Adequacy . . . . . . . . . . . . . . . . . . . . . 48

7.1.1 Predictive Power . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487.1.2 Statistical Hypothesis Tests Analysis . . . . . . . . . . . . . . . . . 487.1.3 Residuals, Leverage and Influence . . . . . . . . . . . . . . . . . . 48

7.2 Comparing the Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 497.3 Analyzing the Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . 50

7.3.1 Characteristic 4 - Sales Channel . . . . . . . . . . . . . . . . . . . 507.3.2 Characteristic 7 - Price Increase . . . . . . . . . . . . . . . . . . . 51

7.4 Sources of Error and Uncertainty . . . . . . . . . . . . . . . . . . . . . . . 52

8 Price Sensitivity Analysis 538.1 Relation Between Renewal Probability and Price Sensitivity . . . . . . . . 538.2 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

8.2.1 Implementation Areas . . . . . . . . . . . . . . . . . . . . . . . . . 558.3 Economic Impact of Price Sensitivity Analysis . . . . . . . . . . . . . . . . 55

8.3.1 If P&C Insurance . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568.3.2 Corporates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568.3.3 Society . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5

8.4 Ethics of Price Sensitivity Analysis . . . . . . . . . . . . . . . . . . . . . . 57

9 Conclusions 59

10 Recommendations 60

References 63

Appendix 64

6

List of Figures

1 Insurance Levels . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292 Odds estimates for Initial Characteristic 1 . . . . . . . . . . . . . . . . . . 353 Odds estimates for Initial Characteristic 18 . . . . . . . . . . . . . . . . . 364 ROC Curve for Final Model . . . . . . . . . . . . . . . . . . . . . . . . . . 435 Scatterplot of Pearson Residuals . . . . . . . . . . . . . . . . . . . . . . . 446 Scatterplot of Deviance Residuals . . . . . . . . . . . . . . . . . . . . . . . 447 Scatterplot of Leverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458 Distribution of Leverage . . . . . . . . . . . . . . . . . . . . . . . . . . . . 459 Scatterplot of DFBETAS for intercept . . . . . . . . . . . . . . . . . . . . 4610 Distribution of DFBETAS for intercept . . . . . . . . . . . . . . . . . . . 4611 Odds Ratios for Characteristic 4 . . . . . . . . . . . . . . . . . . . . . . . 5012 Odds Ratios for Characteristic 7 . . . . . . . . . . . . . . . . . . . . . . . 5113 Price Sensitivity Slopes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

List of Tables

1 Example data set aggregated to Policy Period level. . . . . . . . . . . . . 302 Characteristic Grouping Examples . . . . . . . . . . . . . . . . . . . . . . 333 Null Hypothesis β = 0 for each characteristic subset . . . . . . . . . . . . 384 Null Hypothesis β = 0 for

each characteristic subset . . . . . . . . . . . . . . . . . . . . . . . . . . . 415 Global Null Hypothesis β = 0 for entire characteristic set . . . . . . . . . 416 Pearson and Deviance Goodness-of-Fit Hypothesis Tests . . . . . . . . . . 427 Partitions for the Hosmer-Lemeshow Goodness-of-Fit Tests . . . . . . . . 428 Hosmer-Lemeshow Goodness-of-Fit Tests Results . . . . . . . . . . . . . . 429 Model validation comparisons between the different models . . . . . . . . 4710 Model comparisons deltas with Initial Modelas reference . . . . . . . . . . 4711 Residual, Leverage and Influence in the Final Model . . . . . . . . . . . . 48

7

1 Introduction

1.1 Background

Insurance is a contract between an insurer and a policyholder in which the insuringentity offers financial protection and reimbursement to the policyholder in the event ofan incurred loss. The core parts of an insurance contract, also known as an insurancepolicy, are the circumstances under which the policyholder is covered, the time periodin which the insurance policy is valid, the insurance price known as the premium andthe deductible which is an amount paid by the policyholder in the event of a claim. Inthe event of a covered loss, the policyholder reports a claim to the insurer which thenprovides the agreed compensation.

The insurance industry serves the purpose of hedging against the risk of contingent lossand is a common form of risk management for a large variety of policyholders. Forinstance, instead of holding large amounts of cash to replace property in the event of adisaster, a company can take out an insurance policy and as a result allow themselves tosafely carry out investments and other value creating activities in their business.

The Swedish insurance industry is a free competitive market with multiple actors andhas since a pricing deregulation during the 90’s grown in complexity. For the insur-ance companies it is vital that policies are offered at a competitive yet profitable price.Generally, this is achieved through statistical analysis of risk characterizing parametersof the policyholders to predict future reimbursement costs for specific customer groups,which then are reflected in the offered premiums.

Accordingly, a better estimation of the risk will allow for a more accurate pricing and acompetitive advantage for the insurer. Optimizing and improving insurance pricing al-gorithms to better understand the customer is therefore something insurance companiesput a lot of effort in, in order to overcome the competition.

1.2 Project Formulation and Goals

As we are heading towards a more digitalized world, the amount of accessible data isincreasing and the analytical methods are becoming more sophisticated. Data Mining,Data Science and Big Data are all trending concepts on the rise and are increasinglybeing applied in new fields. For the insurance industry which relies on statistical analysis,this naturally means that risk prediction is incrementally improving. Moreover, as moreinformation is available, ideas of how understanding the customer in other ways thanimplied risk emerge.

This project aims to research how characteristics may predict price sensitivity amongcurrent insurance policyholders and how this information may be used to optimize anddevelop insurance pricing at renewal. The project was carried out in collaboration with

8

If P&C Insurance Ltd, at their headquarters in Stockholm, Sweden. If P&C, or Property& Casualty, is the largest actor on the Nordic non-life insurance market with around20-25% market share and consists of four main ares of operation; Private, Commercial,Large Enterprises and the Baltic Region.

The outcome of this project could have a great impact on the insurance pricing modelsthat cover a major part of the gross premium written by If P&C. More specifically,pricing models which successfully predict and take price sensitivity into considerationare believed to improve the customer satisfaction as premium price is one of its keycomponents, if not the most essential. With better optimized pricing models that takepolicyholder price sensitivity into account, renewal rate might improve, and accordinglyincrease the gross premiums written, the total insurance volume, without raising orlowering the premiums as an overall net, implying a competitive edge for If P&C.

1.3 Project Approach

Price sensitivity will in this project be analyzed through modelling the Renewal Prob-ability for different customer categories and for different price increases1, holding otherfactors such as coverage and deductible constant. The renewal probability is definedas the probability of actively renewing an insurance policy as the current subscriptionperiod is coming to an end.

The price sensitivity for a certain customer is then derived as the change in renewalprobability divided by the change in price increase.

Price Sensitivity :=∆ Renewal Probability

∆ Price Increase(1)

If the renewal probability drops quickly as the price increase rises, the customer is pricesensitive an vice versa.

Ultimately, the modelling of policy renewal probability will in this project be approachedthrough developing a logistic generalized linear regression model, based on historical dataof If P&C’s written policies containing customer specific characteristics.

1Increase in premium

9

1.4 Project Scope and Tools

The analysis of this project will be be based of data from a subset of all written insurancepolicies by If P&C Insurance. The conditions are

• Commercial insurance policies

• Swedish policyholders

• Policies starting from the year 2014 and later

All data management, computing and model analysis will be carried out in the soft-ware suite Statistical Analysis System, SAS. Plotting and visualization of results will beperformed both in SAS and in Microsoft Excel.

1.5 Literature Review

Throughout this project, information was gathered from several sources. The knowledgemay be divided in two main parts; Insurance Theory and Mathematical Theory.

Information about insurance theory was primarily collected from interviews with actu-aries and pricing analysts at If P&C. Before starting this project, a 5 day course ininsurance theory was attended at If P&C. The majority of information for the insurancetheory was gathered during this course and from having a continuous dialog with theactuaries and pricing analysts at If P&C and working very close throughout the project.The insurance theory was essential for this project and provided a wider understandingon core concepts, project achievements and its potential impact for If P&C.

To gain even deeper knowledge of how specifically GLM models are used in pricing non-life insurance policies, the book Non-Life Insurance Pricing with Generalized LinearModels by E. Ohlsson, B. Johansson [4] was very helpful. For further reading into thissubject, this book is highly recommended.

Mathematical theory was collected from a variety of sources, both from textbooks andfrom the internet. The main source for the mathematical theory was the fifth editionof Introduction to Linear Regression Analysis by Douglas C. Montgomery, ElizabethA. Peck and G. Geoffrey Vining[2]. The book is a textbook for courses in regressionanalysis including the course SF2930 at KTH Royal Institute of Technology. It combinesboth theory behind regression analysis with real world examples and puts an emphasison presenting the necessary steps for applying regression analysis for statistical modelbuilding.

In addition to the textbook Introduction to Linear Regression Analysis a variety ofinternet sources were used to gather deeper understanding on statistical measures used inlogistic regression analysis. A majority of these came from Wikipedia and SAS Analytics.Please find the full bibliography at the end for further reference on the literature usedin writing this report.

10

1.6 Prior Research

Before this project was carried out, previous work and research in the two main areasPrice Sensitivity and Big Data Analysis, both inside and outside the insurance industry,were studied.

Price sensitivity in general is a well researched area that has been growing with theincrease of available data on customers and their behaviour.

Why price sensitivity is of interest is because it can be used to optimize the pricing of agood or service for a specific subset of a given population. This is done to have a bettersales approach and will hopefully increase both customers satisfaction as well as increaserevenue and profitability for the company.

McKinsey mentions in their report on How retailers can improve price perception prof-itability [20] that ”New methodologies, powered by big data and advanced analytics,can help retailers attract value-conscious consumers without sacrificing margins.” Themain idea behind this is to identify something called key value categories (KVC) and keyvalue items (KVI) that are products that the customers tend to notice and remember.The idea is to price these products very competitively to attract customers and raiseprices on other items to stay profitable. The trick in doing this effectively is firstly tobe good at identifying what the KVCs and KVIs are and secondly to have a fact-based,data-driven and systematic approach in pricing these products.

Just like price sensitivity, big data has also been a up and coming research area with alot of activity recently. Big data analysis is the science of collection a very large amountof complex raw data, process this data and then apply modern methods of statistics andmathematics to analyze it. The analysis will hopefully produce valuable and actionableinformation for the analyst.

With regards to the insurance industry, big data analysis is still an up and coming areathat rapidly is getting more and more traction. As stated in an article from FinancialTimes[21], big data is something that will revolutionize major parts of the insuranceindustry in the coming years. The main application area for big data analysis is projectedto be ”Pricing, Underwriting and Risk Selection”. Implementation in pricing analysisis in other words the largest application of big data for insurance companies and this isthe exact area of our report.

Financial Conduct Authority (FCA) in the United Kingdom have previously requestedinformation from insurers on how they are using big data analysis in their operations.Their findings reveal that the use of data analysis could include a market study, adjust-ments to policy or guidance or another form of intervention”. This is also in the scopeof our project.

A similar project regarding policyholder price sensitivity was initiated by If P&C lastyear, but with a different scope and analytical foundation. Ideas from this project havebeen studied and considered when applicable. [11]

11

2 Insurance Theory

2.1 Business Model

The business model of an insurance company is, in short, to collect premiums from writ-ing insurance policies upfront, invest the premiums and in the future compensate theholder of insurance policy in case of any damages or claims that might occur. Premi-ums must be priced high enough to generate sustainable profit, but low enough to becompetitive.

A few key terms are introduced below to further present how an insurance companymeasures profitability

Claim Cost: The cost associated with damages and claims from the policyholders. Canbe divided into the following categories:

• Paid: What actually has been paid to the customer.

• Case: Based on the incurred claims. The claims are known and reported but notyet paid.

• IBNR: Incurred but not reported claims. Since the insurance company is unawareof these claims, the estimate of their cost is more volatile.

Administrative Costs: This includes costs for salaries, office space and other costs torun the company. In other words, regular business expenses.

Return: This is the company’s profit and what is left for the shareholders.

Gross Written Premium (GWP): The total premiums that are collected during asubscription period, usually one year. This is the sum of what all customers pay theinsurance company and is payed all at once.

Gross Earned Premium (GEP): Linearization of GWP. GWP is linearly distributedover the time of the insurance period to get GEP. GEP will equal GWP at the end ofthe insurance period as GWP increases linearly with time. For example if the GWP is100 SEK, the GEP would be 100 · (1/365) after 1 day and 100 · (155/365) after 155days.

Ratios: Different profitability ratios are presented below. These are used to evaluatehow well the company is doing and how well the policies have been priced.

• Paid Ratio: Paid / GEP

• Reported Ratio: (Paid + Case) / GEP

• Risk Ratio: (Paid + Case + IBNR) / GEP

• Cost Ratio: Administrative Costs / GEP

12

• Combined Ratio: (Paid + Case + IBNR + Administrative Costs ) / GEP

The Combined Ratio is the most important metric for insurance companies as a Com-bined Ratio below 1 which essentially implies that the company is profitable.

2.2 Profitability Modeling

The owner of If P&C, the Finnish financial group Sampo, has a Return on Equity, ROE,requirement of 17.5% on the capital it has invested in If. The ROE is a function offour variables; Revenue, Combined Ratio, Investment Return and Capital Buffer and isderived from the following formula

ROE =Revenue · (1 - Combined Ratio) + Investment Return

Capital Buffer· (1− Tax Rate)

It is clear that in order to reach a maximum value for the ROE, a large Revenue, smallCombined Ratio, high Investment Return and a small Capital Buffer are desired. Eachcomponent of this equation is presented below

Revenue

The Revenue is the total sum of all collected premiums for a time period. In other words,the revenue is equal to the gross written premiums for one year.

Combined Ratio

The Combined Ratio is essentially what percentage of the collected premiums that isneeded to cover all the company’s expenses. 1 - CR is hence the percentage of thecollected premiums that the company can keep as profit.

Investment Return

The investment return comes from investing the premiums that are collected upfront.Since there is a time gap from when the premiums are collected until the expenses interms of claim costs are paid, that money can be invested and collect interest in themeantime. But not all the of the gross written premiums can be invested. In theexample below the remains from the GWP are used to cover business expenses and canhence not be invested. The Capital Buffer is on the other hand invested and generatesinvestment return.

• Gross Written Premiums (GWP)

• Unearned Premium Reserve: A% of GWP

• Claims Reserve: B% of GWP

• Additional External Capital (from Sampo): Amount equal to C% of GWP

13

• Total Capital = GWP · (1+A%+B%+C%)

A higher return on the invested premiums help increase the ROE. The ROE is hencesensitive to changes in interest rates and If’s attitude toward risk taking in investingthe premiums. In a higher interest rate environment, a higher Combined Ratio can beoffset by higher investment returns and the company can still reach the ROE target fromSampo.

Capital Buffer The Capital Buffer is capital that must be place aside and comes fromSampo. The reason for this is partly regulatory, insurance companies must legally putaside capital as stated by the Swedish capital requirement rules. However, If P&Cchooses to hold more capital in addition to this requirement. The reason for this is toobtain a higher credit rating and show their customers that they are in a strong financialposition. A large Capital Buffer is in other words also used as a sign of stability andsolidity toward If’s customers.

Tax Rate To find the true ROE, the pre-tax ROE has to be adjusted for tax. TheSwedish corporate tax rate is 22% and hence the pre-tax ROE needs to be multipliedwith a factor of 1− 0.22 = 0.78 to find the actual ROE.

This is the actual ROE that Sampo is requiring. Since the tax rate rarely changes,this is view as a constant and not a variable in calculating the ROE and modeling thecompany’s profitability.

2.3 Insurance Pricing

Correct pricing of the insurance policies is essential and at the heart of every insurancecompany. It is the absolute largest factor that determines the combined ratio and thecompetitiveness of the company. Too low premiums and the combined ratio will exceed1, but too high premiums and the company will loose its customers.

Every insurance company has slightly different approach in how they price their policies’premiums. However, the main method of pricing is by estimating the expected totalclaims cost for a specific policy. This is done by creating an insurance tariff whichis built upon a number of different explanatory variables that are used to predict thefuture expected claim cost for a policy. When this is done, adjustments are made and apremium for the policy is derived.

Premium = (Risk + Administrative Costs + Return) · Adjustment Term

The Risk associated with a policy is calculated from an insurance tariff and is theexpected value of the claims cost for a policy. Administrative Costs and Return areadded to Risk to cover the administration costs and produce a return for the company.When all these are added up, the risk-correct premium for a specific policy in isolationis derived.

14

An insurance tariff is in essence a regression model built upon a number of explanatoryvariables. Building a tariff is a data intensive process where a lot of historical datais collected, processed and together with statistical analysis tools used to train a risk-correct pricing model for a specific type of insurance. The derived premium reflects theexpected damage costs for next year.

With a complete tariff at hand, the risk-correct premium for a new policy, the tariff rate,is calculated by inserting the values for the specific variable values into the model whichthen outputs the rate. The majority of the tariff models are regularly updated as moreand more claims data is collected.

The risk-correct policy premium is in a last step then multiplied by an adjustment term.The adjustment term consist of a number of coefficients that in turn are multiplied to-gether. These coefficients represent different factors that make appropriate adjustmentsto the tariff rate. Adjustment factors might for instance be group insurances and allianceagreements.

Depending on the results from this project, a new coefficient, Customer Price Sensitivity,might be added to the adjustment term. This will hopefully improve the pricing modeland lead to a higher renewal rate among If’s customers.

15

3 Mathematical Theory

3.1 Regression Analysis

”Regression analysis is a statistical technique for investigating and modelingthe relationship between variables. Applications of regression are numer-ous and occur in almost every field, including engineering, the physical andchemical sciences, economics, management, life and biological sciences, andthe social sciences. In fact, regression analysis may be the most widely usedstatistical technique.” [2, page 1]

Examples of problems which could be analyzed and modeled with regression analysisinclude

• How the fuel consumption of a specific car depends on the number of people trav-eling, the average speed of the car, outside temperature etc.

• How the price of an apartment depends the location, size, closeness to a park, etc.

• How the renewal rate for Commercial P&C Insurance might depend on severalcustomer characteristics.

3.2 Linear Regression Modelling

A linear regression model assumes a linear relationship between the response variabledenoted yi and the explanatory variables denoted xij . This is modelled with the help ofan error term εi, adding noise to the relationship.

Given a set of n observation of one response variable and k explanatory variables{yi , xi1, xi2, . . . , xik}ni=1 a linear regression model may be written on the followingform

yi = β0 + β1xi1 + β2xi2 + · · ·+ βkxik + εi i ∈ {1, . . . , n} (2)

16

Or more compactly asy = Xβ+ ε (3)

Where

y =[y1 y2 . . . yn

]T ∈ Rn×1

ε =[ε1 ε2 . . . εn

]T ∈ Rn×1

β =[β0 β1 β2 . . . βk

]T ∈ R(k+1)×1

X =

1 x11 . . . x1k1 x21 . . . x2k...

.... . .

...1 xn1 . . . xnk

∈ Rn×(k+1)

• y is the vector of the observed values for the response variable.

• X is the matrix of the observed values for the explanatory variables. The ones inthe first column yields the variable-independent intercept parameter β0.

• xi is the ith row of X, explanatory variables for the ith observation.

• β is the vector of unknown coefficient parameters.

• ε is the vector of error terms.

Since values for both y and X are known and collected from historical data, this yieldsan over-determined system of equations with the objective of optimally estimating the βparameters. The Least Squares and Maximum Likelihood methods are two of the mostcommon ways of estimating the β parameters. When the β parameters are estimated amodel is created with the ability to predict response values for a new observation set ofexplanatory data.

17

3.2.1 Ordinary Linear Model - OLM

In the Ordinary Linear Model, errors are assumed to be independent, homoscedastic2

and normally distributed with mean 0. Thus implying εi is viewed a realization of arandom variable belonging to a to the normal distribution that follows

ε ∈ Nn(0, σ2In)

An ordinary linear model containing only one explanatory variable is referred to as aSimple Linear Regression model

yi = β0 + β1xi1 + εi i ∈ {1, . . . , n}

If more than one explanatory variable is present in a ordinary linear model it is insteadcalled a Multiple Linear Regression

yi = β0 + β1xi1 + β2xi2 + · · ·+ βkxik + εi i ∈ {1, . . . , n}

y = Xβ+ ε

3.2.2 Generalized Linear Model - GLM

The Generalized Linear Model is an extension of the Ordinary Linear Model allowingthe error term ε to be modelled after other distributions besides the normal distribu-tion.

GLM is hence an alternative approach to data transformation when performing linearregression where assumptions of a constant variance and normality are not satisfied. Theonly requirement of GLM is that the modelled error distribution must be a member ofthe Exponential Family of distributions i.e. Normal, Poisson, Binomial, Exponential orGamma distribution and a few others. [2, p. 421]

3.2.3 Logistic Regression

Logistic Regression is a special case of GLM where the response variable only can takethe value of either 0 or 1, often representing whether an event has occurred or not such aspass/fail or win/lose[2, p.422][3]. An example of a simple problem that can be modeledwith logistic regression is the probability of passing an exam (outcome; 1 = pass, 0 =fail) explained by the hours of studying.

2Having constant variance

18

The Logistic Regression model in its essence strives to predict the probability that theresponse variable assumes the value 1. This is viewed as an outcome of a BernoulliTrial.

Pr(yi = 1) = πi

Pr(yi = 0) = 1− πi

The expected value of the response variable is

E[yi] = 1 · (πi) + 0 · (1− πi) = πi

The knowledge that E[εi] = 0 and that the model is unbiased gives

E[yi] = E[xTi β] + E[εi] = xT

i β+ 0

Hence

E[yi] = xTi β = πi

The expected value of the response is now proved to be the same as the probability thatresponse assumes the value 1.

This causes all sorts of problems with using the usual linear response function; y =Xβ + ε. For one the previous section shows that there is an obvious constraint on theresponse variable; 0 ≤ E[yi] = πi ≤ 1. The error term can also only take two values;εi = 1−xT

i β (for yi = 1) and εi = −xTi β (for yi = 0) and from this it can also be proved

that the variance is not constant. All of which make the usual linear response functionnot feasible to use for logistic regression.

To get around this problem a new response function needs to be chosen. When aresponse variable is binary as in this case, it is common to use a nonlinear responsefunction. Usually a nonlinear S-shaped curve called the logistic response function isused and has the following form

E[yi] =exp(xT

i β)

1 + exp(xTi β)

=1

1 + exp(−xTi β)

(4)

In order to linearize this expression, the response function is logarithmized. This is donewith something called a logit transformation.

19

Since E[yi] = πi the following equality is acquired

exp(xTi β)

1 + exp(xTi β)

= πi

Solving for xTi β gives

xTi β = ln

πi1− πi

This is the logit-transformation of πi and the ratio πi(1−πi) is called odds.

odds =πi

1− πi∈ R+

logit = ln( πi

1− πi)∈ R

3.2.4 Maximum Likelihood Estimation

In logistic regression models, the Maximum Likelihood method is commonly used toestimate the β-parameters and in turn the linear predictor xTβ.[2, p. 424]

To use the maximum likelihood method, the likelihood function first needs to be found.The likelihood function is a function used to calculate the total probability for a seriesof events to occur. This function is then maximized to find the coefficient values causingthe highest probability for a series of events to happen.

Since the response variable yi is binary and only can take the value of 0 or 1, it has thefollowing Bernoulli distribution

Pr(yi = 1) = πi

Pr(yi = 0) = 1− πi

This can be rewritten with the following probability density function is for each inde-pendent observation

fi(yi) = πyii (1− πi)1−yi (5)

For i = 1, 2, ..., n where yi takes the value of either 0 or 1.

20

Independent probabilistic events are multiplicative. This results in the following likeli-hood function

L(y,β) =n∏i=1

fi(yi) (6)

L(y,β) =

n∏i=1

πyii (1− πi)1−yi (7)

To linearize and make this function easier to work with, the function is logaritmized andthe result is the log-likelihood function

ln[L(y,β)] = ln[n∏i=1

fi(yi)] (8)

ln[L(y,β)] =n∑i=1

[yi ln(πi

1− πi)] +

n∑i=1

ln(1− πi) (9)

Since we previously proved that xTβ = ln π(1−π) we can rewrite the log-likelihood function

as

ln[L(y,β)] =n∑i=1

yixTi β−

n∑i=1

ln[1 + exp(xTi β)] (10)

The log-likelihood function ln[L(y,β)] is now found and in order to determine its max-imum value, it first needs to be differentiated with respect to β. The differentiationwill yield an equation system that needs to be solved numerically. The solution to suchsystem is the maximum value of the log-likelihood function. This is a vector containingthe best estimates of the β-parameters for the model called β.

β = [β0, β1, β2, . . . βk]T (11)

The fitted value for a logistic regression model is then on the following form

yi = πi =exp(xTβ)

1 + exp(xTβ)=

1

1 + exp(−xTβ)(12)

21

3.3 Statistical Hypothesis Testing

There are usually considered to be two stages in data analysis; Exploratory Data Analysisand Confirmatory Data Analysis[15].

Exploratory data analysis deals with analyzing current data sets to investigate theirfeatures and characteristics. Applying methods of regression analysis and building astatistical model from a data set falls under this stage.

The purpose of confirmatory data analysis is on the other hand to evaluate the resultsthat are found from the exploratory data analysis. It evaluates how good the resultsare and whether the results have occurred by chance or not. To answer these ques-tions, traditional statistical tools such as significance, confidence and interference can beused.

This section addresses confirmatory data analysis, more broadly referred to as statisticalhypothesis testing, and presents the most applicable methods of statistical hypothesistesting to this project.

3.3.1 Goodness of Fit

Goodness of fit is a general term describing how well a statistical model fits a set ofobservations. This is commonly measured through hypothesis testing where a NullHypothesis can be rejected with a certain level of significance.

For instance, a null hypothesis may be that a certain variable in a regression model isinsignificant

H0 : βj = 0 H1 : βj 6= 0

This is in its essence carried out by comparing a specific test statistic to the distributionof a random variable which it asymptotically approaches.

If the null hypothesis presented above is accepted, it is suggested that the explanatoryvariable associated with βj not is significant for the model and should therefore beexcluded.

In order to either accept or reject the null hypothesis, the p-value is calculated. The p-value is derived from the comparison between the statistic and the corresponding randomvariable distribution. The p-value is formally the probability of obtaining results thatare at least as extreme as the observed, given that the null hypothesis is true.[9] In otherwords, a low p-value suggests that the null hypothesis can be be rejected.

Generally, if the p-value is less than or equal to the selected significance level α, thenull hypothesis is rejected. In the example above this would imply that the explanatory

22

variable associated with βj is a meaningful addition to the model and should hence notbe excluded.[18]

3.3.2 Pearson Goodness of Fit test

The Pearson χ2-test is a statistical test used to investigate the goodness of fit for aregression model which utilizes the χ2-statistic defined by [2, p. 432]

χ2 =n∑i=1

yi − niπiniπi(1− πi)

(13)

The statistic corresponds to the normalized sum of squared distances between observedand fitted response values, also referred to as the Pearson Residuals which will be ad-dressed in isolation.

The statistic is compared to a random variable following a χ2 distribution with n − pdegrees of freedom and the selected confidence level 1− α

χ2α(n− p)

If the test statistic exceeds the critical value of the random variable, the null hypothesisis rejected. Thus for the Pearson test, smaller values of the statistic (large p-values)imply that the model fits the data better.[2, p. 432] The statistic is also commonlyexpressed as a ratio by dividing with the degrees of freedom, n-p. If the ratio is largerthan 1, the goodness of fit of the model is debatable.

3.3.3 Hosmer-Lemeshow Goodness of Fit test

The Hosmer-Lemeshow test is another statistical test for goodness of fit where observa-tions are classified into g groups based on estimated probabilities of success[2, p. 433].The Hosmer-Lemeshow statistic is essentially a Pearson statistic with slight adjustmentsto average the event probability for each group

HL =

g∑j=1

(Oj −Nj πj)2

Nj πj(1− πj)(14)

The HL statistic follows a χ2 distribution with g−2 degrees of freedom and small valuesof the statistic (large p-values) suggest that the model is adequate.

23

3.4 Model Validation

Model validation deals with statistical tests and tools that create a basis for evaluatingand selecting from two or more comparable models. The way regression modeling usuallyis carried out is by initially creating a Full Model including all explanatory variables thatare intuitively found to be interesting. This model is then adjusted and reduced withthe help of the statistical hypothesis testing. Model validation tests are very helpfulwhen when analyzing the improvement when developing a full model into a reducedmodel.

3.4.1 AIC & BIC

Akaike’s Information Criterion, AIC, is a common criterion for model complex-ity.

The AIC estimates the quality of a model in relation to another model and can be usedas a basis for choosing between two comparable models. The AIC-value for a model inisolation is more or less meaningless but when comparing between models, the AIC canbe a very helpful tool. The AIC essentially estimates the loss of information in a modeland hence a model with a low value for AIC-value is preferable.

AIC = 2k − 2 ln(L) (15)

k is the number of parameters that has been estimated in the model and L is themaximized value of the likelihood function that was derived in a previous section.

From this equation we see that deriving a low AIC value is a trade-off between addingmore explanatory variables to the model and getting a higher value for L. It is easy toincrease the maximum value of the likelihood function by simply adding more and moreexplanatory variables. However this causes a problem with over-fitting the model andmakes the model less applicable in the real world. A final model should only includeparameters that has a clear predicting ability in accordance with Occam’s Razor

”Among competing hypotheses, the one with the fewest assumptions shouldbe selected.”

AIC deals with this by penalizing the use of more parameters by increase the ”2k”-termwhich is bad for a low AIC-value at the same time as encouraging a high value for L bygiving it a negative coefficient which helps low the AIC-value for large values of L.

24

Bayesian Information Criterion has essential the same function as AIC.

BIC = ln(n)k − 2 ln(L) (16)

In addition to k and L, n is the number of data points in the data-set.

Both AIC and BIC are criterion that analyses and penalizes over-fitting by using apenalty term for the use of more parameters in the model. However the penalty term inBIC, ln(n)k, is larger than the penalty term in AIC, 2k.

AIC and BIC are usually used together and complement each other in the way that ifa model both has a a lower AIC and BIC value, it is sure to say that it is preferable tothe other comparable model.[1, p. 216-217]

3.4.2 Receiver Operating Characteristic

The Receiver Operating Characteristic is a graphical tool for determining the a modelsprediction ability.

In a ROC graph, the logistic regression model’s ability to predict negative outcomesfrom yi = 0 is plotted against it’s ability to predict positive outcomes from yi = 1 in atwo-dimensional chart.

In other words [1 − Pr(yi = 0 | yi = 0)] (x-axis) is plotted against [Pr(yi = 1 | yi = 1)](y-axis). This will create a concave plot between the points (0, 0) and (1, 1).

The area between the ROC curve and the x-axis, i.e. the area under the curve is calledthe concordance index, which is an absolute measure of the logistic model’s predictivepower. Naturally, when choosing between models, a model with higher predictive poweris preferable all else equal.

For example a ROC curve that is a straight line between the points (0, 0) and (1, 1) willgive a concordance index of 0.5 meaning that the prediction ability of the logistic modelis no better than guessing randomly.

A concordance index of 0.7 on the other hand indicates that there is a 70% probabilitythat an observation from yi = 1 will have a larger model value than an observation fromyi = 0.[1, p. 228-229]

3.4.3 Likelihood-Ratio Test

The likelihood-ratio test evaluates the results of two models through goodness of fit bycomparing the observed values for each model with the value were expected.

Two models that are of interest to compare would be the initial model (IM) and thefinal model (FM) for our project. The log-likelihood function for the initial model (IM)

25

is compared to the log-likelihood function of the final model (FM) in the followingway

LR = 2 lnL(IM)

L(FM)= 2 [lnL(IM)− lnL(FM)] (17)

L(IM) = Value of the likelihood function of the initial model.

L(FM) = Value of the likelihood function of the final model.

LR is a test-statistic which follows a χ2 distribution with degrees of freedom equal tothe difference in the number of parameters in the models. If the LR statistic is exceedsthe significance level α, i.e. is larger than the upper α percentage of the χ2 distribution,the notion that the final model is better is rejected.[2, p. 431]

3.4.4 Deviance Goodness-of-Fit

The Deviance measure of goodness of fit tests hypotheses on subsets of the model pa-rameters and addresses the dissimilarity between the current model and a saturatedmodel[2, p. 433].

Deviance is defined as

D = 2 ln[L(Saturated Model)

L(Current Model)

]and follows a χ2 distribution with n − p degrees of freedom. Low deviance and largep-values suggest a satisfactory fit for the current model.

Another measure of adequacy is acquired by dividing the deviance by the degrees of free-dom. If this ratio is greater than 1, the current model is not a good fit[2, p. 433].

3.4.5 Residual Analysis

Further measures of model adequacy may be obtained by analyzing the difference ofthe predicted outcome and the observed values for the response variable in a regressionmodel, referred to as Residuals.

ei = yi − yi = yi − niπi i = 1, 2, ..., n (18)

26

For logistic regression models with binary response data, this is regularly done throughanalyzing the Pearson Residuals and the Deviance Residuals[7], which are the mostappropriate for logistic model adequacy checks[2, p. 440]. They are respectively definedby

pi =yi − niπi√niπi(1− πi)

(19)

di = ±

√2yi

[ln( yiniπi

)+ (ni − yi) ln

( ni − yini(1− πi

)](20)

From the equations, it is clear that observations with the response value 1 will have apositive residual and vice versa for the response value 0. Thus, two separate lines willemerge in a scatterplot of the residual size versus predicted probability. If the majority ofPearson residuals falls within the horizontal band ±3 it is usually a sign that the modelis accurate and highly predictive[17]. On the contrary, large residuals might imply thatOutliers are present in the data.

3.4.6 Influence and Leverage Diagnostics

Residual analysis can be further utilized for identifying observations with influence andleverage.

Leverage refers to how far away the explanatory data of an observation is from otherobservations and a high-leverage point forces the regression line towards itself. Aninfluential point is an outlier observation which highly affects the estimated β parametersand the deletion of a highly influential point will result in a notable change. Furthermore,a leverage point is necessarily not an influential point and vice versa.

The leverage for the ith observation is usually measured by the ith diagnoal value in theHat Matrix defined by

H = X(XTX)−1XT hii = xTi (XTX)−1xi

An observation with leverage exceeding 2pn is usually considered a leverage point[2, p.

213]

Influence is often measured through DFBETAS [5][2, p.217], measuring the change in βjparameters with the deletion of observation i

DFBETASj,i =βj − βj(i)S2(i)Cjj

27

An observation with a absolute DFBETAS value exceeding 2/√n is proposed to be

influential[5].

Low levels of leverage and influence is advantageous as it usually strengthens the pre-dictive power of the regression model[8]. Thus, these measures are often a good measureof model adequacy. Moreover, treatment of influential outliers is sometimes treated bydeletion as observations are viewed as non-representative for what the model is attempt-ing to predict. However, excluding data purely from these measures comes with the riskof strengthening a false hypothesis and might improve a faulty model.[2, p. 221]

3.5 Multicollinearity

Multicollinearity is a measure of correlation between explanatory variables used in aregression model. The reason why this is important to analyze is due to the fact thattwo or more variables that have a high correlation are unnecessary for the model. Inthat case one of the two variables are not contributing to the model significantly. It doesnot provide any additional information i.e. everything it explains can be explained withthe other variable instead.

3.5.1 Marginal Estimation

To investigate the presence of multicollinearity of explanatory variables in a regressionmodel, marginal regression models for all explanatory variables can be analyzed. Essen-tially, this means creating one independent regression model per explanatory variable inthe main model and analyze each autonomous response prediction in comparing to thefull model prediction.

A high deviation between the marginal and the regular parameter estimators for a certainvariable, might imply correlation to another variable present in the main model renderingthe estimators less accurate. Moreover, if the confidence intervals for the marginalestimators are considerably narrower compared to the main ones, correlation is likewiseimplied.[23]

28

4 Data

The first part of the project involved managing raw insurance data in order to obtain anfinal data set for analysis. Due to the sheer amount of available data and differing for-mats, preparation in terms of collection, structuring, aggregation, joining and selectionplayed a major part in the analysis of this project as a whole.

4.1 Data Structures and Levels

Information about an insurance policy can be stored in several ways and levels. For anobserved commercial insurance at If P&C, the five main underlying levels are Customer,Policy, Exposure, Product and Product Module as presented below in Figure 1. Differentlevels may be appropriate to analyze at different times, depending on what is to beobtained from the data.

Observations in a data set on a certain level may be aggregated up through summing,increasing the data level and reducing the number of observations. For instance, the com-bined sum of all Policy premiums for a company will yield the corresponding premiumpaid by that Customer.

Customer

Policy

Exposure

Product

Product Module

Company A

Insurance B

Property B

Property Standard

LiabilityCrisis

Insurance A

Car A

Half Coverage

Other Vehicle DmgGlass Dmg

TPL

Injury

Property A

Property Standard

LiabilityCrisis

Figure 1: Insurance Levels

29

4.1.1 Desired structure

An insurance policy may be renewed between subscription periods3, essentially meaningthat the Policy and its respective exposures remain the same but valid for a later date.This can be seen as a new iteration of what really is the same insurance policy with thesimilar terms.

Since renewal is to be predicted with respect to customer specific parameters, the de-sired data structure for the project was thus concluded to be Policy Period level, withone observation per subscription period of each policy. This allows information aboutrenewal to be obtained while also analyzing the corresponding policyholder specific, orCustomer specific, characteristics that might predict renewal.

Table 1: Example data set aggregated to Policy Period level.

Customer Policy Period Char 1 Char 2

C1 P1 Per1 13 100

C1 P1 Per2 13 100

C1 P1 Per3 7 100

C1 P2 Per1 6 100

C1 P2 Per2 21 100

C2 P3 Per1 7 350

C2 P3 Per2 8 350

Char 1 is an arbitrary characteristic specific to each Policy Period.

Char 2, on the other hand, is specific to each Customer ans is therefore repeated for eachappearance for the same customer in the data. It might for example represent numberof employees.

3Time when the insurance policy is active, normally spanning 1 year

30

4.2 Data Collection

Raw data was retrieved from a primary and a secondary source in If P&C’s data ware-houses containing insurance policy information and customer Characteristics. Charac-teristics and other insurance attributes were represented in a separate column with onerow per observation, determined by the data level.

4.3 Data Processing

To acquire a data set on the desired Policy Period level, the raw data was heavilyprocessed and reformatted though filtration, aggregation and merging. This procedurewas divided into two separate parts, essentially preparing the data from each source andthen performing a mutual merge.

4.3.1 Primary Data

The primary data was aggregated up to Policy Period level and an initial filtration ofcharacteristics was performed. This included both a qualitative analysis of the expectedpredictability of renewal and a quantitative analysis where a certain degree of missingdata entries were not to be exceeded.

A binary characteristic representing if the observed insurance policy period was renewedor not was also introduced through a series of operations. In brief, this included removingall active policies and for the remaining determining if the policyholder deliberately hadrenewed or canceled the insurance subscription for the offered price increase. Wherethis could not be ensured, observations were excluded from the data. For instance, acancellation occurring before receiving a new offer, more than than 90 days before thematurity of the policy, would suggest that the customer has cancelled the policy for areason other than not being satisfied with the offered premium.

4.3.2 Secondary Data

The secondary data was then merged on using a common identifier in the sets. This datawas structured on Customer Level, with one observation per customer for a given timestamp. These observation were not necessarily identical and only the latest availableinformation about the customer when renewal did or did not occur was merged on toeach policy in the primary data. This assured that only correct information regarding thecustomers was joined to reach a realistic picture of the customer characteristics which,naturally, may change over time.

Observations to which there were no matches with the secondary data were chosen notto be excluded from the combined data set, but were given a missing indicator.

31

This process yielded the final data set with a total of 176320 observed insurance policyperiods.

4.3.3 Characteristic Selection

With the completed combined data set, another iteration of qualitative and quantitativecharacteristic selection was carried out along with a filtration of time stamps and otherintermediate data used purely for data merging.

Ultimately, 20 final characteristics with sufficient data, low mutual correlation and alikely predictability of renewal were selected, throughout this report referred to as Char-acteristics 1 - 20.

4.4 Characteristic Grouping and Explanatory Variables

The final characteristics were accordingly divided into groups, where each group wereto represent an explanatory variable in the logistic regression model. Since an observedpolicy period will in every case belong to one group per characteristic, the explanatoryvariables will act as Dummy Variables.

Categorization was essentially performed by dividing each characteristic into discretegroups where the observed insurance policyholders were believed to act similar withrespect to renewal. An initial grouping process was carried out where data for eachcharacteristic was first divided into larger groups such as; Small, Medium, Large andMega Companies for instance. Subgroups for each of these were then created in whichthe intervals were based both on the distribution of the data as well as an intuitiveapproach. In order to consider the data reliable, the minimum number of data points ineach group was set to 500 observations.

In cases where it was difficult to group the data based on intuition, the tails of thedistribution were initially assigned their own groups and the remaining data was dividedin equidistant intervals. In general, the non-binary characteristics seemed to belong toone of two main probability distributions; negative exponential and normal distribu-tion.

It was decided to not exclude potential outliers from the data but to instead assignextreme groups, denoted with X, to filter and treat tail values. Moreover, a groupdenoted Z was also introduced for characteristics where missing values could occur.

32

Table 2: Characteristic Grouping Examples

Characteristic Explanatory Variable Group

Char 1 xi,1 Axi,2 Bxi,3 Cxi,4 Dxi,5 Exi,6 Fxi,7 Gxi,8 Hxi,9 Ixi,10 JRef. X

Char 13 xi,54 Axi,55 Bxi,56 Cxi,57 Dxi,58 Exi,59 Fxi,60 Gxi,61 Hxi,62 Ixi,63 Jxi,64 Kxi,65 X1xi,66 X2Ref. Z

4.5 Response Variable

As renewal probability was to be predicted, the response variable for the regression modelwas selected as the logistic binary characteristic representing the deliberate renewaldecision for each observed insurance policy period.

yi =

{1 Insurance policy was renewed

0 Insurance policy was not renewed

33

5 Model Development

With a complete data set at hand and an initial selection and categorization of char-acteristics, a final model was developed through a number of steps presented in thissection.

The main development stages were

• Initial Model

• Regrouped Model

• Reduced Models −→ Final Model

Furthermore, for each characteristic and their respective grouping configuration, a Marginalmodel was built parallel to all models in the development stages.

5.1 Initial Model

An initial Logistic Generalized Linear Model was constructed as a full model containingthe complete set of Characteristics 1 - 20.

As displayed in Table 2 on page 33, each characteristic group, except for one referencegroup per characteristic, act as one unique explanatory variable in the GLM resultingin a total of 155 explanatory variables for the initial model. These take on the value 0or 1 for each observed policy depending on which group the observed insurance policyperiod belongs to. Fundamentally, this results in a maximum of 20 explanatory variablesreceiving the value 1 whereas the rest remain 0.

ln( yi

1− yi)

=

156∑j=0

xijβj (21)

Modelling was performed in SAS through the LOGISTIC procedure which througha Maximum Likelihood approach determined the β-parameters for each variable.[12]More specifically, Fisher’s Scoring Alhorithm was carried out for numerically solving theMaximum Likelihood Equation[6].

The complete set of odds ratio plots of the Initial Model are located in the appendix.

5.2 Analyzing the Initial Model

The initial model was then analyzed with respect to adequacy and performance, to laythe foundation for improvement towards the final model. A number of different metricswere looked at, as presented in following paragraphs.

34

5.2.1 β Odds Ratios

β-estimates and their respective confidence intervals[13], along with corresponding marginalestimate, were analyzed to draw conclusions about the Initial Model.

The β-estimates were mainly analyzed as Odds Ratios since the odds configuration givesa multiplicative relation between estimates, making measuring comparisons and trendsmore intuitive. For instance, belonging to a group with an estimate of 1.2 the sizecompared to another would imply a 20% larger odds for renewal.

Figure 2: Odds estimates for Initial Characteristic 1

35

Figure 3: Odds estimates for Initial Characteristic 18

In Figure 2, presenting odds ratios for Characteristic 1, a positive trend while advanc-ing through the groups is clear along with narrow confidence intervals, which suggestsa significant difference in renewal probability between the groups. The odds marginalestimates are also close to the regular odds estimates, implying that the characteristicis not highly correlated to other characteristics and thus does not contribute to multi-collinearity. On the contrary, the trend for Characteristic 18, presented in Figure 3, isflat which implies a low significance as no difference in renewal probability is apparentbetween the groups. Confidence intervals are also less narrow and marginal estimatedoes not follow the estimate as closely, all suggesting that this Characteristic is lessuseful.

In order to increase significance and reduce the number of tariff cells in the initial model,characteristics with deviance or absence of trends were regrouped through relocation ofgroup dividers and merging of adjacent groups with almost identical estimates. Fur-thermore, Characteristics 14, 15 and 16 seemed all very useful and interesting froman intuitive standpoint but did not show any significance in isolation. Given this dis-crepancy, these characteristics were chosen to be kept as ratios by dividing them withcharacteristic 12.

This procedure of regrouping characteristics was performed before any conclusions weredrawn regarding characteristic correlations and significance as the latter might very wellbe subject to change when regrouping has been carried out.

36

Regrouping was carried out in iterations until a satisfactory characteristic grouping wasachieved, yielding the Regrouped Model.

5.3 Regrouped Model

The regrouped model was, like the initial model, built upon Characteristics 1 - 20 butwith an updated grouping. As a result of group merging, the number of explanatoryvariables in the regrouped model dropped to 124.

Naturally, the regrouped model showed more stable trends in the β-estimates, loweredcomplexity and a similar performance level. The complete set of odds ratio plots of theRegrouped Model are found in the appendix.

With the Regrouped model at hand, the next step for reaching the final model concernedcharacteristic selection which was carried out through two parallel methods.

1. Analyzing the Regrouped Model yielding the Intuitively Reduced Model

2. Performing a stepwise regression in SAS yielding the Stepwise Reduced Model

5.4 Analyzing the Regrouped Model

The intuitive variable selection of the Regrouped Model was performed with three mainqualities in mind and though a procedure described in this section.

• Clarity of trends in estimates

• Statistical Significance

• Low correlation

5.4.1 β Odds Ratios

First, Odds Ratio plots of the Regrouped Model similar to Figures 2 and 3 on page36 were analyzed and tight confidence intervals and a clear trend gave an indicationof the considered characteristic being significant in predicting renewal. In contrary,a characteristic showing a flat trend would have little or no impact on the renewalestimation and was generally therefore considered insignificant[23].

5.4.2 Multicollinearity

Furthermore, a characteristic showing large confidence intervals in the β-estimate whensimultaneously having tight confidence intervals for the respective marginal estimateswould reasonably suggest a high correlation with another characteristic[23]. This occurs

37

since what was previously explained in an isolated variable is partly explained elsewhereas the clarity in the odds ratio trend fades when additional variables are introduced.Characteristics with this property, and characteristics showing high deviation in marginalestimate from the regular estimate were consequently further discussed and tested formulticollinearity, ultimately resulting in exclusion from the model when appropriate.Characteristics which from the marginal estimates were shown to be correlated werefurther analyzed in the correlation matrix of the model, where high correlation betweenthe variables of the two characteristics supported the exclusion decision.

5.4.3 Statistical Hypothesis testing

Secondly, χ2 hypothesis testing was carried out in SAS for determining characteristicsignificance. Essentially, testing that the subset of β associated with each characteristicis equal to the zero vector[14].

Table 3: Null Hypothesis β = 0 for each characteristic subset

Effect df Wald χ2 Pr > χ2

Char 1 10 82.0122 <.0.001Char 2 16 181.1345 <.0.001Char 3 6 35.9207 <.0.001Char 4 4 292.2338 <.0.001Char 5 7 84.6334 <.0.001Char 6 5 30.8132 <.0.001Char 7 10 42.6909 <.0.001Char 8 1 0.0162 0.8986Char 9 1 9.2687 0.0023Char 10 1 2.9696 0.0848Char 11 1 0.2743 0.6005Char 12 9 26.4175 0.0017Char 13 7 54.2683 <.0.001Char 14 7 6.000 0.5398Char 15 4 9.8624 0.0428Char 16 6 1.7554 0.9408Char 17 2 10.8384 0.0044Char 18 3 6.6472 0.0840Char 19 12 44.4454 <.0.001Char 20 6 10.5083 0.1048

df = Degrees of Freedom α = 0.05

38

5.5 Intuitive Reduced Model

Based on the results of the estimate confidence intervals and hypothesis testing, thefirst action was to discard the binary variables which either showed flat trends, wideconfidence intervals and highly clustered data in one of the groups. The second actionwas to discard variables which lacked a clear and explainable trend, a purely intuitivediscussion which was carried out together with the pricing analysts at If P&C. Theseactions meant the exclusion of Characteristics 8-11 and 17-20.

The final action was to consider statistical significance and correlations for the remain-ing candidate characteristics. The characteristics 14,15,16 which were all divided bycharacteristic 12 now showed high mutual correlation for their respective explanatoryvariables as well as deviating marginal estimates. As a result, only characteristic 14 waskept.

With these actions, 9 characteristics remained, all with clear trends, narrow confidenceintervals, low mutual correlations and all with a significance < .001, together composingthe Intuitively Reduced Model.

5.6 Stepwise Reduced Model

Additionally, from the Regrouped Model, a stepwise selection of characteristics was per-formed in SAS, a computing intensive process which in iterations adds and discardscharacteristics resulting in a best model based on performance, correlation and andcomplexity[16].

The procedure resulted in a best model consisting of 13 characteristics. The completeresults and steps of the Stepwise Reduced Model are located in the appendix.

5.7 Final Model Selection

Selection of a final model for predicting the renewal Probability was ultimately carriedout through weighing both reduced models against each other.

The Stepwise Reduced Model with 13 characteristics, consisted of the complete set of the9 characteristics in the Intuitively Reduced Model plus an additional 4. Hence, these 4were analyzed one final time, having the same qualities as prior in mind.

All 4 excess characteristics were statistically significant to a degree equivalent to the9 mutual, while also lacking notable correlations with other characteristics in the In-tuitively Reduced Model. However, a common property for all 4 was the absence of ajustified and clear trend in the estimates and hence a questionable applicability in arealistic model to be used in the industry.

39

Thus, the Intuitively Reduced Model was ultimately decided to be the Final Model,strengthened by a similar selection of characteristics by model optimization proceduresin SAS and a final intuitive overview with a realistic implementation perspective inmind.

40

6 Results

6.1 Final Model

The final model was composed by Characteristics 1-7,13 and 14 as concluded in theModel Development section. All odds ratio plots along with a table of the complete setof explanatory variables can be found in the Appendix.

6.1.1 Statistical Hypothesis Testing

Several statistical hypothesis tests were performed on the Final Model to test goodnessof fit on both characteristic and global significance level with results presented in thefollowing tables.

Table 4: Null Hypothesis β = 0 foreach characteristic subset

Effect df Value Pr > χ2

Char 1 10 88.9829 <0.001Char 2 16 178.6172 <0.001Char 3 6 36.3679 <0.001Char 4 4 321.4928 <0.001Char 5 7 76.3229 <0.001Char 6 5 28.2064 <0.001Char 7 10 44.6222 <0.001Char 13 8 108.9709 <0.001Char 14 5 44.0605 <0.001

Table 5: Global Null Hypothesis β = 0 forentire characteristic set

Test Value df Pr > χ2

Likeihood Ratio 1226.8046 89 <.0.001Score 1309.5172 89 <.0.001Wald 1268.4367 89 <.0.001


41

Table 6: Pearson and Deviance Goodness-of-Fit Hypothesis Tests

Value df Value/df Pr > χ2

Deviance 71219 17 · 104 0.4134 1.000Pearson 171328.059 17 · 104 0.9944 0.9492


Table 7: Partitions for the Hosmer-Lemeshow Goodness-of-Fit Tests

Event Non-Event

Group Total Observed Expected Observed Expected

1 17237 15654 15561.14 1584 1675.862 17240 15958 15985.51 1282 1254.493 17236 16059 16142.60 1177 1093.404 17268 16230 16284.69 1038 983.315 17241 16350 16343.14 891 897.866 17237 16377 16408.75 860 828.257 17263 16515 16496.69 748 766.318 17234 16548 16531.66 686 702.349 17236 16659 16607.71 577 628.2910 171168 16705 16660.80 463 507.20


Table 8: Hosmer-Lemeshow Goodness-of-Fit Tests Results

Value df Pr > χ2

26.7812 8 0.0008


42

6.1.2 Receiver Operating Characteristic

Figure 4: ROC Curve for Final Model

43

6.1.3 Residuals

Figure 5: Scatterplot of Pearson Residuals

Figure 6: Scatterplot of Deviance Residuals

44

6.1.4 Leverage

Figure 7: Scatterplot of Leverage

Figure 8: Distribution of Leverage

45

6.1.5 Influence

Figure 9: Scatterplot of DFBETAS for intercept

Figure 10: Distribution of DFBETAS for intercept

46

6.2 Model Comparisons

Table 9: Model validation comparisons between the different models

Initial Model Regrouped Model Final Model

Observations 172360 172360 172360Explanatory Variables 155 124 71AUC 0.612072 0.611071 0.605842Log Likelihood -35537 -35545 -35609Pearson statistic 170723 170963 171328Deviance statistic 71075 71090 71220AIC 71373 71328 71363BIC 72871 72525 72087

Table 10: Model comparisons deltas with Initial Modelas reference

Initial Model Regrouped Model Final Model

∆Observations - 0 0∆Explanatory Variables - -31 -84∆AUC - -0.001001 -0.006230∆Log Likelihood - -8 -72∆Pearson statistic - 240 605∆Deviance statistic - 15 145∆AIC - -45 -10∆BIC - -346 -784

47

7 Discussion

7.1 Final Model Validation & Adequacy

7.1.1 Predictive Power

The predictive power of the Final Model given by the concordance index from the ROC-curve is 60.58%. This indicates that the model to some extent lacks the ability toaccurately predict renewal. Besides the fact that the explaining characteristics do notaccount for the entire truth behind the decision of renewal, the predictive power mightalso have been directly affected by the fact that the distribution of the response variablewas roughly 90% events and 10% non-events resulting in a lowered accuracy.

7.1.2 Statistical Hypothesis Tests Analysis

The hypothesis tests show that the individual characteristics and the model as a whole issignificant to a high degree. Furthermore, the Deviance and Pearson tests also indicatehigh p-values suggesting a satisfactory goodness of fit. Furthermore, the statistic valuesdivided by the degrees of freedom in both tests do not exceed unity implying there is noreason to doubt the fit adequacy[2, p. 432]

On the contrary, the results of Hosmer-Lemeshow test show a p-value of around 1% whichdoes not exceed the significance level of 5% giving reason to doubt the fit. However,this test has been shown to be somewhat unreliable and is by many considered to haveserious problems[10].

In conclusion, the statistical tests has shown a good results for the Final Model strength-ening its goodness of fit and overall adequacy.

7.1.3 Residuals, Leverage and Influence

In table 11 below, the Pearson Residuals, Hat Values, DFBETAS are compared to thecutoffs derived in the Residual Analysis section.

Table 11: Residual, Leverage and Influence in the Final Model

Pearson Residuals Hat Values DFBETAS

Cutoff - 2p/n 2/√n

Cutoff Value ±3 0.00084 0.00482Obs. Outisde Cutoff 8788 14115 3169Ratio 5.0986% 8.1893% 1.8386%

n = 172360 p = 72

48

Roughly 5% of the residuals were outside the threshold size of ±3 which is a rela-tively small proportion. Furthermore, as displayed in Figure 5, all residuals outside thisthreshold were negative which suggests that the model less accurately predicts a non-event compared to an event. This makes sense as only around 10% of observed insurancepolicies in the data were not renewed. For these observations, the model had a slightlyhigher predicted probability of renewal compared to what was observed. Overall, theresiduals are adequate and implies a good fit.

Around 8% of observations were classified as leverage points which is slightly larger thanthe amount of outliers. However, only around 2% of observations could be consideredinfluential which is almost negligible. In addition, the amount of positive and negativeDFBETAS were roughly the same.

In conclusion, these cutoffs and measures had to be considered with a degree of skepti-cism, but were nevertheless rough indicators of the overall quality of the fit.

The underlying data was ultimately considered reliable and influential observations wereunder the model development and analysis concluded to not be discarded to avoid biasedand favored results.

”As a general rule, if there is an error in recording a measured value or if thesample point is indeed invalid or not part of the population that was intendedto be sampled, then discarding the observation is appropriate. However, ifanalysis reveals that an influential point is a valid observation, then there isno justification for its removal.”[2, p. 221]

7.2 Comparing the Models

Tables 9 and 10 on page 47 show how a variety of model adequacy measures differbetween the three main model development stages. A decreasing trend in complexityis clear through the BIC measure which was expected as the models were graduallyreduced. This may be further linked to the number of tariff cells for the models, i.e.the number of different available combinations of characteristics in which an arbitrarycustomer will belong to. By discarding a characteristic, the number of tariff cells willdecrease with a factor of the amount of groups resulting in a higher significance for eachcell as the probability of two customers belonging to the same cell increases. This isbeneficial in many ways and makes it generally easier to draw conclusions about specificcustomers.

Moreover, the measures for goodness of fit and predictive power slightly decrease asthe model is reduced but not to a degree which renders the final model inadequate ormarkedly inferior, as deduced in the Model Development section.

49

7.3 Analyzing the Characteristics

In this section, the impact on renewal probability for two characteristics are more closelyanalyzed. A discussion is presented on what the groups in each characteristic representsan why they have different odds for renewal. It is also discussed whether it is possibleto see a trend among the groups and why the trends look as they do. In a last step it isalso analyzed if the trends are logical and intuitive.

7.3.1 Characteristic 4 - Sales Channel

Characteristic 4 represents the Sales Channel through which the policy was sold to thecustomer. The groups of this characteristic can be divided into two sub-categories; wherethe customer have little or not interaction with If P&C and where the customer interactswith If P&Crelatively frequently. A is the group with least interaction and D the groupwith the most interaction. As presented in 11, the odds for renewal for the sales channelgroups correlate strongly with how much interaction the customer has with If.

Figure 11: Odds Ratios for Characteristic 4

The β parameters and odds ratios for Sales Channel resemble well with our logicalintuition of how the characteristic would perform. An insurance customer which interactsfrequently with If P&C receives customer service to a higher degree which reasonablycorrelates with their customer satisfaction as a whole. On the contrary, a customerwith little to no interaction with If P&C might feel less attached and will thus be moreinclined to try out other insurance companies on the market.

50

Ultimately, this will translate into a higher renewal probability for the more satisfied,more approached, customer.

7.3.2 Characteristic 7 - Price Increase

Characteristic 7 represents the size of the offered Price Increase for a policy betweenthe matured subscription period to the following. A is the group that has been offeredthe smallest change in premium and J has been offered the largest. The initial andintuitive speculation for the price increase odds ratios was that the trend would bestrictly negative, where groups that represent a larger premium increase would have alower probability of renewal.

Figure 12: Odds Ratios for Characteristic 7

The results strengthened this speculation as the odds ratio trend is relatively flat fromgroup A to F, and then decreases strictly. However, on closer inspection there is anapparent increases in odds from group D to F. At first, this seems to be very illogicalbut with deeper analysis of the results, an explanation was found.

Policyholders in group F are likely the ones that probably had a claim and hence accepta higher policy premium for the next period. In other words, they have gotten value fortheir insurance and are happy to accept a higher premium for the next period. However,policyholders in group D might just have been given a large increase, without necessarilyhaving used their insurance. Therefore, it makes logical sense that they are less likelyto renew their policy from a customer satisfaction standpoint.

51

For groups beyond F the intuition was correct as a declining trend in the odds ratios isclear where a larger price increase translates into a lower probability of renewal.

The rest of the characteristics were interpreted in a similar way. An intuitive speculationof how the characteristic groups would differ was initially carried out which was thencompared to the quantitative results. This was done in order to more deeply understandwhat the driving force behind a characteristic was and that there was causality behindits results, not only correlation. The rest of the characteristics can unfortunately not beopenly presented and analyzed in the report due to confidentiality.

7.4 Sources of Error and Uncertainty

The possibly largest source of error for the model is the large amount of missing valuesfor many characteristics. After discussions with the pricing analyst at If P&C, thesevalues were chosen to be included in the model and not sorted out. The main reasonwas that the model still should be able to predict the probability of renewal even if acustomer misses values for one or more of the characteristic. If an observation in thedata has a missing value in one characteristic, valuable information might still be storedin other characteristics with non-empty values.

On the contrary, the downside of including these values is that they could have a nega-tive affect on performance of the logistic regression model since GLM takes correlationbetween the groups for each characteristic into account. Better results and a higher pre-dictive power for the model might be found if missing values were to be sorted out. Thisis something that analysts who further develop on this model should look into.

Another source of uncertainty in the project as a whole has been the degree of sub-jectivity and the occasional lack of given best practices. This involves the quality ofcharacteristics grouping for instance, where a different allocation might very well havebeen superior. There is also a level of subjectivity in the interpretation of the results;sometimes the pure mathematical significance and performance of the regression modeldo not provide the entire truth and different measures must be weighted against eachother. In a project such as this where one purpose was to find trends in the explanatoryvariables, significance of explanatory variables might for instance be more interestingthan pure predictive power as all models have some degree of uncertainty and do notaccount for the entire truth behind what is to be predicted.

52

8 Price Sensitivity Analysis

In this section it is discussed how the results of the statistical analysis can be implementedand what kind of impact that could have. This is not only viewed from the perspectiveof If P&C, but also from the perspective their customers and the insurance industry, aswell as from the broader perspective of the society.

8.1 Relation Between Renewal Probability and Price Sensitivity

What has been analyzed and predicted with the logistic regression model is the renewalprobability of a policy explained with a customer’s specific characteristics. This initself is very valuable knowledge. Although, making the translation from a customer’sprobability of renewal into price sensitivity will facilitate the implementation of themodel.

In order to do this translation, non price sensitive and price sensitive customers mustfirst be defined. In brief, customers who are likely to renew their policies no matterthe size of the price increase are viewed as non price sensitive. The renewal probabilityfor these customers have little change when shifting the price increase. Accordingly,customers whose renewal probability quickly drops when the price increase increases areviewed as price sensitive customers. Refer to equation 1 presented on page 9

Price Sensitivity :=∆ Renewal Probability

∆ Price Increase(1)

Figure 13: Price Sensitivity Slopes

53

To visulaize this, a scatter plot of price increase against the probability of renewal fora specific Customer Group is presented in Figure 13 above. A customer group is ahomogeneous group, where members have the same set of characteristics but have beenoffered different premium increases. The slope of the fitted line represents the pricesensitivity of each customer group. The steeper the slope, the more price sensitive thegroup is. Price sensitivity is in other words analyzed by looking at how a change inpremium affects a customer groups probability of renewal.

In Figure 13 it is clear that the customers in the orange category are more price sensitivethan the customers in the blue category by observing the difference in slopes of each fit.However, even though there are clear differences between customer groups, it should bepointed out that this relationship does not necessarily have to be linear but that thegradient most likely is increasing as the price increase grows. A more detailed methodfor deriving the price sensitivity measure would for instance be to assume a piecewiselinear relationship and derive a gradient for every point in each customer group. Thiswould yield a price sensitivity measure for every possible customer.

8.2 Implementation

As suggested in the Project Formulation and the Insurance Pricing sections, a goal forthis project was to add a new coefficient to the adjustment term in If P&C’s pricingmodels that would represent the customer price sensitivity.

This can be achieved by using the price sensitivity measure to classify customers intodifferent price sensitivity groups such as Price Sensitive, Neutral and Not Price Sensitive.Each group would the be assigned a coefficient value that would be part of the adjustmentterm. However, exactly how the adjustment coefficient will be derived and how theclassification will be carried out is unfortunately confidential.

Furthermore, the price sensitivity classification can be employed for more than justpricing adjustments. For instance, the classification might also be used to identify pricesensitive customer groups where improved customer relationship management, CRM,could be useful in order to improve customer satisfaction and in turn increase theirrenewal rate.

In summary, a more detailed implementation method is therefore to create an interactivemodel where it is possible to generate price sensitivity measures and classifications fordifferent sets of characteristics. These measures may then be compared and analysis overhow price sensitivity changes among customer groups can be performed. This would bea very useful CRM tool.

54

8.2.1 Implementation Areas

The knowledge about a customer’s price sensitivity could be used in a variety of scenariosand in several parts of If P&C as an organization. The scenarios and parts of theorganization where the applicability is seemingly most clear include the following.

Renewal: Using price sensitivity analysis for predicting renewal is the area which hasbeen primarily analyzed in this project. The usefulness of implementing this type ofanalysis for renewal has been the focus throughout the report.

New Customers: If the characteristics in the model require no prior knowledge aboutthe customer, this type of analysis may also be used when selling new insurance policiesto new customers. The only requirement is that the characteristics data for a newcustomer is easily accessible, reliable and up to date.

Other Markets & Business Areas: This project has been focused on the Swedishcommercial insurance market but there is no apparent reason why this type of analysisshould not be applied in the other Nordic countries where If P&C has operations aswell. With smaller modifications, If P&C could also implement our findings in theirother divisions, for example the private insurance division. This is something that IfP&C’s analysts for different divisions and markets should look in to.

8.3 Economic Impact of Price Sensitivity Analysis

The impact of the project findings are analyzed from an economic perspective in thissection. This is important because even if the model has mathematical and statisticalsignificance, it will be more or less useless if it does not have any significance or impacton an economic level for either If P&C, their customers or society.

There are connections between this project and research into price elasticity of demand.Price elasticity of demand, Ed, is an microeconomic measure that describes how a the de-mand for a good or service changes with a change in its price, all other things equal.

Ed =P

Qd× dQd

dP(22)

Ed is the point-price elasticity of demand, P is the price and Qd is the quantity de-manded.

Although this is similar to what has been analyzed microeconomic theory is unfortu-nately not widely applicable in insurance pricing. Insurance is one of the few marketswhere two customers can be offered two different prices for the exact same product. Thisis mainly due to the fact that the price for a policy is fundamentally not derived fromthe laws of supply and demand as there are many aspects of uncertainty when pricing

55

an insurance policy. The premium is instead calculated from modelling the risk as wasdescribed in the Insurance Pricing section.

8.3.1 If P&C Insurance

The goal of this project has been to optimize and improve the current pricing models atIf P&C by including policyholder price sensitive as a factor. This is an important areato look into since increased policy renewal both is a sign of customer satisfaction and apotential source of profit for If P&C. As was discussed in the previous section about theimplementation of this project, there are two main areas where price sensitivity analysiscould improve If P&C business.

• Better Pricing Models

• Better Customer Relationship Management

Being able to model and use the knowledge of the price sensitivity of their customerswould hence give If P&C a substantial competitive edge compared to other Nordicinsurance companies. More refined and individualized pricing along with more satisfiedcustomers will likely cause the renewal rate to improve and accordingly increase thecompany’s gross written premiums.

Furthermore, staying competitive and finding a competitive advantage is good for thelong term corporate survival of If P&C. This also allows for making investments inorder to improve the current business. This might for example be investments intodigital transformation and information technology, improved working environment andergonomics as well as improving their attractiveness as an employer in order attract thebest talent.

Naturally, if one company would become much more competitive than the rest of theindustry this would force the the other companies to adapt and evolve as well. Thisis a result of a free, open market where competition among the market participantsultimately drive improvement and development.

8.3.2 Corporates

Given that price sensitivity analysis improves pricing to the extent that it increases therenewal rate, If P&C will be able to lower prices for all their customers. This is supportedby the fact that large part of the administrative expenses are incurred during the time ofsale, at the beginning of a customer life-cycle. The time of the sale is when If P&C willhave a lot of contact with the customer which is costly. Hence if the customer life-cycleis longer, it will lower the average yearly administration cost per customer. This in turn,will make room for generally lower premiums.

56

It is clear that this is positive for If P&C’s corporate customers. They will lower theirexpenses and improve their profitability as a result of more affordable insurance premi-ums.

8.3.3 Society

To understand what impact this project may have on a macroeconomic level, it is im-portant to understand the current role of the insurance industry.

Being able to hedge for and mitigate risk is essential for companies in order to actuallytake risks in the first place. Without risk taking there would be very little large scaleinvestments. For example without property insurance there would be no investmentsin large factories or new office spaces, without liability insurance consultancies wouldnot be given the trust to do their job and without marine insurance global trade wouldstagnate. Therefore access to insurance plays an essential part of the economy whichencourages corporate investments and risk taking.

Having an insurance can also free up capital for a lot of companies. Instead of hoardingcapital that is put aside to cover an eventual damage, the company can buy an insurancepolicy and only put aside the deductible. This will free up capital for the company thatcan be used for investments.

Price sensitivity analysis in general will help make insurance cheaper in general, aswas argued in the previous section. Cheaper insurance and more individualized pricingwill help make insurance more accessible for many companies. More companies mighttherefore get insurance and other companies will insure more of their business. This willalso help to free up capital across the economy and will translate into more corporateinvestments.

All of this is very favorable on a macroeconomic level and creates a good upwardsspiral. Investments create jobs, jobs lower unemployment, low unemployment increasesconsumer confidence which in turn increases consumption and this continues on and onand on.

However, it important to remember that all of this will take time and that it currently isunclear if and how much prices can be lowered on average. The scenario and dynamicsthat are described above are the absolute best outcome of successful price sensitivityanalysis. There are also some fundamental concerns with price sensitivity analysis whichare described in the section below.

8.4 Ethics of Price Sensitivity Analysis

Many fear that big data and price sensitivity analysis in the insurance industry mightcause price discrimination of specific groups. The fear is rooted in that the insurance

57

companies have a clear information arbitrage over their customers and are able to selec-tively raise prices for certain customer groups with certain characteristics in an unfairor discriminatory way.

The Swedish insurance industry is already regulated with regards to price discrimination.It is for example since December of 2012 illegal to use gender as an explanatory variablewhen pricing policies. A man and a woman, all other things equal, must be given thesame premium for the same policy by law. It could however be debated if using genderjust improves the risk assessment in the pricing models which makes it more fair or if itis discriminatory toward either gender.

Where do we draw the line between characteristics that should be protected from possiblediscrimination and characteristics that are okay to use in risk assessment? This is a veryhard question and we have no definitive answer. However, we disagree with the fearand belief that price sensitivity is an important characteristic to protect from possiblediscrimination. Price sensitivity has no shape, color or origin. Instead if done correctly,we view price sensitivity analysis as a natural step in improving the insurance premiumpricing models. The possibility of discrimination is outweighed by the benefits of thistype of analysis which include that a more loyal customer base will make it overallcheaper for all customers as was previously argued.

58

9 Conclusions

From the results and discussion of our project we can conclude that it is, to some extent,possible to predict the price sensitivity of non-life insurance policyholders. However, wealso recognize that there are other factors that lie behind the decision to renew or notrenew an insurance policy that are outside the spectrum of this analysis. Hence, pricesensitivity can not entirely be predicted from a customers specific characteristics. Theperformance of our model is a reflection of this.

Regarding the the implementation of the findings, it can be concluded that a translationfrom renewal probability to customer price sensitivity is possible. From this, a classi-fication system that distinguishes the level of price sensitivity of different customers isderived. This classification system can be implement in both pricing models and CRMsystems.

However, it is not clear whether the predictive power of our model is sufficient to beimplemented in the pricing models straight away. To get clarity in this more discussionneeds to be had with the actuaries and pricing analysts at If P&C. It might also be inter-esting to compare the predictive power of our regression model to similar GLM modelsthat are already employed in order to benchmark the predictive performance.

Finally, our statistical analysis has shown that there are in fact several customer char-acteristics that with significance can be used to predict the renewal probability of acustomer. This fundamentally validates the use of price sensitivity analysis in insurancepricing and our project should be viewed as a foundation for more research into this. Webelieve that price sensitivity analysis, if done correctly, is a natural step in improvingthe pricing models in the insurance industry and we encourage further research into thisarea.

59

10 Recommendations

For an analyst building on our project, a natural extension would be to look at morecharacteristics, outside of our original 20, to analyze if there are any other characteristicsthat can be used to predict price sensitivity. If any are found, these could be added to ourcurrent model in order to increase it’s performance. Our model is a good foundation forfurther analysis and as more and more data becomes available over time and adjustmentsare made to the model, we believe predictive power will increase accordingly.

The analyst should also look into the main source of error in this project; the largeportion of the data containing missing values. Better results and a higher predictivepower for the model might be found if the missing values were to be sorted out.

Finally, there is no doubt in our minds that price sensitivity will be implemented as acomplement to risk correct pricing in the future.

60

References

[1] A. Agresti.Categorical Data Analysis, 2nd edition.John Wiley & Sons, Inc., Hoboken, New Jersey (2002)ISBN: 0-471-36093-7

[2] D. Montgomery, E. Peck, G. Vining, Wiley.Introduction to Linear Regression Analysis, 5th edition.John Wiley & Sons, Inc., Hoboken, New Jersey (2012)ISBN: 978-0470-54281-1

[3] D. R. Cox.The Regression Analysis of Binary SequencesJ Roy Stat Soc B. Vol. 20, No. 2 (1958), pp. 215-242JSTOR: 2983890http://www.jstor.org/stable/2983890

[4] E. Ohlsson, B. Johansson.Non-Life Insurance Pricing with Generalized Linear ModelsSpringer-Verlag, Berlin Heidelberg (2010)ISBN: 978-3-642-10790-0

[5] D. A. Belsley, E. Kuh, R. E. WelschRegression Diagnostics: Identifying Influential Data and Sources of CollinearityJohn Wiley & Sons, Inc. (1980)ISBN: 978-0-471-05856-4

[6] R. I. Jennrich, P. F. Sampson.Newton-Raphson and related algorithms for maximum likelihood variance componentestimation.Technometrics, p. 11-17 (1976)

[7] G. RodriguezGeneralized Linear Models, 3.8 Regression Diagnostics for Binary Datahttp://data.princeton.edu/wws509/notes/c3s8.html

[8] S.K. Sarkar, H. Midi, S. RanaDetection of Outliers and Influential Observations in Binary Logistic Regression:An Empirical StudyJournal of Applied Sciences 11 (1): pp. 26-35 (2011)ISSN 1812-5654 / DOI: 10.3923/jas.2011.26.35

[9] R. L. Wasserstein, N. A. Lazar. The ASA’s Statement on p-Values:Context, Process, and PurposeThe American Statistician 70 (2) pp. 129-133 (2016)DOI: 10.1080/00031305.2016.1154108

61

[10] P. D. AllisonMeasures of Fit for Logistic RegressionSAS Global Forum, Paper 1485-2014 https://statisticalhorizons.com/

wp-content/uploads/GOFForLogisticRegression-Paper.pdf

[11] K. Knobel, L. LaestadiusFornyelsegrad och priskanslighet inom foretagsforsakringarRoyal Institute of Technology, School of Engineering Sciences

[12] SAS/STAT(R) 9.2 User’s Guide, Second Edition.The LOGISTIC Procedurehttps://support.sas.com/documentation/cdl/en/statug/63033/HTML/

default/viewer.htm#logistic_toc.htm

[13] SAS/STAT(R) 9.2 User’s Guide, Second Edition.The LOGISTIC Procedure, Confidence intervals for paramtershttps://support.sas.com/documentation/cdl/en/statug/63033/HTML/

default/viewer.htm#logistic_toc.htm

[14] SAS/STAT(R) 9.22 User’s Guide, Second Edition.Logistic Modeling with Categorical Predictorshttps://support.sas.com/documentation/cdl/en/statug/63347/HTML/

default/viewer.htm#statug_logistic_sect060.htm

[15] SAS/IML(R) Studio 3.4: User’s Guide.Exploratory and Confirmatory Data Analysishttp://support.sas.com/documentation/cdl/en/imlsug/64254/HTML/

default/viewer.htm#imlsug_ugintro_sect003.htm

[16] SAS/STAT(R) 9.2 User’s Guide, Second Edition.Stepwise Logistic Regression and Predicted Valueshttps://support.sas.com/documentation/cdl/en/statug/63033/HTML/

default/viewer.htm#statug_logistic_sect052.htm

[17] The Pennsylvania State University, Eberly College of ScienceSTAT 504 7.2.1 - Model Diagnostics https://onlinecourses.science.psu.edu/

stat504/node/161

[18] The Pennsylvania State University, Eberly College of Science3.2 - Hypothesis Testing (P-value approach)https://onlinecourses.science.psu.edu/statprogram/node/138

[19] J. Manyika, M. Chui.Big data: The next frontier for innovation, competition, and productivityMcKinsey Global Institute (2011)

[20] O. Heinrich, A. Mussa, S. Zerbi.How retailers can improve price perceptionprofitably

62

http://www.mckinsey.com/industries/retail/our-insights/

how-retailers-can-improve-price-perception-profitably

[21] E. LLullBig Data Analysis to Transform Insurance IndustryFinancial TImes, Risk Management (2016)https://www.ft.com/content/3273a7d4-00d2-11e6-99cb-83242733f755

[22] Filip Allard.Pricing Analyst, If P&C Insurance Ltd.

[23] Henrik Bosaeus.Pricing Analyst, If P&C Insurance Ltd.

63

Appendix

Initial Model

Odds Ratio Plots

Characteristic 1 Characteristic 2



64




65





66

Regrouped Model

Odds Ratio Plots




67




68





69

Final Model

Odds Ratio Plots




70


Characteristic 14

71

β and Confidence Intervals Table

Group Odds Odds CI Lower Odds CI Upper Observations

Intercept 17.77531 15.00131 21.06221

Char 1 A 0.7883 0.73672 0.84288 22721B 0.89775 0.85216 0.9456 35081C 1.0067 0.94778 1.0694 23107D 1.11841 1.04171 1.20091 16434E 1.0772 0.993 1.1675 11841F 1.04561 0.95343 1.1468 8477G 1.1394 1.02541 1.266 6798H 1.04341 0.93214 1.1678 5487I 0.98525 0.87282 1.11241 4462J 1.0427 0.91012 1.19471 3625

Char 2 A 1.51131 1.3695 1.6677 1112B D E 1.2271 0.95334 1.5792 12718C 0.92095 0.8508 0.99682 17294F 0.8518 0.7964 0.9123 27886G 0.87055 0.82086 0.92322 6010H 0.99555 0.8911 1.11211 4803I 0.94778 0.83761 1.0725 11442J 0.876 0.8064 0.95142 2218K 0.94077 0.79575 1.11221 14021L 0.93941 0.86826 1.0162 27382M 0.80723 0.76274 0.85425 6726N O 1.00931 0.90638 1.1238 3172P 0.96134 0.82542 1.1194 4311Q 0.82887 0.7319 0.93886 4685R 1.032 0.90794 1.1753 8917S 1.2781 1.14621 1.4252 0.94951

Char 3 A 0.7349 0.56877 0.90138 986B 0.74991 0.62382 1.2846 1888C 1.1596 1.0468 1.4000 61753D 1.2577 1.1298 1.391 47698E 1.2487 1.12031 1.6045 59315F 1.0845 0.7328 1.6046 292

72

Group Odds Odds CI Lower Odds CI Upper Observations

Char 4 A 0.63183 0.5585 0.7147 34995B 0.97671 0.86455 1.1034 117742C 1.0824 0.94942 1.234 18190D 1.9839 1.415 2.78161 1073

Char 5 A 0.84015 0.78185 0.90311 22336B 0.92496 0.87782 0.97462 44665C 1.0045 0.9385 1.0747 18205D 1.09411 1.03371 1.157 35923E 1.14861 1.0821 1.2194 31150F 1.17661 1.0885 1.2719 14090G 1.0148 0.9042 1.139 4643

Char 6 A 0.84444 0.7903 0.90211 115100B 0.87035 0.8145 0.9294 34166C 0.93523 0.86455 1.01161 13480D 0.9486 0.8538 1.05371 5445E 1.1127 0.94786 1.3062 2204

Char 7 A 0.90556 0.77393 1.05971 2664B 1.1113 1.02 1.21091 54773C 1.07381 0.99264 1.1617 67538D 0.95078 0.87255 1.0361 25268E 1.00111 0.90565 1.1066 11077F 1.1997 1.0367 1.38821 4026G 1.13421 0.9788 1.3143 3527H 1.022 0.8075 1.2968 1040I 0.87891 0.69733 1.1076 953J 0.62218 0.4854 0.7974 1046

Char 13 X1 0.63027 0.55131 0.7208 2101A 0.77375 0.67157 0.89146 2296B 0.91012 0.81768 1.012 5037C 0.99008 0.9052 1.083 8497D 1.0490 0.98025 1.1229 20793E 1.1154 1.0571 1.1771 40853F 1.07981 1.0043 1.1611 15435X2 1.13851 0.96361 1.3451 2056

Char 14 A 0.9687 0.9032 1.03941 15908B 1.0412 0.98345 1.1026 52401C 1.1460 1.069 1.2287 19326D 1.22481 1.0630 1.411 3103X 0.96352 0.85145 1.09021 3156

73

TRITA -MAT-K 2017:10

ISRN -KTH/MAT/K--17/10--SE

www.kth.se

Documents

Modelling Non-life Insurance Policyholder Price Sensitivity1114349/FULLTEXT01.pdfIf P&C Insurance Ltd, at their headquarters in Stockholm, Sweden. If P&C, or Property & Casualty, is