Predictive Modeling for Property-Casualty Insurance

Predictive Modeling for Property-Casualty

Insurance

James Guszcza, FCAS, MAAAPeter Wu, FCAS, MAAA

SoCal Actuarial ClubLAX

September 22, 2004

Predictive Modeling: 3 Levels of Discussion

StrategyProfitable growthRetain most profitable policyholders

MethodologyModel design (actuarial)Modeling process

TechniqueGLM vs. decision trees vs. neural nets…

Methodology vs Technique

How does data mining need actuarial science?

Variable creationModel designModel evaluation

How does actuarial science need data mining?

Advances in computing, modeling techniquesIdeas from other fields can be applied to insurance

problems

Semantics: DM vs PM

One connotation: Data Mining (DM) is about knowledge discovery in large industrial databases

Data exploration techniques (some brute force)

e.g. discover strength of credit variables Predictive Modeling (PM) applies statistical

techniques (like regression) after knowledge discovery phase is completed.

Quantify & synthesize relationships found during knowledge discovery

e.g. build a credit model

Strategy: Why do Data Mining?

Think Baseball!

Bay Area Baseball

In 1999 Billy Beane (manager for the Oakland Athletics) found a novel use of data mining.

Not a wealthy teamRanked 12th (out of 14) in payroll How to compete with rich teams?

Beane hired a statistics whiz to analyze statistics advocated by baseball guru Bill James

Beane was able to hire excellent players undervalued by the market.

A year after Beane took over, the A’s ranked 2nd!

Implication

Beane quantified how well a player would do. Not perfectly, just better than his peers

Implication: Be on the lookout for fields where an expert is

required to reach a decision based on judgmentally synthesizing quantifiable information across many dimensions.

(sound like insurance underwriting?) Maybe a predictive model can beat the pro.

Example

Who is worse?... And by how much?20 y.o. driver with 1 minor violation who pays his bills

on time and was written by your best agentMature driver with a recent accident and has paid his

bills late a few times

Unlike the human, the algorithm knows how much weight to give each dimension…

Classic PM strategy: build underwriting models to achieve profitable growth.

Keeping Score

Billy BeaneCEO who wants to run the next Progressive

Beane’s Scouts Underwriter

Potential Team Member Potential Insured

Bill James’ statsPredictive variables – old or new (e.g. credit)

Billy Bean’s number cruncher

You! (or people on your team)

What is Predictive Modeling?

Three Concepts

Scoring enginesA “predictive model” by any other name…

Lift curvesHow much worse than average are the policies with

the worst scores?

Out-of-sample testsHow well will the model work in the real world?Unbiased estimate of predictive power

Classic Application: Scoring Engines

Scoring engine: formula that classifies or separates policies (or risks, accounts, agents…) into

profitable vs. unprofitableRetaining vs. non-retaining…

(Non-)Linear equation f( ) of several predictive variables

Produces continuous range of scores

score = f(X1, X2, …, XN)

What “Powers” a Scoring Engine?

Scoring Engine:

score = f(X1, X2, …, XN)

The X1, X2,…, XN are as important as the f( )!Why actuarial expertise is necessary

A large part of the modeling process consists of variable creation and selection

Usually possible to generate 100’s of variablesSteepest part of the learning curve

Model Evaluation: Lift Curves

Sort data by score Break the dataset into

10 equal pieces Best “decile”: lowest

score lowest LR Worst “decile”: highest

score highest LR Difference: “Lift”

Lift = segmentation power

Lift ROI of the modeling project

Out-of-Sample Testing

Randomly divide data into 3 pieces Training data, Test data, Validation data

Use Training data to fit models Score the Test data to create a lift curve

Perform the train/test steps iteratively until you have a model you’re happy with

During this iterative phase, validation data is set aside in a “lock box”

Once model has been finalized, score the Validation data and produce a lift curve

Unbiased estimate of future performance

Comparison of Techniques

Models built to detect whether an email message is really spam.

“Gains charts” from several models

Analogous to lift curves Good for binary target

All techniques work ok! Good variable creation at

least as important as modeling technique.

0.0 0.2 0.4 0.6 0.8 1.0

Perc.Total

perfect modelmarsneural netdecision treeglmregression

Spam Email Detection - Gains Charts

Credit Scoring is an Example

All of these concepts apply to Credit ScoringKnowledge discovery in databases (KDD)Scoring engineLift Curve evaluation translates to LR

improvement ROIBlind-test validation

Credit scoring has been the insurance industry’s segue into data mining

Applications Beyond Credit The classic: Profitability Scoring Model

Underwriting/Pricing applications Retention models Elasticity models Cross-sell models Lifetime Value models Agent/agency monitoring Target marketing Fraud detection Customer segmentation

no target variable (“unsupervised learning”)

Data Sources Company’s internal data

Policy-level records Loss & premium transactions Agent database Billing VIN……..

Externally purchased data Credit CLUE MVR Census ….

The Predictive Modeling Process

Early: Variable Creation

Middle: Data Exploration & Modeling

Late: Analysis & Implementation

Variable Creation

Research possible data sources Extract/purchase data Check data for quality (QA)

Messy! (still deep in the mines)

Create Predictive and Target VariablesOpportunity to quantify tribal wisdom…and come up with new ideasCan be a very big task!

Steepest part of the learning curve

Types of Predictive Variables

Behavioral Historical Claim, billing, credit …

Policyholder Age/Gender, # employees …

Policy specifics Vehicle age, Construction Type …

Territorial Census, Weather …

Data Exploration & Variable Transformation

1-way analyses of predictive variables Exploratory Data Analysis (EDA) Data Visualization Use EDA to cap / transform predictive

variablesExtreme valuesMissing values…etc

Multivariate Modeling Examine correlations among the variables Weed out redundant, weak, poorly distributed

variables Model design Build candidate models

Regression/GLMDecision Trees/MARSNeural Networks

Select final model

Building the Model1. Pair down collection of predictive variables

to a manageable set

2. Iterative process Build candidate models on “training data” Evaluate on “test data” Many things to tweak

Different target variables Different predictive variables Different modeling techniques # NN nodes, hidden layers; tree splitting rules…

Considerations Do signs/magnitudes of parameters make

sense? Statistically significant? Is the model biased for/against certain types

of policies? States? Policy sizes? ... Predictive power holds up for large policies? Continuity

Are there small changes in input values that might produce large swings in scores

Make sure that an agent can’t game the system

Model Analysis & Implementation Perform model analytics

Necessary for client to gain comfort with the model

Calibrate ModelsCreate user-friendly “scale” – client dictates

Implement modelsProgramming skills are critical here

Monitor performanceDistribution of scores over time, predictiveness,

usage of model...Plan model maintenance

Modeling Techniques

Where Actuarial Science Needs Data Mining

The Greatest Hits

Unsupervised: no target variable

ClusteringPrincipal Components (dimension reduction)

Supervised: predict a target variable

Regression GLMNeural NetworksMARS: Multivariate Adaptive Regression SplinesCART: Classification And Regression Trees

Regression and its Relations

GLM: relax regression’s distributional assumptions

Logistic regression (binary target)Poisson regression (count target)

MARS & NNClever ways of automatically transforming and

interacting input variablesWhy: sometimes “true” relationships aren’t linearUniversal approximators: model any functional form

CART is simplified MARS

Neural Net Motivation

Let X1, X2, X3 be three predictive variablespolicy age, historical LR, driver age

Let Y be the target variableLoss ratio

A NNET model is a complicated, non-linear, function φ such that:

φ(X1, X2, X3) ≈ Y

In visual terms…

NNET lingo

Green: “input layer” Red: “hidden layer” Yellow: “output layer” The {a, b} numbers are

“weights” to be estimated.

The network architecture and the weights constitute the model.

In more detail…

221101

1zbzbbe

331221111011

11 xbxbxbaeZ

332222112021

12 xbxbxbaeZ

In more detail…

The NNET model results from substituting the expressions for Z1

and Z2 in the expression for Y.

221101

1zbzbbe

331221111011

11 xbxbxbaeZ

332222112021

12 xbxbxbaeZ

In more detail…

Notice that the expression for Y has the form of a logistic regression.

Similarly with Z1, Z2.

221101

1zbzbbe

331221111011

11 xbxbxbaeZ

332222112021

12 xbxbxbaeZ

In more detail…

You can therefore think of a NNET as a set of logistic regressions embedded in another logistic regression.

221101

1zbzbbe

331221111011

11 xbxbxbaeZ

332222112021

12 xbxbxbaeZ

Universal Approximators

The essential idea: by layering several logistic regressions in this way…

…we can model any functional form no matter how many non-linearities or

interactions between variables X1, X2,… by varying # of nodes and training cycles only

NNETs are sometimes called “universal function approximators”.

MARS / CART Motivation

NNETs use the logistic function to combine variables and automatically model any functional form

MARS uses an analogous clever idea to do the same work

MARS “basis functions” CART can be viewed as simplified MARS

Basis functions are horizontal step functions

NNETS, MARS, and CART are all cousins of classic regression analysis

Reference

For Beginners:Data Mining Techniques

--Michael Berry & Gordon Linhoff

For Mavens:The Elements of Statistical Learning

--Jerome Friedman, Trevor Hastie, Robert Tibshirani

Predictive Modeling for Property-Casualty Insurance

Documents

Predictive Modeling Using Regression

Map Based Predictive Modeling

Predictive Modeling News - MagnaCare

Improving Commercial Casualty Claims Handling with ... · many insurers have looked to predictive analytics to develop sophisticated pric-ing tools. As predictive modeling moves from

Predictive Modeling and Design Solutions for Beneficial ...aapa.files.cms-plus.com/PDFs/Predictive Modeling... · Predictive Modeling and Design Solutions for Beneficial Use of Dredged

A Property and Casualty Insurance Predictive Modeli ng ...support.sas.com/resources/papers/proceedings16/11422-2016.pdf · A Property and Casualty Insurance Predictive Modeli ng Process

Applied Predictive Modeling - Squarespace Predictive Modeling Central Iowa R Users Group Max Kuhn Pﬁzer R&D “Predictive Modeling” Deﬁne That! Rather than saying that method

spm2: Spatial Predictive Modeling

© Deloitte Consulting, 2004 Predictive Modeling for Property-Casualty Insurance James Guszcza, FCAS, MAAA Peter Wu, FCAS, MAAA SoCal Actuarial Club LAX

Predictive Modeling for Life Insurers - SOA · Predictive Modeling for Life Insurers . ... 1. Predictive Modeling Background ... (e.g., Solvency II, MCEV Principles)

Frauds & Scams: Predictive Modeling

Predictive Modeling for Life Insurers - SOA · 2011-11-03 · Predictive Modeling for Life Insurers . Application of Predictive Modeling Techniques in Measuring Policyholder Behavior

Predictive Modeling Using SAS Group Presentation… · Predictive Modeling using SAS . ... is supplied to PROC LOGISTIC. ... predictive modeling

Predictive Modeling in Reserving - Gross Consulting Modeling with Claim Analytics.pdf · Predictive Modeling in Reserve Analysis • It’s all predictive modeling isn’t it? •

Predictive Battery Modeling

Predictive Modeling Strategies for Disease Management Programs · Predictive Modeling Strategies for Disease Management Programs December 14, 2007 The National Predictive Modeling

Predictive Modeling

WE Analytics & Predictive Modeling

Predictive Modeling in Underwriting

Boosting and Predictive modeling