Upload
roderick-mosley
View
216
Download
0
Tags:
Embed Size (px)
Citation preview
America
CAS Seminar on RatemakingMarch 2005Presented by: Serhat Guven
An Introduction to GLM TheoryRefinements
2
America
An Introduction to GLM Theory
OUTLINE
Background
GLM Building Blocks
Link Function
Error Distribution
Model Structure
Diagnostics
Summary
PURPOSE: To discuss techniques that refine the structural design, including proper tools and diagnostics, of the GLM thereby allowing for a more flexible model
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
3
America
Purpose of Predictive Modeling
To predict a response variable using a series of explanatory variables (or rating factors).
Traditional methods focus on the parameters, modeling requires the analyst to consider the validation of the parameters.
Dependent/ResponseLossesClaims
Retention
WeightsClaims
ExposuresPremium
Statistical Model
Model ResultsParameters
Validation Statistics
Independent/PredictorsAge AccidentsLimit Convictions
Territory Credit Score
Background
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
4
America
Purpose of Predictive Modeling
To produce a sensible model that explains recent historical experience and is likely to be predictive of future experience.
Strong predictive power yet very
poor explanatory power
Good predictor of previous experience but poor predictor of
future experience
Overall Mean“Best” Model
1 parameter for each observation
Model Complexity
(Number of Parameters)
Traditional methods tend to create unnecessarily complex structures that tend to overfit the data.
Background
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
5
America
GLMs generalize the traditional regression models by introducing nonlinearity through the link function and loosening the normality assumption
Generalized Linear Models
y = h(Linear Combination of Rating Factors) + Error
g=h-1 is called the LINK function and is chosen to measure the “signal” most accurately
Error should reflect underlying process and can come from the exponential family
Response Variable
Systematic Component
Random Component= +
Linear combination of rating factors is the model structure
Signal Noise
Background
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
6
America
Generalized Linear Models
εYY
ξ) h(XβμY
More formally:
Response Variable
Systematic Component
Random Component= +
Where:
And:
ω
)μφV(Var(Y)
Link function (g=h-1) Links random and systematic component.
Design MatrixIdentifies predictor variables for each observation.
ParametersQuantities estimated via log likelihood function.
Offset TermAllows incorporation of known effects or restrictions.
Scale Parameter
Variance Function
Prior Weights
Background
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
7
America
Generalized Linear Models
)1()1()1(1)-(rT11)-(rT)( μyμ'gηWXXWXβ rrrr
)μ(V)μ('g
ωW
)(2)((r)
rrdiag
The general solution for the GLM parameters:
Where:
weights
rr
ω
Xβμ
μgη
hg
)()(
1
and:
Link function
Error Distribution
Model Structure
Background
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
8
America
GLMs generalize the traditional regression models by introducing nonlinearity through the link function and loosening the normality assumption
Components of a GLM
y = h(Linear Combination of Rating Factors) + Error
Response Variable
Systematic Component
Random Component= +
Signal Noise
Building Blocks
Basic Building Blocks
Link Function
Error Structure
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
9
America
GLM Building Blocks: Link Functions
Link function (g=h-1) chosen to based on how the factors are related to produce the best signal:
-Log: variables related multiplicatively (e.g., risk modeling)
- Identity: variables related additively (e.g., risk modeling)
-Logit: retention or risk modeling
-Reciprocal: canonical link for gamma distribution (e.g., severity modeling)
-Mixed: additive/multiplicative rating algorithms
y = h(Linear Combination of Rating Factors) + Error
Link Functions
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
10
America
Example: Log Link
The rating structure is multiplicative and the premium for a youthful policyholder living in Area C is: $1,955=$1,000 x 1.700 x 1.150
Policyholder Age (p)
Relativity
Youthful
Adult
Mature
Seniors
1.700
1.000
0.800
1.100
Rating Area (r)
Relativity
A
B
C
D
E
0.900
1.000
1.150
1.300
1.500
Base Premium = 1,000
The signal allows us to populate rating tables:
Premium = Base Premium x Policyholder Age x Rating Area
=exp(.531)
=exp(0.000)
=exp(-0.223)
=exp(0.095)
=exp(-0.105)
=exp(0.000)
=exp(0.140)
=exp(0.262)
=exp(0.405)
=exp(6.908)
= exp(6.908) * exp(0.531) * exp(0.140)
= exp(6.908 + 0.531 + 0.140)
= exp(b+p+r)
h(linear combination of rating factors)
Link Functions
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
11
America
Link function relates the independent predictors to the response in a non linear form :
Pure Multiplicative – Log
XβexpηY
)Xβ(1
1ηY
e
XβηY
Xβ
1ηY
- Pure Additive - Identity
- Logit
- Reciprocal
GLM Building Blocks: Link FunctionsLink Functions
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
12
America
Mixed Rating Algorithms
Mixed additive – multiplicative rating algorithm:
Base Rate*(Age+Gender+Usage)*Driving Record*Territory*Limit
Mixture model structures do not fit within the framework of garden variety GLMs
Solutions:
- Create n way tables from the pure multiplicative models
- Build hierarchal GLM to systematically estimate the additive component from the multiplicative factors
- Remove the restriction on the link function to calculate parameters directly
Link Functions
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
13
America
Generalized Nonlinear Models
TeSXY Z
ZeXY
Examples models that do not fit into traditional GLMs:
Mixed Additive Multiplicative Model
Alternative Mixtures
ZPeX-e1
1Y
Complicating the logit functions
Alternate link functions allow the ability to introduce additional nonlinearities into the solution.
Link Functions
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
14
America
Solution for Non-Linear Models
Use identity link function
Replace design matrix X in GLM solution with D
yWDDWDβ 1)-(rT11)-(rT)( r
)μ(V
ωW
)((r)
rdiag
ijth element of D is derivative of with regard to the jth parameter for the ith observation
Sort of equivalent to solving GLM with different design matrix. However, there are two “linear predictors” in use
Where
“…the method is likely to be most useful for determining if a reasonable fit can be improved, rather than for the somewhat more optimistic goal of correcting a hopeless situation.”
-Pregibon (1980)
Link Functions
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
15
America
GLM Building Blocks: Error Structure
Density: Severity
0
200
400
600
800
1,000
1,200
1,400
0 2,000 4,000 6,000 8,000 10,000 12,000 14,000
Range
Den
sit
y
Severity
Reflects the variability of the underlying process
Frequency: Frequency
0
10,000
20,000
30,000
40,000
50,000
60,000
70,000
80,000
90,000
100,000
0 1 2 3 4
Range
Fre
qu
en
cy
Frequency
- Gamma consistent with severity modeling, may want to try Inverse Gaussian
- Poisson consistent with frequency modeling
- Tweedie consistent with pure premium modeling
- Normal useful for a variety of applications
y = h(Linear Combination of Rating Factors) + Error
Density: Pure Premium
0
200
400
600
800
1,000
1,200
1,400
1,600
1,800
2,000
2,200
2,400
0 2,000 4,000 6,000 8,000 10,000 12,000 14,000
Range
Den
sit
y
Pure Premium
Density: Loss Ratio
0
100
200
300
400
500
600
700
800
900
1,000
-0.6 -0.4 -0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4 1.6 1.8
Range
Den
sit
y
Loss Ratio
Error Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
16
America
Error Structure: Variance Function
Observed Response Error StructureVariance Function
V()Scale Parameter
Normal µ0
Claim Frequency Poisson µ 1
Claim Severity Gamma µ2
Risk Premium Gamma or Tweedie µT µT
New/Renewal Rate Binomial µ (1-µ) 1
Error structure is also used to incorporate assumptions about uncertainty and the prediction
Error Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
17
America
Example: Binomial Distribution
Binomial
Basic functional form in decision modeling
Belongs to the exponential family of distributions
Can be extended to multinomial distributions
-
0.0500
0.1000
0.1500
0.2000
0.2500
0.3000
Mean
Va
ria
nc
e
Extreme probabilities of success/failure
related to low variability
Higher variability associated with less certain probability
outcomes
Variance Function = (1-)
Error Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
18
America
Additional Variance Functions
Observed Response Error StructureVariance Function
V()Scale Parameter
Normal µ0
Claim Frequency Poisson µ 1
Claim Severity Gamma µ2
Risk Premium Gamma or Tweedie µT µT
New/Renewal Rate Binomial µ (1-µ) 1
Claim FrequencyOver-dispersed
Poissonµ K
Claim Severity Inverse Gaussian µ3
Error structure is also used to incorporate assumptions about the uncertainty and the predicted value
Error Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
19
America
Heterogeneous exposure bases
If model together, exposures with high variability may mask patterns of less random risks
If loss trends vary by exposure class, the proportion each represents of the total will change and may mask important trends
Independent predictors can have different effects on different perils
If cannot split data, use joint modeling techniques to improve overall fit
Error Structure: Scale Parameter
-3.5
-3.0
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
-1,000 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000
Fitted Value
Stu
den
tized
Sta
nd
ard
ized
Devia
nce R
esid
uals
> .2
> .5
> .7
> 1.4
> 2.2
> 2.9
> 3.6
> 4.3
> 5.0
> 5.7
> 6.5
> 7.2
> 7.9
> 8.6
> 9.3
> 10.1
> 10.8
> 11.5
> 12.2
> 12.9
> 13.6
> 14.4
> 15.1
> 15.8
> 16.5
> 17.2
> 17.9
> 18.7
> 19.4
> 20.1
> 20.8
> 21.5
> 22.3
> 23.0
> 23.7
> 24.4
> 25.1
> 25.8
> 26.6
> 27.3
-5
-4
-3
-2
-1
0
1
2
3
-1,000 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000 10,000 11,000
Fitted Value
Stu
den
tized
Sta
nd
ard
ized
Devia
nce R
esid
uals
> .3
> .5
> .8
> 1.6
> 2.5
> 3.3
> 4.1
> 4.9
> 5.7
> 6.6
> 7.4
> 8.2
> 9.0
> 9.8
> 10.7
> 11.5
> 12.3
> 13.1
> 13.9
> 14.8
> 15.6
> 16.4
> 17.2
> 18.1
> 18.9
> 19.7
> 20.5
> 21.3
> 22.2
> 23.0
> 23.8
> 24.6
> 25.4
> 26.3
> 27.1
> 27.9
> 28.7
> 29.5
> 30.4
> 31.2
Cluster of negative residuals caused by combination of two types of risks
Joint modeling techniques improved the residual plot
Error Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
20
America
GLM Building Blocks: Model Structure
Include variables that are predictive, exclude those that are not
- Gender may not have major impact on theft severity
Simplify some rating factors, if full inclusion not necessary
- Some levels within a particular predictor may be grouped together (e.g., 50-54 year olds)
- A curve may replicate the signal (e.g., amount of insurance)
- Scoring levels to combine rating factors into a single concept thereby untangling impacts of various factors
Complicate model if the relationship between levels of one variable depends on another characteristic
- The difference between males and females depends on age
y = h(Linear Combination of Rating Factors) + Error
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
21
America
Complicating the Model: Interactions
Interactions are required when the combined effect of multiple rating levels of two different independent rating factors is different than the additive effect of the simple parameters.
Interaction topics
- Interactions versus correlations
- Identifying interactions
- Full and partial interactions
- Simplifying interactions
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
Definitions
Parameterization
Identification
Simplification
22
America
Interactions versus Correlations
Male Female Totals
Youthful wYM wYF wY.
Adult wAM wAF wA.
Mature wMM wMF wM.
Seniors wSM wSF wS.
Totals w.M w.F w..
Distributional Correlations: Observed Weights
Let:
i j
ji
Q
ww
ji,
2ji,ji,
..
.
..
...ji,
w
)w(w
wwww
Assumption of distributional independence
Testing the assumption – Cramer’s V scales this statistic from (-1,+1)
A simple GLM model addresses distributional correlations.
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
Definitions
Parameterization
Identification
Simplification
23
America
Interactions versus Correlations
Male Female
Youthful yYM yYF
Adult yAM yAF
Mature yMM yMF
Seniors ySM ySF
Interactions: Observed Responses
Assumption of simple model adequacy.
Testing the assumption-
Chi Squared Test scales this statistic from (0,1)
Let:
i j ji,
2ji,ji,
ji,kji,ji,
y
)y(yQ
)ξ βh(xμy
A more complex GLM model is required to handles such interactions.
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
Definitions
Parameterization
Identification
Simplification
24
America
Identifying Interactions
Decomposing the Interaction for simplification
- graphically
-Chi-squared test
-Signs test
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
Definitions
Parameterization
Identification
Simplification
25
America
Identifying Interactions
Understanding complex relationships between multiple rating factors.Policyholder Age x Policyholder Sex Interaction
0.0000
0.0001
0.0002
0.0003
0.0004
0.0005
0.0006
0.0007
0.0008
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
55%
Policyholder Age
Policyholder Sex (Male)
Policyholder Sex (Female)
Policyholder Age x Policyholder Sex Interaction - Rescaled
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1.2
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
50%
55%
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34
35-3
9
40-4
4
45-4
9
50-5
4
55-5
9
60-6
4
65-6
970
+
Policyholder Age
Policyholder Sex (Male)
Policyholder Sex (Female)
- Can view interactions on an “absolute” scale …
- … or on a rescaled basis.
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
Definitions
Parameterization
Identification
Simplification
26
America
Identifying Interactions
Consistency of Interaction over time.
Predicted Values [LossYear (1995)]
-0.0002
-0.0001
0.0000
0.0001
0.0002
0.0003
0.0004
0.0005
0.0006
0.0007
0.0008
0%
2%
4%
6%
8%
10%
12%
14%
16%
18%
20%
22%
24%
Policyholder Age
Policyholder Sex (Male)
Policyholder Sex (Female)
Predicted Values [LossYear (1996)]
-0.0002
-0.0001
0.0000
0.0001
0.0002
0.0003
0.0004
0.0005
0.0006
0.0007
0.0008
0.0009
0%
5%
10%
15%
20%
25%
30%
35%
Policyholder Age
Policyholder Sex (Male)
Policyholder Sex (Female)
- Three way interaction with time …
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
Definitions
Parameterization
Identification
Simplification
27
America
Parameter Notation: Simple Model
Relationship between rating levels of one rating factor is constant for all levels of other rating variables.
Assume two rating variables:
- Age: Youthful, Adult (base), Mature, Seniors
- Gender: Male (base), Female
Simple Model: Age + Gender
5 Parameters
Male Female
Youthful β0+βY
β0+βY+βF
Adult β0 β0+βF
Mature β0+βM
β0+βM+βF
Seniors β0+βS
β0+βS+βF
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
Definitions
Parameterization
Identification
Simplification
28
America
Simple Factor Model
Predicted Values
0.500
0.700
0.900
1.100
1.300
1.500
1.700
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+Policyholder Age
PolicyholderSex(Male)
PolicyholderSex(Female)
Simple Model: Age + Gender
Relationship between males and females is a constant exp(βF) at each age.
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
Definitions
Parameterization
Identification
Simplification
29
America
Parameter Notation: Full Interaction
Male Female
Youthful β0+βY
β0+βY+βF
Adult β0 β0+βF
Mature β0+βM
β0+βM+βF
Seniors β0+βS
β0+βS+βF
Male Female
Youthful β0+βY β0+βY+βF+βYF
Adult β0 β0+βF
Mature β0+βM β0+βM+βF+βMF
Seniors β0+βS β0+βS+βF+βSF
Relationship between rating levels of one rating factor is different for all levels of another rating variable.
Assume two rating variables:
- Age: Youthful, Adult (base), Mature, Seniors
- Gender: Male (base), Female
No interaction: Age + Gender
Interaction: Age + Gender+Age.Gender
# Parameters: 5 # Parameters: 8
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
Definitions
Parameterization
Identification
Simplification
30
America
Predicted Values
0.500
0.700
0.900
1.100
1.300
1.500
1.700
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+Policyholder Age
PolicyholderSex(Male)
PolicyholderSex(Female)
Full Interaction Factor Model
Predicted Values
0.500
1.000
1.500
2.000
2.500
3.000
3.500
4.000
4.500
5.000
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+Policyholder Age
PolicyholderSex(Male)
PolicyholderSex(Female)
Full Interaction Model:
Age + Gender + Age.Gender
Relationship between males and females is a different at each age.
Simple Model:
Age + Gender
Relationship between males and females is a constant exp(βF) at each age.
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
Definitions
Parameterization
Identification
Simplification
31
America
Parameter Notation: Partial Interaction
Male Female
Youthful β0+βY β0+βY+βYF
Adult β0 β0
Mature β0+βM β0+βM+βMF
Seniors β0+βS β0+βS+βSF
Relationship between rating levels of one rating factor is different for all levels of another rating variable, except there is no difference at the base level of the other rating variable.
Assume two rating variables:
- Age: Youthful, Adult (base), Mature, Seniors
- Gender: Male (base), Female
Partial Interaction: Age + Age.Gender
Male Female
Youthful β0+βY β0+βY+βF+βYF
Adult β0 β0+βF
Mature β0+βM β0+βM+βF+βMF
Seniors β0+βS β0+βS+βF+βSF
Interaction: Age + Gender + Age.Gender
# Parameters: 8 # Parameters: 7
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
Definitions
Parameterization
Identification
Simplification
32
America
Predicted Values
0.500
1.000
1.500
2.000
2.500
3.000
3.500
4.000
4.500
5.000
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+Policyholder Age
PolicyholderSex(Male)
PolicyholderSex(Female)
Partial Interaction Factor Model
Predicted Values
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
5.00
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+Policyholder Age
PolicyholderSex(Male)
PolicyholderSex(Female)
Partial Interaction Model:
Age + Age.Gender
Interaction adjusts for removal of the simple gender term, except at the base level (36-40).
Full Interaction Model:
Age + Gender + Age.Gender
Relationship between males and females is a different at each age.
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
Definitions
Parameterization
Identification
Simplification
33
America
Parameter Notation: Partial Interaction
Male Female
Youthful β0 β0+βF+βYF
Adult β0 β0+βF
Mature β0 β0+βF+βMF
Seniors β0 β0+βF+βSF
Relationship between rating levels of one rating factor is different for all levels of another rating variable, except there is no difference at the base level of the other rating variable.
Assume two rating variables:
- Age: Youthful, Adult (base), Mature, Seniors
- Gender: Male (base), Female
Partial Interaction: Gender + Age.Gender
Male Female
Youthful β0+βY β0+βY+βF+βYF
Adult β0 β0+βF
Mature β0+βM β0+βM+βF+βMF
Seniors β0+βS β0+βS+βF+βSF
Interaction: Age + Gender + Age.Gender
# Parameters: 8 # Parameters: 5
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
Definitions
Parameterization
Identification
Simplification
34
America
Predicted Values
0.500
1.000
1.500
2.000
2.500
3.000
3.500
4.000
4.500
5.000
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+
Policyholder Age
PolicyholderSex(Male)
PolicyholderSex(Female)
Partial Interaction Factor Model
Partial Interaction Model:
Gender + Age.Gender
Predicted Values
0.50
1.00
1.50
2.00
2.50
3.00
3.50
4.00
4.50
5.00
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+
Policyholder Age
PolicyholderSex(Male)
P olicyholderSex(Female)
Interaction adjusts for removal of the simple age term, except at the base level (male).
Full Interaction Model:
Age + Gender + Age.Gender
Relationship between males and females is a different at each age.
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
Definitions
Parameterization
Identification
Simplification
35
America
Parameter Notation: Interaction Only
Male Female
Youthful β0 β0+βYF
Adult β0 β0
Mature β0 β0+βMF
Seniors β0 β0+βSF
Only variation allowed is at non-base levels.
Assume two rating variables:
- Age: Youthful, Adult (base), Mature, Seniors
- Gender: Male (base), Female
Interaction Only: Age.Gender
Male Female
Youthful β0+βY β0+βY+βF+βYF
Adult β0 β0+βF
Mature β0+βM β0+βM+βF+βMF
Seniors β0+βS β0+βS+βF+βSF
Interaction: Age + Gender + Age.Gender
# Parameters: 8 # Parameters: 4
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
Definitions
Parameterization
Identification
Simplification
36
America
Predicted Values
0.50
0.60
0.70
0.80
0.90
1.00
1.10
1.20
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+Policyholder Age
PolicyholderSex(Male)
PolicyholderSex(Female)
Interaction Only Factor Model
Predicted Values
0.50
0.60
0.70
0.80
0.90
1.00
1.10
1.20
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+Policyholder Age
PolicyholderSex(Male)
PolicyholderSex(Female)
Interaction adjusts for removal of the simple terms, except at the base levels (male and 36-40).
Interaction adjusts for removal of the simple age term, except at the base level (male).
Partial Interaction Model:
Gender + Age.Gender
Interaction Only Model:
Age.Gender
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
Definitions
Parameterization
Identification
Simplification
37
America
Parameter Notation: Summaries
Male Female
Youthful β0 β0+βYF
Adult β0 β0
Mature β0 β0+βMF
Seniors β0 β0+βSF
Interaction Only: Age.Gender
Male Female
Youthful β0+βY β0+βY+βF+βYF
Adult β0 β0+βF
Mature β0+βM β0+βM+βF+βMF
Seniors β0+βS β0+βS+βF+βSF
Interaction: Age + Gender + Age.Gender
# Parameters: 8
# Parameters: 4
Male Female
Youthful β0 β0+βF+βYF
Adult β0 β0+βF
Mature β0 β0+βF+βMF
Seniors β0 β0+βF+βSF
Partial Interaction: Gender + Age.Gender
# Parameters: 5
Male Female
Youthful β0+βY β0+βY+βYF
Adult β0 β0
Mature β0+βM β0+βM+βMF
Seniors β0+βS β0+βS+βSF
Partial Interaction: Age + Age.Gender
# Parameters: 7
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
Definitions
Parameterization
Identification
Simplification
38
America
Simplifying Interactions
Age and Gender (Male at Base)
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+
Policyholder Age
0
10
20
30
40
50
Complex relationships can be simplified using curves, groups, etc.
Age and Gender Predicted Values
0.00005
0.00015
0.00025
0.00035
0.00045
0.00055
16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35-39 40-44 45-49 50-54 55-59 60-64 65-69 70+
Policyholder Age
0
10
20
30
40
503rd Degree Curve
4th Degree Curve Ages
Grouped
Males same as FemalesMale/Female
Relativity same for 21-24
- Age curve simplified with several curves and a grouping.
- Relationship between males and females simplified too. Male/Female
Relativity varies by Age
Model Structure
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
Definitions
Parameterization
Identification
Simplification
39
America
Testing Assumptions: Macro Residual Analysis
Normal Error Structure/Log Link (Studentized Standardized Deviance Residuals)
-8
-6
-4
-2
0
2
4
6
8
10
12
14
16
10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0 16.5Transformed Fitted Value
> 1
> 3
> 6
> 11
> 16
> 21
> 26
> 31
> 37
> 42
> 47
> 52
> 57
> 62
> 68
> 73
> 78
> 83
> 88
Gammar Error/Log Link (Studentized Standardized Dev iance Residuals)
-8
-6
-4
-2
0
2
4
6
8
10
12
14
16
10.5 11.0 11.5 12.0 12.5 13.0 13.5 14.0 14.5 15.0 15.5 16.0 16.5Transformed Fitted Value
> 1
> 2
> 4
> 6
> 8
> 9
> 11
> 13
> 15
> 16
> 18
> 20
> 22
> 24
> 25
> 27
> 31
- Asymmetrical appearance suggests power of variance function is too low
Plot of all residuals tests selected error structure/link function
- Elliptical pattern is ideal
Crunched Residuals (Group Size: 72)
-0.04
-0.03
-0.02
-0.01
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.00000 0.00005 0.00010 0.00015 0.00020 0.00025 0.00030 0.00035 0.00040 0.00045 0.00050 0.00055 0.00060 0.00065 0.00070
Fitted Value
- Two concentrations suggests two perils: split of use joint modeling
- Use crunched residuals for frequency
-3.5
-3.0
-2.5
-2.0
-1.5
-1.0
-0.5
0.0
0.5
1.0
1.5
2.0
2.5
3.0
3.5
-1,000 0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000
Fitted Value
Stu
den
tized
Sta
nd
ard
ized
Devia
nce R
esid
uals
> .2
> .5
> .7
> 1.4
> 2.2
> 2.9
> 3.6
> 4.3
> 5.0
> 5.7
> 6.5
> 7.2
> 7.9
> 8.6
> 9.3
> 10.1
> 10.8
> 11.5
> 12.2
> 12.9
> 13.6
> 14.4
> 15.1
> 15.8
> 16.5
> 17.2
> 17.9
> 18.7
> 19.4
> 20.1
> 20.8
> 21.5
> 22.3
> 23.0
> 23.7
> 24.4
> 25.1
> 25.8
> 26.6
> 27.3
Diagnostics
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
40
America
Testing Assumptions: Micro Residual Analysis
Largest 100 Studentized Standardized Deviance Residuals
-6
-4
-2
0
2
4
6
8
10
12
11.5 12 12.5 13 13.5 14 14.5 15 15.5 16 16.5
Transformed Fitted Value
Small Large
Good OK OK
Poor OK Problem
Examine largest residuals…
Influence
Fit
Largest 100 Cook's Deviance Residuals
0
0.005
0.01
0.015
0.02
0.025
12 12.5 13 13.5 14 14.5 15 15.5 16 16.5 17
Transformed Fitted Value
- Standardized deviance gives a measure of “fit” (performance)
- Cook’s deviance gives a measure of “influence”
“Problem” points may require further investigation
Diagnostics
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
41
America
Testing Predictiveness: Sampling
Training and Testing
Data
Training Data
Test Data
80%
20%
Model Structure and Parameters
Build
Test
OK
Not OK
Done
Diagnostics
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
42
America
Testing Predictiveness: Bootstrapping
Bootstrapping
Data
Training Data
Test Data
80% 20%
Model Structure
Build
TestModel Parameters
DoneOK
Not OK
Diagnostics
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
43
America
Testing Predictiveness: Gains Curve
Gains Curves
Order observations by fitted values (descending).
Plot cumulative fitted against cumulative weight.
High fitted values should correspond to high observed values.
Gini Coefficient : The larger coefficient implies greater predictiveness.
Fitted Number of Claims
Diagnostics
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics
44
America
Summary
Predictive Models corrects for methodological flaws associated with traditional approaches
- Excludes unsystematic effect
- Corrects distributional bias
- Identifies and models response correlation
GLM Building Blocks can be refined to find the signals for stochastic processes
Link Functions can be adjusted to create non linear solutions
Error Distributions are expanded to understand the relationship between the uncertainty and prediction
Model Structures are flexible to reflect underlying process
GLM diagnostics aid to better decision making abilities
• Background
– Link Function
– Error Distribution
– Model Structure
• Summary
• GLM Building Blocks
• Diagnostics