Fall 2015. Statistical Models For Crash Data Modeling Process Determine Modeling Objectives Definition (Intersections, Pedestrians, etc.) Data availability

Predictive Methods and Development of

Statistical Models– Part IIFall 2015

Statistical Models For Crash DataModeling Process

Determine Modeling Objectives• Definition (Intersections, Pedestrians,

etc.)• Data availability• Unit Scales (Crashes/year; Severity; etc.)

Establish Appropriate Process• Sampling Models• Observational Models• Process/System State Models• Parameter Models (Bayesian Models

Only)

Statistical Models For Crash DataModeling Process

Determine Inferential Goals• Point estimate (Value + Standard Error)• Distribution (Bayesian Models)• Percentiles (2.5%, 85%, etc.; Bayesian

Models)

Select Computation Techniques• Frequentist (MLE)• Bayesian (via simulation)• Empirical Bayes

Evaluate Models• Goodness-of-Fit• Prediction• Confidence Intervals

Data/Methodological Issue Associated ProblemsOverdispersion Can violate some the basic count-data modeling assumptions of some

modeling approaches

Underdispersion As with overdispersion, can violate some the basic count-data modeling assumptions of some modeling approaches

Time-varying explanatory variables Averaging of variables over studied time intervals ignores potentially important variations within time intervals – which can result in erroneous parameter estimates

Temporal and spatial correlation Correlation over time and space causes losses in estimation efficiencyLow sample mean and small sample size

Causes an excess number of observations where zero crashes are observed which can cause errors in parameter estimates

Injury severity and crash type correlation

Correlation between severities and crash types causes losses in estimation efficiency when separate severity-count models are estimated

Under reporting Under reporting can distort model predictions and lead to erroneous inferences with regard to the influence of explanatory variables

Omitted variables bias If significant variables are omitted from the model, parameter estimates will be biased and possibly erroneous inferences with regard to the influence of explanatory variables will result

Endogenous variables If endogenous variables are included without appropriate statistical corrections parameter estimates will be biased and erroneous inferences with regard to the influence of explanatory variables may be drawn

Functional form If incorrect functional for is used, the result will be biased parameter estimates and possibly erroneous inferences with regard to the influence of explanatory variables

Fixed parameters If parameters are estimated as fixed when they actually vary across observations, the result will be biased parameter estimates and possibly erroneous inferences with regard to the influence of explanatory variables

Statistical Models For Crash DataData and Methodological Issues Associated with Crash-

Frequency Data

Statistical Models For Crash DataSummary of Existing Models for Analyzing Crash-Frequency

DataModel Type Advantages DisadvantagesPoisson Most basic model; easy to estimate Cannot handle over- and under-

dispersion; negatively influenced by the low sample mean and small sample size bias

Negative binomial/Poisson-gamma

Easy to estimate can account for overdispersion

Cannot handle under-dispersion; can be adversely influenced by the low sample mean and small sample size bias

Poisson-lognormal More flexible than the Poisson-gamma to handle over-dispersion

Cannot handle under-dispersion; can be adversely influenced by the low sample mean and small sample size bias (less than the Poisson-gamma); cannot estimate a varying dispersion parameter

Zero-inflated Poisson and negative binomial

Handles datasets that have a large number of zero-crash observations

Can create theoretical inconsistencies; zero-inflated negative binomial can be adversely influenced by the low sample mean and small sample size bias

Conway-Maxwell-Poisson

Can handle under- and over-dispersion or combination of both using a variable dispersion (scaling) parameter

Could be negatively influenced by the low sample mean and small sample size bias; no multivariate extensions available to date

Gamma Can handle under-dispersed data Truncated distribution (full gamma function); independence of data (incomplete gamma function)

Generalized estimating equation models

Can handle temporal correlation May need to determine or evaluate the type of temporal correlation a priori; results sensitive to missing values

Generalized additive models

More flexible than the traditional generalized estimating equation models; allows non-linear variable interactions

Relatively complex to implement; may not be easily transferable to other datasets

Statistical Models For Crash DataSummary of Existing Models for Analyzing Crash-Frequency

DataModel Type Advantages Disadvantages

Random-effects models Handles temporal and spatial correlation May not be easily transferable to other datasets

Negative multinomial Can account for overdispersion and serial correlation; panel count data.

Cannot handle under-dispersion; can be adversely influenced by the low sample mean and small sample size bias

Random-parameters models

More flexible than the traditional fixed parameter models in accounting for unobserved heterogeneity

Complex estimation process; may not be easily transferable to other datasets

Bivariate/multivariate models

Can model different crash types simultaneously; more flexible functional form than the generalized estimating equation models (can use non-linear functions)

Complex estimation process; requires formulation of correlation matrix

Finite mixture/Markov Switching

Can be used for analyzing sources of dispersion in the data

Complex estimation process; may not be easily transferable to other datasets

Duration models By considering the time between crashes (as opposed to crash frequency directly), allows for a very in-depth analysis of data and duration effects

Requires more detailed data than traditional crash frequency models; time-varying explanatory variables are difficult to handle

Hierarchical/Multilevel Models

Can handle temporal, spatial and other correlations among groups of observations

May not be easily transferable to other datasets; correlation results can be difficult to interpret

Neural Network, Bayesian Neural Network, and support vector machine

Non parametric approach does not require an assumption about distribution of data; flexible functional form; usually provides better statistical fit than traditional parametric models

Complex estimation process; may not be transferable to other datasets; work as black-boxes; may not have interpretable parameters

Review of Multivariate Linear ModelsOrdinary Least Square

Method:

0 1 1 2 2

01

i i i k ik i

k

i j ij ij

y x x x

y x

This is an estimation technique that is used for estimating unknown coefficients. It consists of solving p = k + 1 simultaneously linear equations and by minimizing the sum of square errors.

Let

Note: E(ε) = 0 and var(ε) = σ2

Review of Multivariate Linear Models

2

1

2

01 1

n

i

n k

i j iji j

S

S y b b x

The least square function S is given by

The S function is to be minimized with respect to β1, β2, …, βk. The least square estimators, say b0, b1, …, bk, must satisfy

0 1,

0 1,

, , 01 10

, , 01 1

| 2 0

| 2 ( ) 0

k

k

n k

b b b i j iji i

n k

b b b i j ij iji jj

Sy b x

Sy b b x x

j = 1, 2, …, k

Review of Multivariate Linear ModelsIt is easier to solve the equations by using a

matrix format. The equations can be written the following way:

1

2

n

y

y

y

y

y Xβwhere

11 12 1

21 22 2

1 2

1

1

1

k

k

n n nk

x x x

x x x

x x x

X

1

2

n

β

1

2

n

ε

Review of Multivariate Linear ModelsNeed to find the least square estimator b that

minimizes2

1

( ) ( ) ( )n

ii

S

y Xβ y Xβ

It can be shown that S(β) can be expressed this way

The least square estimator* must satisfy

( ) 2S y y β X y β XXβ

| 2 2 0bS

X y XXb

which simplifies to

XXb X y 1( ) X yb XX* b is called the ordinary least squares estimator of β.

Review of Multivariate Linear ModelsMaximum Likelihood

Method:The likelihood function is found from the joint probability distribution of the observations. Given the assumption that the distribution of errors is normally distributed and the variance σ2 is constant, the likelihood function is the following (normal distribution)

2

1( ) ( )

2 22 / 2

1( , , )

(2 )ne

y Xβ y Xβy β

Same model as before: Y Xβ

Review of Multivariate Linear ModelsThe maximum likelihood estimators are the values

of the parameters β and σ2 that maximize the likelihood function. Maximizing the likelihood is equivalent to maximizing the log-likelihood, . The log-likelihood is:

2 22

1ln[ ( , , )] ln(2 ) ln( ) ( ) ( )

2 2 2

n n

y β y Xβ y Xβ

ln( )

The derivative of the log-likelihood function is called the score function. Taking the derivatives with respect to the coefficients β and equating to zero yields

2

2

ln( ) 1( 2 )

2

1( )

2

0

0

b X y b XXββ

X y Xb

X yb = XX

Review of Multivariate Linear ModelsTaking the partial derivative with respect to

gives

2 2 4

ln( ) 1 1( )

2 2) ( 0

y Xb y Xb

2 1( )) (n

y Xb y Xb

Which is

2

Generalized Linear Models

In the previous overheads, it was obvious how the normal distribution played an important role in estimating the coefficients and inferences of probabilistic models. Unfortunately, there are many practical situations where the normal assumption is not valid. Count data, binary response (0 or 1) or other continuous variables with positive and high-skewed distribution cannot be modeled with a normally distributed errors.

The generalized linear model (GLM) was developed to allow fitting regression models for univariate response data that follows a very general distribution called exponential family. This family includes the normal, binomial, negative binomial, geometric, gamma, etc.

Statistical Models For Crash DataPoisson-gamma Model (NB)

( | )

!

ii

i ii

ef y

y

The crash count (or any count) follows a Poisson distribution:

The mean of yi, conditional on μi, is Poisson with the conditional mean and variance given by

1

0( | )

!

i i

iii i i i

i

ef y e d

y

Statistical Models For Crash DataPoisson-gamma Model (NB)

( )( )

( 1) ( )

iy

i ii

i i i

y uf y

y u

The PDF of the Poisson-gamma regression for yi is

The mean and variance are given by

( )i iE y u2

( ) ii iVar y u

The mean function is given by

( ) exp( )i i iE y u x β

2( )i i iVar y u or

Statistical Models For Crash DataPoisson-gamma Model

Example – Crash Data at 3-legged signalized intersections:

0 1 2maj majF Fe

Expected number of crashes

Where,

majF Major traffic flow

1 20 minmajF F

Functional form:Functional form needed to model crash data:

minF Minor traffic flow

Need to take the natural log of the flow variables

Statistical Models For Crash DataPoisson-gamma Model

The GENMOD Procedure

Model Information

Data Set WORK.C Distribution Negative Binomial Link Function Log Dependent Variable Total Total

Number of Observations Read 255 Number of Observations Used 255

Criteria For Assessing Goodness Of Fit

Criterion DF Value Value/DF Deviance 252 288.8580 1.1463 Scaled Deviance 252 288.8580 1.1463 Pearson Chi-Square 252 312.6975 1.2409 Scaled Pearson X2 252 312.6975 1.2409 Log Likelihood 836.0686 Full Log Likelihood -606.7989 AIC (smaller is better) 1221.5978 AICC (smaller is better) 1221.7578 BIC (smaller is better) 1235.7628

Algorithm converged.

Analysis Of Maximum Likelihood Parameter Estimates Standard Wald 95% Confidence Wald Parameter DF Estimate Error Limits Chi-Square Pr > ChiSq Intercept 1 -10.0648 1.3659 -12.7420 -7.3876 54.29 <.0001 logf_maj 1 0.7517 0.1320 0.4929 1.0105 32.41 <.0001 logf_min 1 0.4837 0.0562 0.3735 0.5939 74.01 <.0001 Dispersion 1 0.3153 0.0519 0.2135 0.4170

NOTE: The negative binomial dispersion parameter was estimated by maximum likelihood.

10.113 0.740 0.505min

0.740 0.505min4.05 05

maj

maj

e F F

E F F

2( ) 0.313Var y

Statistical Models For Crash DataStatistical fit (Goodness of fit)

There are various methods for estimating the statistical fit of models. The methods cane be divided into two categories:

Likelihood Statistics

• Log-Likelihood

• Deviance

• Pearson Chi-Square

• Akaike’s Information Criterion (AIC)

• Bayesian Information Criterion (BIC)

Model Errors

• Mean Absolute Deviance

• Mean Squared Prediction Errors

Log-likelihood

Statistical Models For Crash Data

Poisson:

NB:

1

11

1

ln , ln ln 1 ln ln 1 ln1

ni

i i i ii i

L y y y

1

ln ln ln !n

i i i ii

L y y

expi i x β

Where:

Log-likelihood


Poisson:

NB:

-685.34-606.80

Example – Crash Data at 3-legged signalized intersections:


The deviance statistic is defined as twice the difference between the maximum log-likelihood achievable (y=μ) and the log-likelihood of the fitted model:

ˆ ˆ( | ) 2{ ( ) ( )}D y μ y μ

When competitive models are compared, the model with the lowest deviance offers the best statistical fit. A note of caution: this is only valid when the dispersion parameter Φ is the same for each competitive model.


The deviance statistic for the Poisson model is the following:

1

ˆ2 ln ( )ˆ

ni

P i i ii i

yD y y

The deviance statistic for the Poisson-gamma model is the following:

11

11

2 ln ( ) lnˆ ˆ

ni i

NB i ii i i

y yD y y


The deviance statistic for the Poisson model is the following:

644.4PD

The deviance statistic for the Poisson-gamma model is the following:

288.9NBD

Statistical fit (Goodness of fit)


AIC:

BIC:

2ln 2AIC L P

P = estimated coefficients + 1

n = number of observations

2ln ln( )BIC L P n

AIC and BIC penalize the fit when additional variables are added to the model.

AIC and BIC


AIC:

BIC:

2 685.3 2 3 1,376.7PAIC

2 685.3 3 ln(255) 1,387.2PBIC

AIC and BIC penalize the fit when additional variables are added to the model.

2 606.8 2 4 1,221.6NBAIC

2 606.8 4 ln(255) 1,235.8NBBIC

Statistical fit (Model Errors)


1

1ˆ

n

i ii

MAD yn

21

1ˆ

n

i ii

MPSE yn

Mean Absolute Deviation (MAD)

This criterion has been proposed by Oh et al. (2003) to evaluate the fit of models. The Mean Absolute Deviance (MAD) calculates the absolute difference between the estimated and observed values

Mean Squared Prediction Error (MSPE)

The Man Squared Prediction Error (MSPE) is a traditional indicator of error and calculates the difference between the estimated and observed values squared.

Recent Models for Over-dispersion:◦ Poisson-lognormal Poisson mean follows a lognormal distribution

◦ Poisson-Weibull Poisson mean follows a Weibull distribution

◦ Random-Parameters (investigation of the variance)◦ Negative Binomial-Lindley (highly dispersed data) Overcome problems with zero-inflated models.

◦ Generalized Sichel (highly dispersed data)◦ Generalized Waring (highly dispersed data – investigation of

variance)◦ Finite mixture (Poisson and Poisson-gamma – investigation

of variance and structure of data)◦ Bayesian Model Averaging (automatically compare

different models)◦ See AA&P and Safety Science for info on some of these

models.


Recent Models for Under-dispersion:◦ Not very common; usually with low sample mean and

often based on model output (conditional on the mean).◦ All the models below can be also used for over-dispersion◦ Gamma time-dependent Observations not independent.

◦ Conway-Maxwell-Poisson Has become increasingly popular

◦ Double-Poisson Work published

◦ Hyper-Poisson Work published


Crash data have often the characteristics that the mean μ can be very low (below 1.0)

Create problems with goodness-of-fit and prediction

Read papers by ◦ Wood, G.R. (2004) Generalised Linear Models and

Goodness of Fit Testing. Accident Analysis & Prevention, Vol. 34, pp. 417-427.

◦ Lord, D. (2006) Modeling Motor Vehicle Crashes using Poisson-gamma Models: Examining the Effects of Low Sample Mean Values and Small Sample Size on the Estimation of the Fixed Dispersion Parameter. Accident Analysis & Prevention, Vol. 38, No. 4, pp. 751-766.


Statistical Models For Crash DataLow Mean Issue

Statistical Models For Crash DataTime Trend Effects

0

0.5

1

1.5

2

2.5

0 1 2 3 4 5 6 7

Year

Me

an

(c

ras

he

s p

er

ye

ar)

Statistical Models For Crash DataTime Trend Effects

Goal: capture changes that vary from year to year directly into the model.The model structure is given by the following:

0, 1

p

it it ji jjy x

Time Trend captured with the intercept (i.e., one intercept for each year)

Characteristic: each year is defined as a different observation. Issues: Since each site is observed at a different point in time, a temporal serial correlation exits and affects the statistical inferences of statistical models. Therefore, you need to account for this correlation into the model. Modeling approach: Generalized Estimating Equations (GEE); Random-Effects models, etc.

Bayes Methods The Bayes method approaches the

analysis of data differently than the classical method (frequentist)

Subjective judgment more easily incorporated with the observed data and models

Treat unknown coefficients of regression models as random variables

Data analysis less limited by the number of observations (can be supplemented with subjective judgment)

Computationally intensive (no longer an issue)

Bayes Methods The Bayes method makes inferences from

data using probability models for quantities that are observed and for quantities one is interested to learn about

Bayesian data analysis can be divided into three steps:◦ Setting up a full probability model: provide a joint

probability distribution for all observable and unobservable quantities

◦ Conditioning on observed data: calculating and interpreting the appropriate posterior distribution (conditional probability distribution)

◦ Evaluating the fit of the model and implication of the posterior distribution

Emphasis placed on interval estimation (confidence interval) rather than hypothesis testing

For the EB method, a different weight is assigned to the prior distribution and standard estimate respectively

In safety analyses, the weights are estimated with the assumption that the mean () for each site follows a Gamma distribution

The EB estimates has been found to outperform other estimates, such as the MLE

The EB framework is presented on next overhead

Empirical Bayes Model

Formulation:


ˆ̂ ˆ (1 )y

1ˆ

1

1

where

Dispersion parameter of NB regression

Mean of a Poisson-gamma regression

Using the same example shown earlier:


ˆ̂ 0.39 3.9 (1 0.39)10 7.63 1

0.393.90

12.46

0.816 0.3732ˆ 5.5 5 24,164 2,560

ˆ 3.9

u E

F1 = 24,164; F2 = 3,392; y=10

The values are estimated as follows

Crashes per year

Crashes per year

Empirical Bayes ModelC

rash

es

per

Year

Year1 2 t

MLE estimate 3.9

EB estimate 7.63

Observed value 10

Documents

Fall 2015. Statistical Models For Crash Data Modeling Process Determine Modeling Objectives Definition (Intersections, Pedestrians, etc.) Data availability