Train Accidents Report

Analysis of Train Accidents in the U.S. During

2001 – 2012

Imran A. Khan

Summary

There are many factors that can cause severity of rail accidents. Based on my findings, season is an

important factor that causes more death. I find that more fatalities occur during summer season. The rate of

change of fatalities during summer season is estimated to be about 0.22 with 95% confident interval

between 0.04 and 0.4. So it is important for the FRA to put an extra safety when the train is running under

summer season. Type of accident and cause of accident significantly affect the cost damage at 5% level. A

train accident at RR grade crossing is more likely to cause cost damage. Putting a greater safety at RR

Grade Crossing can reduced the severity of cost damage. Also, the FRA should train well their people about

safety in order to minimize human error.

Honor pledge: On my honor, I pledge that I am the sole author of this paper and I have accurately cited all

help and references used in its completion.

Imran A. Khan

1. Problem Description

1.1. Situation

According to the Federal Railroad Administration (FRA) data from 2001 – 2012[3], about 2,500 train

accidents occur annually in the U.S. These incidents cause injuries ranging from the moderately severe to

death and cost damage. Many of the incidents have small damage cost and do not lead to any death. On

average, a train accident happens in the last 12 years has zero number of injury as shown in Figure 1. I

observe that more than 13% of the accidents lead to high damage cost and about 1% lead to a lot of people

killed. These are the extreme values above the upper whisker of boxplots that are considered as severe

accidents.

Figure 1. Boxplot of severity metrics

For a given accident, many factors come into play. Table 1 shows that most of the accidents are because of

human error account 34% of train accidents, followed by rack, roadbed and structures failures account 32%.

Train derailment is the most common type of accident that leads to large cost damage (Table 2). However,

this type of accident has very minimum human damage. There are only 7% of train accidents at rail-highway

crossings but it lead to many fatalities (85%).

02

46

8

Boxplot of Fatalities

Fa

talitie

s

05

10

15

Boxplot of Total Damage

To

tal D

am

ag

e (

x$

1,0

00

,00

0)

Table 1. Frequency table for severity metrics by type of cause

Type of cause

# %

Number Fatalities

Total

Damage

($)

Number Fatalities

Total

Damage

($)

Mechanical and

Electrical Failures (E) 3617 0 561,518,909 12% 0% 17%

Train operation -

Human Factors (H) 10655 35 756,877,988 34% 7% 23%

Miscellaneous

Causes (M) 6327 452 625,508,542 20% 90% 19%

Signal and

Communication (S) 589 0 24,838,419 2% 0% 1%

Rack, Roadbed and

Structures (T) 9785 16 1,377,074,145 32% 3% 41%

Table 2. Frequency table for severity metrics by type of accident

Type of accident

# %

Number Fatalities

Total

Damage

($)

Number Fatalities

Total

Damage

($)

Derailment (1) 20694 22 2,552,192,579 67% 4% 76%

Head on collision (2) 99 8 83,868,583 0% 2% 3%

Rear-end collision (3) 218 3 66,591,167 1% 1% 2%

Side collision (4) 1221 4 121,312,370 4% 1% 4%

Raking collision (5) 500 0 26,922,342 2% 0% 1%

Broken collision (6) 62 0 4,455,089 0% 0% 0%

Highway-rail cross (7) 2309 428 160,160,623 7% 85% 5%

RR Grade Crossing (8) 2 0 7,324,022 0% 0% 0%

Obstruction (9) 728 20 59,304,105 2% 4% 2%

Explosive (10) 12 0 18,268,351 0% 0% 1%

Fire (11) 231 0 37,921,122 1% 0% 1%

Other impacts (12) 3428 6 134,453,596 11% 1% 4%

Others (13) 1469 11 73,044,054 5% 2% 2%

Figure 2. Boxplot of severity metrics vs. season

Figure 3. Boxplot of severity metrics vs. type of accident

Spring Summer Autumn Winter

24

68

Fatalities vs. Season

Season

Fa

talitie

s

Spring Summer Autumn Winter

05

10

15

Total Damage vs. Season

SeasonT

ota

l D

am

ag

e (

x$

1,0

00

,00

0)

1 2 3 4 5 6 7 8 9 10 11 12 13

24

68

Fatalities vs. Type of Accident

Type

Fa

talitie

s

1 2 3 4 5 6 7 8 9 10 11 12 13

05

10

15

Total Damage vs. Type of Accident

Type

To

tal D

am

ag

e (

x$

1,0

00

,00

0)

Figure 4. Boxplot of severity metrics vs. cause of accident

In this study, the predominant focus is on the severity of train accidents, i.e. any incident that lead to more

fatalities and expensive cost damage, and how they can be minimized. It appears that summer season lead

to more fatalities as compared to the other seasons but the total damage is almost similar across the four

seasons (Figure 2). Different type of accident will lead to different severity of rail accident. For severe

accident, it looks like explosive is the major type of accident that causing cost damage (Figure 3). Cause of

accident is another factor that affects the cost damage (Figure 4).

Figure 5 and 6 show that there is a relationship between speed and the severity metrics. High speed of train

tends to cause more fatalities and cost damage. The plots also tell me that the more people evacuated, the

more severity of rail accidents can be reduced. Gross tonnage of a train (TONS) and number of head end

locomotive (HEADEND1) are other important factors that related to severity metrics.

E H M S T

24

68

Fatalities vs. Cause of Accident

Cause

Fa

talitie

s

E H M S T

05

10

15

Total Damage vs. Cause of Accident

CauseT

ota

l D

am

ag

e (

x$

1,0

00

,00

0)

Figure 5. Scatterplot matrix between fatalities (TOTKLD) and the other quantitative variables

Figure 6. Scatterplot matrix between total damage (ACCDMG) and the other quantitative variables

TOTKLD

0 40 80 0 20000 50000

24

68

040

80

TRNSPD

EVACUATE

02000

5000

020000

50000

TONS

2 4 6 8 0 2000 5000 0 2 4 6 8

02

46

8

HEADEND1

ACCDMG

0 40 80 0 30000 70000

0.0

e+00

1.5

e+07

040

80

TRNSPD

EVACUATE

02000

5000

030000

70000

TONS

0.0e+00 1.5e+07 0 2000 5000 0 2 4 6 8

02

46

8

HEADEND1

The biplot displayed in Figure 7 tell me that many factors related to fatalities and cost damage as many

vectors pointing in the same direction as TOTKLD as well as ACCDMG. This confirms my findings as the

scatterplot matrix displayed in Figure 5 and 6 and those vectors are potential factors in causing severity

metrics.

Figure 7. Biplot for human and cost damage

1.2. Goal

The purpose of this study is to provide recommendations, so that FRA can take their action to reduce the

severity of railroad accidents in terms of fatalities and total damage.

1.3. Metrics

I utilize multiple linear regression models to measure the severity metrics. I consider several potential

factors such type of accident, cause of accident, and season to predict the severity of rail accidents. I use

significance level of 5% for the analysis throughout this study. If the confidence level (p-value) is less than

0.05, then my (null) hypothesis is rejected in favor of the alternative. Alternatively, if p-value is greater

than 0.05, the null should not be rejected. I also use adjusted R2, AIC and BIC criteria to compare between

models.

-0.2 -0.1 0.0 0.1 0.2

-0.2

-0.1

0.0

0.1

0.2

Fatalities

Comp.1

Co

mp

.2

-20 -10 0 10 20

-20

-10

01

02

0TOTKLD

CARS

CARSDMGCARSHZD

TEMPTRNSPD

EVACUATE

TONS

HEADEND1

Latitude

Longitud

-0.04 -0.02 0.00 0.02 0.04

-0.0

4-0

.02

0.0

00

.02

0.0

4

Total Damage

Comp.1

Co

mp

.2

-60 -40 -20 0 20 40 60

-60

-40

-20

02

04

06

0

ACCDMG

CARSCARSDMGCARSHZD

TEMP

TRNSPDEVACUATETONS

HEADEND1

Latitude

Longitud

1.4. Hypothesis

Based on my observation in Figure 2, 3, and 4, I have three hypotheses regarding the severity of rail

accidents:

Hypothesis 1:

Ho: Season does not cause to more death

H1: Season causes to more death.

Hypothesis 2:

Ho: Type of accident does not affect cost damage.

H1: Type of accident affects cost damage.

Hypothesis 3:

Ho: Cause of accident does not affect cost damage.

H1: Cause of accident affects cost damage.

2. Approach

2.1. Data

The data set is obtained from the FRA railroad accidents period 2001 – 2012[3]. In total, there are 42033

accidents over the 12 years with 140 relevant variables. I find 20% of data points are duplicated. I also find

one data point with extreme value in terms of evacuation in year 2002. The reported numbers is 50000

people evacuated in an accident which is very unlikely to happen. I spot this extreme value as typo because

it shows very large value as compared to the other cases (Figure B1, Appendix B). Thus, I do not include

them in the analysis. I also do not consider data points from September 11, 2001 due to the chances of that

happening again is almost zero. The cost damage is significantly higher than the other cases. This leads to

30973 data points used in this study.

Since my interest is to examine severe accidents, only extreme cases above the upper whisker boxplot of

severity metrics is taken into account, that are accidents with at least one fatality and cost damage with at

least $143,861. I use all potential predictor variables, including confounding variables, in the initial models.

In total, there are 21 predictors: 10 continuous variables and 11 categorical variables, including SEASON

created as a new variable (Table A1, Appendix A). I remove any missing cases from data since the

methodology required complete observations. In total, I use 391 cases to model fatality (TOTKLD), and

2954 to model total damage (ACCDMG).

2.2. Analysis

In the modeling of severity metrics, I perform multiple linear regression analysis using R software with a

general model.

𝑌 = 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + … . +𝛽𝑝𝑋𝑝 + 𝜀

The stages of data analysis are as follows:

1. I convert all categorical predictor variables into dummy variables. For example, TYPE has 13 levels

and R automatically encodes these 13 levels into 12 dummy variables with derailment as the base

case. See Table A1 in the Appendix for details base case selected for each categorical variable.

2. I utilize simple linear regression analysis for each of the 21 potential predictor variables.

Continuous predictor with p-value > 0.25 is not considered in the initial model (Full Model).

3. I reduce the full models by dropping all the non-significant predictors (Reduced Model). I use

Partial F test to examine if smaller set of predictors can be retained.

4. I also perform an alternative model selection, i.e. stepwise selection procedure, to select important

predictors in the full model (Step Model).

5. I then compare the reduced model and the step model by adjusted R2 and AIC criteria to select the

best model. I cannot use cross validation for model comparison due to the regression model on a

fold in which certain levels of the factor variable are not present.

6. I introduce second order model and interaction term for the selected model.

7. I carry out graphical diagnostic plots to examine how well the regression assumptions are satisfied.

8. I transform the response variable if the regression assumptions are violated.

For fatalities model, there are 10 predictor variables in the full model (Table B1, Appendix). This model

can be reduced by dropping 7 variables, i.e. TRNSP, TONS, HEADEND1, TYPE, TYPTRK, TRKCLAS,

and CAUSE with F-statistic 0.913 and p-value 0.5743. The reported BIC show that the step model is a

better model with smaller BIC value (881.76). But the AIC and the adjusted R2 values agree that the reduced

model is a preferable model (Table B2, Appendix). Furthermore, a second order model including interaction

term is considered in the reduced model. I find that second order model does not fit better than the first

model. I use partial F-test to check for this and I get F-test 1.39 with p-value 0.24. This means the interaction

terms and the second order of EVACUATE are not important in the model and the first order model is

preferable. A further investigation with diagnostic plots shows that the fitted model is moderately violated

the regression assumptions. The residual points are generally scattered randomly throughout the range of

fitted values. The points also generally fall around the line in QQ plot (Figure B2, Appendix). Transforming

the response variable with Box-Cox method does not do any better (Figure B3, Appendix), thus the fitted

model without interaction and second order term is chosen for ease of interpretation. Table 3 summarizes

the estimated coefficient (standard error) and the corresponding p-value for the first and second order

model.

Table 3. Comparison the first and second order model for fatalities

First order model Second order model

Estimate (Std. Error) P-value Estimate (Std. Error) P-value

(Intercept) 1.07 (0.08) <0.0001 1.07 (0.08) <0.0001

EVACUATE 0.001 (0) <0.0001 0 (0.002) 0.87

Type of consist

(base case = TYPEQ 1)

TYPEQ 2 0.46 (0.09) <0.0001 0.46 (0.09) <0.0001

TYPEQ 3 0.1 (0.14) 0.46 0.09 (0.14) 0.51

TYPEQ 4 -0.3 (0.7) 0.67 -0.3 (0.7) 0.67

TYPEQ 6 -0.07 (0.7) 0.92 -0.07 (0.7) 0.92

TYPEQ 7 0.19 (0.35) 0.58 0.18 (0.35) 0.60

TYPEQ 8 0.02 (0.2) 0.93 0.01 (0.2) 0.95

TYPEQ 9 -0.3 (0.5) 0.55 -0.3 (0.5) 0.55

TYPEQ A -0.3 (0.7) 0.67 -0.3 (0.7) 0.67

TYPEQ C 0.08 (0.35) 0.82 0.1 (0.35) 0.78

TYPEQ D 0.43 (0.41) 0.29 0.44 (0.41) 0.28

TYPEQ E -0.07 (0.7) 0.92 -0.07 (0.7) 0.92

Season(base case:

Spring)

Summer 0.23 (0.1) 0.02 0.22 (0.1) 0.03

Autumn -0.07 (0.11) 0.51 -0.06 (0.11) 0.61

Winter 0.03 (0.1) 0.74 0.03 (0.1) 0.80

EVACUATE2 4.7E-7 (1.8E-6) 0.80

EVACUATE x Summer 0.003 (0.004) 0.43

EVACUATE x Autumn 0.0001 (0.003) 0.97

EVACUATE x Winter -0.001 (0.009) 0.95

For total damage model, there are 16 predictor variables considered in the initial model (Table B1,

Appendix). There are 7 variables that are not significant in the full model. Thus, only 9 predictors are kept

in the reduced model, i.e. CARSHZD, EVACUATE, TRNSPD, TONS, TYPE, TRNDIR, REGION, TYPTRK, and

CAUSE. A partial F-test shows that the reduced model explains total damage better with F-statistic 1.74 and

p-value 0.08. The reported BIC show that the reduced model is a better model since the BIC value is smaller

(87957.48). However, the model based on stepwise selection procedure is selected as the best model since

the AIC is smaller and the adjusted R2 is larger than the reduced model (Table B3, Appendix). As shown

in Table 4, the selected stepwise model that includes second order and interaction terms (Model 2) is found

to be better with p-value < 0.0001. Model 2 is then reduced by performing stepwise selection procedure. I

find that some interaction and second order terms can be dropped from the model so Model 3 can be retained

(p-value = 0.99).

Table 4. Partial F-test

Res. Df RSS Df

Sum of

Square F-test P-value

Model 1: Step model (first order

model) 2906 1.32E+15

Model 2: Step model including

second order and interaction terms

(second order model)

2883 1.24E+15 23 7.80E+13 7.89 <0.0001

Model 3: Model 2 after

performing stepwise selection

procedure

2886 1.24E+15 3 4.35E+10 0.03 0.99

The diagnostic plot shows that the selected model (Model 3) is moderately violated the regression

assumptions (Figure B4, Appendix). Similar as fatalities model, transforming the response variable with

Box-Cox method does not fit any better (Figure B5, Appendix). Therefore, the fitted model with second

order terms without any transformation to the response variable is chosen for ease of interpretation. Table

5 summarizes the estimated coefficient, standard error, and p-value.

Table 5. The selected second order model for total damage

Estimate (Std. Error) P-value

Intercept 746000 (393000) 0.058

CARSDMG 8660 (6180) 0.161

CARSHZD 110000 (31500) <0.0001

EVACUATE 283 (102) 0.006

TRNSPD -24100 (4930) <0.0001

TONS 26.9 (4.12) <0.0001

Type of accident (base case:

derailment)

Head on collision 1470000 (134000) <0.0001

Rearend collision 465000 (106000) <0.0001

Side collision 251000 (73000) 0.001

Raking collision 111000 (140000) 0.429

Broken train collision -199000 (221000) 0.369

Hwy-rail crossing -302000 (82000) <0.0001

RR Grade Crossing 6760000 (658000) <0.0001

Obstruction 242000 (123000) 0.049

Explosive – detonation 266000 (658000) 0.686

Fire / violent rupture -56300 (113000) 0.620

Other impacts 104000 (73000) 0.155

Others -47600 (111000) 0.667

Train direction (base case:

north)

South -70600 (65200) 0.279

East -116000 (61000) 0.058

West -158000 (63900) 0.013

FRA designated region (base

case: Region 1)

Region 2 -71000 (107000) 0.509

Region 3 -260000 (107000) 0.015

Region 4 -133000 (103000) 0.196

Region 5 -207000 (96700) 0.033

Region 6 -196000 (98600) 0.046

Region 7 81800 (107000) 0.443

Region 8 -108000 (105000) 0.306

Type of consist (base case:

TYPEQ “”-NA)

TYPEQ 1 -225000 (383000) 0.556

TYPEQ 2 -18500 (392000) 0.962

TYPEQ 3 562000 (417000) 0.177

TYPEQ 4 -411000 (409000) 0.315

TYPEQ 5 -426000 (440000) 0.333

TYPEQ 6 83500 (398000) 0.834

TYPEQ 7 -85300 (384000) 0.824

TYPEQ 8 -54500 (401000) 0.892

TYPEQ 9 -117000 (426000) 0.783

TYPEQ A -39800 (418000) 0.924

TYPEQ B 560000 (610000) 0.359

TYPEQ D -502000 (611000) 0.411

Type of track (base case:

Main)

Yard -83900 (64900) 0.196

Siding 377000 (132000) <0.0001

Industry -27800 (110000) 0.801

Cause of accident (Base

case: E)

H -340000 (76400) <0.0001

M -12400 (76100) 0.871

S -214000 (195000) 0.271

T -253000 (68400) <0.0001

CARSHZD2 -2820 (1230) 0.022

TRNSPD2 232 (42.3) <0.0001

TONS2 0 (0) 0.032

Longitud2 12.3 (3.92) <0.0001

TRNSPD x South 96.2 (2410) 0.968

TRNSPD East 7690 (2290) <0.0001

TRNSPD x West 7900 (2400) <0.0001

TRNSPD x Region 2 2290 (3780) 0.545

TRNSPD x Region 3 19500 (3770) <0.0001

TRNSPD x Region 4 12600 (3550) <0.0001

TRNSPD x Region 5 15700 (3370) 0.000

TRNSPD x Region 6 17300 (3400) <0.0001

TRNSPD x Region 7 9200 (3690) 0.013

TRNSPD x Region 8 12100 (3690) <0.0001

TRNSPD x Yard -7370 (5980) 0.218

TRNSPD x Siding -25500 (7220) <0.0001

TRNSPD x Industry -12200 (10200) 0.231

TRNSPD x H 16400 (2740) <0.0001

TRNSPD x M 4740 (2490) 0.056

TRNSPD x S 4680 (10200) 0.646

TRNSPD x T 14800 (2190) <0.0001

3. Evidence

I find that season is an important factor that leads to more fatalities. The partial F-test shows that season

cannot be eliminated from the model (F-statistic: 3.49, p-value: 0.016). The p-value for summer season is

0.03, meaning that I have a strong evidence to reject my (null) hypothesis. The resulting coefficient

indicates that the number of fatalities is higher during summer season. The rate of change of fatalities during

summer season is estimated to be about 0.22 with 95% confident interval between 0.04 and 0.4. Based on

the final model for fatalities, I observe that TYPEQ is another important factor causing more death.

For total damage, cause and type of accident are important factors to the severity of total damage. Different

cause and different type of accident will lead to different cost damage and they are statistically significant.

With 95% confidence, these effects cannot be dropped from the model with F-statistics 5.82 and p-value <

0.0001. Therefore, I can reject my hypothesis that cause and type of accident do not affect total damage. It

should be noted that the train speed and cause of accident has an interaction effect on cost damage (Figure

B6, Appendix). This means that the relationship between total damage and cause of accident depend on the

train speed. I observe that at high train speed, human error comes into play to cause more cost damage.

Furthermore, given the other factors are fixed, the expected total damage for severe accident is higher at

RR Grade Crossing, i.e. $7,506,000 and the evidence is highly significant at 5% level.

4. Recommendation

It is evidence that several number of factors can lead to severe train accidents. This includes season, type

of accident, and cause of accident. The best models to answer my hypotheses have pretty high validation

to predict the severity of rail accidents, i.e. about 25% based on the adjusted R2 (Table B2-B3, Appendix).

With 95% confidence, the effect of season to fatalities is statistically significant. The rate of change of

fatalities during summer season is estimated to be about 0.22 with 95% confident interval between 0.04 and

0.4. The effect of type of accident and cause of accident are also significant to total damage. At 5% level,

these factors cannot be eliminated from the model, so I can be sure that they are important to severity of

train accidents. This confirms my findings based on the plots shown in Figure 2, 3, and 4. The results tell

me that the FRA should put an extra safety requirement when the train is running during summer season.

Human errors are often unavoidable. This is what I obtain from modeling the cost damage. I find that human

error is one of the most important factors that causing more cost damage. The FRA should train well their

people about safety, so that human error failures can be minimized. In addition, it is important to put greater

safety for train at RR Grade Crossing.

5. References

[1] D. E. Brown and L. Barnes, “Laboratory 1: Train accidents," August 2013, assignment in class SYS

4021.

[2] D. E. Brown and L. Barnes, “Laboratory 1: Train accidents template," August 2013, assignment in

class SYS 4021.

[3] F. R. Administration, “Federal railroad administration office of safety analysis," August 2012.

[Online]. Available: http://safetydata.fra.dot.gov/officesafety/

http://safetydata.fra.dot.gov/officesafety/

Appendix A

Table A. Accident Description

No Field Name Description Type

1 TOTKLD Fatalities - total killed for railroads Response variable

2 ACCDMG Total reportable damage on all reports in $ Response variable

3 CARS # of cars carrying hazmat Continuous variable

4 CARSDMG # of hazmat cars damaged or derailed Continuous variable

5 CARSHZD # of cars that released hazmat Continuous variable

6 EVACUATE # of persons evacuated Continuous variable

7 TEMP Temperature in degrees Fahrenheit Continuous variable

8 TRNSPD Speed of train in miles per hour Continuous variable

9 TONS Gross tonnage, excluding power units Continuous variable

10 HEADEND1 # of head end locomotives Continuous variable

11 Latitude Latitude in decimal degrees, explicit decimal, explicit +/- (WGS84) Continuous variable

12 Longitud Longitude in decimal degrees, explicit decimal, explicit +/- (WGS84) Continuous variable

13 TYPE type of accident:

01= derailment (base case),02= head on collision,03= rearend collision,04=

side collision,05= raking collision,06= broken train collision,07= hwy-rail

crossing,08= RR Grade Crossing, 09= obstruction,10= explosiv – detonation,

11= fire / violent rupture,12= other impacts,13= other (described in narrative)

Categorical variable

14 VISIBILTY daylight period:

1=dawn (base case),2=day,3=dusk,4=dark


15 WEATHER weather conditions:

1=clear (base case), 2=cloudy,3=rain,4=fog,5=sleet,6=snow


16 TRNDIR train direction:

1=north (base case),2=south,3=east,4=west


17 REGION FRA designated region (1 = base case) Categorical variable

18 TYPEQ type of consist:

1=freight train (base case),2=passenger train,3=commuter train,4=work

train,5=single car,6= cut of cars,7= yard / switching,8= light loco(s),9= maint

/ inspect,car,A= spec. MoW q


19 TYPTRK type of track:

1=main (base case), 2=yard, 3=siding, 4=industry


20 TRKCLAS FRA track class: 1-9,X (1 = base case) Categorical variable

21 RCL Remote control locomotive = 0,1,2, or 3

0= not a remotely controlled operation (base case),1= remote control portable

transmitter,2= remote control tower operation, 3= remote control portable

transmitter (more than one remote control)


22 CAUSE Primary cause of incident:

E=Mechanical and Electrical Failures (base case), H=Human Factors,

M=Miscellaneous Causes, S=Signal and Communication, T=Rack, Roadbed

and Structures


23 SEASON Primary cause of incident:

1=spring (Mar – May) ( (base case), 2=summer (Jun – Aug), 3=autumn (Sep –

Nov), 4=winter (Dec – Feb)


Appendix B

Table B 1. P-value of the overall F-statistic in simple regression model

Fatalities Total

Damage

CARS 0.96 0.89

CARSDMG 0.40 0.00

CARSHZD 0.86 0.00

EVACUATE 0.00 0.00

TEMP 0.27 0.74

TRNSPD 0.01 0.00

TONS 0.13 0.00

HEADEND1 0.10 0.81

Latitude 0.82 0.00

Longitud 0.71 0.00

factor(TYPE) 0.00 0.00

factor(VISIBLTY) 0.99 0.24

factor(WEATHER) 0.34 0.69

factor(TRNDIR) 0.83 0.00

factor(REGION) 0.28 0.00

factor(TYPEQ) 0.05 0.00

factor(TYPTRK) 0.00 0.00

factor(TRKCLAS) 0.17 0.00

factor(RCL) NA 0.00

factor(CAUSE) 0.01 0.05

factor(SEASON) 0.05 0.46

*NA: cannot be estimated since only one level available under RCL variable for fatalities model

Table B 2. Model comparison for fatalities

Full Model Reduced Model Stepwise Model

Response variable TOTKLD

Predictor variables

EVACUATE, TRNSPD,

TONS, HEADEND1,

TYPE, TYPEQ, TYPTRK,

TRKCLAS, CAUSE,

SEASON

EVACUATE, TYPEQ,

SEASON

EVACUATE, TRNSPD,

CAUSE, SEASON

R2 33.22% 29.6% 26.66%

adjusted R2 26.43% 26.79% 25.32%

AIC 867.419 846.043 846.044

BIC 1018.23 913.51 881.76

Overall significance F-statistic: 4.892 on 36 and

354 DF, p-value: 8.785e-16

F-statistic: 10.51 on 15

and 375 DF, p-value: <

2.2e-16

F-statistic: 19.89 on 7 and

383 DF, p-value: < 2.2e-

16

Partial F-test: Full vs.

Reduced Model F: 0.913, p-value: 0.5743

Table B 3. Model comparison for total damage

Full Model Reduced Model Stepwise Model

Response variable ACCDMG

Predictor variables

CARSDMG, CARSHZD,

EVACUATE, TRNSPD,

TONS, Latitude, Longitud,

TYPE, VISIBLTY, TRNDIR,

REGION, TYPEQ,

TYPTRK, TRKCLAS, RCL,

CAUSE

CARSHZD,

EVACUATE, TRNSPD,

TONS, TYPE, TRNDIR,

REGION, TYPTRK,

CAUSE

CARSDMG, CARSHZD,

EVACUATE, TRNSPD,

TONS, Longitud, TYPE,

TRNDIR, REGION,

TYPEQ, TYPTRK,

CAUSE

R2 27.2% 25.14% 26.67%

adjusted R2 25.61% 24.29% 25.49%

AIC 87725.23 87747.80 87714.55

BIC 88114.63 87957.48 88008.11

Overall significance F-statistic: 17.14 on 63 and

2890 DF, p-value: < 2.2e-16



2.2e-16



2.2e-16

Partial F-test: Full vs.

Reduced Model F-test:1.74, p-value:0.08

Figure B 1. Boxplot for fatalities and number of people evacuated for each year to identify potential

outliers

Figure B 2. Diagnostic plot for the selected fatalities model before transformation

2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012

010000

20000

30000

40000

50000

Number of people evacuated in each year

Year

Nu

mb

er

of P

eo

ple

Eva

cu

ate

d

2 4 6 8

-22

46

Fitted values

Resid

uals

Residuals vs Fitted19242776238066

-3 -2 -1 0 1 2 3

-22

6

Theoretical Quantiles

Sta

ndard

ized r

esid

uals

Normal Q-Q1924

2776238066

2 4 6 8

0.0

1.5

Fitted values

Sta

ndard

ized r

esid

uals

Scale-Location19242776238066

0.0 0.2 0.4 0.6 0.8

-22

6

Leverage

Sta

ndard

ized r

esid

uals

Cook's distance10.50.51

Residuals vs Leverage

1896941250

1924

Figure B 3. Diagnostic plot for fatalities model after transformation with Box-Cox method (Lambda=-2)

Figure B 4. Diagnostic plot for the selected total damage model before transformation

0 20 40 60 80

-20

020

40

Fitted values

Resid

uals

Residuals vs Fitted

1924

2776238066

-3 -2 -1 0 1 2 3

05

10


Sta

ndard

ized r

esid

uals

Normal Q-Q

1924

2776238066

0 20 40 60 80

0.0

1.0

2.0

3.0

Fitted values

Sta

ndard

ized r

esid

uals

Scale-Location1924

2776238066

0.0 0.2 0.4 0.6 0.8-5

05

10

Leverage

Sta

ndard

ized r

esid

uals

Cook's distance

10.50.51


18969

1924

4535

0e+00 2e+06 4e+06 6e+06

0e+

00

1e+

07

Fitted values

Resid

uals

Residuals vs Fitted

18324

4107620237

-3 -2 -1 0 1 2 3

05

10

20


Sta

ndard

ized r

esid

uals

Normal Q-Q

18324

4107620237

0e+00 2e+06 4e+06 6e+06

01

23

4

Fitted values

Sta

ndard

ized r

esid

uals Scale-Location

18324

4107620237

0.0 0.2 0.4 0.6 0.8

-50

515

Leverage

Sta

ndard

ized r

esid

uals

Cook's distance 10.5

0.51


18324

4107620237

Figure B 5. Diagnostic plot for the selected total damage model after transformation with Box-Cox

method (lambda=-0.5)

Figure B 6. Interaction plot train speed and cause with damage cost of accident

500 1000 1500 2000 2500

-1000

1000

3000

Fitted values

Resid

uals

Residuals vs Fitted

18324

3806641076

-3 -2 -1 0 1 2 3

05

10


Sta

ndard

ized r

esid

uals

Normal Q-Q

18324

3806641076

500 1000 1500 2000 2500

0.0

1.0

2.0

3.0

Fitted values

Sta

ndard

ized r

esid

uals Scale-Location

18324

3806641076

0.0 0.2 0.4 0.6 0.8

05

10

Leverage

Sta

ndard

ized r

esid

uals

Cook's distance 10.5

0.51


18324

36985

38066

0.0

e+

00

4.0

e+

06

8.0

e+

06

1.2

e+

07

TRNSPD

AC

CD

MG

0 4 8 13 19 25 31 37 43 49 55 61 67 75 90

CAUSE

M

E

H

S

T

Documents

Train Accidents Report