Upload
immi1989
View
13
Download
0
Embed Size (px)
DESCRIPTION
There are many factors that can cause severity of rail accidents. Based on my findings, season is an important factor that causes more death. I find that more fatalities occur during summer season. The rate of change of fatalities during summer season is estimated to be about 0.22 with 95% confident interval between 0.04 and 0.4. So it is important for the FRA to put an extra safety when the train is running under summer season. Type of accident and cause of accident significantly affect the cost damage at 5% level. A train accident at RR grade crossing is more likely to cause cost damage. Putting a greater safety at RR Grade Crossing can reduced the severity of cost damage. Also, the FRA should train well their people about safety in order to minimize human error.
Citation preview
Analysis of Train Accidents in the U.S. During
2001 – 2012
Imran A. Khan
Summary
There are many factors that can cause severity of rail accidents. Based on my findings, season is an
important factor that causes more death. I find that more fatalities occur during summer season. The rate of
change of fatalities during summer season is estimated to be about 0.22 with 95% confident interval
between 0.04 and 0.4. So it is important for the FRA to put an extra safety when the train is running under
summer season. Type of accident and cause of accident significantly affect the cost damage at 5% level. A
train accident at RR grade crossing is more likely to cause cost damage. Putting a greater safety at RR
Grade Crossing can reduced the severity of cost damage. Also, the FRA should train well their people about
safety in order to minimize human error.
Honor pledge: On my honor, I pledge that I am the sole author of this paper and I have accurately cited all
help and references used in its completion.
Imran A. Khan
1. Problem Description
1.1. Situation
According to the Federal Railroad Administration (FRA) data from 2001 – 2012[3], about 2,500 train
accidents occur annually in the U.S. These incidents cause injuries ranging from the moderately severe to
death and cost damage. Many of the incidents have small damage cost and do not lead to any death. On
average, a train accident happens in the last 12 years has zero number of injury as shown in Figure 1. I
observe that more than 13% of the accidents lead to high damage cost and about 1% lead to a lot of people
killed. These are the extreme values above the upper whisker of boxplots that are considered as severe
accidents.
Figure 1. Boxplot of severity metrics
For a given accident, many factors come into play. Table 1 shows that most of the accidents are because of
human error account 34% of train accidents, followed by rack, roadbed and structures failures account 32%.
Train derailment is the most common type of accident that leads to large cost damage (Table 2). However,
this type of accident has very minimum human damage. There are only 7% of train accidents at rail-highway
crossings but it lead to many fatalities (85%).
02
46
8
Boxplot of Fatalities
Fa
talitie
s
05
10
15
Boxplot of Total Damage
To
tal D
am
ag
e (
x$
1,0
00
,00
0)
Table 1. Frequency table for severity metrics by type of cause
Type of cause
# %
Number Fatalities
Total
Damage
($)
Number Fatalities
Total
Damage
($)
Mechanical and
Electrical Failures (E) 3617 0 561,518,909 12% 0% 17%
Train operation -
Human Factors (H) 10655 35 756,877,988 34% 7% 23%
Miscellaneous
Causes (M) 6327 452 625,508,542 20% 90% 19%
Signal and
Communication (S) 589 0 24,838,419 2% 0% 1%
Rack, Roadbed and
Structures (T) 9785 16 1,377,074,145 32% 3% 41%
Table 2. Frequency table for severity metrics by type of accident
Type of accident
# %
Number Fatalities
Total
Damage
($)
Number Fatalities
Total
Damage
($)
Derailment (1) 20694 22 2,552,192,579 67% 4% 76%
Head on collision (2) 99 8 83,868,583 0% 2% 3%
Rear-end collision (3) 218 3 66,591,167 1% 1% 2%
Side collision (4) 1221 4 121,312,370 4% 1% 4%
Raking collision (5) 500 0 26,922,342 2% 0% 1%
Broken collision (6) 62 0 4,455,089 0% 0% 0%
Highway-rail cross (7) 2309 428 160,160,623 7% 85% 5%
RR Grade Crossing (8) 2 0 7,324,022 0% 0% 0%
Obstruction (9) 728 20 59,304,105 2% 4% 2%
Explosive (10) 12 0 18,268,351 0% 0% 1%
Fire (11) 231 0 37,921,122 1% 0% 1%
Other impacts (12) 3428 6 134,453,596 11% 1% 4%
Others (13) 1469 11 73,044,054 5% 2% 2%
Figure 2. Boxplot of severity metrics vs. season
Figure 3. Boxplot of severity metrics vs. type of accident
Spring Summer Autumn Winter
24
68
Fatalities vs. Season
Season
Fa
talitie
s
Spring Summer Autumn Winter
05
10
15
Total Damage vs. Season
SeasonT
ota
l D
am
ag
e (
x$
1,0
00
,00
0)
1 2 3 4 5 6 7 8 9 10 11 12 13
24
68
Fatalities vs. Type of Accident
Type
Fa
talitie
s
1 2 3 4 5 6 7 8 9 10 11 12 13
05
10
15
Total Damage vs. Type of Accident
Type
To
tal D
am
ag
e (
x$
1,0
00
,00
0)
Figure 4. Boxplot of severity metrics vs. cause of accident
In this study, the predominant focus is on the severity of train accidents, i.e. any incident that lead to more
fatalities and expensive cost damage, and how they can be minimized. It appears that summer season lead
to more fatalities as compared to the other seasons but the total damage is almost similar across the four
seasons (Figure 2). Different type of accident will lead to different severity of rail accident. For severe
accident, it looks like explosive is the major type of accident that causing cost damage (Figure 3). Cause of
accident is another factor that affects the cost damage (Figure 4).
Figure 5 and 6 show that there is a relationship between speed and the severity metrics. High speed of train
tends to cause more fatalities and cost damage. The plots also tell me that the more people evacuated, the
more severity of rail accidents can be reduced. Gross tonnage of a train (TONS) and number of head end
locomotive (HEADEND1) are other important factors that related to severity metrics.
E H M S T
24
68
Fatalities vs. Cause of Accident
Cause
Fa
talitie
s
E H M S T
05
10
15
Total Damage vs. Cause of Accident
CauseT
ota
l D
am
ag
e (
x$
1,0
00
,00
0)
Figure 5. Scatterplot matrix between fatalities (TOTKLD) and the other quantitative variables
Figure 6. Scatterplot matrix between total damage (ACCDMG) and the other quantitative variables
TOTKLD
0 40 80 0 20000 50000
24
68
040
80
TRNSPD
EVACUATE
02000
5000
020000
50000
TONS
2 4 6 8 0 2000 5000 0 2 4 6 8
02
46
8
HEADEND1
ACCDMG
0 40 80 0 30000 70000
0.0
e+00
1.5
e+07
040
80
TRNSPD
EVACUATE
02000
5000
030000
70000
TONS
0.0e+00 1.5e+07 0 2000 5000 0 2 4 6 8
02
46
8
HEADEND1
The biplot displayed in Figure 7 tell me that many factors related to fatalities and cost damage as many
vectors pointing in the same direction as TOTKLD as well as ACCDMG. This confirms my findings as the
scatterplot matrix displayed in Figure 5 and 6 and those vectors are potential factors in causing severity
metrics.
Figure 7. Biplot for human and cost damage
1.2. Goal
The purpose of this study is to provide recommendations, so that FRA can take their action to reduce the
severity of railroad accidents in terms of fatalities and total damage.
1.3. Metrics
I utilize multiple linear regression models to measure the severity metrics. I consider several potential
factors such type of accident, cause of accident, and season to predict the severity of rail accidents. I use
significance level of 5% for the analysis throughout this study. If the confidence level (p-value) is less than
0.05, then my (null) hypothesis is rejected in favor of the alternative. Alternatively, if p-value is greater
than 0.05, the null should not be rejected. I also use adjusted R2, AIC and BIC criteria to compare between
models.
-0.2 -0.1 0.0 0.1 0.2
-0.2
-0.1
0.0
0.1
0.2
Fatalities
Comp.1
Co
mp
.2
-20 -10 0 10 20
-20
-10
01
02
0TOTKLD
CARS
CARSDMGCARSHZD
TEMPTRNSPD
EVACUATE
TONS
HEADEND1
Latitude
Longitud
-0.04 -0.02 0.00 0.02 0.04
-0.0
4-0
.02
0.0
00
.02
0.0
4
Total Damage
Comp.1
Co
mp
.2
-60 -40 -20 0 20 40 60
-60
-40
-20
02
04
06
0
ACCDMG
CARSCARSDMGCARSHZD
TEMP
TRNSPDEVACUATETONS
HEADEND1
Latitude
Longitud
1.4. Hypothesis
Based on my observation in Figure 2, 3, and 4, I have three hypotheses regarding the severity of rail
accidents:
Hypothesis 1:
Ho: Season does not cause to more death
H1: Season causes to more death.
Hypothesis 2:
Ho: Type of accident does not affect cost damage.
H1: Type of accident affects cost damage.
Hypothesis 3:
Ho: Cause of accident does not affect cost damage.
H1: Cause of accident affects cost damage.
2. Approach
2.1. Data
The data set is obtained from the FRA railroad accidents period 2001 – 2012[3]. In total, there are 42033
accidents over the 12 years with 140 relevant variables. I find 20% of data points are duplicated. I also find
one data point with extreme value in terms of evacuation in year 2002. The reported numbers is 50000
people evacuated in an accident which is very unlikely to happen. I spot this extreme value as typo because
it shows very large value as compared to the other cases (Figure B1, Appendix B). Thus, I do not include
them in the analysis. I also do not consider data points from September 11, 2001 due to the chances of that
happening again is almost zero. The cost damage is significantly higher than the other cases. This leads to
30973 data points used in this study.
Since my interest is to examine severe accidents, only extreme cases above the upper whisker boxplot of
severity metrics is taken into account, that are accidents with at least one fatality and cost damage with at
least $143,861. I use all potential predictor variables, including confounding variables, in the initial models.
In total, there are 21 predictors: 10 continuous variables and 11 categorical variables, including SEASON
created as a new variable (Table A1, Appendix A). I remove any missing cases from data since the
methodology required complete observations. In total, I use 391 cases to model fatality (TOTKLD), and
2954 to model total damage (ACCDMG).
2.2. Analysis
In the modeling of severity metrics, I perform multiple linear regression analysis using R software with a
general model.
𝑌 = 𝛽0 + 𝛽1𝑋1 + 𝛽2𝑋2 + … . +𝛽𝑝𝑋𝑝 + 𝜀
The stages of data analysis are as follows:
1. I convert all categorical predictor variables into dummy variables. For example, TYPE has 13 levels
and R automatically encodes these 13 levels into 12 dummy variables with derailment as the base
case. See Table A1 in the Appendix for details base case selected for each categorical variable.
2. I utilize simple linear regression analysis for each of the 21 potential predictor variables.
Continuous predictor with p-value > 0.25 is not considered in the initial model (Full Model).
3. I reduce the full models by dropping all the non-significant predictors (Reduced Model). I use
Partial F test to examine if smaller set of predictors can be retained.
4. I also perform an alternative model selection, i.e. stepwise selection procedure, to select important
predictors in the full model (Step Model).
5. I then compare the reduced model and the step model by adjusted R2 and AIC criteria to select the
best model. I cannot use cross validation for model comparison due to the regression model on a
fold in which certain levels of the factor variable are not present.
6. I introduce second order model and interaction term for the selected model.
7. I carry out graphical diagnostic plots to examine how well the regression assumptions are satisfied.
8. I transform the response variable if the regression assumptions are violated.
For fatalities model, there are 10 predictor variables in the full model (Table B1, Appendix). This model
can be reduced by dropping 7 variables, i.e. TRNSP, TONS, HEADEND1, TYPE, TYPTRK, TRKCLAS,
and CAUSE with F-statistic 0.913 and p-value 0.5743. The reported BIC show that the step model is a
better model with smaller BIC value (881.76). But the AIC and the adjusted R2 values agree that the reduced
model is a preferable model (Table B2, Appendix). Furthermore, a second order model including interaction
term is considered in the reduced model. I find that second order model does not fit better than the first
model. I use partial F-test to check for this and I get F-test 1.39 with p-value 0.24. This means the interaction
terms and the second order of EVACUATE are not important in the model and the first order model is
preferable. A further investigation with diagnostic plots shows that the fitted model is moderately violated
the regression assumptions. The residual points are generally scattered randomly throughout the range of
fitted values. The points also generally fall around the line in QQ plot (Figure B2, Appendix). Transforming
the response variable with Box-Cox method does not do any better (Figure B3, Appendix), thus the fitted
model without interaction and second order term is chosen for ease of interpretation. Table 3 summarizes
the estimated coefficient (standard error) and the corresponding p-value for the first and second order
model.
Table 3. Comparison the first and second order model for fatalities
First order model Second order model
Estimate (Std. Error) P-value Estimate (Std. Error) P-value
(Intercept) 1.07 (0.08) <0.0001 1.07 (0.08) <0.0001
EVACUATE 0.001 (0) <0.0001 0 (0.002) 0.87
Type of consist
(base case = TYPEQ 1)
TYPEQ 2 0.46 (0.09) <0.0001 0.46 (0.09) <0.0001
TYPEQ 3 0.1 (0.14) 0.46 0.09 (0.14) 0.51
TYPEQ 4 -0.3 (0.7) 0.67 -0.3 (0.7) 0.67
TYPEQ 6 -0.07 (0.7) 0.92 -0.07 (0.7) 0.92
TYPEQ 7 0.19 (0.35) 0.58 0.18 (0.35) 0.60
TYPEQ 8 0.02 (0.2) 0.93 0.01 (0.2) 0.95
TYPEQ 9 -0.3 (0.5) 0.55 -0.3 (0.5) 0.55
TYPEQ A -0.3 (0.7) 0.67 -0.3 (0.7) 0.67
TYPEQ C 0.08 (0.35) 0.82 0.1 (0.35) 0.78
TYPEQ D 0.43 (0.41) 0.29 0.44 (0.41) 0.28
TYPEQ E -0.07 (0.7) 0.92 -0.07 (0.7) 0.92
Season(base case:
Spring)
Summer 0.23 (0.1) 0.02 0.22 (0.1) 0.03
Autumn -0.07 (0.11) 0.51 -0.06 (0.11) 0.61
Winter 0.03 (0.1) 0.74 0.03 (0.1) 0.80
EVACUATE2 4.7E-7 (1.8E-6) 0.80
EVACUATE x Summer 0.003 (0.004) 0.43
EVACUATE x Autumn 0.0001 (0.003) 0.97
EVACUATE x Winter -0.001 (0.009) 0.95
For total damage model, there are 16 predictor variables considered in the initial model (Table B1,
Appendix). There are 7 variables that are not significant in the full model. Thus, only 9 predictors are kept
in the reduced model, i.e. CARSHZD, EVACUATE, TRNSPD, TONS, TYPE, TRNDIR, REGION, TYPTRK, and
CAUSE. A partial F-test shows that the reduced model explains total damage better with F-statistic 1.74 and
p-value 0.08. The reported BIC show that the reduced model is a better model since the BIC value is smaller
(87957.48). However, the model based on stepwise selection procedure is selected as the best model since
the AIC is smaller and the adjusted R2 is larger than the reduced model (Table B3, Appendix). As shown
in Table 4, the selected stepwise model that includes second order and interaction terms (Model 2) is found
to be better with p-value < 0.0001. Model 2 is then reduced by performing stepwise selection procedure. I
find that some interaction and second order terms can be dropped from the model so Model 3 can be retained
(p-value = 0.99).
Table 4. Partial F-test
Res. Df RSS Df
Sum of
Square F-test P-value
Model 1: Step model (first order
model) 2906 1.32E+15
Model 2: Step model including
second order and interaction terms
(second order model)
2883 1.24E+15 23 7.80E+13 7.89 <0.0001
Model 3: Model 2 after
performing stepwise selection
procedure
2886 1.24E+15 3 4.35E+10 0.03 0.99
The diagnostic plot shows that the selected model (Model 3) is moderately violated the regression
assumptions (Figure B4, Appendix). Similar as fatalities model, transforming the response variable with
Box-Cox method does not fit any better (Figure B5, Appendix). Therefore, the fitted model with second
order terms without any transformation to the response variable is chosen for ease of interpretation. Table
5 summarizes the estimated coefficient, standard error, and p-value.
Table 5. The selected second order model for total damage
Estimate (Std. Error) P-value
Intercept 746000 (393000) 0.058
CARSDMG 8660 (6180) 0.161
CARSHZD 110000 (31500) <0.0001
EVACUATE 283 (102) 0.006
TRNSPD -24100 (4930) <0.0001
TONS 26.9 (4.12) <0.0001
Type of accident (base case:
derailment)
Head on collision 1470000 (134000) <0.0001
Rearend collision 465000 (106000) <0.0001
Side collision 251000 (73000) 0.001
Raking collision 111000 (140000) 0.429
Broken train collision -199000 (221000) 0.369
Hwy-rail crossing -302000 (82000) <0.0001
RR Grade Crossing 6760000 (658000) <0.0001
Obstruction 242000 (123000) 0.049
Explosive – detonation 266000 (658000) 0.686
Fire / violent rupture -56300 (113000) 0.620
Other impacts 104000 (73000) 0.155
Others -47600 (111000) 0.667
Train direction (base case:
north)
South -70600 (65200) 0.279
East -116000 (61000) 0.058
West -158000 (63900) 0.013
FRA designated region (base
case: Region 1)
Region 2 -71000 (107000) 0.509
Region 3 -260000 (107000) 0.015
Region 4 -133000 (103000) 0.196
Region 5 -207000 (96700) 0.033
Region 6 -196000 (98600) 0.046
Region 7 81800 (107000) 0.443
Region 8 -108000 (105000) 0.306
Type of consist (base case:
TYPEQ “”-NA)
TYPEQ 1 -225000 (383000) 0.556
TYPEQ 2 -18500 (392000) 0.962
TYPEQ 3 562000 (417000) 0.177
TYPEQ 4 -411000 (409000) 0.315
TYPEQ 5 -426000 (440000) 0.333
TYPEQ 6 83500 (398000) 0.834
TYPEQ 7 -85300 (384000) 0.824
TYPEQ 8 -54500 (401000) 0.892
TYPEQ 9 -117000 (426000) 0.783
TYPEQ A -39800 (418000) 0.924
TYPEQ B 560000 (610000) 0.359
TYPEQ D -502000 (611000) 0.411
Type of track (base case:
Main)
Yard -83900 (64900) 0.196
Siding 377000 (132000) <0.0001
Industry -27800 (110000) 0.801
Cause of accident (Base
case: E)
H -340000 (76400) <0.0001
M -12400 (76100) 0.871
S -214000 (195000) 0.271
T -253000 (68400) <0.0001
CARSHZD2 -2820 (1230) 0.022
TRNSPD2 232 (42.3) <0.0001
TONS2 0 (0) 0.032
Longitud2 12.3 (3.92) <0.0001
TRNSPD x South 96.2 (2410) 0.968
TRNSPD East 7690 (2290) <0.0001
TRNSPD x West 7900 (2400) <0.0001
TRNSPD x Region 2 2290 (3780) 0.545
TRNSPD x Region 3 19500 (3770) <0.0001
TRNSPD x Region 4 12600 (3550) <0.0001
TRNSPD x Region 5 15700 (3370) 0.000
TRNSPD x Region 6 17300 (3400) <0.0001
TRNSPD x Region 7 9200 (3690) 0.013
TRNSPD x Region 8 12100 (3690) <0.0001
TRNSPD x Yard -7370 (5980) 0.218
TRNSPD x Siding -25500 (7220) <0.0001
TRNSPD x Industry -12200 (10200) 0.231
TRNSPD x H 16400 (2740) <0.0001
TRNSPD x M 4740 (2490) 0.056
TRNSPD x S 4680 (10200) 0.646
TRNSPD x T 14800 (2190) <0.0001
3. Evidence
I find that season is an important factor that leads to more fatalities. The partial F-test shows that season
cannot be eliminated from the model (F-statistic: 3.49, p-value: 0.016). The p-value for summer season is
0.03, meaning that I have a strong evidence to reject my (null) hypothesis. The resulting coefficient
indicates that the number of fatalities is higher during summer season. The rate of change of fatalities during
summer season is estimated to be about 0.22 with 95% confident interval between 0.04 and 0.4. Based on
the final model for fatalities, I observe that TYPEQ is another important factor causing more death.
For total damage, cause and type of accident are important factors to the severity of total damage. Different
cause and different type of accident will lead to different cost damage and they are statistically significant.
With 95% confidence, these effects cannot be dropped from the model with F-statistics 5.82 and p-value <
0.0001. Therefore, I can reject my hypothesis that cause and type of accident do not affect total damage. It
should be noted that the train speed and cause of accident has an interaction effect on cost damage (Figure
B6, Appendix). This means that the relationship between total damage and cause of accident depend on the
train speed. I observe that at high train speed, human error comes into play to cause more cost damage.
Furthermore, given the other factors are fixed, the expected total damage for severe accident is higher at
RR Grade Crossing, i.e. $7,506,000 and the evidence is highly significant at 5% level.
4. Recommendation
It is evidence that several number of factors can lead to severe train accidents. This includes season, type
of accident, and cause of accident. The best models to answer my hypotheses have pretty high validation
to predict the severity of rail accidents, i.e. about 25% based on the adjusted R2 (Table B2-B3, Appendix).
With 95% confidence, the effect of season to fatalities is statistically significant. The rate of change of
fatalities during summer season is estimated to be about 0.22 with 95% confident interval between 0.04 and
0.4. The effect of type of accident and cause of accident are also significant to total damage. At 5% level,
these factors cannot be eliminated from the model, so I can be sure that they are important to severity of
train accidents. This confirms my findings based on the plots shown in Figure 2, 3, and 4. The results tell
me that the FRA should put an extra safety requirement when the train is running during summer season.
Human errors are often unavoidable. This is what I obtain from modeling the cost damage. I find that human
error is one of the most important factors that causing more cost damage. The FRA should train well their
people about safety, so that human error failures can be minimized. In addition, it is important to put greater
safety for train at RR Grade Crossing.
5. References
[1] D. E. Brown and L. Barnes, “Laboratory 1: Train accidents," August 2013, assignment in class SYS
4021.
[2] D. E. Brown and L. Barnes, “Laboratory 1: Train accidents template," August 2013, assignment in
class SYS 4021.
[3] F. R. Administration, “Federal railroad administration office of safety analysis," August 2012.
[Online]. Available: http://safetydata.fra.dot.gov/officesafety/
Appendix A
Table A. Accident Description
No Field Name Description Type
1 TOTKLD Fatalities - total killed for railroads Response variable
2 ACCDMG Total reportable damage on all reports in $ Response variable
3 CARS # of cars carrying hazmat Continuous variable
4 CARSDMG # of hazmat cars damaged or derailed Continuous variable
5 CARSHZD # of cars that released hazmat Continuous variable
6 EVACUATE # of persons evacuated Continuous variable
7 TEMP Temperature in degrees Fahrenheit Continuous variable
8 TRNSPD Speed of train in miles per hour Continuous variable
9 TONS Gross tonnage, excluding power units Continuous variable
10 HEADEND1 # of head end locomotives Continuous variable
11 Latitude Latitude in decimal degrees, explicit decimal, explicit +/- (WGS84) Continuous variable
12 Longitud Longitude in decimal degrees, explicit decimal, explicit +/- (WGS84) Continuous variable
13 TYPE type of accident:
01= derailment (base case),02= head on collision,03= rearend collision,04=
side collision,05= raking collision,06= broken train collision,07= hwy-rail
crossing,08= RR Grade Crossing, 09= obstruction,10= explosiv – detonation,
11= fire / violent rupture,12= other impacts,13= other (described in narrative)
Categorical variable
14 VISIBILTY daylight period:
1=dawn (base case),2=day,3=dusk,4=dark
Categorical variable
15 WEATHER weather conditions:
1=clear (base case), 2=cloudy,3=rain,4=fog,5=sleet,6=snow
Categorical variable
16 TRNDIR train direction:
1=north (base case),2=south,3=east,4=west
Categorical variable
17 REGION FRA designated region (1 = base case) Categorical variable
18 TYPEQ type of consist:
1=freight train (base case),2=passenger train,3=commuter train,4=work
train,5=single car,6= cut of cars,7= yard / switching,8= light loco(s),9= maint
/ inspect,car,A= spec. MoW q
Categorical variable
19 TYPTRK type of track:
1=main (base case), 2=yard, 3=siding, 4=industry
Categorical variable
20 TRKCLAS FRA track class: 1-9,X (1 = base case) Categorical variable
21 RCL Remote control locomotive = 0,1,2, or 3
0= not a remotely controlled operation (base case),1= remote control portable
transmitter,2= remote control tower operation, 3= remote control portable
transmitter (more than one remote control)
Categorical variable
22 CAUSE Primary cause of incident:
E=Mechanical and Electrical Failures (base case), H=Human Factors,
M=Miscellaneous Causes, S=Signal and Communication, T=Rack, Roadbed
and Structures
Categorical variable
23 SEASON Primary cause of incident:
1=spring (Mar – May) ( (base case), 2=summer (Jun – Aug), 3=autumn (Sep –
Nov), 4=winter (Dec – Feb)
Categorical variable
Appendix B
Table B 1. P-value of the overall F-statistic in simple regression model
Fatalities Total
Damage
CARS 0.96 0.89
CARSDMG 0.40 0.00
CARSHZD 0.86 0.00
EVACUATE 0.00 0.00
TEMP 0.27 0.74
TRNSPD 0.01 0.00
TONS 0.13 0.00
HEADEND1 0.10 0.81
Latitude 0.82 0.00
Longitud 0.71 0.00
factor(TYPE) 0.00 0.00
factor(VISIBLTY) 0.99 0.24
factor(WEATHER) 0.34 0.69
factor(TRNDIR) 0.83 0.00
factor(REGION) 0.28 0.00
factor(TYPEQ) 0.05 0.00
factor(TYPTRK) 0.00 0.00
factor(TRKCLAS) 0.17 0.00
factor(RCL) NA 0.00
factor(CAUSE) 0.01 0.05
factor(SEASON) 0.05 0.46
*NA: cannot be estimated since only one level available under RCL variable for fatalities model
Table B 2. Model comparison for fatalities
Full Model Reduced Model Stepwise Model
Response variable TOTKLD
Predictor variables
EVACUATE, TRNSPD,
TONS, HEADEND1,
TYPE, TYPEQ, TYPTRK,
TRKCLAS, CAUSE,
SEASON
EVACUATE, TYPEQ,
SEASON
EVACUATE, TRNSPD,
CAUSE, SEASON
R2 33.22% 29.6% 26.66%
adjusted R2 26.43% 26.79% 25.32%
AIC 867.419 846.043 846.044
BIC 1018.23 913.51 881.76
Overall significance F-statistic: 4.892 on 36 and
354 DF, p-value: 8.785e-16
F-statistic: 10.51 on 15
and 375 DF, p-value: <
2.2e-16
F-statistic: 19.89 on 7 and
383 DF, p-value: < 2.2e-
16
Partial F-test: Full vs.
Reduced Model F: 0.913, p-value: 0.5743
Table B 3. Model comparison for total damage
Full Model Reduced Model Stepwise Model
Response variable ACCDMG
Predictor variables
CARSDMG, CARSHZD,
EVACUATE, TRNSPD,
TONS, Latitude, Longitud,
TYPE, VISIBLTY, TRNDIR,
REGION, TYPEQ,
TYPTRK, TRKCLAS, RCL,
CAUSE
CARSHZD,
EVACUATE, TRNSPD,
TONS, TYPE, TRNDIR,
REGION, TYPTRK,
CAUSE
CARSDMG, CARSHZD,
EVACUATE, TRNSPD,
TONS, Longitud, TYPE,
TRNDIR, REGION,
TYPEQ, TYPTRK,
CAUSE
R2 27.2% 25.14% 26.67%
adjusted R2 25.61% 24.29% 25.49%
AIC 87725.23 87747.80 87714.55
BIC 88114.63 87957.48 88008.11
Overall significance F-statistic: 17.14 on 63 and
2890 DF, p-value: < 2.2e-16
F-statistic: 29.71 on 33
and 2920 DF, p-value: <
2.2e-16
F-statistic: 22.49 on 47
and 2906 DF, p-value: <
2.2e-16
Partial F-test: Full vs.
Reduced Model F-test:1.74, p-value:0.08
Figure B 1. Boxplot for fatalities and number of people evacuated for each year to identify potential
outliers
Figure B 2. Diagnostic plot for the selected fatalities model before transformation
2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012
010000
20000
30000
40000
50000
Number of people evacuated in each year
Year
Nu
mb
er
of P
eo
ple
Eva
cu
ate
d
2 4 6 8
-22
46
Fitted values
Resid
uals
Residuals vs Fitted19242776238066
-3 -2 -1 0 1 2 3
-22
6
Theoretical Quantiles
Sta
ndard
ized r
esid
uals
Normal Q-Q1924
2776238066
2 4 6 8
0.0
1.5
Fitted values
Sta
ndard
ized r
esid
uals
Scale-Location19242776238066
0.0 0.2 0.4 0.6 0.8
-22
6
Leverage
Sta
ndard
ized r
esid
uals
Cook's distance10.50.51
Residuals vs Leverage
1896941250
1924
Figure B 3. Diagnostic plot for fatalities model after transformation with Box-Cox method (Lambda=-2)
Figure B 4. Diagnostic plot for the selected total damage model before transformation
0 20 40 60 80
-20
020
40
Fitted values
Resid
uals
Residuals vs Fitted
1924
2776238066
-3 -2 -1 0 1 2 3
05
10
Theoretical Quantiles
Sta
ndard
ized r
esid
uals
Normal Q-Q
1924
2776238066
0 20 40 60 80
0.0
1.0
2.0
3.0
Fitted values
Sta
ndard
ized r
esid
uals
Scale-Location1924
2776238066
0.0 0.2 0.4 0.6 0.8-5
05
10
Leverage
Sta
ndard
ized r
esid
uals
Cook's distance
10.50.51
Residuals vs Leverage
18969
1924
4535
0e+00 2e+06 4e+06 6e+06
0e+
00
1e+
07
Fitted values
Resid
uals
Residuals vs Fitted
18324
4107620237
-3 -2 -1 0 1 2 3
05
10
20
Theoretical Quantiles
Sta
ndard
ized r
esid
uals
Normal Q-Q
18324
4107620237
0e+00 2e+06 4e+06 6e+06
01
23
4
Fitted values
Sta
ndard
ized r
esid
uals Scale-Location
18324
4107620237
0.0 0.2 0.4 0.6 0.8
-50
515
Leverage
Sta
ndard
ized r
esid
uals
Cook's distance 10.5
0.51
Residuals vs Leverage
18324
4107620237
Figure B 5. Diagnostic plot for the selected total damage model after transformation with Box-Cox
method (lambda=-0.5)
Figure B 6. Interaction plot train speed and cause with damage cost of accident
500 1000 1500 2000 2500
-1000
1000
3000
Fitted values
Resid
uals
Residuals vs Fitted
18324
3806641076
-3 -2 -1 0 1 2 3
05
10
Theoretical Quantiles
Sta
ndard
ized r
esid
uals
Normal Q-Q
18324
3806641076
500 1000 1500 2000 2500
0.0
1.0
2.0
3.0
Fitted values
Sta
ndard
ized r
esid
uals Scale-Location
18324
3806641076
0.0 0.2 0.4 0.6 0.8
05
10
Leverage
Sta
ndard
ized r
esid
uals
Cook's distance 10.5
0.51
Residuals vs Leverage
18324
36985
38066
0.0
e+
00
4.0
e+
06
8.0
e+
06
1.2
e+
07
TRNSPD
AC
CD
MG
0 4 8 13 19 25 31 37 43 49 55 61 67 75 90
CAUSE
M
E
H
S
T