Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Master Thesis in Statistics, Data Analysis and Knowledge Discovery
Isolating and quantifying factors affecting body and paint business for Volvo Cars
Muhammad Awais Khan
2
Abstract This thesis focuses on identifying the degree of contribution of the most important factors
affecting Body and Paint business of Volvo Car Corporation in Sweden. It is clear that
Body and Paint business for VCCS directly depends on the number of registered
accidents. Our major purpose is to determine the factors which have direct or indirect
effect on reduction in the number of accidents in Sweden and to analyze in which degree
they may affect the business. During the interviews with senior staff members, we
discover that particularly city safety cars are mentioned by most of the specialists. Other
important factors highlighted were mileage, weather, company car/ private car and age of
a car.
City Safety is a technology designed to help the driver mitigate, and in certain situations
avoid, collisions at low speed by automatically braking the vehicle. The estimated claim
rate frequency i.e. claims per contract rate was 50% lower for city safety equipped; then
other warranty cars models without system. The study also analysis the effect of rain,
mean temperature and snow on Volvo Body part sales in Stockholm Sweden.
Temperature snow impacted road accidents significantly. Snow was shown to be the
leading variable, as the number of accidents increases sharply with increased snowfall.
Temperature is the second important variable in the list i.e. as the temperature decreases
by 1ͦC the sales of body and paint business in Stockholm increases by 1.6%.
Time variable such as weekday, month, and year also plays significant role in this model.
During Fridays 51% high accidents are expected then accidents occurred on Sundays.
3
4
Acknowledgements
I would like to thank everyone who helped me during the development of this thesis
paper; in particular, my supervisor Prof. Anders Nordgaard. I would also like to thank
Mattias Villani and Oleg for providing guidance and support, as well as giving me
suggestions for improvements.
I would also like to thank Bart Smits for introducing such an interesting topic and
providing me an opportunity to work with it. His comments and suggestions were very
helpful in researching and understanding the business. I would also like to thank Mikael
Thorin; Volvia, Gothenburg for providing us the necessary research data.
5
Table of contents 1 Introduction ............................................................................................................................. 6
1. Background .......................................................................................................................... 6
Objective ..................................................................................................................................... 8
Outline of Thesis ......................................................................................................................... 8
2 Problem discussion .................................................................................................................. 9
3 Data ....................................................................................................................................... 10
1. Data sources ....................................................................................................................... 10
2. Raw data ............................................................................................................................. 11
1. City safety vs. Non-City safety cars dataset ............................................................... 11
2. Weather and parts sales dataset................................................................................... 12
3. Decision tree dataset ................................................................................................... 14
4 Methods ................................................................................................................................. 16
1. Effect of City Safety on Accidents ..................................................................................... 16
2. Effect of Weather on Body & Paint parts sales .................................................................. 23
3. Decision Tree for count data .............................................................................................. 26
5 Results ................................................................................................................................... 28
1. Effect of city safety technology on accidents: ................................................................... 28
This indicates that city safety has 33% high accident rate then insurance car models. ............ 29
2. Effect of Weather and time variables ................................................................................. 30
3. Decision Tree for count data .............................................................................................. 33
6 Discussion: ............................................................................................................................ 35
1. Effect of city safety technology on accidents: ................................................................... 35
2. Effect of Weather and time variables ................................................................................. 37
7 Conclusions ........................................................................................................................... 39
8 Literature: .............................................................................................................................. 40
9 Appendix ............................................................................................................................... 42
1. Effect of city safety technology on accidents (Graphs and Tables) ................................... 42
2. Effect of Weather and time variables (Graphs and Tables) ............................................... 43
6
1 Introduction
1. Background
Volvo was founded in Sweden in 1927. Today, Volvo cars corporation (VCC) appears to
be a relatively small player in terms of the global car industry, with 373,525 cars sold in
2010 (Tannou & Westerman, 2012).
Volvo Body and Paint business plays an important role in the company’s business and
highly contributes to sales in VCC. According to Figure 1, Body and Paint business
contributes 32.6% of annual sales in Sweden and 32.8% of annual sales in the rest of the
world.
Volvo business has expanded throughout the world, with its largest markets for cars in
USA, Sweden, China, Germany, UK and Belgium.
The core of Volvo business is the selling of cars and car parts. The main focus of this
study, however, lies solely with the latter. The car parts are divided into various groups
and subgroups, each of which in its own manner contributes to the business. Each
subgroup is categorized according to its function; for example, Body and Paints subgroup
has various function groups, such as radiators, headlamps, windscreens, painted bumpers
and unpainted bumpers etc. Apart from function groups, the parts also have specific
names and numbers, through which they are easily identifiable in the system. The
contribution of Body and Paints subgroup to Volvo overall sales worldwide and in
Sweden during 2010 and 2011(with contribution of some important function groups) is
shown on the pie chart on Figure 1. It is clear from the bar line plot that each function
group decreases with varying percentage rate.
7
One of the most important function groups in Body and Paints group, for example, is
“painted bumper” which constitutes, as from Figure 1, roughly 5% of the group sales..
In comparison with 2010, in the year 2011 Volvo cars (insurance & warranty cars) had
6% less accidents, which consequently resulted in 16% decrease in Body and Paints sales
in 2011. According to [European accident research and safety report 2013], in 2009
around 35,500 fatalities were registered in traffic accidents on the territory of European
countries. This was a 10% decrease compared to the previous year (2008), which can be
considered as a positive sign for Volvo city safety cars department, which strives for
achieving “crash-free” results by 2020[Toward crash free cars: Volvo car corporation].
Since the Body and Paint branch constitutes a significant part of the company overall
sales, it is important for Volvo Car Corporation to identify the possible causes standing
behind the decrease in the number of accidents during 2011 and to isolate and quantify
the factors which have direct effect on decrease in Volvo body and paint part sales.
Figure 1(Importance of Body and Paint Business)
8
Objective
The overall objective of the thesis is to identify the important factors affecting Volvo
Body and Paints business, as well as to isolate and quantify the effect of each factor on
the sales of the parts in Sweden.
The specific aims of thesis are:
1. Isolate important factors affecting Body and Paint business.
2. Quantify the effect of each factor on Body and Paint business.
3. Make a comparison between city safety cars vs other car models.
Outline of Thesis
The thesis is divided into several studies.
The first Study is centered on the comparison between city safety cars and non-city safety
cars, to analyze and identify if there is any difference in the number of accidents for two
types of car model.
The second study focuses on the relationships between Volvo Body and Paint parts sales
in Stockholm, air temperature, precipitation and time variables.
The final study is a tree representation for the total number of accidents in 2010 and 2011
used to identify the categorical factors which play an important role in causing the
accidents.
9
2 Problem discussion It is important to understand Body and Paints business and to analyze the relationship
between accidents and external factors behind them (weather, mileage, city safety, etc.).
The point to consider is that VCCS body parts sales are directly proportional to the
number of accidents, as an accident potentially results in that the car owner contacts
Volvo dealer in order to arrange the repair.
10
3 Data
1. Data sources
The data sources for both monthly and daily sales for VCCS business were obtained from
Volvia which is based in Gothenburg, Sweden. Volvia is a Swedish insurance company,
which specializes on Volvo cars and has more than 30% of the customers who own a
Volvo car.
Alongside Volvo car owners, Volvia provides services to customers owning Renault,
Land Rover and Jaguar cars.
Each year Volvia arranges around 400,000 insurance contracts for Volvo cars. The
customers are either insured or warranty based.
11
2. Raw data
1. City safety vs. Non-City safety cars dataset
In this study, we will use the data from Sweden, during the period January 2010 till
December 2011, and compare city safety cars with non-city safety cars (warranty cars
and insurance cars).
Figure 2 (A subset of the monthly dataset)
The screenshot shown in Figure 6(city safety technology sensor) contains a part of data in
which accidents are considered as response variable, and other factors such as contracts,
year and month are predictors that will be considered in the model building process.
2. Weather and parts sa
In order to see the effect of wea
is relatively difficult to conside
weather and the parts sales throu
turn attention to a proxy. Figur
sales data and sales data of th
Stockholm data can be used as a
similarity in peaks and troughs
Figure
Weather data was taken from S
were the common predictors
precipitation in millimeters.
Snow and rain might also have
temperature and precipitation.
The following criteria were cons
0
1000
2000
3000
4000
5000
6000
7000
8000Tota
12
rts sales dataset
of weather and time variables on the overall sales
onsider the whole data, as there is no linkage b
s throughout the whole Sweden. So it was importa
Figure 3 contains a comparison between Stockh
of the entire Sweden. From Figure 3 it becom
ed as a proxy for the whole country, as there is a h
s between the two plots.
Figure 3 (Total claims vs. Stockholm Claims)
rom Swedish National databases; temperature and
ictors, where the temperature was measured
have an effect on daily sales which can also be
.
considered for transformation:
Total Claims Vs Stockholm Claims
TClaims Stock-VA
sales in Sweden, it
kage between daily
mportant, instead, to
Stockholm monthly
becomes clear that
e is a high degree of
re and precipitation
asured in oC and
lso be derived from
0
50
100
150
200
250
300
350
400
450
500
13
If temperature is less than -1.1 and precipitation is greater than 0 then mark precipitation
as snow.
If temperature is greater than -1.1 and temperature is less than 3.3 then
Snow = (3.3-temperature)/(precipitation*(3.3-temperature))
Rain = precipitation –snow
Figure 4 (a subset of daily sales dataset)
Figure 4 shows a table screenshot of daily parts sales dataset from Stockholm; various
time and lagged weather variables are mentioned in the table as well..
14
3. Decision tree dataset
The scope of this study covers car accidents occurred in Sweden during the years 2010
and 2011. The data obtained from Volvia databases was relatively unclear; some of the
variables had outliers and missing variables. Hence, the data cleaning, which involves
checking completeness of data records and missing values and error removing, was
performed at first.
The final dataset was organized based on 8 predictors and 1 response variable which is
the frequency of accidents by Volvo cars. The final data set contains the total number of
710 records. The description of each predictor is explained in Table 1.
Table 1 (Description of variables)
Variable Name Role Variable type Description
Accident frequency Target Count/Categorical Frequencies of car
accidents
1: <400 (<400)
2: <300 (<300)
3: <200 (<200)
4: <100 (<100)
5: <50 (<50)
6: <10 (<10)
Gender [Car Owner] Input Categorical Sex of Car owner
(Male/female)
Mileage Input Categorical Car traveled in 1 year
1: 0-10,000 (1)
2: 10,000-15,000 (2)
3: 15,000-20,000 (3)
4: 20,000-25,000 (4)
5: >25,000 (5)
Contracts Input Continuous Total number of cars
registered in that year
Car Type Input Binary CC = company car
PC = Private car
Age of car Input Continuous < 6 years old cars were
taken (car age = #of days)
Month Input Categorical Categorical
15
AS = 6,7,8,9,10,11
DM = 12,1,2,3
AJ = 4, 5
Weekday Input Categorical 1-7(Monday, Tuesday,
Wednesday, Thursday,
Friday, Saturday, Sunday)
Year Input Binary Year 2010 and 2011
16
4 Methods
1. Effect of City Safety on Accidents
As stated in the article (Tannou & Westerman, 2012); the decrease in the number of
accidents was expected to be around 5% per year. According to accidents and contract
dataset data provided by several insurance companies (in particular, Volvia and IF AB
Sweden), we found almost a 6% decrease in 2011 accidents compared to accidents
occurred in 2010. The decrease in 2011 is due to some external and internal factors.
In Sweden, winter season is a major factor since Sweden often has cold weather and
hence snowy or icy roads, which causes high numbers of accidents. It was also noted that
2010 winter season was much harsher than 2011 as shown in Figure 5.
Figure 5(Weekly temperature for Stockholm)
There are also some other important factors, e.g. city safety system; this section will
focus on the comparison between city safety and non-city safety cars. It is done in order
to analyze if there are any differences in the number of accidents for both types of car
models.
-20
-15
-10
-5
0
5
10
15
20
25
Te
mp
era
ture
in
°C
Stockholm Average Weekly Temperature
17
Even if there is lot of safety systems that were introduced previously by Volvo
Corporations, in this thesis however, we will particularly see how city safety works under
some circumstances.
City safety technology works very effectively on low speeds, which has resulted in high
decrease in accidents right after the technology was introduced on the market. We will
thus consider city safety technology as an important factor, for it seems to have high
influence on VCCS parts sales.
Figure 6(city safety technology sensor) 1
City Safety technology monitors the traffic in front of the car with the help of a laser
sensor that is built into the windscreen’s upper section, as shown in Figure 6. It can detect
the rear end of a vehicle in front of the car. If the driver is about to drive into the vehicle
in front of him and does not react in time, the car brakes automatically. The scope of the
technology is occasional: everyday low-speed scenarios, such as traffic jams or entering
roundabouts - the situations where a large portion of collisions occur due to distracted
drivers [MartinDistner et al 2008].
The importance of city safety cars is to reduce the amount of low-speed crashes and
causation of low-speed collisions, which mainly occur because of distractions or
1 Martin Distner et al, 2008, city safety- A system addressing rear-end collisions at low
speeds, Volvo Cars Sweden
18
inattentions of a driver. In the US a sample of 100 cars accidents was analyzed to
investigate whether there is a relation between the collisions and inattention or distraction
of the drivers involved. According to the report (the first analysis of such kind, where the
researchers collected detailed information on a large number of near-crash events), nearly
80 % of all analyzed crashes and 65 % of all near-crashes involved driver inattention
exactly prior to the onset of the conflict (Neale et al. 2005). Analyzing the UK National
accident database (STATS19) for the period starting from 2005, Grover et al. (2007)
found that in 44% of the situations the drivers took no avoiding action prior to the
collision.
The objective of this section is to evaluate the effectiveness of city safety technology in
terms of avoided crashes, by using real life crash data. The rate of accidents in the cars
equipped with the safety was compared with the corresponding rate for other Volvo car
models such as older warranty cars 2010 and insurance car
As the response variable is “count data” (i.e. Data that consists of count observations),
there are a number of limitations related to the application of a technical model for
analyzing the effects of city safety technology.
The rate of accidents was estimate by the number of claims frequency per insured vehicle
years.
��,�� = ����� ��,��/��� ��� ��,�� …. Eq. 1
Where
����� ��,�� = Number of accidents with City Safety
��� ��� ��,��= Number of Registered cars with City Safety
The total number of accidents
observations. SAS 9.2 were us
model.
Poisson Regression
Poisson regression analysis is
describes count data (Cameron
small numbers of counts as a fu
observational studies in many di
Biology and Medicine (Gardene
In this case Poisson regression
Accidents and potential useful in
The number of accidents over
with contracts as an offset value
approximation to this distributio
The Poisson regression often u
predicted values of the dependen
In particular, let ��, i = 1,…, n
and ���� … ��� �, i = 1,…, n be
For Poisson regression, �� i = 1
expected value of �� is linked to
link function:
19
idents was aggregated on month where the samp
ere used to build a Poisson and negative binom
is a technique used to model independent data
eron et al, 1998). It is often applied to study the
as a function of a set of predictor variables in exp
any disciplines, including Economy, Demography
ardener et al, 1995).
ression is used to model the relationship betwee
seful independent variables (year, month, car type)
over contracts can be considered by using Poiss
t value. The 95% CI for the rate was calculated by
ribution
ften uses the log link function, which ensures t
pendent variable will be nonnegative (Montgomer
…, n be n random variables representing the depen
, n be the corresponding values of the k independ
i = 1… n are modeled as independent Poisson v
ked to a linear function of the independent variable
e sample size is 72
binomial regression
t data variables that
y the occurrence of
in experimental and
graphy, Psychology,
between the rate of
r type)
Poisson regression
ted by using normal
ures that all of the
gomery et al, 2006).
dependent variable
ependent variables.
isson variables. The
ariables, using a log
����������� ��
Poisson Regression model with
Where � !���� are the param
categorical variables such as
converted into dummy variable
for variable month.
Negative binomial Regression:
There might be some situations
hence the fitted data may conta
variance of response is greater
improve the results the negat
regression function over dispe
heterogeneity in count data. Th
where it predicts µᵢ on the base o
Note that " = # then the mo
negative binomial regression ha
the " is smaller the negative b
Cameron & Pravin K. Trivedi, 1
20
!����$ = �� % ��&��� %!�%������
l with explanatory and offset variable can be repre
Equation 1
parameters of the model; In Poisson regress
h as Month, year and Car type. Categorical
riables with ℓ = 12 levels minus 1 binary depend
ssion:
ations that the Poisson regression may not fit the
contain over dispersion. Over dispersion may a
reater than its mean. In order to overcome this
negative binomial regression is used. In nega
dispersion is taken as a parameter which m
ata. This model is a generalized model of Poiss
base of Xᵢ and dispersion parameter "ᵢ.
e model will becomes Poisson regression mod
ion has higher flexibility then the Poisson regres
tive binomial approaches the Poisson regression
vedi, 1998).
��
represented as
egression there are
rical variables are
dependent variables
fit the data well and
may arise when the
e this problem and
negative binomial
ich minimizes the
Poisson regression
n model. Thus the
regression. Also as
ssion model (Colin
21
Regarding the comparison of the model performance in terms of accident
statistics, particularly the XC60 model appeared to be the most desired by Volvo
customers. A new reference from the Highway Loss Data Institute (HLDI) (Russ Rader,
2011); indicates that Volvo XC60 midsize SUVs equipped with standard City Safety
system has much fewer registered accidents in comparison with vehicles without the
safety feature. Table 2 shows accidents rates of insurance car pool and warranty cars
pool.
Table 2(Claims per contract for different car models)
Car Models Claims 2010 Claim/Contract Index Claims 2011 Claim/contract Index
XC90(275) 1668 0.22 139 1380 0.19 102
C30(533) 1498 0.20 124 1548 0.20 110
Others 35338 0.16 102 32789 0.17 91
XC70(295) 1829 0.16 98 1348 0.13 71
S40(644) 1644 0.13 83 1143 0.11 62
S60 II(134) 108 0.06 40 923 0.18 98
V60(155) 28 0.04 25 1682 0.13 72
XC60(156) 1314 0.16 100 2546 0.18 100
Total 71978 0.17 67414 0.16
22
Consider Figure 7 which graphs the accident rate for city safety cars and new warranty
cars (model year 2010). The graph shows that the accident rates for two types of cars are
very alike, which suggest that the safety characteristics of the new car models are also
improving, along with the technological developments in the field Older warranty cars
has higher accident rate.
Figure 7(Accident rate for different group cars)
0
0.005
0.01
0.015
0.02
0.025
0.03
20
10 2 3 4 5 6 7 8 9
10
11
12
20
11 2 3 4 5 6 7 8 9
10
11
12
Cla
im/c
on
tra
ct
Model year 2010
Warranty2010 City Safety Older W Cars
23
2. Effect of Weather on Body & Paint parts sales
Weather is an important factor that directly affects accident statistics, and thus is
related to changes of VCCS body and paint branch sales dynamics. During winter season
sales of body and paint parts in Sweden increase; during summer season, conversely,
much lower amount of parts is regularly expected to be sold. However, another aspect,
which directly contributes to notably higher sales during winter season, is the more
serious damages cars are possibly exposed to in harsher winter conditions, which causes
higher probability of a more excessive demand for expensive parts. Following the above,
we can presuppose that any increase in the number of accidents can possibly lead to an
increase in Volvo’s Body and Paint branch sales.
According to several studies, snow appears to be a central cause for traffic chaos (see e.g.
Thornes, 2005; London Assembly, 2009). The impact also varies considerably from study
to study; Smith (1982), for example, encountered only an increase of accidents by just
2.2%, whereas the other studies have reported almost double increase in the accident rate
(Codling, 1974; Andreescu and Frost, 1998; Suggett, 1999).
The tests show that road surface reaches its most slippery condition when the temperature
is close to zero degrees Celsius (Moore, 1975). Campbell (1986), however, researching
the same topic in Winnipeg, Canada, found that the number of accidents within the
temperature range below -15°C was surprisingly higher than within the temperature range
-15°C to 0°C. Apparently not only snow, ice, rainfall, wind, fog or low sun are the factors
that contribute to traffic accidents – even hot temperatures (>34°C) have shown to be a
contributing factor in Saudi Arabia (Nofal and Saeed, 1997). Other factors can also affect
driving; for instance, sudden illnesses (Lam and Lam, 2005) or drink driving (Meyhew et
al., 1986; Horwood and Fergusson, 2000; Evans, 2004). Even superstition can play a
significant role in causing an accident; Näyhä (2002) has discovered that there was an
24
increase in the amount of fatal accidents on Friday 13th
(compared with other Fridays), by
1.63 for women and 1.02 for men accordingly. Fatal accidents among female drivers
occurred most often in the temperature interval -3°C to 1°C, which coincides with the
slippery road conditions.
In addition, the reports show that a lot of other weather and time factors are quite
important as well, as harsh snowy March 2010 was said to have brought more accidents
than an average mild winter month. Cool, rainy spring makes people more careful with
regard to driving. During summer season, on the other hand, for short distances most
people prefer to use bicycles (going to a nearby mall, for example). Moreover, the fact
that sometimes people do not initiate the repair immediately after an accident results in
the occurrence of lagged time effects.
All of the above mentioned aspects are the important factors which have to be considered
within the framework of such research. Using subset algorithms, we can choose the most
appropriate factors for a particular model under the analysis.
Selecting the best subset model requires us to search for all possible subsets; e.g. if we
have (p-1) predictors, then the best subset algorithm constructs almost 2ᵖˉ¹ alternative
models.
Model selection procedures, also known as subset selection or variable selection
procedures, are executed in accordance with particular criteria, which allow identifying
the most appropriate model. The criteria for selecting the appropriate parameter in the
model is and and can be defined as
'() = 1 − ,,�(,,-.
And can be defined
25
=
For large values of n or for a multivariate dependent variable the method of generating all
subsets becomes infeasible. Thus, polynomial stepwise (greedy) procedures have been
proposed. These procedures are based on adding or deleting variables one at a time
according to a specific criterion (R. Draper, 1966; R. Hocking, 1976; A. F. Seber, Sen
and M. Srivastava). The Forward Selection procedure starts with no variable in the model
and adds one variable at a time until either a stopping criterion is satisfied, or all variables
are selected. During each step, the variable with the largest single degree of freedom F-
value among those eligible is considered for inclusion. That is, a variable is added to a p-
factor regression equation if
where (p + i) denotes the quantities computed when variable i is added to the current p-
factors equation. The stopping criterion for the procedure is given by the specification of
the quantity FIN.
It is important to see the effect of weather and time variable on daily sales. For this an
appropriate model must be selected. The linear regression has the property that it linearly
fits the dataset and for this it is important that the variables have a linear relationship with
the response variable. A linear model can be stated as
Where α is intercept and βj are linear coefficient in multi linear regression for X
variables. In this models lagged weather variables may have certain effect on Body and
Paint part sales in Stockholm and they can be selected by forward selected method.
26
3. Decision Tree for count data
A decision tree is a graphical way to divide the large amount of data into smaller groups
or rules and make a decision on response variable. The decision is made by taking
predictors as an input variable and target as response variable.
A decision tree model consists of a set of rules for dividing a large collection of
observations into smaller homogeneous group with respect to a particular target variable
(Yap Bee Wah et al, 2012). It is better that the target variable is categorical variable as
the decision tree calculates the probability that a given record belongs to each of target
category. Given a target variable and a set of explanatory variables, decision algorithms
automatically determine which variables are most important, and subsequently sort the
observations into the correct output category (Olson and Yong, 2006).
The common decision tree algorithms in data mining software are CHAID (Chi-Square
Automatic Interaction Detector), CART (Classification and Regression tree) and C5. The
splitting criteria for CART, C5 and CHAID are Gini index in, entropy and chi-square test
respectively (Yap Bee Wah et al, 2012).
The objective of this study is to model the number of accidents over one year time period.
For this purpose a classification tree model is developed. The principal behind the CART
tree model is to minimize the impurity in the terminal nodes. For this the tree growing
method is used which recursively partition the target variable to minimize the impurity in
the terminal nodes.
The impurity can be calculated for a node is defined as follows
� $ = ∅�1�1� $, 1�2� $, ! , 1�3� $$ …. Eq. 2
27
Where i(t) is the measure of impurity of node t, p(j|t) is the node proportions and ∅ is
non negative function. 2
The measure of node impurity by Gini index in criteria can is defined as
i�t$ = 61�� $, 1�3� $�78
…. Eq. 2
The partitioning is done by searching all possible threshold values for all input variables
to find the threshold that leads to greatest improvement in the impurity score of the
resultant nodes. The pruning step will be done to create a sequence of similar trees,
through cutting off increasingly important nodes. This step needs complexity parameter
which can be calculated through a cost function of the misclassification of data and size
of the tree2. The last step is to select a tree with right size from prune tree. Large trees
normally results in higher misclassification when applied to analyze new data sets.
Assessment of the tree is also a valuable step which normally takes pruned tree and test
sample as an input parameter.
If the misclassification rate for the test tree is low, the pruned tree will produce a tree like
structure diagram and the decision rules whereby important information can be
extracted3.
2 Li-Yen Chang and Wen-Chieh Chen, (2005): “Data mining of tree-based models to analyze freeway accident
frequency” 3 Yap, B. W., Ismail, N.H. and Fong, S., “Predicting Car Purchase Intent Using Data Mining Approach,” IEEE
Eighth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD) Proceedings, 2011, pp. 2052-
2057
28
5 Results
1. Effect of city safety technology on accidents:
The range of Volvo car accidents frequencies for the sample is from 0 to 72, with mean =
607 and standard deviation = 321. Running Poisson regression the deviance/Degree of
freedom is equal to 30 which are far greater than 1 and it indicates over dispersion
problem. The deviance of Negative binomial regression is = 1.08 which is much closer to
1 but it makes month variable as insignificant. Thus, from the log likelihood ratio test
statistics as shown in Effect of city safety technology on accidents (Graphs and Tables)
Table 7 in Appendix at least two independent variables are significant predictors of
frequency of accidents. In Table 3 the value of AIC and BIC for negative binomial
regression model is lower than Poisson regression model. Hence the negative binomial
regression model demonstrates a better fit then Poisson Regression model.
Table 3(comparison of Poisson and Negative Binomial Regression)
Criterion Poisson Regression Negative Binomial Regression
AIC (smaller is better) 2355.6120 904.9228
AICC (smaller is better) 2364.1834 905.8319
BIC (smaller is better) 2389.7620 916.3061
Hence the estimated negative binomial regression model can be written as.
l�:��μ$ = ���:���� ��� �$ % �2.24 − #.57. � @��A� @ % �#$ ∗ C����� @2#1#− #.39. F��G����� − #.29�. @���2#1# % �#$ ∗ @���2#11��
Where µ is the number of accidents in month
The estimated Negative binomia
constant in the model, shows tha
to be 0.42 units lower then for
compared to warranty cars.
In order to compare rate of acc
let C = city safety and I = insura
As from Equation 1 r is the rate
us the ratio between rates of two
This indicates that city safety ha
29
inomial regression coefficient, given the other vari
ws that the difference in the logs of expected coun
for insurance cars and 0.71 units lower for ci
of accidents between city safety and Insurance ca
insurance car models then
e rate of accidents; then exp (difference in β) valu
of two car models.
� exp(-0.42 + 0.71 ) = 1.33
fety has 33% high accident rate then insurance car
er variables are held
counts is expected
for city safety cars
nce cars models i.e.
) values will give
ce car models.
30
2. Effect of Weather and time variables
To see the effect of weather and time variables on sales which are linear regression
model. From the output we notice that 55% of variation is explained by the model. All
the five variables seem to be significantly important in terms of their effect on the overall
sales. Figure shows the whole process of multi linear regression i.e. the data is partitioned
into training test and validation datasets. After data partitioning important variables are
chosen from the model which has high correlation with log_partsales. Figure shows that
partitioning of data is done using stratified partitioning method. Two important variables
year and month are used for data partition partitioning. These variables are ordinal and
categorical in nature.
Figure 8(SAS miner window for multiple linear regression)
31
Figure 9(sample stratification)
Figure 10(stratified variable list)
Figure shows training, test and validation error in data set. This also shows that the data
fits well with minimum squared error.
Figure 11(Standard error in training, test and validation)
32
Model Fit Statistics
R-Square 0.5547 Adj R-Sq 0.5439
AIC -692.7205 BIC -690.2904
SBC -672.5527 C(p) 4.5867
The r-square value shows that model has explained almost 55% variations.
Type 3 Analysis of Effects
Sum of
Effect DF Squares F Value Pr > F
Accidents 1 7.9411 212.04 <.0001
lagPrr4 1 0.2537 6.77 0.0099
lagTem1 1 0.1607 4.29 0.0395
lagTem2 1 0.3283 8.77 0.0034
Year 2 0.4293 5.73 0.0038
From the type 3 analysis of effects almost all variables are significant in the model.
Analysis of Maximum Likelihood Estimates
Standard
Parameter DF Estimate Error t Value Pr > |t|
Intercept 1 4.3289 0.0380 113.88 <.0001
Accidents 1 0.0464 0.00318 14.56 <.0001
lagPrr4 1 -0.00625 0.00240 -2.60 0.0099
lagTem1 1 0.0109 0.00528 2.07 0.0395
lagTem2 1 -0.0160 0.00539 -2.96 0.0034
year 2009 1 -0.00621 0.0193 -0.32 0.7485
year 2010 1 -0.0553 0.0205 -2.69 0.0076
From the estimate values of important variables which are selected by selection model;
lagprr (4 days) lagged temperature (2 day) has negative relation with body and paint part
sales in Stockholm. Year variable is also an important variable and against year2011 sales
data for Stockholm is negative for 2009 and 2010. Accidents variable has highest effect
on daily sales.
The effect of each variable in the list is shown as follows
Accidents LagPrr4 LagTemp1 Lagtemp2 Year2010
4.7% -0.6% 1.1% -1.6% -5.4%
33
3. Decision Tree for count data
Two samples were taken from data i.e. our learning or training sample contains 164
observations and test sample contains 124 observations. Before running the rpart function
in R programming it is important to split the tree according to some criteria. A minimum
split of 10 observations was set to create the tree; which means that 10 similar
observations will form a split.
In validation if we have >1 similar observation the node code be validated else it would
be rejected.
The tree is formed by Gini index method which partitions the tree by searching all
possible threshold values for input variables. The partitioning is done by considering
greatest improvement in the impurity score of the resultant nodes. When tree is formed
from learning dataset then it is important to prune the tree based on critical parameter. A
critical parameter has a value which split the tree such as it has less number of nodes and
minimum standard error. It is always better to split the tree with less number of splits and
minimum error in tree.
Table 4 (complexity parameter with standard error)
CP nsplit rel error xerror xstd
1 0.291667 0 1 1 0.06572
2 0.078125 2 0.41667 0.47917 0.059928
3 0.041667 4 0.26042 0.39583 0.056284
4 0.020833 5 0.21875 0.38542 0.055758
5 0.017361 6 0.19792 0.44792 0.058672
6 0.01 9 0.14583 0.48958 0.060318
modeltree.e = prune(modeltree, cp=.03)
plot(modeltree.e) Text(modeltree.e)
34
Figure 12 (Decision tree for categorical response)
Figure 12 shows that how tree is formed based on minimum split and minimum standard
error.
Next step is to make an assessment of the tree based on test sample; the major purpose of
this step is to see how much standard error is there if we provide new sample data.
Table 5(standard error calculated at each node)
<10 <100 <200 <300 <50
<10 36 0 0 0 1
<100 0 7 3 0 3
<200 0 4 14 1 0
<300 0 0 3 1 0
<50 8 0 0 0 43
err = 1.0 - (mc[1,1]+mc[2,2]+mc[3,3]+mc[4,4]+mc[5,5]) / sum(mc)
print(err) = 0.1854839
Above calculation shows that we have almost 98% correct tree
35
6 Discussion:
1. Effect of city safety technology on accidents:
The rate of accidents is greatly reduced in cars equipped with city safety compared with
other Volvo car models without city safety technology. There are 6 city safety equipped
car models in accidents database for year 2010 and 2011. Three cars were considered as
insignificant due to small sample size (SSS) and were removed from analysis. Table 6
shows accident rate for different city safety car models.
Table 6
Accident rate in % for City Safety cars
2010 2011
Volvo S80 II SSS 9%
Volvo V70 III SSS 12%
Volvo VC70 III SSS 8%
Volvo S60 7% 18%
Volvo V60 4% 13%
Volvo XC60 16% 18%
Total 14% 16%
The comparative analysis was done on rest of Volvo cars equipped with city safety
technology. The effect comparison was done by considering all new cars i.e. warranty
cars and insurance cars.
The study covers two calendar years in order to cover both summer and winter conditions
equally. In Sweden, winters are often means cold weather and hence snowy or icy roads.
In order to see the effect of weather and time variables it was difficult to mention the
weather variables in the data set as the number of daily accidents will decrease when a
particular city such as Stockholm is considered. To overcome this problem month
variable is considered as an input parameter which will isolate the weather from the
model.
36
From empirical analysis as mentioned in Table 6 above the rate of accident is below 16%
for 2011 year. However for the warranty and insurance cars it was 20% and 10%
respectively. This also shows that there is an effect of this technology on number of
accidents.
There is also lot of reports published based on finding the effect of city safety technology
i.e. [Martin Distner et al 2008] work was based on observational analysis due to lack of
real accident data. The report measured that City Safety has potential to reduce the risk of
soft-tissue neck injuries in the rear-end impacted car by approximately 60%. The other
report was based on real data and was published in 2012 by Volvia AB Gothenburg. In
this report only car with city safety technology XC60 was compared with other warranty
car models. The report says that reduction in number of accidents decreases by 33%. The
only difference in the study in hand and the previous work is that I took whole pool of
city safety technology cars and compared with other warranty and insurance cars. While
the previous work either only on city safety or was based on observational analysis. In
this study Poisson regression model and Negative binomial regression Model is used as
tool for comparative analysis. From Poisson regression the effect was calculated as 42%
however from Negative binomial regression function the effect was increased to 50%
reduction in accidents. However the month variable becomes insignificant in case of
Negative binomial regression model.
37
2. Effect of Weather and time variables
If each selected variable has a linear relationship with the daily sales then the appropriate
model is multi linear regression. From Figure 13 it is clear that almost all important
variables have linear relationship. Some variables have nonlinear relationship due to less
number of data points. As it can be seen from plot between rain and sales some of data
points are less then 0mm which is technically impossible so the negative rain variables
are removed from the model and then fitting linear model to find the effect.
30150
5.5
5.0
4.5
4.0
3.5
3.0
200-20
20100
40200
1.0
0.5
0.0
1.0
0.5
0.0
40200
600000
500000
400000
accidents
log_partsales
Temperature snow rain winter summer PRR contracts
Matrix Plot of log_partsale vs accidents, Temperature, snow, rain, ...
Figure 13 (relationship between response and predictors )
From Figure 13, the relationship between accidents and log partsales suggest that when
accidents and contracts gets higher the log of body and paint business in Stockholm also
gets higher, this means that both variables has positive and linear relationship with the
response.
It is also important that the response should be distributed normally; Figure 14 shows that
response variable is normally distributed except for few data points which seem to be
outlier or extreme value.
38
The daily partsales in SEK has a positive skewness coefficient, viz, 1.66 but after
converting the daily part sales into log part sales the skewness becomes negative i.e. = -
0.57. This is also shown using normal probability plots of the daily sales in Figure 14 and
16 in Appendix. As there is skewness and kurtosis in response data, it is considered
important to convert it before creating a model.
The actf() in R is used to find the autocorrelation plot. It describes the strength of
relationship between different points in the series. In Appendix Figure 17 is used to show
such plots for sales in Stockholm on daily basis.
It is also important to see the correlation between each variable; in this instance we use
correlation matrix between response and other important variables. Figure 17 shows
correlation matrix between all important variables.
Correlation between lagged temperature and precipitation variables is decreasing after 3
lagged days. The correlation between accidents and sales are very high and it shows that
as accidents increases sales also increases. Temperature and precipitation has negative
effect on sales while snow has positive correlation.
39
7 Conclusions We have analyzed daily crash data for Stockholm, collected during three year period i.e.
from year 2009 till 2011. The major purpose of the analysis was to find and quantify the
factors that have a direct effect on the Body and paint business for VCCS.
The study outcomes show that the cars equipped with the city safety system have lower
accident rates than those of the insurance cars and other warranty cars. The findings
propose that the warranty cars have 50% and 32% high accident rate then city safety and
5 year old insurance cars.
Weather has also a significant impact on road safety. In terms of crash frequency, rate,
and severity, winter weather appears to be far more dangerous than wet weather. Most of
the weather-related crashes occur during the winter season – during snowfall and when
the overall temperature is below -15 Co and the precipitation is higher than 1. The
analyses suggest that temperature has an effect of 1.6% on accidents sales. Rain has
almost negative effect on sales which means that people drive less or slowly during rain.
In addition, the results of the study suggest that the time variables, such as weekday,
month and week, also play an important role in the business.
Then, in the tree based approach two models have been implemented. The model with
contract variable has shown that increase in the number of cars for a particular model can
possibly result in the consequent increase in the number of accidents.
40
8 Literature: Irene,E. & Magdalena,L.(2012) The effect of a low-speed automatic brake system estimated from real life data (If
Insurance Company P&C Ltd & Volvo Car Corporation)
M Distner, M Bengtsson, T Broberg, L Jakobsson. City safety: Volvo Cars Sweden, paper Number 09-0371
Tannou
T Mael, W George. Volvo Cars Corporation: Shifting from a B2B to a “B2B+B2C”Business Model: “The MIT
center for digital business”,
http://ebusiness.mit.edu/research/papers/2012.04_Tannou_Westerman_Volvo 20Cars
20Corporation_298.pdf.
Gardner W, Mulvey EP, Shaw EC (1995). Regression analyses of counts and rates: Poisson, overdispersed Poisson
and negative binomial models. Psychological Bulletin, 118(3):392-404.
Russ Rader (2011). High Tech System on Volvos is Preventing Crashes.
Thornes, J.E., 2005. Snow and road chaos in Birmingham on 28th January, 2004. Weather 60, 146-149
Codling, P.J., 1974. Weather and road accidents. In. Taylor JA (ed) Climatic resources and economic activity. David
& Charles Holdings, Newton Abbot, p 205-222.
Moore, D.F., 1975. The friction of pneumatic tyres. Oxford: Elsevier Scientific. 220pp
Campbell LR. 1986. Assessment of traffic collision occurrence related to winter conditions in the city of Winnipeg:
1974 to 1984. City of Winnipeg.
Nofal FH, Saeed AAW. 1997. Seasonal variation and weather effects on road traffic accidents in Riyadh City.
Public Health 111: 51-55.
Lam LT, Lam MKP. 2005. The association between sudden illness and motor vehicle crash mortality and injury
among older drivers in NSW, Australia. Accident analysis and prevention 37: 563-567.
Meyhew DR, Donelson AC, Beirness DJ, Simpson HM. 1986. Youth, alcohol and relative risk of crash
involvement. Accident analysis and prevention 18: 273-287.
Horwood LJ, Fergusson DM. 2000. Drink driving and traffic accidents in young people. Accident Analysis and
Prevention 32: 805-814.
Evans L. 2004. Traffic Safety. SSS, Bloomfield Hills, MI
N. R. Draper and H. Smith. Applied Regression Analysis. John Wiley & Sons, Inc., 1966.
R. R. Hocking. The analysis and selection of variables in linear regression. Biometrics, 32:1–49, 1976.
G. A. F. Seber. Linear regression analysis. John Wiley & Sons, 1977.
A. Sen and M. Srivastava. Regression Analysis. Theory, Methods, and Applications, volume 38 of Springer Texts in
Statistics. Springer-Verlag, New York Inc, 1990.
Olson, D. and Yong, S., Introduction to Business Data Mining. McGraw Hill International Edition, 2006.
41
Neale VL, Dingus TA, Klauer SG, Sudweeks J, Goodman M. An Overview of the 100 Car Naturalistic Study and
Findings, Paper No. 05-0400, 19th Int. ESV Conf., 2005
Grover C, Knight I, Okoro F, Simmons I, Couper G, Massie P, Smith B. Automated Emergency Brake Systems:
Technical Requirements, Costs and Benefits, Published Project Report PPR227, Contract ENTR/05/17.01, DG
Enterprise, European Commission, 2007
Lisa A. White. Predicting hospital admissions with poisson regression analysis: http://www.dtic.mil/cgi-
bin/GetTRDoc?AD=ADA501543, June 2009
D. Montgomery, E. Peck and G. Vining, Introduction to Linear Regression Analysis. John Wiley & Sons, Inc. 2006,
pp. 427, 450.
Cameron, A. C. and Trivedi, P. K., Regression analysis of count data. Cambridge University Press, 1998.
Greene, W., Functional forms for the negative binomial model for count data. Economics Letters 99, 2008, pp. 585-
590.
42
9 Appendix
1. Effect of city safety technology on accidents (Graphs and Tables)
Table 7(likelihood ratio test statistics for categorical variables)
LR Statistics For Type 3 Analysis
Source DF Chi-Square Pr > ChiSq
year 1 22.06 <.0001
type 2 64.32 <.0001
Table 8(output from negative binomial regression)
Parameter DF Estimate Effect Standard
Error
Wald 95%
Confidence Limits
Pr >
ChiSq
Intercept 1 2.2310 0.0605 2.1124 2.3495 <.0001
year 2010 1 -0.2925 0.0580 -0.4063 -0.1788 <.0001
year 2011 0 0.0000 0.0000 0.0000 0.0000 .
type Insurance 1 -0.4204 -34% 0.0696 -0.5568 -0.2840 <.0001
type citysafety 1 -0.7109 -50% 0.0713 -0.8506 -0.5711 <.0001
type warranty 0 0.0000 0.0000 0.0000 0.0000 .
Dispersion 1 0.0561 0.0099 0.0397 0.0792
43
2. Effect of Weather and time variables (Graphs and Tables)
6.05.55.04.54.03.53.0
99.99
99
95
80
50
20
5
1
0.01
log_partsales
Percent
Mean 4.767
StDev 0.2798
N 577
AD 0.829
P-Value 0.032
Probability Plot of log_partsalesNormal
Figure 14(Normality plot for response variables)
300000250000200000150000100000500000
90
80
70
60
50
40
30
20
10
0
partsales
Frequency
Mean 70714
StDev 45308
N 577
Histogram of partsalesNormal
Figure 15(histogram partsales)
5.24.84.44.03.63.2
90
80
70
60
50
40
30
20
10
0
log_partsales
Frequency
Mean 4.767
StDev 0.2798
N 577
Histogram of log_partsalesNormal
Figure 16(histogram for log partsales)
44
Figure17(distribution of log of part sales)
Figure 18(residuals vs ordered row count)
0 5 10 15 20 25
0.0
0.2
0.4
0.6
0.8
1.0
Lag
ACF
Series carData$log_partsales
45
Figure 19(predicted vs original)
Figure20(Correlation matrix)
Presentation Date
04/06/2013
Publishing Date (Electronic version) 25/06/2013
Department and Division
Division of Statistics, Department Of Computer and Information Science
URL, Electronic Version http://urn.kb.se/resolve?urn=urn:nbn:se:liu:diva-94567
Publication Title Isolating and quantifying factors affecting body and paint business for Volvo Cars
Author(s) Muhammad Awais khan
Sammanfattning Abstract
This thesis focuses on identifying the degree of contribution of the most important factors affecting Body and Paint business of Volvo Car Corporation in Sweden. It is clear that Body and Paint business for VCCS directly depends on the number of
registered accidents. Our major purpose is to determine the factors which have direct or indirect effect on reduction in the number of accidents in Sweden and to analyze in which degree they may affect the business. During the interviews with
senior staff members, we discover that particularly city safety cars are mentioned by most of the specialists. Other important factors highlighted were mileage, weather, company car/ private car and age of a car.
City Safety is a technology designed to help the driver mitigate, and in certain situations avoid, collisions at low speed by automatically bracking the vehicle. The estimated claim rate frequency i.e. claims per contract rate was 50% lower for city
safety equipped; then other warranty cars models without system. The study also analysis the effect of rain, mean temperature and snow on Volvo Body part sales in Stockholm Sweden. Temperature snow impacted road accidents
significantly. Snow was shown to be the leading variable, as the number of accidents increases sharply with increased snowfall. Temperature is the second important variable in the list i.e. as the temperature decreases by 1ͦC the sales of body
and paint business in Stockholm increases by 1.6%.
Time variable such as weekday, month, and year also plays significant role in this model. During Fridays 51% high accidents are expected then accidents occurred on Sundays.
Keywords Volvo Car customer service, City safety technology, accident rate, machine learning models, Poisson Regression, Negative
binomial regression, CART
Language -- English Other (specify below)
Number of Pages 48
Type of Publication Licentiate thesis -- Degree thesis
Thesis C-level Thesis D-level
Report
Other (specify below)
ISBN (Licentiate thesis)
ISRN: LIU-IDA/STAT-A--13/003--SE
Title of series (Licentiate thesis)
Series number/ISSN (Licentiate thesis)
LIU-IDA/STAT-A--13/003--SE