9
Research Article The Application of Data Mining Technology to Build a Forecasting Model for Classification of Road Traffic Accidents Yau-Ren Shiau, 1 Ching-Hsing Tsai, 1 Yung-Hsiang Hung, 2 and Yu-Ting Kuo 2 1 Department of Industrial Engineering and System Management, Feng Chia University, No. 100 Wenhua Road, Taichung 40724, Taiwan 2 Department of Industrial Engineering and Management, National Chin-Yi University of Technology, No. 57, Section 2, Zhongshan Road, Taiping District, Taichung 41170, Taiwan Correspondence should be addressed to Yung-Hsiang Hung; [email protected] Received 24 March 2015; Revised 27 June 2015; Accepted 28 June 2015 Academic Editor: Aime’ Lay-Ekuakille Copyright © 2015 Yau-Ren Shiau et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. With the ever-increasing number of vehicles on the road, traffic accidents have also increased, resulting in the loss of lives and properties, as well as immeasurable social costs. e environment, time, and region influence the occurrence of traffic accidents. e life and property loss is expected to be reduced by improving traffic engineering, education, and administration of law and advocacy. is study observed 2,471 traffic accidents which occurred in central Taiwan from January to December 2011 and used the Recursive Feature Elimination (RFE) of Feature Selection to screen the important factors affecting traffic accidents. It then established models to analyze traffic accidents with various methods, such as Fuzzy Robust Principal Component Analysis (FRPCA), Backpropagation Neural Network (BPNN), and Logistic Regression (LR). e proposed model aims to probe into the environments of traffic accidents, as well as the relationships between the variables of road designs, rule-violation items, and accident types. e results showed that the accuracy rate of classifiers FRPCA-BPNN (85.89%) and FRPCA-LR (85.14%) combined with FRPCA is higher than that of BPNN (84.37%) and LR (85.06%) by 1.52% and 0.08%, respectively. Moreover, the performance of FRPCA-BPNN and FRPCA-LR combined with FRPCA in classification prediction is better than that of BPNN and LR. 1. Introduction As the demand for vehicles rises, the number of vehicles on the road increases greatly and traffic jams worsen, especially during rush hours; thus, traffic accidents are more likely to occur. Faced with more severe accidents, the traffic problem has become a topic of concern in Taiwan. e statistics of the Ministry of Health and Welfare (2013) indicated that accidental injury is the sixth major cause of death in Taiwan, with 6,873 deaths from accidental injuries. Most traffic accidents are caused by improper driving behaviors, and one of the major reasons is that drivers failed to pay attention to the road ahead (Ministry of Transportation, 2008). According to the statistical data of the National Police Agency (2012), the number of road traffic accidents with death in Taichung City was next to Kaohsiung City. In 2012, the number of traffic accidents causing death was 208, and the death toll was 210. In 2012, the number of traffic accidents causing death was 198, the death toll was 203, and the gradient of number of accidents was 3.41%. Due to the increase in urban population, at a growth rate of 0.76% in 2012, the occurrence rate of road traffic accidents increased accordingly. In 2011, accidental injury ranked sixth among the ten major causes of death in Taiwan, and the death toll from motor vehicle accidents was about 30, accounting for 17.3% of the death rate per 100,000 persons (MOHW, 2012). is study uses the traffic accident data from the NPA of the region from January to December 2012 as the data source. e data content includes 17 items, such as weather, light rays, road category, speed limit, road type, accident site, road conditions, and roadblocks. ere are 2,471 original observations. According to previous transportation research [15], the causes of traffic accidents are mostly human factors, such as speeding, violation of signals, and drunk driving, Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2015, Article ID 170635, 8 pages http://dx.doi.org/10.1155/2015/170635

Research Article The Application of Data Mining Technology ...downloads.hindawi.com/journals/mpe/2015/170635.pdfResearch Article The Application of Data Mining Technology to Build

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article The Application of Data Mining Technology ...downloads.hindawi.com/journals/mpe/2015/170635.pdfResearch Article The Application of Data Mining Technology to Build

Research ArticleThe Application of Data Mining Technology to Build aForecasting Model for Classification of Road Traffic Accidents

Yau-Ren Shiau1 Ching-Hsing Tsai1 Yung-Hsiang Hung2 and Yu-Ting Kuo2

1Department of Industrial Engineering and System Management Feng Chia University No 100 Wenhua RoadTaichung 40724 Taiwan2Department of Industrial Engineering and Management National Chin-Yi University of Technology No 57 Section 2Zhongshan Road Taiping District Taichung 41170 Taiwan

Correspondence should be addressed to Yung-Hsiang Hung hys502ncutedutw

Received 24 March 2015 Revised 27 June 2015 Accepted 28 June 2015

Academic Editor Aimersquo Lay-Ekuakille

Copyright copy 2015 Yau-Ren Shiau et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

With the ever-increasing number of vehicles on the road traffic accidents have also increased resulting in the loss of lives andproperties as well as immeasurable social costsThe environment time and region influence the occurrence of traffic accidentsThelife and property loss is expected to be reduced by improving traffic engineering education and administration of law and advocacyThis study observed 2471 traffic accidents which occurred in central Taiwan from January to December 2011 and used the RecursiveFeature Elimination (RFE) of Feature Selection to screen the important factors affecting traffic accidents It then establishedmodelsto analyze traffic accidents with various methods such as Fuzzy Robust Principal Component Analysis (FRPCA) BackpropagationNeural Network (BPNN) and Logistic Regression (LR) The proposed model aims to probe into the environments of trafficaccidents as well as the relationships between the variables of road designs rule-violation items and accident types The resultsshowed that the accuracy rate of classifiers FRPCA-BPNN (8589) and FRPCA-LR (8514) combined with FRPCA is higherthan that of BPNN (8437) and LR (8506) by 152 and 008 respectively Moreover the performance of FRPCA-BPNN andFRPCA-LR combined with FRPCA in classification prediction is better than that of BPNN and LR

1 Introduction

As the demand for vehicles rises the number of vehicles onthe road increases greatly and traffic jams worsen especiallyduring rush hours thus traffic accidents are more likely tooccur Faced with more severe accidents the traffic problemhas become a topic of concern in Taiwan The statisticsof the Ministry of Health and Welfare (2013) indicatedthat accidental injury is the sixth major cause of death inTaiwan with 6873 deaths from accidental injuries Mosttraffic accidents are caused by improper driving behaviorsand one of the major reasons is that drivers failed to payattention to the road ahead (Ministry of Transportation2008)

According to the statistical data of the National PoliceAgency (2012) the number of road traffic accidents withdeath in Taichung City was next to Kaohsiung City In2012 the number of traffic accidents causing death was 208

and the death toll was 210 In 2012 the number of trafficaccidents causing death was 198 the death toll was 203 andthe gradient of number of accidents was minus341 Due to theincrease in urban population at a growth rate of 076 in2012 the occurrence rate of road traffic accidents increasedaccordingly In 2011 accidental injury ranked sixth amongthe ten major causes of death in Taiwan and the death tollfrom motor vehicle accidents was about 30 accounting for173 of the death rate per 100000 persons (MOHW 2012)This study uses the traffic accident data from the NPA of theregion from January to December 2012 as the data sourceThe data content includes 17 items such as weather lightrays road category speed limit road type accident siteroad conditions and roadblocks There are 2471 originalobservations

According to previous transportation research [1ndash5]the causes of traffic accidents are mostly human factorssuch as speeding violation of signals and drunk driving

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2015 Article ID 170635 8 pageshttpdxdoiorg1011552015170635

2 Mathematical Problems in Engineering

as well as the interaction between road environments andtraffic engineering facilities This study identifies the keyfactors that affect traffic accidents using Feature Selectionand establishes models to analyze traffic accidents and theirtypes with various methods such as Fuzzy Robust PrincipalComponent Analysis (FRPCA) Back Propagation NeuralNetwork (BPNN) and FRPCA-Logistic Regression (LR)Theenvironments of traffic accidents and the variables of roaddesigns identified by the model could serve as reference forthe police force and regulatory authorities to design and plansand improve traffic safety thus decreasing the ratio of trafficaccidents damage to property and loss of lives

With the advancements of information technology datamining becomes increasinglymature and useful informationwithout preconditions can be found in databases Relationalmodels can be built to determine the correlation betweencharacterization factors of traffic accidents and casualtiesThis study uses the Recursive Feature Elimination (RFE)FRPCA BPNN and LR of Feature Selection to determineimportant factors influencing traffic accidentsThe results canprovide suggestions for improving the occurrence of trafficaccidents Finally the model was statistically evaluated

The remainder of this paper is organized as followsSection 2 reviews the literature concerning the severity ofinjuries in traffic accidents Section 3 presents the FRPCASection 4 discusses the research data Section 5 offers conclu-sions

2 Material and Methods

Many studies have focused on forecasting and modelingtraffic accidents and analyzed the results The results suggestthat the significant factors influencing the occurrence ofaccidentsmust be eliminated or controlled in order to preventtraffic accidents and reduce injuries and deaths

In terms of research methods most studies use BPNN orLR to forecast or model analysis results [6ndash9] Gang andZhuping [10] suggested that the PSO-SVM is better thanBPNN in traffic safety forecasting Chang et al [6] usedthe established modeling method and LR to discuss thecontributing factors and conditions of driving after drinkingThe analysis results showed that law enforcement driversrsquodrinking habits and regulatory knowledge of drunk drivingapparently influenced driversrsquo selecting drunk driving behav-iors Kong and Yang [8] used LR for casualties and drivingspeed in traffic accident survey data and found that regardingthe correlation of collisions between vehicles and pedestriansthe risk of pedestrian death was 26when the vehiclersquos speedwas 50 kmh 50 when the speed was 58 kmh and 82when the speed was 70 kmh However the analysis resultshowed that age was not a major risk factor in death Fu andZhou [7] pointed out that the traditional BPNN has somedefects such as local minima too many iterations and tooslow training Therefore the improved LM-BP neural net-work was used for forecasting The forecast results of trafficaccidents death toll and amount of direct economic losswere significant thus the BP network is applicable to trafficaccident forecasting

A number of recent studies have used data mining orstatistical methods [11ndash15] Karacasua and Er [14] used chi-square significance testing to analyze whether the same ageand gender have similar traffic accidents as well as the cor-relation among education age gender and psychology Thefindings showed that (1) males were more prone to trafficaccidents than females (2) driving while being intoxicatedand speeding were major causes Kanchan et al [13] usedstatistical software for analysis and found that the injuredwere mostly male and the major causes of death were headand abdominal injuries Traffic accidents are a significantpublic health hazard thus first aid should be strengthenedand traffic regulations and health education should be strictlyimplemented Kashani et al [16] used classification andCART to analyze traffic collision data The results showedthat improper passing and not using seat belts were the mostimportant factors influencing the severity of injuries De Onaet al [15] used Latent Class Cluster (LCC) to reduce theheterogeneity of traffic accident data and combined it withBayesian Networks (BNS) to recognize major factors Theresults indicated that weather factors pavement markingsand road width were significant factors

Based on the above discussion this study uses BPNNLR and statistical methods differing from previous studieswhich aim at accident patterns and types The FRPCA isused for data preprocessing which is combined with BPNNand LR models to analyze the performance of the aforesaidfour classification models (BPNN LR FRPCA-BPNN andFRPCA-LR) in forecasting

21 Research Method This study uses four constructsnamely (1) natural factors (2) environmental factors (3)road design and (4) accident types and patterns of roadtraffic accident cases in Taichung City to discuss the factorsinfluencing the occurrence of road traffic accidents Theresearch structure is as shown in Figure 1

22 Data Preprocessing of Feature Selection This study usesRFE as the Feature Selection method which is a FeatureSelection algorithm with the principle proposed by Guyonet al [17] Guyon used RFE to select the key and importantfeature set which not only shortens classification computingtime but also improves the classification accuracy rate Thepurpose of RFE is to calculate the weight vectors of eachfeature which are ordered according to the calculated weightvectors as the basis of classification RFE is an iterativeprocess that eliminates features backwards and its feature setscreening procedure is described as follows

(1) Use current data set to train classifier(2) Calculate the weight of each feature(3) Delete the feature with minimum weight

The iterative process is ended when there is one featureremaining A list of features ordered according to the weightsis obtained as a result of execution and unimportant oruncorrelated features are eliminated from the list first thusthey are listed at the end whereas the most importantfeatures are eliminated last and are listed at the front [18 19]

Mathematical Problems in Engineering 3

Road factors(I) Type(II) Location(III) Conditions(IV) Barriers

Road design(I) Semaphore(II) Separation facility

(directionlane)

Natural factors(I) Weather(II) Light

Accident types and patterns(I) People and motor vehicles(II) Motor vehicles(III) Single vehicle

FRPCA

Logisticregression

Injured-only traffic accident

Feature selection-RFE

BPNNLogisticregressionBPNN

Evaluation and analysis

Original data

Figure 1 Research structure

The RFE selects the feature set in three major stepsimports the data set for classification calculates the weight ofeach feature and deletes the feature with minimum weightFeature ordering is obtained the feature with minimumweight square is removed in each cycle and then the remain-ing features are retrained to obtain a new feature orderingRFE continuously executes this process and a feature orderlist is obtained [20] It is noteworthy that one of the featuresordered in the front does not always enable the classifierto obtain the best classification performance however thecombination of multiple features enables the classifier toobtain the optimum classification performance ThereforeRFE algorithm can select the most complementary featurecombination [4]

23 Backpropagation Neural Network (BPNN) BPNN is themost frequently used supervised learning among the neuralnetworks and is highly effective in classification problems[21] The parameters are divided into structural param-eters and learning parameters The structural parametersinclude the number of hidden layers while the learningparameters include the learning rate initial weight rangeand momentum term Generally Trial and Error is usedto determine the optimal parameter values when selectingstructural parameters and learning parameters The mostused nonlinear transfer function in the hidden layer of BPNNis the log-sigmoid transfer function whose output is between

0 and 1 in order to respond to the negative infinity to positiveinfinity input of neurons

An alternative is the tangent sigmoid transfer functionldquotansigrdquo as shown in the hidden layer The linear transferfunction purelin is in the output layer If the sigmoid transferfunction is used in the output layer the network output isrestricted to a very small range If the linear transfer functionis used in the output layer the network output can be anarbitrary value

24 LR LR can be used to analyze one or several fore-cast values These results have a binary (eg existence ornonexistence of an event) relationship [22 23] LR is derivedfrom the cumulative probability function of the logisticmodel and is a linear probability model which is similarto a linear regression model The difference is that LR cantest the dependent variable of a nominal scale where thediscussed dependent variable is discontinuous especially inbinary classification The purpose of LR is to establish thesimplest and fittest analysis result Furthermore it can beused in a practicalmodel to forecast the relationships betweendependent variables and a set of forecast variances wherethe explanatory variable can be a categorical or continuousvariable

25 FRPCA The nonlinear FRPCA algorithm is deducedfrom the linear fuzzy principal component analysis algo-rithm as introduced by Yang and Wang [24] and the

4 Mathematical Problems in Engineering

nonlinear criteria in blind source separation of Karhunen etal [25]The robust principal component analysis as proposedby Yang andWang is established on the principal componentanalysis learning rule and energy function as proposed by Xuand Yuilles [26] and the objective function bias is proposedThese methods are briefly introduced as follows Xu andYuilles [26] proposed the optimal function of constraint 119906

119894isin

0 1

119864 (119880119908) =

119899

sum

119894=1119906

119894119890 (119909

119894) + 120578

119899

sum

119894=1(1minus119906

119894) (1)

The objective is to minimize 119864(119880119908) where 119883 = 11990911199092 119909119899 is the data set 119880 = 119906

119894| 119894 = 1 119899 is the

membership set 120578 is the threshold 119906119894is the binary variable

and 119908 is the continuous variable rendering gradient descentmethod optimization difficult to solve thus they trans-formed the minimization problem where Gibbs distributionis maximized by the following equation

119875 (119880119908) =

exp (minus120574119864 (119880119908))119885

(2)

where 119885 is the separation function ensuring sum119880int119908119875(119880

119908) = 1 119890(119909119894) can be one of the following functions

1198901 (119909119894) =1003817

1003817

1003817

1003817

1003817

119909

119894minus119908

119879119909

119894119908

1003817

1003817

1003817

1003817

1003817

2

1198902 (119909119894) =1003817

1003817

1003817

1003817

119909

119894

1003817

1003817

1003817

1003817

2minus

1003817

1003817

1003817

1003817

1003817

119908

119879119909

119894

1003817

1003817

1003817

1003817

1003817

2

119908

2 = 119909

119879

119894119909

119894minus

119908

119879119909

119894119909

119879

119894119908

119908

119879119908

(3)

The gradient descent rule for minimizing 1198641 = sum

119899

119894=1 1198901(119909119894)and 1198642 = sum

119899

119894=1 1198902(119909119894) is

119908

new= 119908

old+120572

119905[119910 (119909

119894minus119906) + (119910 minus V) 119909

119894]

119908

new= 119908

old+120572

119905[119909

119894119910minus

119908

119908

119879119908

119910

2]

(4)

where 120572119905is the learning rate 119910 = 119908119879119909

119894

Therefore Yang andWang [24] proposed a new objectivefunction

Min119864 =119899

sum

119894=1119906

1198981119894119890 (119909

119894) + 120578

119899

sum

119894=1(1minus119906

119894)

1198981 (5)

The constraints are 119906119894isin [0 1] 1198981 isin [1infin) where 119906

119894is

the membership value belonging to the 119909119894data cluster (1 minus

119906

119894) is themembership value of the 119909

119894disturbance cluster and

1198981 is the fuzzy variable In this case 119890(119909119894) is the error

between the measured 119909119894and the cluster center which is

similar to the 119862-means algorithm [27]As 119906119894is a continuous variable it avoids the difficulty of

an optimummix of discrete types and continuous types thusthe gradient descentmethod can be used First 119906

119894equals 0 as

calculated by the slope of (2) so

119906

119894=

11 + [119890 (119909

119894) 120578]

1(1198981minus1) (6)

119906

119894is replaced in (2) and the following equation is obtained

119864 =

119899

sum

119894=1[

11 + [119890 (119909

119894) 120578]

1(1198981minus1)]

1198981minus1

119890 (119909

119894)

(7)

On the other hand 119908 gradient is

120597119864

120597119908

= [

11 + [119890 (119909

119894) 120578]

1(1198981minus1)]

1198981

[

120597119890 (119909

119894)

120597119908

] (8)

1198981 is the fuzzy variable If 1198981 = 1 the fuzzy membership isdemoted to a fixed membership and can be determined bythe following rule

119906

119894=

1 if (119890 (119909119894)) lt 120578

0 otherwise(9)

In this case 120578 is the hard threshold where 1198981 is not setbut 1198981 = 2 inmost studies Yang andWang [24] deduced thefollowing process of an optimization algorithm

(1) The number of iterations is set as 119905 = 1 the iterationis constrained as 119879 the learning coefficient is 1205720 isin

(0 1] the soft threshold 120578 is a small positive and theweight 119908 is randomly initialized

(2) In a case of less than 119879 execute step (3) to step (9)(3) Calculate 120572

119905= 1205720(1 minus 119905119879) set 119894 = 1 and 120590 = 0

(4) For observation frequency execute step (5) to step (8)(5) Calculate 119910 = 119908119879119909

119894 119906 = 119910119908 and V = 119908119879119906

(6) The new weight is 119908new= 119908

old+ 120572

119905120573(119909

119894)[119910(119909

119894minus 119906) +

(119910 minus V)119909119894]

(7) The new temporary count is 120575 = 120575 + 1198901(119909119894)(8) Add from 1 to 119894(9) Calculate 120578 = (120575119899) and add from 1 to 119905

It is almost affirmative that the new weight 119908 approaches theprincipal component vector [19 27 28]

3 Case Study

This section is divided into three parts collection of trafficaccident data preprocessing of the research data and sub-stituting the data after Feature Selection in FRPCA BPNNand LR Four groups of information are obtained includingBPNN LR FRPCA-BPNN and FRPCA-LR

31 Data Collection The data variables used in this study are17 input variables including weather light rays road patternaccident site sight distance and separation facility thereare 2471 observationsTheoutput variables are accident typesand patterns including (1) vehicle-vehicle accidents (2)person-automobilemotorcycle and (3) automobilemotor-cycle data variables There are 2096 observations ofCategory (1) vehicle-vehicle there are 146 observations ofCategory (2) person-automobilemotorcycle and there are229 observations of Category (3) automobilemotorcycleThe aforesaid variables are coded as 119884

1 119884

2 and 119884

3

respectively as listed in Table 1

Mathematical Problems in Engineering 5

Table 1 Number of traffic accidents and output variable codes inthe region in 2012

Output variable Number of data Percentage CodePeople and motor vehicles 146 59 119884

1

Motor vehicles 2096 8482 119884

2

Single vehicle 229 93 119884

3

Total 2471 100

32 Feature Selection Result This study uses RFE for datapreprocessing The 17 variables of the database are codedbefore Feature Selection as shown in Table 2The 17 variablesare sequenced according to importance as shown in Table 3

4 Empirical Research Result

This study uses the first 7 variables in order of importanceas obtained by the Feature Selection of RFE as inputvariables while the person-automobilemotorcycle vehicle-vehicle and automobilemotorcycle are output variables asshown in Table 4The 7 input variables and 3 output variablesare substituted in FRPCA BPNN and LR respectively inorder to obtain BPNN LR FRPCA-BPNN and FRPCA-LRclassifier models The experimental procedure is as followsStep (1) 2471 data sets are used as test data the 17 variablesare sequenced according to their importance by using theRFE data preprocessing method and the first 7 variables arerearranged as the test data set (119878

1) Step (2) classifier mod-

eling and performance evaluation the BPNN LR FRPCA-BPNN and FRPCA-LR classifiers are tested respectively inorder to evaluate the performance of the classifiers duringclassification

41 Analysis of BPNN Model The 7 input variables are sub-stituted in FRPCA in order to obtain the scores and loadingsof the principal components The scores of the principalcomponents can be used to classify various observationpoints and to integrate the scores of the principal componentsof each observation point in order to calculate an averageweighted integral indicator The coefficient of the correlationbetween new variables and old variables is called loadingswhich represent the influence or importance of the originalvariable to the new variables where larger loadings representhigher influence

The loadings can be obtained from

119897

119894119895=

119908

119894119895

119904

119895

sdot

radic

120582

119894 (10)

where 119897119894119895is the loadings of No 119895 variable on No 119894 principal

component 119908119894119895is the weight of No 119895 variable on No 119894

principal component 120582119894is the eigenvalue of No 119894 principal

component (ie variance) and 119904119895is the standard deviation of

No 119895 variableExperimental combination 1 the experimental parame-

ters of the BPNN forecasting model are set as follows epochs= 1000 learning rate lr is 01 03 and 05 respectively and

Table 2 Codes of 17 variables

Variable code Variable name119883

1Weather

119883

2Light rays

119883

3Road type

119883

4Speed limit

119883

5Road pattern

119883

6Accident site

119883

7Pavement

119883

8Pavement state

119883

9Pavement defect

119883

10Barrier

119883

11Sight distance

119883

12Signal type

119883

13Signal action

119883

14Separation facility

119883

15In fast or general lane

119883

16In fast and slow lanes

119883

17Edge of pavement

each experiment is repeated 5 times with the results ofthe three experiments as shown in Table 5 Experimentalcombination 2 the FRPAC-BPNN forecasting model is dif-ferent from experimental combination 1 where the principalcomponent scores are converted by executing FRPCA beforebuilding BPNN and then the BPNN forecasting networkis built The experimental results show the learning rate ofthe lr = empirical results of BPNN versus FRPCA-BPNNforecasting models

42 LR Analysis Experimental combination 3 the LR clas-sifier is constructed and the data are imported into LR forclassification forecasting according to the aforesaid test dataset 1198781 Experimental combination 4 the FRPCA-LR classifier

is constructed as experimental combination 2 the 1198781data

set is converted into principal component scores by FRPCAand then the LR classifier is constructed Each experiment isconducted five times the experimental results are as shownin Table 6 The experimental results show that the averageaccuracy rate and standard deviation of the FRPCA-LRmodel are 08506 plusmn 00021 which is better than the 08514 plusmn00031 of the LR classification model

The LR model investigates the impacts on the pattern oftraffic accidents according to the types and patterns of thetraffic accidents The optimal model is shown in Table 7Thissection discusses the correlation between the vehicles andthe environment based on 2325 pieces of data for analysisAfter the deletion of 146 pieces of data involving humanand vehicles the dependent variables are divided into thetwo categories of ldquovehicle to vehiclerdquo and ldquovehicle in itselfrdquoaccording to the types and patterns of traffic accidents Oddratio is adopted to represent the relevant influences of EventA to the occurrence of Event B In Table 7 the odd ratioof crossroads among the road types is 301 meaning thatthe risk of traffic accidents on ldquocrossroadsrdquo is higher than

6 Mathematical Problems in Engineering

Table 3 Sequence of feature attributes of environmental factors

Rank Code Variable name Importance1 119883

16In fast and slow lanes 0069791

2 119883

15In fast or general lane 0045356

3 119883

17Edge of pavement 0042181

4 119883

5Road pattern 0041695

5 119883

6Accident site 0037548

6 119883

14Separation facility 0025303

7 119883

11Sight distance 0018811

8 119883

2Light rays 0017205

9 119883

3Road type 0017063

10 119883

1Weather 0015428

11 119883

8Pavement state 0015322

12 119883

4Speed limit 0012274

13 119883

13Signal action 0009509

14 119883

10Barrier 0008938

15 119883

12Signal type 0002745

16 119883

9Pavement defect 0002059

17 119883

7Pavement 0000180

Table 4 Input (119909) and output (119910) parameters

Code(119909) Input variable Code

(119910) Output variable

119883

5Road pattern

119884

1People and motor vehicles

119883

6Accident site

119884

2Motor vehicles

119883

11Sight distance

119884

3Single vehicle

119883

14Separation facility

119883

15In fast or general lane

119883

16In fast and slow lanes

119883

17Edge of pavement

Table 5 Empirical results of BPNN versus FRPCA-BPNN

BPNN FRPCA-BPNNLearning rate 01 03 05 01 03 05BPNN-1 08232 08067 08089 08607 08594 08583BPNN-2 08523 08412 08335 08615 08572 08591BPNN-3 08508 08543 08560 08557 08623 08599BPNN-4 08550 08530 08530 08563 08553 08572BPNN-5 08574 08567 08542 08584 08607 08609Mean 08477 08424 08411 08585 08590 08591SD 00139 00208 02019 00026 00028 00010

the 138 of ldquoone-way roadsrdquo 119890301minus138 = 119890

123= 342 In

other words the ratio of traffic accidents at intersections is324 times that of one-way roads Likewise the ratio of trafficaccidents on ldquointersectionsrdquo under the category of ldquotrafficlocationsrdquo is also the highest at 724 times that of generalroads 119890572minus374 = 119890

198 Table 7 shows the importance of

Table 6 Empirical results of LR versus FRPCA-LR

Logistic FRPCA-LogisticLogistic-1 08480 08503Logistic-2 08523 08528Logistic-3 08491 08497Logistic-4 08483 08497Logistic-5 08551 08543Mean 08506 08514SD 00031 00021

traffic accident environments and road design to the ratio oftraffic accidents

5 Conclusions

Traffic safety depends on road design road configurationvehicle performance traffic regulations and the effective-ness of implementation The main means of transport inmiddle and low income countries include walking bicyclemotorcycle and bus while that of high income countries isautomobiles Therefore the traffic safety control measuresof high income countries are not completely applicableto middle and low income countries and thus should beimported and improved to fit local transportation and roadusage conditions [29]

The report of the World Health Organization (WHO)indicated that about 12 million people die from trafficaccidents in the world annually about 3400 people diefrom traffic accidents per day approximately 1000 peopleare injured or disabled children pedestrians cyclers andthe elderly are the most vulnerable road users and 85 offatalities and 90 of the disabled live in middle and lowincome countries The scientific analysis of accident data aswell as the implementation of relevant safety measures canprevent the occurrence of traffic accidents thus reducing theseverity of injuries

At present with the rapid development of cities it isnecessary to make efficient forecasting in order that decisionmakers can make preventions and decisions in advancein order to reduce the death rate This study uses RFEFRPCA BPNN and LR to analyze the classification accuracyrate of the traffic accident data of the region Accordingto the experimental results the classification accuracy rateof BPNN LR FRPCA-BPNN and FRPCA-LR is higherthan 80 thus forecast performance is significant Furtheranalysis shows that the network performance of the FRPCA-BPNN and FRPCA-LR classifiers combined with FRPCAis better than BPNN and LR According to Tables 5 and6 the accuracy rate of classifiers FRPCA-BPNN (8589)and FRPCA-LR (8514) combined with FRPCA is higherthan BPNN (8437) and LR (8506) by 152 and 008respectively meaning the FRPCA-BPNN and FRPCA-LRhave better classification forecast ability

In traffic accident analysis or verification results thehuman factor is mostly regarded as the first cause of traf-fic accidents However the road environment has certain

Mathematical Problems in Engineering 7

Table 7 The impacts on the types and patterns of traffic accidents as imposed by the various factors and represented in the LR model

Variables Saliency Odd ratioConstant terms 0067 0540

Relevant environmental factorsRoad types

One-way road 0044 138Crossroad 0028 301

Location of the accidentIntersection 0012 572Ordinary road sections 0032 374

Visibility rangeSatisfactory mdash mdashPoor 0024 159

Relevant road designDirectional facilities

Two-way no-passing markings 0084 167Directional markings 0074 145No directional facilities 0040 226

Traffic separation facilitiesFast lanes and slow lanesNo lane-change markings (without signs) 017 119Lane markings (with signs) 0024 189Lane markings (without signs) 0049 302No lane markings 0005 571

Separation linesSeparations lines between the fast lanes and slow lanes 0142 191No separations lines between the fast lanes and slow lanes 0028 425

SidelinesYes mdash mdashNo 0032 399

Total sample size 2471

correlation and improper intersection design or planning islikely to cause traffic accidents In comparison to other trafficaccident sites a forked road intersection is the most probableaccident site This study used RFE to select 7 input variablesfrom 17 input variables Based on the 7 input variables theenvironmental factor and road design are found to be thecauses of road traffic accidents in the region According tothe statistical data of the Taichung Police Station the 4 maincauses among the 67 causes of accidents are as follows (1)not allowing other vehicles to pass as per regulations (2)not aware of the situation ahead (3) violating specific sign(line) bans and (4) not maintaining a safe driving distanceaccounting for 2372 1354 751 and 822 respectivelyof the total number of traffic accidents and the total propor-tion is as high as 5299The road authoritiesmay refer to the7 variables of traffic accidents and road design as proposedin this study regarding future road designs and plans As forthe 4 main causes of accidents on road sections involving theabove seven variables of traffic accidents and road designthey should be the priorities in the future elimination actionsof the police force If improvements and preventive measures

are made road safety can be substantially increased therebyreducing traffic accidents and fatalitiesThefindings can serveas reference for the police force and management authoritiesto improve roads as well as the assessment and managementmodels for the elimination of traffic offences

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] D D Clarke PWard C Bartle andW Truman ldquoKiller crashesfatal road traffic accidents in the UKrdquo Accident Analysis ampPrevention vol 42 no 2 pp 764ndash770 2010

[2] K-V Hung and L-T Huyen ldquoEducation influence in trafficsafety a case study in Vietnamrdquo IATSS Research vol 34 no 2pp 87ndash93 2011

[3] H Gjerde A S Christophersen P T Normann and JMoslashrlandldquoToxicological investigations of drivers killed in road traffic

8 Mathematical Problems in Engineering

accidents in Norway during 2006ndash2008rdquo Forensic Science Inter-national vol 212 no 1ndash3 pp 102ndash109 2011

[4] Y Komada S Asaoka T Abe and Y Inoue ldquoShort sleepduration sleep disorders and traffic accidentsrdquo IATSS Researchvol 37 no 1 pp 1ndash7 2013

[5] S A Shappell and D A Wiegmann ldquoHuman factors investi-gation and analysis of accidents and incidentsrdquo in Encyclopediaof Forensic Sciences pp 440ndash449 Academic Press 2nd edition2013

[6] L-Y ChangD-J Lin C-HHuang andK-K Chang ldquoAnalysisof contributory factors for driving under the influence ofalcohol a stated choice approachrdquo Transportation Research PartF Traffic Psychology and Behaviour vol 18 pp 11ndash20 2013

[7] H Fu and Y Zhou ldquoThe traffic accident prediction basedon neural networkrdquo in Proceedings of the 2nd InternationalConference on Digital Manufacturing and Automation (ICDMArsquo11) pp 1349ndash1350 IEEE Zhangjiajie Hunan August 2011

[8] C-Y Kong and J-K Yang ldquoLogistic regression analysis ofpedestrian casualty risk in passenger vehicle collisions inChinardquoAccident Analysis and Prevention vol 42 no 4 pp 987ndash993 2010

[9] Y-K Ou Y-C Liu and F-Y Shih ldquoRisk prediction modelfor driversrsquo in-vehicle activitiesmdashapplication of task analysisand back-propagation neural networkrdquoTransportationResearchPart F Traffic Psychology and Behaviour vol 18 pp 83ndash93 2013

[10] R Gang and Z Zhuping ldquoTraffic safety forecasting methodby particle swarm optimization and support vector machinerdquoExpert SystemswithApplications vol 38 no 8 pp 10420ndash104242011

[11] S S Durduran ldquoA decision making system to automaticrecognize of traffic accidents on the basis of a GIS platformrdquoExpert Systems with Applications vol 37 no 12 pp 7729ndash77362010

[12] K El-Basyouny and T Sayed ldquoSafety performance functionsusing traffic conflictsrdquo Safety Science vol 51 no 1 pp 160ndash1642013

[13] T Kanchan V Kulkarni S M Bakkannavar N Kumar andB Unnikrishnan ldquoAnalysis of fatal road traffic accidents in acoastal township of South Indiardquo Journal of Forensic and LegalMedicine vol 19 no 8 pp 448ndash451 2012

[14] M Karacasua and A Er ldquoAn analysis on distribution of trafficfaults in accidents based on driverrsquos age and gender Eskisehircaserdquo ProcediandashSocial and Behavioral Sciences vol 20 pp 776ndash785 2011

[15] J De Ona G Lopez R Mujalli and F J Calvo ldquoAnalysis oftraffic accidents on rural highways using LatentClass Clusteringand BayesianNetworksrdquoAccident Analysis amp Prevention vol 51pp 1ndash10 2013

[16] A T Kashani SM Afshin andA Ranjbari ldquoAnalysis of factorsassociated with traffic injury severity on rural roads in IranrdquoJournal of Injury and Violence Research vol 4 no 1 pp 36ndash412012

[17] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[18] C Chu A-L Hsu K-H Chou P Bandettini C Lin andAlzheimerrsquos Disease Neuroimaging Initiative ldquoDoes featureselection improve classification accuracy Impact of samplesize and feature selection on classification using anatomicalmagnetic resonance imagesrdquoNeuroImage vol 60 no 1 pp 59ndash70 2012

[19] X-H Lin F-F Yang L Zhou et al ldquoA support vector machine-recursive feature elimination feature selectionmethod based onartificial contrast variables and mutual informationrdquo Journal ofChromatography B vol 910 pp 149ndash155 2012

[20] D Wei S Li and M-K Tan ldquoGraph embedding based featureselectionrdquo Neurocomputing vol 93 pp 115ndash125 2012

[21] M Ahmadzadeh A H Fard B Saranjam and H R SalimildquoPrediction of residual stresses in gas arc welding by backpropagation neural networkrdquo NDT amp E International vol 52pp 136ndash143 2012

[22] P Reed and Y-Q Wu ldquoLogistic regression for risk factormodelling in stuttering researchrdquo Journal of Fluency Disordersvol 38 no 2 pp 88ndash101 2013

[23] P Reed ldquoThe effect of traffic tickets on road traffic crashesrdquoAccident Analysis amp Prevention vol 64 pp 86ndash91 2014

[24] T-N Yang and S-D Wang ldquoRobust algorithms for principalcomponent analysisrdquo Pattern Recognition Letters vol 20 no 9pp 927ndash933 1999

[25] J Karhunen P Pajunen and E Oja ldquoThe nonlinear PCAcriterion in blind source separation relations with otherapproachesrdquo Neurocomputing vol 22 no 1ndash3 pp 5ndash20 1998

[26] L Xu and A L Yuille ldquoRobust principal component analysisby self-organizing rules based on statistical physics approachrdquoIEEE Transactions on Neural Networks vol 6 no 1 pp 131ndash1431995

[27] F Gharibnezhad L E Mujica J Rodellar and C-P FritzenldquoDamage detection using robust fuzzy principal componentanalysisrdquo UPCommons Conference Report 2013

[28] P Luukka ldquoNonlinear fuzzy robust PCA algorithms and sim-ilarity classifier in bankruptcy analysisrdquo Expert Systems withApplications vol 37 no 12 pp 8296ndash8302 2010

[29] G-G Zhang K K W Yau and G Chen ldquoRisk factorsassociatedwith traffic violations and accident severity inChinardquoAccident Analysis amp Prevention vol 59 pp 18ndash25 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article The Application of Data Mining Technology ...downloads.hindawi.com/journals/mpe/2015/170635.pdfResearch Article The Application of Data Mining Technology to Build

2 Mathematical Problems in Engineering

as well as the interaction between road environments andtraffic engineering facilities This study identifies the keyfactors that affect traffic accidents using Feature Selectionand establishes models to analyze traffic accidents and theirtypes with various methods such as Fuzzy Robust PrincipalComponent Analysis (FRPCA) Back Propagation NeuralNetwork (BPNN) and FRPCA-Logistic Regression (LR)Theenvironments of traffic accidents and the variables of roaddesigns identified by the model could serve as reference forthe police force and regulatory authorities to design and plansand improve traffic safety thus decreasing the ratio of trafficaccidents damage to property and loss of lives

With the advancements of information technology datamining becomes increasinglymature and useful informationwithout preconditions can be found in databases Relationalmodels can be built to determine the correlation betweencharacterization factors of traffic accidents and casualtiesThis study uses the Recursive Feature Elimination (RFE)FRPCA BPNN and LR of Feature Selection to determineimportant factors influencing traffic accidentsThe results canprovide suggestions for improving the occurrence of trafficaccidents Finally the model was statistically evaluated

The remainder of this paper is organized as followsSection 2 reviews the literature concerning the severity ofinjuries in traffic accidents Section 3 presents the FRPCASection 4 discusses the research data Section 5 offers conclu-sions

2 Material and Methods

Many studies have focused on forecasting and modelingtraffic accidents and analyzed the results The results suggestthat the significant factors influencing the occurrence ofaccidentsmust be eliminated or controlled in order to preventtraffic accidents and reduce injuries and deaths

In terms of research methods most studies use BPNN orLR to forecast or model analysis results [6ndash9] Gang andZhuping [10] suggested that the PSO-SVM is better thanBPNN in traffic safety forecasting Chang et al [6] usedthe established modeling method and LR to discuss thecontributing factors and conditions of driving after drinkingThe analysis results showed that law enforcement driversrsquodrinking habits and regulatory knowledge of drunk drivingapparently influenced driversrsquo selecting drunk driving behav-iors Kong and Yang [8] used LR for casualties and drivingspeed in traffic accident survey data and found that regardingthe correlation of collisions between vehicles and pedestriansthe risk of pedestrian death was 26when the vehiclersquos speedwas 50 kmh 50 when the speed was 58 kmh and 82when the speed was 70 kmh However the analysis resultshowed that age was not a major risk factor in death Fu andZhou [7] pointed out that the traditional BPNN has somedefects such as local minima too many iterations and tooslow training Therefore the improved LM-BP neural net-work was used for forecasting The forecast results of trafficaccidents death toll and amount of direct economic losswere significant thus the BP network is applicable to trafficaccident forecasting

A number of recent studies have used data mining orstatistical methods [11ndash15] Karacasua and Er [14] used chi-square significance testing to analyze whether the same ageand gender have similar traffic accidents as well as the cor-relation among education age gender and psychology Thefindings showed that (1) males were more prone to trafficaccidents than females (2) driving while being intoxicatedand speeding were major causes Kanchan et al [13] usedstatistical software for analysis and found that the injuredwere mostly male and the major causes of death were headand abdominal injuries Traffic accidents are a significantpublic health hazard thus first aid should be strengthenedand traffic regulations and health education should be strictlyimplemented Kashani et al [16] used classification andCART to analyze traffic collision data The results showedthat improper passing and not using seat belts were the mostimportant factors influencing the severity of injuries De Onaet al [15] used Latent Class Cluster (LCC) to reduce theheterogeneity of traffic accident data and combined it withBayesian Networks (BNS) to recognize major factors Theresults indicated that weather factors pavement markingsand road width were significant factors

Based on the above discussion this study uses BPNNLR and statistical methods differing from previous studieswhich aim at accident patterns and types The FRPCA isused for data preprocessing which is combined with BPNNand LR models to analyze the performance of the aforesaidfour classification models (BPNN LR FRPCA-BPNN andFRPCA-LR) in forecasting

21 Research Method This study uses four constructsnamely (1) natural factors (2) environmental factors (3)road design and (4) accident types and patterns of roadtraffic accident cases in Taichung City to discuss the factorsinfluencing the occurrence of road traffic accidents Theresearch structure is as shown in Figure 1

22 Data Preprocessing of Feature Selection This study usesRFE as the Feature Selection method which is a FeatureSelection algorithm with the principle proposed by Guyonet al [17] Guyon used RFE to select the key and importantfeature set which not only shortens classification computingtime but also improves the classification accuracy rate Thepurpose of RFE is to calculate the weight vectors of eachfeature which are ordered according to the calculated weightvectors as the basis of classification RFE is an iterativeprocess that eliminates features backwards and its feature setscreening procedure is described as follows

(1) Use current data set to train classifier(2) Calculate the weight of each feature(3) Delete the feature with minimum weight

The iterative process is ended when there is one featureremaining A list of features ordered according to the weightsis obtained as a result of execution and unimportant oruncorrelated features are eliminated from the list first thusthey are listed at the end whereas the most importantfeatures are eliminated last and are listed at the front [18 19]

Mathematical Problems in Engineering 3

Road factors(I) Type(II) Location(III) Conditions(IV) Barriers

Road design(I) Semaphore(II) Separation facility

(directionlane)

Natural factors(I) Weather(II) Light

Accident types and patterns(I) People and motor vehicles(II) Motor vehicles(III) Single vehicle

FRPCA

Logisticregression

Injured-only traffic accident

Feature selection-RFE

BPNNLogisticregressionBPNN

Evaluation and analysis

Original data

Figure 1 Research structure

The RFE selects the feature set in three major stepsimports the data set for classification calculates the weight ofeach feature and deletes the feature with minimum weightFeature ordering is obtained the feature with minimumweight square is removed in each cycle and then the remain-ing features are retrained to obtain a new feature orderingRFE continuously executes this process and a feature orderlist is obtained [20] It is noteworthy that one of the featuresordered in the front does not always enable the classifierto obtain the best classification performance however thecombination of multiple features enables the classifier toobtain the optimum classification performance ThereforeRFE algorithm can select the most complementary featurecombination [4]

23 Backpropagation Neural Network (BPNN) BPNN is themost frequently used supervised learning among the neuralnetworks and is highly effective in classification problems[21] The parameters are divided into structural param-eters and learning parameters The structural parametersinclude the number of hidden layers while the learningparameters include the learning rate initial weight rangeand momentum term Generally Trial and Error is usedto determine the optimal parameter values when selectingstructural parameters and learning parameters The mostused nonlinear transfer function in the hidden layer of BPNNis the log-sigmoid transfer function whose output is between

0 and 1 in order to respond to the negative infinity to positiveinfinity input of neurons

An alternative is the tangent sigmoid transfer functionldquotansigrdquo as shown in the hidden layer The linear transferfunction purelin is in the output layer If the sigmoid transferfunction is used in the output layer the network output isrestricted to a very small range If the linear transfer functionis used in the output layer the network output can be anarbitrary value

24 LR LR can be used to analyze one or several fore-cast values These results have a binary (eg existence ornonexistence of an event) relationship [22 23] LR is derivedfrom the cumulative probability function of the logisticmodel and is a linear probability model which is similarto a linear regression model The difference is that LR cantest the dependent variable of a nominal scale where thediscussed dependent variable is discontinuous especially inbinary classification The purpose of LR is to establish thesimplest and fittest analysis result Furthermore it can beused in a practicalmodel to forecast the relationships betweendependent variables and a set of forecast variances wherethe explanatory variable can be a categorical or continuousvariable

25 FRPCA The nonlinear FRPCA algorithm is deducedfrom the linear fuzzy principal component analysis algo-rithm as introduced by Yang and Wang [24] and the

4 Mathematical Problems in Engineering

nonlinear criteria in blind source separation of Karhunen etal [25]The robust principal component analysis as proposedby Yang andWang is established on the principal componentanalysis learning rule and energy function as proposed by Xuand Yuilles [26] and the objective function bias is proposedThese methods are briefly introduced as follows Xu andYuilles [26] proposed the optimal function of constraint 119906

119894isin

0 1

119864 (119880119908) =

119899

sum

119894=1119906

119894119890 (119909

119894) + 120578

119899

sum

119894=1(1minus119906

119894) (1)

The objective is to minimize 119864(119880119908) where 119883 = 11990911199092 119909119899 is the data set 119880 = 119906

119894| 119894 = 1 119899 is the

membership set 120578 is the threshold 119906119894is the binary variable

and 119908 is the continuous variable rendering gradient descentmethod optimization difficult to solve thus they trans-formed the minimization problem where Gibbs distributionis maximized by the following equation

119875 (119880119908) =

exp (minus120574119864 (119880119908))119885

(2)

where 119885 is the separation function ensuring sum119880int119908119875(119880

119908) = 1 119890(119909119894) can be one of the following functions

1198901 (119909119894) =1003817

1003817

1003817

1003817

1003817

119909

119894minus119908

119879119909

119894119908

1003817

1003817

1003817

1003817

1003817

2

1198902 (119909119894) =1003817

1003817

1003817

1003817

119909

119894

1003817

1003817

1003817

1003817

2minus

1003817

1003817

1003817

1003817

1003817

119908

119879119909

119894

1003817

1003817

1003817

1003817

1003817

2

119908

2 = 119909

119879

119894119909

119894minus

119908

119879119909

119894119909

119879

119894119908

119908

119879119908

(3)

The gradient descent rule for minimizing 1198641 = sum

119899

119894=1 1198901(119909119894)and 1198642 = sum

119899

119894=1 1198902(119909119894) is

119908

new= 119908

old+120572

119905[119910 (119909

119894minus119906) + (119910 minus V) 119909

119894]

119908

new= 119908

old+120572

119905[119909

119894119910minus

119908

119908

119879119908

119910

2]

(4)

where 120572119905is the learning rate 119910 = 119908119879119909

119894

Therefore Yang andWang [24] proposed a new objectivefunction

Min119864 =119899

sum

119894=1119906

1198981119894119890 (119909

119894) + 120578

119899

sum

119894=1(1minus119906

119894)

1198981 (5)

The constraints are 119906119894isin [0 1] 1198981 isin [1infin) where 119906

119894is

the membership value belonging to the 119909119894data cluster (1 minus

119906

119894) is themembership value of the 119909

119894disturbance cluster and

1198981 is the fuzzy variable In this case 119890(119909119894) is the error

between the measured 119909119894and the cluster center which is

similar to the 119862-means algorithm [27]As 119906119894is a continuous variable it avoids the difficulty of

an optimummix of discrete types and continuous types thusthe gradient descentmethod can be used First 119906

119894equals 0 as

calculated by the slope of (2) so

119906

119894=

11 + [119890 (119909

119894) 120578]

1(1198981minus1) (6)

119906

119894is replaced in (2) and the following equation is obtained

119864 =

119899

sum

119894=1[

11 + [119890 (119909

119894) 120578]

1(1198981minus1)]

1198981minus1

119890 (119909

119894)

(7)

On the other hand 119908 gradient is

120597119864

120597119908

= [

11 + [119890 (119909

119894) 120578]

1(1198981minus1)]

1198981

[

120597119890 (119909

119894)

120597119908

] (8)

1198981 is the fuzzy variable If 1198981 = 1 the fuzzy membership isdemoted to a fixed membership and can be determined bythe following rule

119906

119894=

1 if (119890 (119909119894)) lt 120578

0 otherwise(9)

In this case 120578 is the hard threshold where 1198981 is not setbut 1198981 = 2 inmost studies Yang andWang [24] deduced thefollowing process of an optimization algorithm

(1) The number of iterations is set as 119905 = 1 the iterationis constrained as 119879 the learning coefficient is 1205720 isin

(0 1] the soft threshold 120578 is a small positive and theweight 119908 is randomly initialized

(2) In a case of less than 119879 execute step (3) to step (9)(3) Calculate 120572

119905= 1205720(1 minus 119905119879) set 119894 = 1 and 120590 = 0

(4) For observation frequency execute step (5) to step (8)(5) Calculate 119910 = 119908119879119909

119894 119906 = 119910119908 and V = 119908119879119906

(6) The new weight is 119908new= 119908

old+ 120572

119905120573(119909

119894)[119910(119909

119894minus 119906) +

(119910 minus V)119909119894]

(7) The new temporary count is 120575 = 120575 + 1198901(119909119894)(8) Add from 1 to 119894(9) Calculate 120578 = (120575119899) and add from 1 to 119905

It is almost affirmative that the new weight 119908 approaches theprincipal component vector [19 27 28]

3 Case Study

This section is divided into three parts collection of trafficaccident data preprocessing of the research data and sub-stituting the data after Feature Selection in FRPCA BPNNand LR Four groups of information are obtained includingBPNN LR FRPCA-BPNN and FRPCA-LR

31 Data Collection The data variables used in this study are17 input variables including weather light rays road patternaccident site sight distance and separation facility thereare 2471 observationsTheoutput variables are accident typesand patterns including (1) vehicle-vehicle accidents (2)person-automobilemotorcycle and (3) automobilemotor-cycle data variables There are 2096 observations ofCategory (1) vehicle-vehicle there are 146 observations ofCategory (2) person-automobilemotorcycle and there are229 observations of Category (3) automobilemotorcycleThe aforesaid variables are coded as 119884

1 119884

2 and 119884

3

respectively as listed in Table 1

Mathematical Problems in Engineering 5

Table 1 Number of traffic accidents and output variable codes inthe region in 2012

Output variable Number of data Percentage CodePeople and motor vehicles 146 59 119884

1

Motor vehicles 2096 8482 119884

2

Single vehicle 229 93 119884

3

Total 2471 100

32 Feature Selection Result This study uses RFE for datapreprocessing The 17 variables of the database are codedbefore Feature Selection as shown in Table 2The 17 variablesare sequenced according to importance as shown in Table 3

4 Empirical Research Result

This study uses the first 7 variables in order of importanceas obtained by the Feature Selection of RFE as inputvariables while the person-automobilemotorcycle vehicle-vehicle and automobilemotorcycle are output variables asshown in Table 4The 7 input variables and 3 output variablesare substituted in FRPCA BPNN and LR respectively inorder to obtain BPNN LR FRPCA-BPNN and FRPCA-LRclassifier models The experimental procedure is as followsStep (1) 2471 data sets are used as test data the 17 variablesare sequenced according to their importance by using theRFE data preprocessing method and the first 7 variables arerearranged as the test data set (119878

1) Step (2) classifier mod-

eling and performance evaluation the BPNN LR FRPCA-BPNN and FRPCA-LR classifiers are tested respectively inorder to evaluate the performance of the classifiers duringclassification

41 Analysis of BPNN Model The 7 input variables are sub-stituted in FRPCA in order to obtain the scores and loadingsof the principal components The scores of the principalcomponents can be used to classify various observationpoints and to integrate the scores of the principal componentsof each observation point in order to calculate an averageweighted integral indicator The coefficient of the correlationbetween new variables and old variables is called loadingswhich represent the influence or importance of the originalvariable to the new variables where larger loadings representhigher influence

The loadings can be obtained from

119897

119894119895=

119908

119894119895

119904

119895

sdot

radic

120582

119894 (10)

where 119897119894119895is the loadings of No 119895 variable on No 119894 principal

component 119908119894119895is the weight of No 119895 variable on No 119894

principal component 120582119894is the eigenvalue of No 119894 principal

component (ie variance) and 119904119895is the standard deviation of

No 119895 variableExperimental combination 1 the experimental parame-

ters of the BPNN forecasting model are set as follows epochs= 1000 learning rate lr is 01 03 and 05 respectively and

Table 2 Codes of 17 variables

Variable code Variable name119883

1Weather

119883

2Light rays

119883

3Road type

119883

4Speed limit

119883

5Road pattern

119883

6Accident site

119883

7Pavement

119883

8Pavement state

119883

9Pavement defect

119883

10Barrier

119883

11Sight distance

119883

12Signal type

119883

13Signal action

119883

14Separation facility

119883

15In fast or general lane

119883

16In fast and slow lanes

119883

17Edge of pavement

each experiment is repeated 5 times with the results ofthe three experiments as shown in Table 5 Experimentalcombination 2 the FRPAC-BPNN forecasting model is dif-ferent from experimental combination 1 where the principalcomponent scores are converted by executing FRPCA beforebuilding BPNN and then the BPNN forecasting networkis built The experimental results show the learning rate ofthe lr = empirical results of BPNN versus FRPCA-BPNNforecasting models

42 LR Analysis Experimental combination 3 the LR clas-sifier is constructed and the data are imported into LR forclassification forecasting according to the aforesaid test dataset 1198781 Experimental combination 4 the FRPCA-LR classifier

is constructed as experimental combination 2 the 1198781data

set is converted into principal component scores by FRPCAand then the LR classifier is constructed Each experiment isconducted five times the experimental results are as shownin Table 6 The experimental results show that the averageaccuracy rate and standard deviation of the FRPCA-LRmodel are 08506 plusmn 00021 which is better than the 08514 plusmn00031 of the LR classification model

The LR model investigates the impacts on the pattern oftraffic accidents according to the types and patterns of thetraffic accidents The optimal model is shown in Table 7Thissection discusses the correlation between the vehicles andthe environment based on 2325 pieces of data for analysisAfter the deletion of 146 pieces of data involving humanand vehicles the dependent variables are divided into thetwo categories of ldquovehicle to vehiclerdquo and ldquovehicle in itselfrdquoaccording to the types and patterns of traffic accidents Oddratio is adopted to represent the relevant influences of EventA to the occurrence of Event B In Table 7 the odd ratioof crossroads among the road types is 301 meaning thatthe risk of traffic accidents on ldquocrossroadsrdquo is higher than

6 Mathematical Problems in Engineering

Table 3 Sequence of feature attributes of environmental factors

Rank Code Variable name Importance1 119883

16In fast and slow lanes 0069791

2 119883

15In fast or general lane 0045356

3 119883

17Edge of pavement 0042181

4 119883

5Road pattern 0041695

5 119883

6Accident site 0037548

6 119883

14Separation facility 0025303

7 119883

11Sight distance 0018811

8 119883

2Light rays 0017205

9 119883

3Road type 0017063

10 119883

1Weather 0015428

11 119883

8Pavement state 0015322

12 119883

4Speed limit 0012274

13 119883

13Signal action 0009509

14 119883

10Barrier 0008938

15 119883

12Signal type 0002745

16 119883

9Pavement defect 0002059

17 119883

7Pavement 0000180

Table 4 Input (119909) and output (119910) parameters

Code(119909) Input variable Code

(119910) Output variable

119883

5Road pattern

119884

1People and motor vehicles

119883

6Accident site

119884

2Motor vehicles

119883

11Sight distance

119884

3Single vehicle

119883

14Separation facility

119883

15In fast or general lane

119883

16In fast and slow lanes

119883

17Edge of pavement

Table 5 Empirical results of BPNN versus FRPCA-BPNN

BPNN FRPCA-BPNNLearning rate 01 03 05 01 03 05BPNN-1 08232 08067 08089 08607 08594 08583BPNN-2 08523 08412 08335 08615 08572 08591BPNN-3 08508 08543 08560 08557 08623 08599BPNN-4 08550 08530 08530 08563 08553 08572BPNN-5 08574 08567 08542 08584 08607 08609Mean 08477 08424 08411 08585 08590 08591SD 00139 00208 02019 00026 00028 00010

the 138 of ldquoone-way roadsrdquo 119890301minus138 = 119890

123= 342 In

other words the ratio of traffic accidents at intersections is324 times that of one-way roads Likewise the ratio of trafficaccidents on ldquointersectionsrdquo under the category of ldquotrafficlocationsrdquo is also the highest at 724 times that of generalroads 119890572minus374 = 119890

198 Table 7 shows the importance of

Table 6 Empirical results of LR versus FRPCA-LR

Logistic FRPCA-LogisticLogistic-1 08480 08503Logistic-2 08523 08528Logistic-3 08491 08497Logistic-4 08483 08497Logistic-5 08551 08543Mean 08506 08514SD 00031 00021

traffic accident environments and road design to the ratio oftraffic accidents

5 Conclusions

Traffic safety depends on road design road configurationvehicle performance traffic regulations and the effective-ness of implementation The main means of transport inmiddle and low income countries include walking bicyclemotorcycle and bus while that of high income countries isautomobiles Therefore the traffic safety control measuresof high income countries are not completely applicableto middle and low income countries and thus should beimported and improved to fit local transportation and roadusage conditions [29]

The report of the World Health Organization (WHO)indicated that about 12 million people die from trafficaccidents in the world annually about 3400 people diefrom traffic accidents per day approximately 1000 peopleare injured or disabled children pedestrians cyclers andthe elderly are the most vulnerable road users and 85 offatalities and 90 of the disabled live in middle and lowincome countries The scientific analysis of accident data aswell as the implementation of relevant safety measures canprevent the occurrence of traffic accidents thus reducing theseverity of injuries

At present with the rapid development of cities it isnecessary to make efficient forecasting in order that decisionmakers can make preventions and decisions in advancein order to reduce the death rate This study uses RFEFRPCA BPNN and LR to analyze the classification accuracyrate of the traffic accident data of the region Accordingto the experimental results the classification accuracy rateof BPNN LR FRPCA-BPNN and FRPCA-LR is higherthan 80 thus forecast performance is significant Furtheranalysis shows that the network performance of the FRPCA-BPNN and FRPCA-LR classifiers combined with FRPCAis better than BPNN and LR According to Tables 5 and6 the accuracy rate of classifiers FRPCA-BPNN (8589)and FRPCA-LR (8514) combined with FRPCA is higherthan BPNN (8437) and LR (8506) by 152 and 008respectively meaning the FRPCA-BPNN and FRPCA-LRhave better classification forecast ability

In traffic accident analysis or verification results thehuman factor is mostly regarded as the first cause of traf-fic accidents However the road environment has certain

Mathematical Problems in Engineering 7

Table 7 The impacts on the types and patterns of traffic accidents as imposed by the various factors and represented in the LR model

Variables Saliency Odd ratioConstant terms 0067 0540

Relevant environmental factorsRoad types

One-way road 0044 138Crossroad 0028 301

Location of the accidentIntersection 0012 572Ordinary road sections 0032 374

Visibility rangeSatisfactory mdash mdashPoor 0024 159

Relevant road designDirectional facilities

Two-way no-passing markings 0084 167Directional markings 0074 145No directional facilities 0040 226

Traffic separation facilitiesFast lanes and slow lanesNo lane-change markings (without signs) 017 119Lane markings (with signs) 0024 189Lane markings (without signs) 0049 302No lane markings 0005 571

Separation linesSeparations lines between the fast lanes and slow lanes 0142 191No separations lines between the fast lanes and slow lanes 0028 425

SidelinesYes mdash mdashNo 0032 399

Total sample size 2471

correlation and improper intersection design or planning islikely to cause traffic accidents In comparison to other trafficaccident sites a forked road intersection is the most probableaccident site This study used RFE to select 7 input variablesfrom 17 input variables Based on the 7 input variables theenvironmental factor and road design are found to be thecauses of road traffic accidents in the region According tothe statistical data of the Taichung Police Station the 4 maincauses among the 67 causes of accidents are as follows (1)not allowing other vehicles to pass as per regulations (2)not aware of the situation ahead (3) violating specific sign(line) bans and (4) not maintaining a safe driving distanceaccounting for 2372 1354 751 and 822 respectivelyof the total number of traffic accidents and the total propor-tion is as high as 5299The road authoritiesmay refer to the7 variables of traffic accidents and road design as proposedin this study regarding future road designs and plans As forthe 4 main causes of accidents on road sections involving theabove seven variables of traffic accidents and road designthey should be the priorities in the future elimination actionsof the police force If improvements and preventive measures

are made road safety can be substantially increased therebyreducing traffic accidents and fatalitiesThefindings can serveas reference for the police force and management authoritiesto improve roads as well as the assessment and managementmodels for the elimination of traffic offences

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] D D Clarke PWard C Bartle andW Truman ldquoKiller crashesfatal road traffic accidents in the UKrdquo Accident Analysis ampPrevention vol 42 no 2 pp 764ndash770 2010

[2] K-V Hung and L-T Huyen ldquoEducation influence in trafficsafety a case study in Vietnamrdquo IATSS Research vol 34 no 2pp 87ndash93 2011

[3] H Gjerde A S Christophersen P T Normann and JMoslashrlandldquoToxicological investigations of drivers killed in road traffic

8 Mathematical Problems in Engineering

accidents in Norway during 2006ndash2008rdquo Forensic Science Inter-national vol 212 no 1ndash3 pp 102ndash109 2011

[4] Y Komada S Asaoka T Abe and Y Inoue ldquoShort sleepduration sleep disorders and traffic accidentsrdquo IATSS Researchvol 37 no 1 pp 1ndash7 2013

[5] S A Shappell and D A Wiegmann ldquoHuman factors investi-gation and analysis of accidents and incidentsrdquo in Encyclopediaof Forensic Sciences pp 440ndash449 Academic Press 2nd edition2013

[6] L-Y ChangD-J Lin C-HHuang andK-K Chang ldquoAnalysisof contributory factors for driving under the influence ofalcohol a stated choice approachrdquo Transportation Research PartF Traffic Psychology and Behaviour vol 18 pp 11ndash20 2013

[7] H Fu and Y Zhou ldquoThe traffic accident prediction basedon neural networkrdquo in Proceedings of the 2nd InternationalConference on Digital Manufacturing and Automation (ICDMArsquo11) pp 1349ndash1350 IEEE Zhangjiajie Hunan August 2011

[8] C-Y Kong and J-K Yang ldquoLogistic regression analysis ofpedestrian casualty risk in passenger vehicle collisions inChinardquoAccident Analysis and Prevention vol 42 no 4 pp 987ndash993 2010

[9] Y-K Ou Y-C Liu and F-Y Shih ldquoRisk prediction modelfor driversrsquo in-vehicle activitiesmdashapplication of task analysisand back-propagation neural networkrdquoTransportationResearchPart F Traffic Psychology and Behaviour vol 18 pp 83ndash93 2013

[10] R Gang and Z Zhuping ldquoTraffic safety forecasting methodby particle swarm optimization and support vector machinerdquoExpert SystemswithApplications vol 38 no 8 pp 10420ndash104242011

[11] S S Durduran ldquoA decision making system to automaticrecognize of traffic accidents on the basis of a GIS platformrdquoExpert Systems with Applications vol 37 no 12 pp 7729ndash77362010

[12] K El-Basyouny and T Sayed ldquoSafety performance functionsusing traffic conflictsrdquo Safety Science vol 51 no 1 pp 160ndash1642013

[13] T Kanchan V Kulkarni S M Bakkannavar N Kumar andB Unnikrishnan ldquoAnalysis of fatal road traffic accidents in acoastal township of South Indiardquo Journal of Forensic and LegalMedicine vol 19 no 8 pp 448ndash451 2012

[14] M Karacasua and A Er ldquoAn analysis on distribution of trafficfaults in accidents based on driverrsquos age and gender Eskisehircaserdquo ProcediandashSocial and Behavioral Sciences vol 20 pp 776ndash785 2011

[15] J De Ona G Lopez R Mujalli and F J Calvo ldquoAnalysis oftraffic accidents on rural highways using LatentClass Clusteringand BayesianNetworksrdquoAccident Analysis amp Prevention vol 51pp 1ndash10 2013

[16] A T Kashani SM Afshin andA Ranjbari ldquoAnalysis of factorsassociated with traffic injury severity on rural roads in IranrdquoJournal of Injury and Violence Research vol 4 no 1 pp 36ndash412012

[17] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[18] C Chu A-L Hsu K-H Chou P Bandettini C Lin andAlzheimerrsquos Disease Neuroimaging Initiative ldquoDoes featureselection improve classification accuracy Impact of samplesize and feature selection on classification using anatomicalmagnetic resonance imagesrdquoNeuroImage vol 60 no 1 pp 59ndash70 2012

[19] X-H Lin F-F Yang L Zhou et al ldquoA support vector machine-recursive feature elimination feature selectionmethod based onartificial contrast variables and mutual informationrdquo Journal ofChromatography B vol 910 pp 149ndash155 2012

[20] D Wei S Li and M-K Tan ldquoGraph embedding based featureselectionrdquo Neurocomputing vol 93 pp 115ndash125 2012

[21] M Ahmadzadeh A H Fard B Saranjam and H R SalimildquoPrediction of residual stresses in gas arc welding by backpropagation neural networkrdquo NDT amp E International vol 52pp 136ndash143 2012

[22] P Reed and Y-Q Wu ldquoLogistic regression for risk factormodelling in stuttering researchrdquo Journal of Fluency Disordersvol 38 no 2 pp 88ndash101 2013

[23] P Reed ldquoThe effect of traffic tickets on road traffic crashesrdquoAccident Analysis amp Prevention vol 64 pp 86ndash91 2014

[24] T-N Yang and S-D Wang ldquoRobust algorithms for principalcomponent analysisrdquo Pattern Recognition Letters vol 20 no 9pp 927ndash933 1999

[25] J Karhunen P Pajunen and E Oja ldquoThe nonlinear PCAcriterion in blind source separation relations with otherapproachesrdquo Neurocomputing vol 22 no 1ndash3 pp 5ndash20 1998

[26] L Xu and A L Yuille ldquoRobust principal component analysisby self-organizing rules based on statistical physics approachrdquoIEEE Transactions on Neural Networks vol 6 no 1 pp 131ndash1431995

[27] F Gharibnezhad L E Mujica J Rodellar and C-P FritzenldquoDamage detection using robust fuzzy principal componentanalysisrdquo UPCommons Conference Report 2013

[28] P Luukka ldquoNonlinear fuzzy robust PCA algorithms and sim-ilarity classifier in bankruptcy analysisrdquo Expert Systems withApplications vol 37 no 12 pp 8296ndash8302 2010

[29] G-G Zhang K K W Yau and G Chen ldquoRisk factorsassociatedwith traffic violations and accident severity inChinardquoAccident Analysis amp Prevention vol 59 pp 18ndash25 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article The Application of Data Mining Technology ...downloads.hindawi.com/journals/mpe/2015/170635.pdfResearch Article The Application of Data Mining Technology to Build

Mathematical Problems in Engineering 3

Road factors(I) Type(II) Location(III) Conditions(IV) Barriers

Road design(I) Semaphore(II) Separation facility

(directionlane)

Natural factors(I) Weather(II) Light

Accident types and patterns(I) People and motor vehicles(II) Motor vehicles(III) Single vehicle

FRPCA

Logisticregression

Injured-only traffic accident

Feature selection-RFE

BPNNLogisticregressionBPNN

Evaluation and analysis

Original data

Figure 1 Research structure

The RFE selects the feature set in three major stepsimports the data set for classification calculates the weight ofeach feature and deletes the feature with minimum weightFeature ordering is obtained the feature with minimumweight square is removed in each cycle and then the remain-ing features are retrained to obtain a new feature orderingRFE continuously executes this process and a feature orderlist is obtained [20] It is noteworthy that one of the featuresordered in the front does not always enable the classifierto obtain the best classification performance however thecombination of multiple features enables the classifier toobtain the optimum classification performance ThereforeRFE algorithm can select the most complementary featurecombination [4]

23 Backpropagation Neural Network (BPNN) BPNN is themost frequently used supervised learning among the neuralnetworks and is highly effective in classification problems[21] The parameters are divided into structural param-eters and learning parameters The structural parametersinclude the number of hidden layers while the learningparameters include the learning rate initial weight rangeand momentum term Generally Trial and Error is usedto determine the optimal parameter values when selectingstructural parameters and learning parameters The mostused nonlinear transfer function in the hidden layer of BPNNis the log-sigmoid transfer function whose output is between

0 and 1 in order to respond to the negative infinity to positiveinfinity input of neurons

An alternative is the tangent sigmoid transfer functionldquotansigrdquo as shown in the hidden layer The linear transferfunction purelin is in the output layer If the sigmoid transferfunction is used in the output layer the network output isrestricted to a very small range If the linear transfer functionis used in the output layer the network output can be anarbitrary value

24 LR LR can be used to analyze one or several fore-cast values These results have a binary (eg existence ornonexistence of an event) relationship [22 23] LR is derivedfrom the cumulative probability function of the logisticmodel and is a linear probability model which is similarto a linear regression model The difference is that LR cantest the dependent variable of a nominal scale where thediscussed dependent variable is discontinuous especially inbinary classification The purpose of LR is to establish thesimplest and fittest analysis result Furthermore it can beused in a practicalmodel to forecast the relationships betweendependent variables and a set of forecast variances wherethe explanatory variable can be a categorical or continuousvariable

25 FRPCA The nonlinear FRPCA algorithm is deducedfrom the linear fuzzy principal component analysis algo-rithm as introduced by Yang and Wang [24] and the

4 Mathematical Problems in Engineering

nonlinear criteria in blind source separation of Karhunen etal [25]The robust principal component analysis as proposedby Yang andWang is established on the principal componentanalysis learning rule and energy function as proposed by Xuand Yuilles [26] and the objective function bias is proposedThese methods are briefly introduced as follows Xu andYuilles [26] proposed the optimal function of constraint 119906

119894isin

0 1

119864 (119880119908) =

119899

sum

119894=1119906

119894119890 (119909

119894) + 120578

119899

sum

119894=1(1minus119906

119894) (1)

The objective is to minimize 119864(119880119908) where 119883 = 11990911199092 119909119899 is the data set 119880 = 119906

119894| 119894 = 1 119899 is the

membership set 120578 is the threshold 119906119894is the binary variable

and 119908 is the continuous variable rendering gradient descentmethod optimization difficult to solve thus they trans-formed the minimization problem where Gibbs distributionis maximized by the following equation

119875 (119880119908) =

exp (minus120574119864 (119880119908))119885

(2)

where 119885 is the separation function ensuring sum119880int119908119875(119880

119908) = 1 119890(119909119894) can be one of the following functions

1198901 (119909119894) =1003817

1003817

1003817

1003817

1003817

119909

119894minus119908

119879119909

119894119908

1003817

1003817

1003817

1003817

1003817

2

1198902 (119909119894) =1003817

1003817

1003817

1003817

119909

119894

1003817

1003817

1003817

1003817

2minus

1003817

1003817

1003817

1003817

1003817

119908

119879119909

119894

1003817

1003817

1003817

1003817

1003817

2

119908

2 = 119909

119879

119894119909

119894minus

119908

119879119909

119894119909

119879

119894119908

119908

119879119908

(3)

The gradient descent rule for minimizing 1198641 = sum

119899

119894=1 1198901(119909119894)and 1198642 = sum

119899

119894=1 1198902(119909119894) is

119908

new= 119908

old+120572

119905[119910 (119909

119894minus119906) + (119910 minus V) 119909

119894]

119908

new= 119908

old+120572

119905[119909

119894119910minus

119908

119908

119879119908

119910

2]

(4)

where 120572119905is the learning rate 119910 = 119908119879119909

119894

Therefore Yang andWang [24] proposed a new objectivefunction

Min119864 =119899

sum

119894=1119906

1198981119894119890 (119909

119894) + 120578

119899

sum

119894=1(1minus119906

119894)

1198981 (5)

The constraints are 119906119894isin [0 1] 1198981 isin [1infin) where 119906

119894is

the membership value belonging to the 119909119894data cluster (1 minus

119906

119894) is themembership value of the 119909

119894disturbance cluster and

1198981 is the fuzzy variable In this case 119890(119909119894) is the error

between the measured 119909119894and the cluster center which is

similar to the 119862-means algorithm [27]As 119906119894is a continuous variable it avoids the difficulty of

an optimummix of discrete types and continuous types thusthe gradient descentmethod can be used First 119906

119894equals 0 as

calculated by the slope of (2) so

119906

119894=

11 + [119890 (119909

119894) 120578]

1(1198981minus1) (6)

119906

119894is replaced in (2) and the following equation is obtained

119864 =

119899

sum

119894=1[

11 + [119890 (119909

119894) 120578]

1(1198981minus1)]

1198981minus1

119890 (119909

119894)

(7)

On the other hand 119908 gradient is

120597119864

120597119908

= [

11 + [119890 (119909

119894) 120578]

1(1198981minus1)]

1198981

[

120597119890 (119909

119894)

120597119908

] (8)

1198981 is the fuzzy variable If 1198981 = 1 the fuzzy membership isdemoted to a fixed membership and can be determined bythe following rule

119906

119894=

1 if (119890 (119909119894)) lt 120578

0 otherwise(9)

In this case 120578 is the hard threshold where 1198981 is not setbut 1198981 = 2 inmost studies Yang andWang [24] deduced thefollowing process of an optimization algorithm

(1) The number of iterations is set as 119905 = 1 the iterationis constrained as 119879 the learning coefficient is 1205720 isin

(0 1] the soft threshold 120578 is a small positive and theweight 119908 is randomly initialized

(2) In a case of less than 119879 execute step (3) to step (9)(3) Calculate 120572

119905= 1205720(1 minus 119905119879) set 119894 = 1 and 120590 = 0

(4) For observation frequency execute step (5) to step (8)(5) Calculate 119910 = 119908119879119909

119894 119906 = 119910119908 and V = 119908119879119906

(6) The new weight is 119908new= 119908

old+ 120572

119905120573(119909

119894)[119910(119909

119894minus 119906) +

(119910 minus V)119909119894]

(7) The new temporary count is 120575 = 120575 + 1198901(119909119894)(8) Add from 1 to 119894(9) Calculate 120578 = (120575119899) and add from 1 to 119905

It is almost affirmative that the new weight 119908 approaches theprincipal component vector [19 27 28]

3 Case Study

This section is divided into three parts collection of trafficaccident data preprocessing of the research data and sub-stituting the data after Feature Selection in FRPCA BPNNand LR Four groups of information are obtained includingBPNN LR FRPCA-BPNN and FRPCA-LR

31 Data Collection The data variables used in this study are17 input variables including weather light rays road patternaccident site sight distance and separation facility thereare 2471 observationsTheoutput variables are accident typesand patterns including (1) vehicle-vehicle accidents (2)person-automobilemotorcycle and (3) automobilemotor-cycle data variables There are 2096 observations ofCategory (1) vehicle-vehicle there are 146 observations ofCategory (2) person-automobilemotorcycle and there are229 observations of Category (3) automobilemotorcycleThe aforesaid variables are coded as 119884

1 119884

2 and 119884

3

respectively as listed in Table 1

Mathematical Problems in Engineering 5

Table 1 Number of traffic accidents and output variable codes inthe region in 2012

Output variable Number of data Percentage CodePeople and motor vehicles 146 59 119884

1

Motor vehicles 2096 8482 119884

2

Single vehicle 229 93 119884

3

Total 2471 100

32 Feature Selection Result This study uses RFE for datapreprocessing The 17 variables of the database are codedbefore Feature Selection as shown in Table 2The 17 variablesare sequenced according to importance as shown in Table 3

4 Empirical Research Result

This study uses the first 7 variables in order of importanceas obtained by the Feature Selection of RFE as inputvariables while the person-automobilemotorcycle vehicle-vehicle and automobilemotorcycle are output variables asshown in Table 4The 7 input variables and 3 output variablesare substituted in FRPCA BPNN and LR respectively inorder to obtain BPNN LR FRPCA-BPNN and FRPCA-LRclassifier models The experimental procedure is as followsStep (1) 2471 data sets are used as test data the 17 variablesare sequenced according to their importance by using theRFE data preprocessing method and the first 7 variables arerearranged as the test data set (119878

1) Step (2) classifier mod-

eling and performance evaluation the BPNN LR FRPCA-BPNN and FRPCA-LR classifiers are tested respectively inorder to evaluate the performance of the classifiers duringclassification

41 Analysis of BPNN Model The 7 input variables are sub-stituted in FRPCA in order to obtain the scores and loadingsof the principal components The scores of the principalcomponents can be used to classify various observationpoints and to integrate the scores of the principal componentsof each observation point in order to calculate an averageweighted integral indicator The coefficient of the correlationbetween new variables and old variables is called loadingswhich represent the influence or importance of the originalvariable to the new variables where larger loadings representhigher influence

The loadings can be obtained from

119897

119894119895=

119908

119894119895

119904

119895

sdot

radic

120582

119894 (10)

where 119897119894119895is the loadings of No 119895 variable on No 119894 principal

component 119908119894119895is the weight of No 119895 variable on No 119894

principal component 120582119894is the eigenvalue of No 119894 principal

component (ie variance) and 119904119895is the standard deviation of

No 119895 variableExperimental combination 1 the experimental parame-

ters of the BPNN forecasting model are set as follows epochs= 1000 learning rate lr is 01 03 and 05 respectively and

Table 2 Codes of 17 variables

Variable code Variable name119883

1Weather

119883

2Light rays

119883

3Road type

119883

4Speed limit

119883

5Road pattern

119883

6Accident site

119883

7Pavement

119883

8Pavement state

119883

9Pavement defect

119883

10Barrier

119883

11Sight distance

119883

12Signal type

119883

13Signal action

119883

14Separation facility

119883

15In fast or general lane

119883

16In fast and slow lanes

119883

17Edge of pavement

each experiment is repeated 5 times with the results ofthe three experiments as shown in Table 5 Experimentalcombination 2 the FRPAC-BPNN forecasting model is dif-ferent from experimental combination 1 where the principalcomponent scores are converted by executing FRPCA beforebuilding BPNN and then the BPNN forecasting networkis built The experimental results show the learning rate ofthe lr = empirical results of BPNN versus FRPCA-BPNNforecasting models

42 LR Analysis Experimental combination 3 the LR clas-sifier is constructed and the data are imported into LR forclassification forecasting according to the aforesaid test dataset 1198781 Experimental combination 4 the FRPCA-LR classifier

is constructed as experimental combination 2 the 1198781data

set is converted into principal component scores by FRPCAand then the LR classifier is constructed Each experiment isconducted five times the experimental results are as shownin Table 6 The experimental results show that the averageaccuracy rate and standard deviation of the FRPCA-LRmodel are 08506 plusmn 00021 which is better than the 08514 plusmn00031 of the LR classification model

The LR model investigates the impacts on the pattern oftraffic accidents according to the types and patterns of thetraffic accidents The optimal model is shown in Table 7Thissection discusses the correlation between the vehicles andthe environment based on 2325 pieces of data for analysisAfter the deletion of 146 pieces of data involving humanand vehicles the dependent variables are divided into thetwo categories of ldquovehicle to vehiclerdquo and ldquovehicle in itselfrdquoaccording to the types and patterns of traffic accidents Oddratio is adopted to represent the relevant influences of EventA to the occurrence of Event B In Table 7 the odd ratioof crossroads among the road types is 301 meaning thatthe risk of traffic accidents on ldquocrossroadsrdquo is higher than

6 Mathematical Problems in Engineering

Table 3 Sequence of feature attributes of environmental factors

Rank Code Variable name Importance1 119883

16In fast and slow lanes 0069791

2 119883

15In fast or general lane 0045356

3 119883

17Edge of pavement 0042181

4 119883

5Road pattern 0041695

5 119883

6Accident site 0037548

6 119883

14Separation facility 0025303

7 119883

11Sight distance 0018811

8 119883

2Light rays 0017205

9 119883

3Road type 0017063

10 119883

1Weather 0015428

11 119883

8Pavement state 0015322

12 119883

4Speed limit 0012274

13 119883

13Signal action 0009509

14 119883

10Barrier 0008938

15 119883

12Signal type 0002745

16 119883

9Pavement defect 0002059

17 119883

7Pavement 0000180

Table 4 Input (119909) and output (119910) parameters

Code(119909) Input variable Code

(119910) Output variable

119883

5Road pattern

119884

1People and motor vehicles

119883

6Accident site

119884

2Motor vehicles

119883

11Sight distance

119884

3Single vehicle

119883

14Separation facility

119883

15In fast or general lane

119883

16In fast and slow lanes

119883

17Edge of pavement

Table 5 Empirical results of BPNN versus FRPCA-BPNN

BPNN FRPCA-BPNNLearning rate 01 03 05 01 03 05BPNN-1 08232 08067 08089 08607 08594 08583BPNN-2 08523 08412 08335 08615 08572 08591BPNN-3 08508 08543 08560 08557 08623 08599BPNN-4 08550 08530 08530 08563 08553 08572BPNN-5 08574 08567 08542 08584 08607 08609Mean 08477 08424 08411 08585 08590 08591SD 00139 00208 02019 00026 00028 00010

the 138 of ldquoone-way roadsrdquo 119890301minus138 = 119890

123= 342 In

other words the ratio of traffic accidents at intersections is324 times that of one-way roads Likewise the ratio of trafficaccidents on ldquointersectionsrdquo under the category of ldquotrafficlocationsrdquo is also the highest at 724 times that of generalroads 119890572minus374 = 119890

198 Table 7 shows the importance of

Table 6 Empirical results of LR versus FRPCA-LR

Logistic FRPCA-LogisticLogistic-1 08480 08503Logistic-2 08523 08528Logistic-3 08491 08497Logistic-4 08483 08497Logistic-5 08551 08543Mean 08506 08514SD 00031 00021

traffic accident environments and road design to the ratio oftraffic accidents

5 Conclusions

Traffic safety depends on road design road configurationvehicle performance traffic regulations and the effective-ness of implementation The main means of transport inmiddle and low income countries include walking bicyclemotorcycle and bus while that of high income countries isautomobiles Therefore the traffic safety control measuresof high income countries are not completely applicableto middle and low income countries and thus should beimported and improved to fit local transportation and roadusage conditions [29]

The report of the World Health Organization (WHO)indicated that about 12 million people die from trafficaccidents in the world annually about 3400 people diefrom traffic accidents per day approximately 1000 peopleare injured or disabled children pedestrians cyclers andthe elderly are the most vulnerable road users and 85 offatalities and 90 of the disabled live in middle and lowincome countries The scientific analysis of accident data aswell as the implementation of relevant safety measures canprevent the occurrence of traffic accidents thus reducing theseverity of injuries

At present with the rapid development of cities it isnecessary to make efficient forecasting in order that decisionmakers can make preventions and decisions in advancein order to reduce the death rate This study uses RFEFRPCA BPNN and LR to analyze the classification accuracyrate of the traffic accident data of the region Accordingto the experimental results the classification accuracy rateof BPNN LR FRPCA-BPNN and FRPCA-LR is higherthan 80 thus forecast performance is significant Furtheranalysis shows that the network performance of the FRPCA-BPNN and FRPCA-LR classifiers combined with FRPCAis better than BPNN and LR According to Tables 5 and6 the accuracy rate of classifiers FRPCA-BPNN (8589)and FRPCA-LR (8514) combined with FRPCA is higherthan BPNN (8437) and LR (8506) by 152 and 008respectively meaning the FRPCA-BPNN and FRPCA-LRhave better classification forecast ability

In traffic accident analysis or verification results thehuman factor is mostly regarded as the first cause of traf-fic accidents However the road environment has certain

Mathematical Problems in Engineering 7

Table 7 The impacts on the types and patterns of traffic accidents as imposed by the various factors and represented in the LR model

Variables Saliency Odd ratioConstant terms 0067 0540

Relevant environmental factorsRoad types

One-way road 0044 138Crossroad 0028 301

Location of the accidentIntersection 0012 572Ordinary road sections 0032 374

Visibility rangeSatisfactory mdash mdashPoor 0024 159

Relevant road designDirectional facilities

Two-way no-passing markings 0084 167Directional markings 0074 145No directional facilities 0040 226

Traffic separation facilitiesFast lanes and slow lanesNo lane-change markings (without signs) 017 119Lane markings (with signs) 0024 189Lane markings (without signs) 0049 302No lane markings 0005 571

Separation linesSeparations lines between the fast lanes and slow lanes 0142 191No separations lines between the fast lanes and slow lanes 0028 425

SidelinesYes mdash mdashNo 0032 399

Total sample size 2471

correlation and improper intersection design or planning islikely to cause traffic accidents In comparison to other trafficaccident sites a forked road intersection is the most probableaccident site This study used RFE to select 7 input variablesfrom 17 input variables Based on the 7 input variables theenvironmental factor and road design are found to be thecauses of road traffic accidents in the region According tothe statistical data of the Taichung Police Station the 4 maincauses among the 67 causes of accidents are as follows (1)not allowing other vehicles to pass as per regulations (2)not aware of the situation ahead (3) violating specific sign(line) bans and (4) not maintaining a safe driving distanceaccounting for 2372 1354 751 and 822 respectivelyof the total number of traffic accidents and the total propor-tion is as high as 5299The road authoritiesmay refer to the7 variables of traffic accidents and road design as proposedin this study regarding future road designs and plans As forthe 4 main causes of accidents on road sections involving theabove seven variables of traffic accidents and road designthey should be the priorities in the future elimination actionsof the police force If improvements and preventive measures

are made road safety can be substantially increased therebyreducing traffic accidents and fatalitiesThefindings can serveas reference for the police force and management authoritiesto improve roads as well as the assessment and managementmodels for the elimination of traffic offences

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] D D Clarke PWard C Bartle andW Truman ldquoKiller crashesfatal road traffic accidents in the UKrdquo Accident Analysis ampPrevention vol 42 no 2 pp 764ndash770 2010

[2] K-V Hung and L-T Huyen ldquoEducation influence in trafficsafety a case study in Vietnamrdquo IATSS Research vol 34 no 2pp 87ndash93 2011

[3] H Gjerde A S Christophersen P T Normann and JMoslashrlandldquoToxicological investigations of drivers killed in road traffic

8 Mathematical Problems in Engineering

accidents in Norway during 2006ndash2008rdquo Forensic Science Inter-national vol 212 no 1ndash3 pp 102ndash109 2011

[4] Y Komada S Asaoka T Abe and Y Inoue ldquoShort sleepduration sleep disorders and traffic accidentsrdquo IATSS Researchvol 37 no 1 pp 1ndash7 2013

[5] S A Shappell and D A Wiegmann ldquoHuman factors investi-gation and analysis of accidents and incidentsrdquo in Encyclopediaof Forensic Sciences pp 440ndash449 Academic Press 2nd edition2013

[6] L-Y ChangD-J Lin C-HHuang andK-K Chang ldquoAnalysisof contributory factors for driving under the influence ofalcohol a stated choice approachrdquo Transportation Research PartF Traffic Psychology and Behaviour vol 18 pp 11ndash20 2013

[7] H Fu and Y Zhou ldquoThe traffic accident prediction basedon neural networkrdquo in Proceedings of the 2nd InternationalConference on Digital Manufacturing and Automation (ICDMArsquo11) pp 1349ndash1350 IEEE Zhangjiajie Hunan August 2011

[8] C-Y Kong and J-K Yang ldquoLogistic regression analysis ofpedestrian casualty risk in passenger vehicle collisions inChinardquoAccident Analysis and Prevention vol 42 no 4 pp 987ndash993 2010

[9] Y-K Ou Y-C Liu and F-Y Shih ldquoRisk prediction modelfor driversrsquo in-vehicle activitiesmdashapplication of task analysisand back-propagation neural networkrdquoTransportationResearchPart F Traffic Psychology and Behaviour vol 18 pp 83ndash93 2013

[10] R Gang and Z Zhuping ldquoTraffic safety forecasting methodby particle swarm optimization and support vector machinerdquoExpert SystemswithApplications vol 38 no 8 pp 10420ndash104242011

[11] S S Durduran ldquoA decision making system to automaticrecognize of traffic accidents on the basis of a GIS platformrdquoExpert Systems with Applications vol 37 no 12 pp 7729ndash77362010

[12] K El-Basyouny and T Sayed ldquoSafety performance functionsusing traffic conflictsrdquo Safety Science vol 51 no 1 pp 160ndash1642013

[13] T Kanchan V Kulkarni S M Bakkannavar N Kumar andB Unnikrishnan ldquoAnalysis of fatal road traffic accidents in acoastal township of South Indiardquo Journal of Forensic and LegalMedicine vol 19 no 8 pp 448ndash451 2012

[14] M Karacasua and A Er ldquoAn analysis on distribution of trafficfaults in accidents based on driverrsquos age and gender Eskisehircaserdquo ProcediandashSocial and Behavioral Sciences vol 20 pp 776ndash785 2011

[15] J De Ona G Lopez R Mujalli and F J Calvo ldquoAnalysis oftraffic accidents on rural highways using LatentClass Clusteringand BayesianNetworksrdquoAccident Analysis amp Prevention vol 51pp 1ndash10 2013

[16] A T Kashani SM Afshin andA Ranjbari ldquoAnalysis of factorsassociated with traffic injury severity on rural roads in IranrdquoJournal of Injury and Violence Research vol 4 no 1 pp 36ndash412012

[17] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[18] C Chu A-L Hsu K-H Chou P Bandettini C Lin andAlzheimerrsquos Disease Neuroimaging Initiative ldquoDoes featureselection improve classification accuracy Impact of samplesize and feature selection on classification using anatomicalmagnetic resonance imagesrdquoNeuroImage vol 60 no 1 pp 59ndash70 2012

[19] X-H Lin F-F Yang L Zhou et al ldquoA support vector machine-recursive feature elimination feature selectionmethod based onartificial contrast variables and mutual informationrdquo Journal ofChromatography B vol 910 pp 149ndash155 2012

[20] D Wei S Li and M-K Tan ldquoGraph embedding based featureselectionrdquo Neurocomputing vol 93 pp 115ndash125 2012

[21] M Ahmadzadeh A H Fard B Saranjam and H R SalimildquoPrediction of residual stresses in gas arc welding by backpropagation neural networkrdquo NDT amp E International vol 52pp 136ndash143 2012

[22] P Reed and Y-Q Wu ldquoLogistic regression for risk factormodelling in stuttering researchrdquo Journal of Fluency Disordersvol 38 no 2 pp 88ndash101 2013

[23] P Reed ldquoThe effect of traffic tickets on road traffic crashesrdquoAccident Analysis amp Prevention vol 64 pp 86ndash91 2014

[24] T-N Yang and S-D Wang ldquoRobust algorithms for principalcomponent analysisrdquo Pattern Recognition Letters vol 20 no 9pp 927ndash933 1999

[25] J Karhunen P Pajunen and E Oja ldquoThe nonlinear PCAcriterion in blind source separation relations with otherapproachesrdquo Neurocomputing vol 22 no 1ndash3 pp 5ndash20 1998

[26] L Xu and A L Yuille ldquoRobust principal component analysisby self-organizing rules based on statistical physics approachrdquoIEEE Transactions on Neural Networks vol 6 no 1 pp 131ndash1431995

[27] F Gharibnezhad L E Mujica J Rodellar and C-P FritzenldquoDamage detection using robust fuzzy principal componentanalysisrdquo UPCommons Conference Report 2013

[28] P Luukka ldquoNonlinear fuzzy robust PCA algorithms and sim-ilarity classifier in bankruptcy analysisrdquo Expert Systems withApplications vol 37 no 12 pp 8296ndash8302 2010

[29] G-G Zhang K K W Yau and G Chen ldquoRisk factorsassociatedwith traffic violations and accident severity inChinardquoAccident Analysis amp Prevention vol 59 pp 18ndash25 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article The Application of Data Mining Technology ...downloads.hindawi.com/journals/mpe/2015/170635.pdfResearch Article The Application of Data Mining Technology to Build

4 Mathematical Problems in Engineering

nonlinear criteria in blind source separation of Karhunen etal [25]The robust principal component analysis as proposedby Yang andWang is established on the principal componentanalysis learning rule and energy function as proposed by Xuand Yuilles [26] and the objective function bias is proposedThese methods are briefly introduced as follows Xu andYuilles [26] proposed the optimal function of constraint 119906

119894isin

0 1

119864 (119880119908) =

119899

sum

119894=1119906

119894119890 (119909

119894) + 120578

119899

sum

119894=1(1minus119906

119894) (1)

The objective is to minimize 119864(119880119908) where 119883 = 11990911199092 119909119899 is the data set 119880 = 119906

119894| 119894 = 1 119899 is the

membership set 120578 is the threshold 119906119894is the binary variable

and 119908 is the continuous variable rendering gradient descentmethod optimization difficult to solve thus they trans-formed the minimization problem where Gibbs distributionis maximized by the following equation

119875 (119880119908) =

exp (minus120574119864 (119880119908))119885

(2)

where 119885 is the separation function ensuring sum119880int119908119875(119880

119908) = 1 119890(119909119894) can be one of the following functions

1198901 (119909119894) =1003817

1003817

1003817

1003817

1003817

119909

119894minus119908

119879119909

119894119908

1003817

1003817

1003817

1003817

1003817

2

1198902 (119909119894) =1003817

1003817

1003817

1003817

119909

119894

1003817

1003817

1003817

1003817

2minus

1003817

1003817

1003817

1003817

1003817

119908

119879119909

119894

1003817

1003817

1003817

1003817

1003817

2

119908

2 = 119909

119879

119894119909

119894minus

119908

119879119909

119894119909

119879

119894119908

119908

119879119908

(3)

The gradient descent rule for minimizing 1198641 = sum

119899

119894=1 1198901(119909119894)and 1198642 = sum

119899

119894=1 1198902(119909119894) is

119908

new= 119908

old+120572

119905[119910 (119909

119894minus119906) + (119910 minus V) 119909

119894]

119908

new= 119908

old+120572

119905[119909

119894119910minus

119908

119908

119879119908

119910

2]

(4)

where 120572119905is the learning rate 119910 = 119908119879119909

119894

Therefore Yang andWang [24] proposed a new objectivefunction

Min119864 =119899

sum

119894=1119906

1198981119894119890 (119909

119894) + 120578

119899

sum

119894=1(1minus119906

119894)

1198981 (5)

The constraints are 119906119894isin [0 1] 1198981 isin [1infin) where 119906

119894is

the membership value belonging to the 119909119894data cluster (1 minus

119906

119894) is themembership value of the 119909

119894disturbance cluster and

1198981 is the fuzzy variable In this case 119890(119909119894) is the error

between the measured 119909119894and the cluster center which is

similar to the 119862-means algorithm [27]As 119906119894is a continuous variable it avoids the difficulty of

an optimummix of discrete types and continuous types thusthe gradient descentmethod can be used First 119906

119894equals 0 as

calculated by the slope of (2) so

119906

119894=

11 + [119890 (119909

119894) 120578]

1(1198981minus1) (6)

119906

119894is replaced in (2) and the following equation is obtained

119864 =

119899

sum

119894=1[

11 + [119890 (119909

119894) 120578]

1(1198981minus1)]

1198981minus1

119890 (119909

119894)

(7)

On the other hand 119908 gradient is

120597119864

120597119908

= [

11 + [119890 (119909

119894) 120578]

1(1198981minus1)]

1198981

[

120597119890 (119909

119894)

120597119908

] (8)

1198981 is the fuzzy variable If 1198981 = 1 the fuzzy membership isdemoted to a fixed membership and can be determined bythe following rule

119906

119894=

1 if (119890 (119909119894)) lt 120578

0 otherwise(9)

In this case 120578 is the hard threshold where 1198981 is not setbut 1198981 = 2 inmost studies Yang andWang [24] deduced thefollowing process of an optimization algorithm

(1) The number of iterations is set as 119905 = 1 the iterationis constrained as 119879 the learning coefficient is 1205720 isin

(0 1] the soft threshold 120578 is a small positive and theweight 119908 is randomly initialized

(2) In a case of less than 119879 execute step (3) to step (9)(3) Calculate 120572

119905= 1205720(1 minus 119905119879) set 119894 = 1 and 120590 = 0

(4) For observation frequency execute step (5) to step (8)(5) Calculate 119910 = 119908119879119909

119894 119906 = 119910119908 and V = 119908119879119906

(6) The new weight is 119908new= 119908

old+ 120572

119905120573(119909

119894)[119910(119909

119894minus 119906) +

(119910 minus V)119909119894]

(7) The new temporary count is 120575 = 120575 + 1198901(119909119894)(8) Add from 1 to 119894(9) Calculate 120578 = (120575119899) and add from 1 to 119905

It is almost affirmative that the new weight 119908 approaches theprincipal component vector [19 27 28]

3 Case Study

This section is divided into three parts collection of trafficaccident data preprocessing of the research data and sub-stituting the data after Feature Selection in FRPCA BPNNand LR Four groups of information are obtained includingBPNN LR FRPCA-BPNN and FRPCA-LR

31 Data Collection The data variables used in this study are17 input variables including weather light rays road patternaccident site sight distance and separation facility thereare 2471 observationsTheoutput variables are accident typesand patterns including (1) vehicle-vehicle accidents (2)person-automobilemotorcycle and (3) automobilemotor-cycle data variables There are 2096 observations ofCategory (1) vehicle-vehicle there are 146 observations ofCategory (2) person-automobilemotorcycle and there are229 observations of Category (3) automobilemotorcycleThe aforesaid variables are coded as 119884

1 119884

2 and 119884

3

respectively as listed in Table 1

Mathematical Problems in Engineering 5

Table 1 Number of traffic accidents and output variable codes inthe region in 2012

Output variable Number of data Percentage CodePeople and motor vehicles 146 59 119884

1

Motor vehicles 2096 8482 119884

2

Single vehicle 229 93 119884

3

Total 2471 100

32 Feature Selection Result This study uses RFE for datapreprocessing The 17 variables of the database are codedbefore Feature Selection as shown in Table 2The 17 variablesare sequenced according to importance as shown in Table 3

4 Empirical Research Result

This study uses the first 7 variables in order of importanceas obtained by the Feature Selection of RFE as inputvariables while the person-automobilemotorcycle vehicle-vehicle and automobilemotorcycle are output variables asshown in Table 4The 7 input variables and 3 output variablesare substituted in FRPCA BPNN and LR respectively inorder to obtain BPNN LR FRPCA-BPNN and FRPCA-LRclassifier models The experimental procedure is as followsStep (1) 2471 data sets are used as test data the 17 variablesare sequenced according to their importance by using theRFE data preprocessing method and the first 7 variables arerearranged as the test data set (119878

1) Step (2) classifier mod-

eling and performance evaluation the BPNN LR FRPCA-BPNN and FRPCA-LR classifiers are tested respectively inorder to evaluate the performance of the classifiers duringclassification

41 Analysis of BPNN Model The 7 input variables are sub-stituted in FRPCA in order to obtain the scores and loadingsof the principal components The scores of the principalcomponents can be used to classify various observationpoints and to integrate the scores of the principal componentsof each observation point in order to calculate an averageweighted integral indicator The coefficient of the correlationbetween new variables and old variables is called loadingswhich represent the influence or importance of the originalvariable to the new variables where larger loadings representhigher influence

The loadings can be obtained from

119897

119894119895=

119908

119894119895

119904

119895

sdot

radic

120582

119894 (10)

where 119897119894119895is the loadings of No 119895 variable on No 119894 principal

component 119908119894119895is the weight of No 119895 variable on No 119894

principal component 120582119894is the eigenvalue of No 119894 principal

component (ie variance) and 119904119895is the standard deviation of

No 119895 variableExperimental combination 1 the experimental parame-

ters of the BPNN forecasting model are set as follows epochs= 1000 learning rate lr is 01 03 and 05 respectively and

Table 2 Codes of 17 variables

Variable code Variable name119883

1Weather

119883

2Light rays

119883

3Road type

119883

4Speed limit

119883

5Road pattern

119883

6Accident site

119883

7Pavement

119883

8Pavement state

119883

9Pavement defect

119883

10Barrier

119883

11Sight distance

119883

12Signal type

119883

13Signal action

119883

14Separation facility

119883

15In fast or general lane

119883

16In fast and slow lanes

119883

17Edge of pavement

each experiment is repeated 5 times with the results ofthe three experiments as shown in Table 5 Experimentalcombination 2 the FRPAC-BPNN forecasting model is dif-ferent from experimental combination 1 where the principalcomponent scores are converted by executing FRPCA beforebuilding BPNN and then the BPNN forecasting networkis built The experimental results show the learning rate ofthe lr = empirical results of BPNN versus FRPCA-BPNNforecasting models

42 LR Analysis Experimental combination 3 the LR clas-sifier is constructed and the data are imported into LR forclassification forecasting according to the aforesaid test dataset 1198781 Experimental combination 4 the FRPCA-LR classifier

is constructed as experimental combination 2 the 1198781data

set is converted into principal component scores by FRPCAand then the LR classifier is constructed Each experiment isconducted five times the experimental results are as shownin Table 6 The experimental results show that the averageaccuracy rate and standard deviation of the FRPCA-LRmodel are 08506 plusmn 00021 which is better than the 08514 plusmn00031 of the LR classification model

The LR model investigates the impacts on the pattern oftraffic accidents according to the types and patterns of thetraffic accidents The optimal model is shown in Table 7Thissection discusses the correlation between the vehicles andthe environment based on 2325 pieces of data for analysisAfter the deletion of 146 pieces of data involving humanand vehicles the dependent variables are divided into thetwo categories of ldquovehicle to vehiclerdquo and ldquovehicle in itselfrdquoaccording to the types and patterns of traffic accidents Oddratio is adopted to represent the relevant influences of EventA to the occurrence of Event B In Table 7 the odd ratioof crossroads among the road types is 301 meaning thatthe risk of traffic accidents on ldquocrossroadsrdquo is higher than

6 Mathematical Problems in Engineering

Table 3 Sequence of feature attributes of environmental factors

Rank Code Variable name Importance1 119883

16In fast and slow lanes 0069791

2 119883

15In fast or general lane 0045356

3 119883

17Edge of pavement 0042181

4 119883

5Road pattern 0041695

5 119883

6Accident site 0037548

6 119883

14Separation facility 0025303

7 119883

11Sight distance 0018811

8 119883

2Light rays 0017205

9 119883

3Road type 0017063

10 119883

1Weather 0015428

11 119883

8Pavement state 0015322

12 119883

4Speed limit 0012274

13 119883

13Signal action 0009509

14 119883

10Barrier 0008938

15 119883

12Signal type 0002745

16 119883

9Pavement defect 0002059

17 119883

7Pavement 0000180

Table 4 Input (119909) and output (119910) parameters

Code(119909) Input variable Code

(119910) Output variable

119883

5Road pattern

119884

1People and motor vehicles

119883

6Accident site

119884

2Motor vehicles

119883

11Sight distance

119884

3Single vehicle

119883

14Separation facility

119883

15In fast or general lane

119883

16In fast and slow lanes

119883

17Edge of pavement

Table 5 Empirical results of BPNN versus FRPCA-BPNN

BPNN FRPCA-BPNNLearning rate 01 03 05 01 03 05BPNN-1 08232 08067 08089 08607 08594 08583BPNN-2 08523 08412 08335 08615 08572 08591BPNN-3 08508 08543 08560 08557 08623 08599BPNN-4 08550 08530 08530 08563 08553 08572BPNN-5 08574 08567 08542 08584 08607 08609Mean 08477 08424 08411 08585 08590 08591SD 00139 00208 02019 00026 00028 00010

the 138 of ldquoone-way roadsrdquo 119890301minus138 = 119890

123= 342 In

other words the ratio of traffic accidents at intersections is324 times that of one-way roads Likewise the ratio of trafficaccidents on ldquointersectionsrdquo under the category of ldquotrafficlocationsrdquo is also the highest at 724 times that of generalroads 119890572minus374 = 119890

198 Table 7 shows the importance of

Table 6 Empirical results of LR versus FRPCA-LR

Logistic FRPCA-LogisticLogistic-1 08480 08503Logistic-2 08523 08528Logistic-3 08491 08497Logistic-4 08483 08497Logistic-5 08551 08543Mean 08506 08514SD 00031 00021

traffic accident environments and road design to the ratio oftraffic accidents

5 Conclusions

Traffic safety depends on road design road configurationvehicle performance traffic regulations and the effective-ness of implementation The main means of transport inmiddle and low income countries include walking bicyclemotorcycle and bus while that of high income countries isautomobiles Therefore the traffic safety control measuresof high income countries are not completely applicableto middle and low income countries and thus should beimported and improved to fit local transportation and roadusage conditions [29]

The report of the World Health Organization (WHO)indicated that about 12 million people die from trafficaccidents in the world annually about 3400 people diefrom traffic accidents per day approximately 1000 peopleare injured or disabled children pedestrians cyclers andthe elderly are the most vulnerable road users and 85 offatalities and 90 of the disabled live in middle and lowincome countries The scientific analysis of accident data aswell as the implementation of relevant safety measures canprevent the occurrence of traffic accidents thus reducing theseverity of injuries

At present with the rapid development of cities it isnecessary to make efficient forecasting in order that decisionmakers can make preventions and decisions in advancein order to reduce the death rate This study uses RFEFRPCA BPNN and LR to analyze the classification accuracyrate of the traffic accident data of the region Accordingto the experimental results the classification accuracy rateof BPNN LR FRPCA-BPNN and FRPCA-LR is higherthan 80 thus forecast performance is significant Furtheranalysis shows that the network performance of the FRPCA-BPNN and FRPCA-LR classifiers combined with FRPCAis better than BPNN and LR According to Tables 5 and6 the accuracy rate of classifiers FRPCA-BPNN (8589)and FRPCA-LR (8514) combined with FRPCA is higherthan BPNN (8437) and LR (8506) by 152 and 008respectively meaning the FRPCA-BPNN and FRPCA-LRhave better classification forecast ability

In traffic accident analysis or verification results thehuman factor is mostly regarded as the first cause of traf-fic accidents However the road environment has certain

Mathematical Problems in Engineering 7

Table 7 The impacts on the types and patterns of traffic accidents as imposed by the various factors and represented in the LR model

Variables Saliency Odd ratioConstant terms 0067 0540

Relevant environmental factorsRoad types

One-way road 0044 138Crossroad 0028 301

Location of the accidentIntersection 0012 572Ordinary road sections 0032 374

Visibility rangeSatisfactory mdash mdashPoor 0024 159

Relevant road designDirectional facilities

Two-way no-passing markings 0084 167Directional markings 0074 145No directional facilities 0040 226

Traffic separation facilitiesFast lanes and slow lanesNo lane-change markings (without signs) 017 119Lane markings (with signs) 0024 189Lane markings (without signs) 0049 302No lane markings 0005 571

Separation linesSeparations lines between the fast lanes and slow lanes 0142 191No separations lines between the fast lanes and slow lanes 0028 425

SidelinesYes mdash mdashNo 0032 399

Total sample size 2471

correlation and improper intersection design or planning islikely to cause traffic accidents In comparison to other trafficaccident sites a forked road intersection is the most probableaccident site This study used RFE to select 7 input variablesfrom 17 input variables Based on the 7 input variables theenvironmental factor and road design are found to be thecauses of road traffic accidents in the region According tothe statistical data of the Taichung Police Station the 4 maincauses among the 67 causes of accidents are as follows (1)not allowing other vehicles to pass as per regulations (2)not aware of the situation ahead (3) violating specific sign(line) bans and (4) not maintaining a safe driving distanceaccounting for 2372 1354 751 and 822 respectivelyof the total number of traffic accidents and the total propor-tion is as high as 5299The road authoritiesmay refer to the7 variables of traffic accidents and road design as proposedin this study regarding future road designs and plans As forthe 4 main causes of accidents on road sections involving theabove seven variables of traffic accidents and road designthey should be the priorities in the future elimination actionsof the police force If improvements and preventive measures

are made road safety can be substantially increased therebyreducing traffic accidents and fatalitiesThefindings can serveas reference for the police force and management authoritiesto improve roads as well as the assessment and managementmodels for the elimination of traffic offences

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] D D Clarke PWard C Bartle andW Truman ldquoKiller crashesfatal road traffic accidents in the UKrdquo Accident Analysis ampPrevention vol 42 no 2 pp 764ndash770 2010

[2] K-V Hung and L-T Huyen ldquoEducation influence in trafficsafety a case study in Vietnamrdquo IATSS Research vol 34 no 2pp 87ndash93 2011

[3] H Gjerde A S Christophersen P T Normann and JMoslashrlandldquoToxicological investigations of drivers killed in road traffic

8 Mathematical Problems in Engineering

accidents in Norway during 2006ndash2008rdquo Forensic Science Inter-national vol 212 no 1ndash3 pp 102ndash109 2011

[4] Y Komada S Asaoka T Abe and Y Inoue ldquoShort sleepduration sleep disorders and traffic accidentsrdquo IATSS Researchvol 37 no 1 pp 1ndash7 2013

[5] S A Shappell and D A Wiegmann ldquoHuman factors investi-gation and analysis of accidents and incidentsrdquo in Encyclopediaof Forensic Sciences pp 440ndash449 Academic Press 2nd edition2013

[6] L-Y ChangD-J Lin C-HHuang andK-K Chang ldquoAnalysisof contributory factors for driving under the influence ofalcohol a stated choice approachrdquo Transportation Research PartF Traffic Psychology and Behaviour vol 18 pp 11ndash20 2013

[7] H Fu and Y Zhou ldquoThe traffic accident prediction basedon neural networkrdquo in Proceedings of the 2nd InternationalConference on Digital Manufacturing and Automation (ICDMArsquo11) pp 1349ndash1350 IEEE Zhangjiajie Hunan August 2011

[8] C-Y Kong and J-K Yang ldquoLogistic regression analysis ofpedestrian casualty risk in passenger vehicle collisions inChinardquoAccident Analysis and Prevention vol 42 no 4 pp 987ndash993 2010

[9] Y-K Ou Y-C Liu and F-Y Shih ldquoRisk prediction modelfor driversrsquo in-vehicle activitiesmdashapplication of task analysisand back-propagation neural networkrdquoTransportationResearchPart F Traffic Psychology and Behaviour vol 18 pp 83ndash93 2013

[10] R Gang and Z Zhuping ldquoTraffic safety forecasting methodby particle swarm optimization and support vector machinerdquoExpert SystemswithApplications vol 38 no 8 pp 10420ndash104242011

[11] S S Durduran ldquoA decision making system to automaticrecognize of traffic accidents on the basis of a GIS platformrdquoExpert Systems with Applications vol 37 no 12 pp 7729ndash77362010

[12] K El-Basyouny and T Sayed ldquoSafety performance functionsusing traffic conflictsrdquo Safety Science vol 51 no 1 pp 160ndash1642013

[13] T Kanchan V Kulkarni S M Bakkannavar N Kumar andB Unnikrishnan ldquoAnalysis of fatal road traffic accidents in acoastal township of South Indiardquo Journal of Forensic and LegalMedicine vol 19 no 8 pp 448ndash451 2012

[14] M Karacasua and A Er ldquoAn analysis on distribution of trafficfaults in accidents based on driverrsquos age and gender Eskisehircaserdquo ProcediandashSocial and Behavioral Sciences vol 20 pp 776ndash785 2011

[15] J De Ona G Lopez R Mujalli and F J Calvo ldquoAnalysis oftraffic accidents on rural highways using LatentClass Clusteringand BayesianNetworksrdquoAccident Analysis amp Prevention vol 51pp 1ndash10 2013

[16] A T Kashani SM Afshin andA Ranjbari ldquoAnalysis of factorsassociated with traffic injury severity on rural roads in IranrdquoJournal of Injury and Violence Research vol 4 no 1 pp 36ndash412012

[17] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[18] C Chu A-L Hsu K-H Chou P Bandettini C Lin andAlzheimerrsquos Disease Neuroimaging Initiative ldquoDoes featureselection improve classification accuracy Impact of samplesize and feature selection on classification using anatomicalmagnetic resonance imagesrdquoNeuroImage vol 60 no 1 pp 59ndash70 2012

[19] X-H Lin F-F Yang L Zhou et al ldquoA support vector machine-recursive feature elimination feature selectionmethod based onartificial contrast variables and mutual informationrdquo Journal ofChromatography B vol 910 pp 149ndash155 2012

[20] D Wei S Li and M-K Tan ldquoGraph embedding based featureselectionrdquo Neurocomputing vol 93 pp 115ndash125 2012

[21] M Ahmadzadeh A H Fard B Saranjam and H R SalimildquoPrediction of residual stresses in gas arc welding by backpropagation neural networkrdquo NDT amp E International vol 52pp 136ndash143 2012

[22] P Reed and Y-Q Wu ldquoLogistic regression for risk factormodelling in stuttering researchrdquo Journal of Fluency Disordersvol 38 no 2 pp 88ndash101 2013

[23] P Reed ldquoThe effect of traffic tickets on road traffic crashesrdquoAccident Analysis amp Prevention vol 64 pp 86ndash91 2014

[24] T-N Yang and S-D Wang ldquoRobust algorithms for principalcomponent analysisrdquo Pattern Recognition Letters vol 20 no 9pp 927ndash933 1999

[25] J Karhunen P Pajunen and E Oja ldquoThe nonlinear PCAcriterion in blind source separation relations with otherapproachesrdquo Neurocomputing vol 22 no 1ndash3 pp 5ndash20 1998

[26] L Xu and A L Yuille ldquoRobust principal component analysisby self-organizing rules based on statistical physics approachrdquoIEEE Transactions on Neural Networks vol 6 no 1 pp 131ndash1431995

[27] F Gharibnezhad L E Mujica J Rodellar and C-P FritzenldquoDamage detection using robust fuzzy principal componentanalysisrdquo UPCommons Conference Report 2013

[28] P Luukka ldquoNonlinear fuzzy robust PCA algorithms and sim-ilarity classifier in bankruptcy analysisrdquo Expert Systems withApplications vol 37 no 12 pp 8296ndash8302 2010

[29] G-G Zhang K K W Yau and G Chen ldquoRisk factorsassociatedwith traffic violations and accident severity inChinardquoAccident Analysis amp Prevention vol 59 pp 18ndash25 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article The Application of Data Mining Technology ...downloads.hindawi.com/journals/mpe/2015/170635.pdfResearch Article The Application of Data Mining Technology to Build

Mathematical Problems in Engineering 5

Table 1 Number of traffic accidents and output variable codes inthe region in 2012

Output variable Number of data Percentage CodePeople and motor vehicles 146 59 119884

1

Motor vehicles 2096 8482 119884

2

Single vehicle 229 93 119884

3

Total 2471 100

32 Feature Selection Result This study uses RFE for datapreprocessing The 17 variables of the database are codedbefore Feature Selection as shown in Table 2The 17 variablesare sequenced according to importance as shown in Table 3

4 Empirical Research Result

This study uses the first 7 variables in order of importanceas obtained by the Feature Selection of RFE as inputvariables while the person-automobilemotorcycle vehicle-vehicle and automobilemotorcycle are output variables asshown in Table 4The 7 input variables and 3 output variablesare substituted in FRPCA BPNN and LR respectively inorder to obtain BPNN LR FRPCA-BPNN and FRPCA-LRclassifier models The experimental procedure is as followsStep (1) 2471 data sets are used as test data the 17 variablesare sequenced according to their importance by using theRFE data preprocessing method and the first 7 variables arerearranged as the test data set (119878

1) Step (2) classifier mod-

eling and performance evaluation the BPNN LR FRPCA-BPNN and FRPCA-LR classifiers are tested respectively inorder to evaluate the performance of the classifiers duringclassification

41 Analysis of BPNN Model The 7 input variables are sub-stituted in FRPCA in order to obtain the scores and loadingsof the principal components The scores of the principalcomponents can be used to classify various observationpoints and to integrate the scores of the principal componentsof each observation point in order to calculate an averageweighted integral indicator The coefficient of the correlationbetween new variables and old variables is called loadingswhich represent the influence or importance of the originalvariable to the new variables where larger loadings representhigher influence

The loadings can be obtained from

119897

119894119895=

119908

119894119895

119904

119895

sdot

radic

120582

119894 (10)

where 119897119894119895is the loadings of No 119895 variable on No 119894 principal

component 119908119894119895is the weight of No 119895 variable on No 119894

principal component 120582119894is the eigenvalue of No 119894 principal

component (ie variance) and 119904119895is the standard deviation of

No 119895 variableExperimental combination 1 the experimental parame-

ters of the BPNN forecasting model are set as follows epochs= 1000 learning rate lr is 01 03 and 05 respectively and

Table 2 Codes of 17 variables

Variable code Variable name119883

1Weather

119883

2Light rays

119883

3Road type

119883

4Speed limit

119883

5Road pattern

119883

6Accident site

119883

7Pavement

119883

8Pavement state

119883

9Pavement defect

119883

10Barrier

119883

11Sight distance

119883

12Signal type

119883

13Signal action

119883

14Separation facility

119883

15In fast or general lane

119883

16In fast and slow lanes

119883

17Edge of pavement

each experiment is repeated 5 times with the results ofthe three experiments as shown in Table 5 Experimentalcombination 2 the FRPAC-BPNN forecasting model is dif-ferent from experimental combination 1 where the principalcomponent scores are converted by executing FRPCA beforebuilding BPNN and then the BPNN forecasting networkis built The experimental results show the learning rate ofthe lr = empirical results of BPNN versus FRPCA-BPNNforecasting models

42 LR Analysis Experimental combination 3 the LR clas-sifier is constructed and the data are imported into LR forclassification forecasting according to the aforesaid test dataset 1198781 Experimental combination 4 the FRPCA-LR classifier

is constructed as experimental combination 2 the 1198781data

set is converted into principal component scores by FRPCAand then the LR classifier is constructed Each experiment isconducted five times the experimental results are as shownin Table 6 The experimental results show that the averageaccuracy rate and standard deviation of the FRPCA-LRmodel are 08506 plusmn 00021 which is better than the 08514 plusmn00031 of the LR classification model

The LR model investigates the impacts on the pattern oftraffic accidents according to the types and patterns of thetraffic accidents The optimal model is shown in Table 7Thissection discusses the correlation between the vehicles andthe environment based on 2325 pieces of data for analysisAfter the deletion of 146 pieces of data involving humanand vehicles the dependent variables are divided into thetwo categories of ldquovehicle to vehiclerdquo and ldquovehicle in itselfrdquoaccording to the types and patterns of traffic accidents Oddratio is adopted to represent the relevant influences of EventA to the occurrence of Event B In Table 7 the odd ratioof crossroads among the road types is 301 meaning thatthe risk of traffic accidents on ldquocrossroadsrdquo is higher than

6 Mathematical Problems in Engineering

Table 3 Sequence of feature attributes of environmental factors

Rank Code Variable name Importance1 119883

16In fast and slow lanes 0069791

2 119883

15In fast or general lane 0045356

3 119883

17Edge of pavement 0042181

4 119883

5Road pattern 0041695

5 119883

6Accident site 0037548

6 119883

14Separation facility 0025303

7 119883

11Sight distance 0018811

8 119883

2Light rays 0017205

9 119883

3Road type 0017063

10 119883

1Weather 0015428

11 119883

8Pavement state 0015322

12 119883

4Speed limit 0012274

13 119883

13Signal action 0009509

14 119883

10Barrier 0008938

15 119883

12Signal type 0002745

16 119883

9Pavement defect 0002059

17 119883

7Pavement 0000180

Table 4 Input (119909) and output (119910) parameters

Code(119909) Input variable Code

(119910) Output variable

119883

5Road pattern

119884

1People and motor vehicles

119883

6Accident site

119884

2Motor vehicles

119883

11Sight distance

119884

3Single vehicle

119883

14Separation facility

119883

15In fast or general lane

119883

16In fast and slow lanes

119883

17Edge of pavement

Table 5 Empirical results of BPNN versus FRPCA-BPNN

BPNN FRPCA-BPNNLearning rate 01 03 05 01 03 05BPNN-1 08232 08067 08089 08607 08594 08583BPNN-2 08523 08412 08335 08615 08572 08591BPNN-3 08508 08543 08560 08557 08623 08599BPNN-4 08550 08530 08530 08563 08553 08572BPNN-5 08574 08567 08542 08584 08607 08609Mean 08477 08424 08411 08585 08590 08591SD 00139 00208 02019 00026 00028 00010

the 138 of ldquoone-way roadsrdquo 119890301minus138 = 119890

123= 342 In

other words the ratio of traffic accidents at intersections is324 times that of one-way roads Likewise the ratio of trafficaccidents on ldquointersectionsrdquo under the category of ldquotrafficlocationsrdquo is also the highest at 724 times that of generalroads 119890572minus374 = 119890

198 Table 7 shows the importance of

Table 6 Empirical results of LR versus FRPCA-LR

Logistic FRPCA-LogisticLogistic-1 08480 08503Logistic-2 08523 08528Logistic-3 08491 08497Logistic-4 08483 08497Logistic-5 08551 08543Mean 08506 08514SD 00031 00021

traffic accident environments and road design to the ratio oftraffic accidents

5 Conclusions

Traffic safety depends on road design road configurationvehicle performance traffic regulations and the effective-ness of implementation The main means of transport inmiddle and low income countries include walking bicyclemotorcycle and bus while that of high income countries isautomobiles Therefore the traffic safety control measuresof high income countries are not completely applicableto middle and low income countries and thus should beimported and improved to fit local transportation and roadusage conditions [29]

The report of the World Health Organization (WHO)indicated that about 12 million people die from trafficaccidents in the world annually about 3400 people diefrom traffic accidents per day approximately 1000 peopleare injured or disabled children pedestrians cyclers andthe elderly are the most vulnerable road users and 85 offatalities and 90 of the disabled live in middle and lowincome countries The scientific analysis of accident data aswell as the implementation of relevant safety measures canprevent the occurrence of traffic accidents thus reducing theseverity of injuries

At present with the rapid development of cities it isnecessary to make efficient forecasting in order that decisionmakers can make preventions and decisions in advancein order to reduce the death rate This study uses RFEFRPCA BPNN and LR to analyze the classification accuracyrate of the traffic accident data of the region Accordingto the experimental results the classification accuracy rateof BPNN LR FRPCA-BPNN and FRPCA-LR is higherthan 80 thus forecast performance is significant Furtheranalysis shows that the network performance of the FRPCA-BPNN and FRPCA-LR classifiers combined with FRPCAis better than BPNN and LR According to Tables 5 and6 the accuracy rate of classifiers FRPCA-BPNN (8589)and FRPCA-LR (8514) combined with FRPCA is higherthan BPNN (8437) and LR (8506) by 152 and 008respectively meaning the FRPCA-BPNN and FRPCA-LRhave better classification forecast ability

In traffic accident analysis or verification results thehuman factor is mostly regarded as the first cause of traf-fic accidents However the road environment has certain

Mathematical Problems in Engineering 7

Table 7 The impacts on the types and patterns of traffic accidents as imposed by the various factors and represented in the LR model

Variables Saliency Odd ratioConstant terms 0067 0540

Relevant environmental factorsRoad types

One-way road 0044 138Crossroad 0028 301

Location of the accidentIntersection 0012 572Ordinary road sections 0032 374

Visibility rangeSatisfactory mdash mdashPoor 0024 159

Relevant road designDirectional facilities

Two-way no-passing markings 0084 167Directional markings 0074 145No directional facilities 0040 226

Traffic separation facilitiesFast lanes and slow lanesNo lane-change markings (without signs) 017 119Lane markings (with signs) 0024 189Lane markings (without signs) 0049 302No lane markings 0005 571

Separation linesSeparations lines between the fast lanes and slow lanes 0142 191No separations lines between the fast lanes and slow lanes 0028 425

SidelinesYes mdash mdashNo 0032 399

Total sample size 2471

correlation and improper intersection design or planning islikely to cause traffic accidents In comparison to other trafficaccident sites a forked road intersection is the most probableaccident site This study used RFE to select 7 input variablesfrom 17 input variables Based on the 7 input variables theenvironmental factor and road design are found to be thecauses of road traffic accidents in the region According tothe statistical data of the Taichung Police Station the 4 maincauses among the 67 causes of accidents are as follows (1)not allowing other vehicles to pass as per regulations (2)not aware of the situation ahead (3) violating specific sign(line) bans and (4) not maintaining a safe driving distanceaccounting for 2372 1354 751 and 822 respectivelyof the total number of traffic accidents and the total propor-tion is as high as 5299The road authoritiesmay refer to the7 variables of traffic accidents and road design as proposedin this study regarding future road designs and plans As forthe 4 main causes of accidents on road sections involving theabove seven variables of traffic accidents and road designthey should be the priorities in the future elimination actionsof the police force If improvements and preventive measures

are made road safety can be substantially increased therebyreducing traffic accidents and fatalitiesThefindings can serveas reference for the police force and management authoritiesto improve roads as well as the assessment and managementmodels for the elimination of traffic offences

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] D D Clarke PWard C Bartle andW Truman ldquoKiller crashesfatal road traffic accidents in the UKrdquo Accident Analysis ampPrevention vol 42 no 2 pp 764ndash770 2010

[2] K-V Hung and L-T Huyen ldquoEducation influence in trafficsafety a case study in Vietnamrdquo IATSS Research vol 34 no 2pp 87ndash93 2011

[3] H Gjerde A S Christophersen P T Normann and JMoslashrlandldquoToxicological investigations of drivers killed in road traffic

8 Mathematical Problems in Engineering

accidents in Norway during 2006ndash2008rdquo Forensic Science Inter-national vol 212 no 1ndash3 pp 102ndash109 2011

[4] Y Komada S Asaoka T Abe and Y Inoue ldquoShort sleepduration sleep disorders and traffic accidentsrdquo IATSS Researchvol 37 no 1 pp 1ndash7 2013

[5] S A Shappell and D A Wiegmann ldquoHuman factors investi-gation and analysis of accidents and incidentsrdquo in Encyclopediaof Forensic Sciences pp 440ndash449 Academic Press 2nd edition2013

[6] L-Y ChangD-J Lin C-HHuang andK-K Chang ldquoAnalysisof contributory factors for driving under the influence ofalcohol a stated choice approachrdquo Transportation Research PartF Traffic Psychology and Behaviour vol 18 pp 11ndash20 2013

[7] H Fu and Y Zhou ldquoThe traffic accident prediction basedon neural networkrdquo in Proceedings of the 2nd InternationalConference on Digital Manufacturing and Automation (ICDMArsquo11) pp 1349ndash1350 IEEE Zhangjiajie Hunan August 2011

[8] C-Y Kong and J-K Yang ldquoLogistic regression analysis ofpedestrian casualty risk in passenger vehicle collisions inChinardquoAccident Analysis and Prevention vol 42 no 4 pp 987ndash993 2010

[9] Y-K Ou Y-C Liu and F-Y Shih ldquoRisk prediction modelfor driversrsquo in-vehicle activitiesmdashapplication of task analysisand back-propagation neural networkrdquoTransportationResearchPart F Traffic Psychology and Behaviour vol 18 pp 83ndash93 2013

[10] R Gang and Z Zhuping ldquoTraffic safety forecasting methodby particle swarm optimization and support vector machinerdquoExpert SystemswithApplications vol 38 no 8 pp 10420ndash104242011

[11] S S Durduran ldquoA decision making system to automaticrecognize of traffic accidents on the basis of a GIS platformrdquoExpert Systems with Applications vol 37 no 12 pp 7729ndash77362010

[12] K El-Basyouny and T Sayed ldquoSafety performance functionsusing traffic conflictsrdquo Safety Science vol 51 no 1 pp 160ndash1642013

[13] T Kanchan V Kulkarni S M Bakkannavar N Kumar andB Unnikrishnan ldquoAnalysis of fatal road traffic accidents in acoastal township of South Indiardquo Journal of Forensic and LegalMedicine vol 19 no 8 pp 448ndash451 2012

[14] M Karacasua and A Er ldquoAn analysis on distribution of trafficfaults in accidents based on driverrsquos age and gender Eskisehircaserdquo ProcediandashSocial and Behavioral Sciences vol 20 pp 776ndash785 2011

[15] J De Ona G Lopez R Mujalli and F J Calvo ldquoAnalysis oftraffic accidents on rural highways using LatentClass Clusteringand BayesianNetworksrdquoAccident Analysis amp Prevention vol 51pp 1ndash10 2013

[16] A T Kashani SM Afshin andA Ranjbari ldquoAnalysis of factorsassociated with traffic injury severity on rural roads in IranrdquoJournal of Injury and Violence Research vol 4 no 1 pp 36ndash412012

[17] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[18] C Chu A-L Hsu K-H Chou P Bandettini C Lin andAlzheimerrsquos Disease Neuroimaging Initiative ldquoDoes featureselection improve classification accuracy Impact of samplesize and feature selection on classification using anatomicalmagnetic resonance imagesrdquoNeuroImage vol 60 no 1 pp 59ndash70 2012

[19] X-H Lin F-F Yang L Zhou et al ldquoA support vector machine-recursive feature elimination feature selectionmethod based onartificial contrast variables and mutual informationrdquo Journal ofChromatography B vol 910 pp 149ndash155 2012

[20] D Wei S Li and M-K Tan ldquoGraph embedding based featureselectionrdquo Neurocomputing vol 93 pp 115ndash125 2012

[21] M Ahmadzadeh A H Fard B Saranjam and H R SalimildquoPrediction of residual stresses in gas arc welding by backpropagation neural networkrdquo NDT amp E International vol 52pp 136ndash143 2012

[22] P Reed and Y-Q Wu ldquoLogistic regression for risk factormodelling in stuttering researchrdquo Journal of Fluency Disordersvol 38 no 2 pp 88ndash101 2013

[23] P Reed ldquoThe effect of traffic tickets on road traffic crashesrdquoAccident Analysis amp Prevention vol 64 pp 86ndash91 2014

[24] T-N Yang and S-D Wang ldquoRobust algorithms for principalcomponent analysisrdquo Pattern Recognition Letters vol 20 no 9pp 927ndash933 1999

[25] J Karhunen P Pajunen and E Oja ldquoThe nonlinear PCAcriterion in blind source separation relations with otherapproachesrdquo Neurocomputing vol 22 no 1ndash3 pp 5ndash20 1998

[26] L Xu and A L Yuille ldquoRobust principal component analysisby self-organizing rules based on statistical physics approachrdquoIEEE Transactions on Neural Networks vol 6 no 1 pp 131ndash1431995

[27] F Gharibnezhad L E Mujica J Rodellar and C-P FritzenldquoDamage detection using robust fuzzy principal componentanalysisrdquo UPCommons Conference Report 2013

[28] P Luukka ldquoNonlinear fuzzy robust PCA algorithms and sim-ilarity classifier in bankruptcy analysisrdquo Expert Systems withApplications vol 37 no 12 pp 8296ndash8302 2010

[29] G-G Zhang K K W Yau and G Chen ldquoRisk factorsassociatedwith traffic violations and accident severity inChinardquoAccident Analysis amp Prevention vol 59 pp 18ndash25 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article The Application of Data Mining Technology ...downloads.hindawi.com/journals/mpe/2015/170635.pdfResearch Article The Application of Data Mining Technology to Build

6 Mathematical Problems in Engineering

Table 3 Sequence of feature attributes of environmental factors

Rank Code Variable name Importance1 119883

16In fast and slow lanes 0069791

2 119883

15In fast or general lane 0045356

3 119883

17Edge of pavement 0042181

4 119883

5Road pattern 0041695

5 119883

6Accident site 0037548

6 119883

14Separation facility 0025303

7 119883

11Sight distance 0018811

8 119883

2Light rays 0017205

9 119883

3Road type 0017063

10 119883

1Weather 0015428

11 119883

8Pavement state 0015322

12 119883

4Speed limit 0012274

13 119883

13Signal action 0009509

14 119883

10Barrier 0008938

15 119883

12Signal type 0002745

16 119883

9Pavement defect 0002059

17 119883

7Pavement 0000180

Table 4 Input (119909) and output (119910) parameters

Code(119909) Input variable Code

(119910) Output variable

119883

5Road pattern

119884

1People and motor vehicles

119883

6Accident site

119884

2Motor vehicles

119883

11Sight distance

119884

3Single vehicle

119883

14Separation facility

119883

15In fast or general lane

119883

16In fast and slow lanes

119883

17Edge of pavement

Table 5 Empirical results of BPNN versus FRPCA-BPNN

BPNN FRPCA-BPNNLearning rate 01 03 05 01 03 05BPNN-1 08232 08067 08089 08607 08594 08583BPNN-2 08523 08412 08335 08615 08572 08591BPNN-3 08508 08543 08560 08557 08623 08599BPNN-4 08550 08530 08530 08563 08553 08572BPNN-5 08574 08567 08542 08584 08607 08609Mean 08477 08424 08411 08585 08590 08591SD 00139 00208 02019 00026 00028 00010

the 138 of ldquoone-way roadsrdquo 119890301minus138 = 119890

123= 342 In

other words the ratio of traffic accidents at intersections is324 times that of one-way roads Likewise the ratio of trafficaccidents on ldquointersectionsrdquo under the category of ldquotrafficlocationsrdquo is also the highest at 724 times that of generalroads 119890572minus374 = 119890

198 Table 7 shows the importance of

Table 6 Empirical results of LR versus FRPCA-LR

Logistic FRPCA-LogisticLogistic-1 08480 08503Logistic-2 08523 08528Logistic-3 08491 08497Logistic-4 08483 08497Logistic-5 08551 08543Mean 08506 08514SD 00031 00021

traffic accident environments and road design to the ratio oftraffic accidents

5 Conclusions

Traffic safety depends on road design road configurationvehicle performance traffic regulations and the effective-ness of implementation The main means of transport inmiddle and low income countries include walking bicyclemotorcycle and bus while that of high income countries isautomobiles Therefore the traffic safety control measuresof high income countries are not completely applicableto middle and low income countries and thus should beimported and improved to fit local transportation and roadusage conditions [29]

The report of the World Health Organization (WHO)indicated that about 12 million people die from trafficaccidents in the world annually about 3400 people diefrom traffic accidents per day approximately 1000 peopleare injured or disabled children pedestrians cyclers andthe elderly are the most vulnerable road users and 85 offatalities and 90 of the disabled live in middle and lowincome countries The scientific analysis of accident data aswell as the implementation of relevant safety measures canprevent the occurrence of traffic accidents thus reducing theseverity of injuries

At present with the rapid development of cities it isnecessary to make efficient forecasting in order that decisionmakers can make preventions and decisions in advancein order to reduce the death rate This study uses RFEFRPCA BPNN and LR to analyze the classification accuracyrate of the traffic accident data of the region Accordingto the experimental results the classification accuracy rateof BPNN LR FRPCA-BPNN and FRPCA-LR is higherthan 80 thus forecast performance is significant Furtheranalysis shows that the network performance of the FRPCA-BPNN and FRPCA-LR classifiers combined with FRPCAis better than BPNN and LR According to Tables 5 and6 the accuracy rate of classifiers FRPCA-BPNN (8589)and FRPCA-LR (8514) combined with FRPCA is higherthan BPNN (8437) and LR (8506) by 152 and 008respectively meaning the FRPCA-BPNN and FRPCA-LRhave better classification forecast ability

In traffic accident analysis or verification results thehuman factor is mostly regarded as the first cause of traf-fic accidents However the road environment has certain

Mathematical Problems in Engineering 7

Table 7 The impacts on the types and patterns of traffic accidents as imposed by the various factors and represented in the LR model

Variables Saliency Odd ratioConstant terms 0067 0540

Relevant environmental factorsRoad types

One-way road 0044 138Crossroad 0028 301

Location of the accidentIntersection 0012 572Ordinary road sections 0032 374

Visibility rangeSatisfactory mdash mdashPoor 0024 159

Relevant road designDirectional facilities

Two-way no-passing markings 0084 167Directional markings 0074 145No directional facilities 0040 226

Traffic separation facilitiesFast lanes and slow lanesNo lane-change markings (without signs) 017 119Lane markings (with signs) 0024 189Lane markings (without signs) 0049 302No lane markings 0005 571

Separation linesSeparations lines between the fast lanes and slow lanes 0142 191No separations lines between the fast lanes and slow lanes 0028 425

SidelinesYes mdash mdashNo 0032 399

Total sample size 2471

correlation and improper intersection design or planning islikely to cause traffic accidents In comparison to other trafficaccident sites a forked road intersection is the most probableaccident site This study used RFE to select 7 input variablesfrom 17 input variables Based on the 7 input variables theenvironmental factor and road design are found to be thecauses of road traffic accidents in the region According tothe statistical data of the Taichung Police Station the 4 maincauses among the 67 causes of accidents are as follows (1)not allowing other vehicles to pass as per regulations (2)not aware of the situation ahead (3) violating specific sign(line) bans and (4) not maintaining a safe driving distanceaccounting for 2372 1354 751 and 822 respectivelyof the total number of traffic accidents and the total propor-tion is as high as 5299The road authoritiesmay refer to the7 variables of traffic accidents and road design as proposedin this study regarding future road designs and plans As forthe 4 main causes of accidents on road sections involving theabove seven variables of traffic accidents and road designthey should be the priorities in the future elimination actionsof the police force If improvements and preventive measures

are made road safety can be substantially increased therebyreducing traffic accidents and fatalitiesThefindings can serveas reference for the police force and management authoritiesto improve roads as well as the assessment and managementmodels for the elimination of traffic offences

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] D D Clarke PWard C Bartle andW Truman ldquoKiller crashesfatal road traffic accidents in the UKrdquo Accident Analysis ampPrevention vol 42 no 2 pp 764ndash770 2010

[2] K-V Hung and L-T Huyen ldquoEducation influence in trafficsafety a case study in Vietnamrdquo IATSS Research vol 34 no 2pp 87ndash93 2011

[3] H Gjerde A S Christophersen P T Normann and JMoslashrlandldquoToxicological investigations of drivers killed in road traffic

8 Mathematical Problems in Engineering

accidents in Norway during 2006ndash2008rdquo Forensic Science Inter-national vol 212 no 1ndash3 pp 102ndash109 2011

[4] Y Komada S Asaoka T Abe and Y Inoue ldquoShort sleepduration sleep disorders and traffic accidentsrdquo IATSS Researchvol 37 no 1 pp 1ndash7 2013

[5] S A Shappell and D A Wiegmann ldquoHuman factors investi-gation and analysis of accidents and incidentsrdquo in Encyclopediaof Forensic Sciences pp 440ndash449 Academic Press 2nd edition2013

[6] L-Y ChangD-J Lin C-HHuang andK-K Chang ldquoAnalysisof contributory factors for driving under the influence ofalcohol a stated choice approachrdquo Transportation Research PartF Traffic Psychology and Behaviour vol 18 pp 11ndash20 2013

[7] H Fu and Y Zhou ldquoThe traffic accident prediction basedon neural networkrdquo in Proceedings of the 2nd InternationalConference on Digital Manufacturing and Automation (ICDMArsquo11) pp 1349ndash1350 IEEE Zhangjiajie Hunan August 2011

[8] C-Y Kong and J-K Yang ldquoLogistic regression analysis ofpedestrian casualty risk in passenger vehicle collisions inChinardquoAccident Analysis and Prevention vol 42 no 4 pp 987ndash993 2010

[9] Y-K Ou Y-C Liu and F-Y Shih ldquoRisk prediction modelfor driversrsquo in-vehicle activitiesmdashapplication of task analysisand back-propagation neural networkrdquoTransportationResearchPart F Traffic Psychology and Behaviour vol 18 pp 83ndash93 2013

[10] R Gang and Z Zhuping ldquoTraffic safety forecasting methodby particle swarm optimization and support vector machinerdquoExpert SystemswithApplications vol 38 no 8 pp 10420ndash104242011

[11] S S Durduran ldquoA decision making system to automaticrecognize of traffic accidents on the basis of a GIS platformrdquoExpert Systems with Applications vol 37 no 12 pp 7729ndash77362010

[12] K El-Basyouny and T Sayed ldquoSafety performance functionsusing traffic conflictsrdquo Safety Science vol 51 no 1 pp 160ndash1642013

[13] T Kanchan V Kulkarni S M Bakkannavar N Kumar andB Unnikrishnan ldquoAnalysis of fatal road traffic accidents in acoastal township of South Indiardquo Journal of Forensic and LegalMedicine vol 19 no 8 pp 448ndash451 2012

[14] M Karacasua and A Er ldquoAn analysis on distribution of trafficfaults in accidents based on driverrsquos age and gender Eskisehircaserdquo ProcediandashSocial and Behavioral Sciences vol 20 pp 776ndash785 2011

[15] J De Ona G Lopez R Mujalli and F J Calvo ldquoAnalysis oftraffic accidents on rural highways using LatentClass Clusteringand BayesianNetworksrdquoAccident Analysis amp Prevention vol 51pp 1ndash10 2013

[16] A T Kashani SM Afshin andA Ranjbari ldquoAnalysis of factorsassociated with traffic injury severity on rural roads in IranrdquoJournal of Injury and Violence Research vol 4 no 1 pp 36ndash412012

[17] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[18] C Chu A-L Hsu K-H Chou P Bandettini C Lin andAlzheimerrsquos Disease Neuroimaging Initiative ldquoDoes featureselection improve classification accuracy Impact of samplesize and feature selection on classification using anatomicalmagnetic resonance imagesrdquoNeuroImage vol 60 no 1 pp 59ndash70 2012

[19] X-H Lin F-F Yang L Zhou et al ldquoA support vector machine-recursive feature elimination feature selectionmethod based onartificial contrast variables and mutual informationrdquo Journal ofChromatography B vol 910 pp 149ndash155 2012

[20] D Wei S Li and M-K Tan ldquoGraph embedding based featureselectionrdquo Neurocomputing vol 93 pp 115ndash125 2012

[21] M Ahmadzadeh A H Fard B Saranjam and H R SalimildquoPrediction of residual stresses in gas arc welding by backpropagation neural networkrdquo NDT amp E International vol 52pp 136ndash143 2012

[22] P Reed and Y-Q Wu ldquoLogistic regression for risk factormodelling in stuttering researchrdquo Journal of Fluency Disordersvol 38 no 2 pp 88ndash101 2013

[23] P Reed ldquoThe effect of traffic tickets on road traffic crashesrdquoAccident Analysis amp Prevention vol 64 pp 86ndash91 2014

[24] T-N Yang and S-D Wang ldquoRobust algorithms for principalcomponent analysisrdquo Pattern Recognition Letters vol 20 no 9pp 927ndash933 1999

[25] J Karhunen P Pajunen and E Oja ldquoThe nonlinear PCAcriterion in blind source separation relations with otherapproachesrdquo Neurocomputing vol 22 no 1ndash3 pp 5ndash20 1998

[26] L Xu and A L Yuille ldquoRobust principal component analysisby self-organizing rules based on statistical physics approachrdquoIEEE Transactions on Neural Networks vol 6 no 1 pp 131ndash1431995

[27] F Gharibnezhad L E Mujica J Rodellar and C-P FritzenldquoDamage detection using robust fuzzy principal componentanalysisrdquo UPCommons Conference Report 2013

[28] P Luukka ldquoNonlinear fuzzy robust PCA algorithms and sim-ilarity classifier in bankruptcy analysisrdquo Expert Systems withApplications vol 37 no 12 pp 8296ndash8302 2010

[29] G-G Zhang K K W Yau and G Chen ldquoRisk factorsassociatedwith traffic violations and accident severity inChinardquoAccident Analysis amp Prevention vol 59 pp 18ndash25 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article The Application of Data Mining Technology ...downloads.hindawi.com/journals/mpe/2015/170635.pdfResearch Article The Application of Data Mining Technology to Build

Mathematical Problems in Engineering 7

Table 7 The impacts on the types and patterns of traffic accidents as imposed by the various factors and represented in the LR model

Variables Saliency Odd ratioConstant terms 0067 0540

Relevant environmental factorsRoad types

One-way road 0044 138Crossroad 0028 301

Location of the accidentIntersection 0012 572Ordinary road sections 0032 374

Visibility rangeSatisfactory mdash mdashPoor 0024 159

Relevant road designDirectional facilities

Two-way no-passing markings 0084 167Directional markings 0074 145No directional facilities 0040 226

Traffic separation facilitiesFast lanes and slow lanesNo lane-change markings (without signs) 017 119Lane markings (with signs) 0024 189Lane markings (without signs) 0049 302No lane markings 0005 571

Separation linesSeparations lines between the fast lanes and slow lanes 0142 191No separations lines between the fast lanes and slow lanes 0028 425

SidelinesYes mdash mdashNo 0032 399

Total sample size 2471

correlation and improper intersection design or planning islikely to cause traffic accidents In comparison to other trafficaccident sites a forked road intersection is the most probableaccident site This study used RFE to select 7 input variablesfrom 17 input variables Based on the 7 input variables theenvironmental factor and road design are found to be thecauses of road traffic accidents in the region According tothe statistical data of the Taichung Police Station the 4 maincauses among the 67 causes of accidents are as follows (1)not allowing other vehicles to pass as per regulations (2)not aware of the situation ahead (3) violating specific sign(line) bans and (4) not maintaining a safe driving distanceaccounting for 2372 1354 751 and 822 respectivelyof the total number of traffic accidents and the total propor-tion is as high as 5299The road authoritiesmay refer to the7 variables of traffic accidents and road design as proposedin this study regarding future road designs and plans As forthe 4 main causes of accidents on road sections involving theabove seven variables of traffic accidents and road designthey should be the priorities in the future elimination actionsof the police force If improvements and preventive measures

are made road safety can be substantially increased therebyreducing traffic accidents and fatalitiesThefindings can serveas reference for the police force and management authoritiesto improve roads as well as the assessment and managementmodels for the elimination of traffic offences

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

References

[1] D D Clarke PWard C Bartle andW Truman ldquoKiller crashesfatal road traffic accidents in the UKrdquo Accident Analysis ampPrevention vol 42 no 2 pp 764ndash770 2010

[2] K-V Hung and L-T Huyen ldquoEducation influence in trafficsafety a case study in Vietnamrdquo IATSS Research vol 34 no 2pp 87ndash93 2011

[3] H Gjerde A S Christophersen P T Normann and JMoslashrlandldquoToxicological investigations of drivers killed in road traffic

8 Mathematical Problems in Engineering

accidents in Norway during 2006ndash2008rdquo Forensic Science Inter-national vol 212 no 1ndash3 pp 102ndash109 2011

[4] Y Komada S Asaoka T Abe and Y Inoue ldquoShort sleepduration sleep disorders and traffic accidentsrdquo IATSS Researchvol 37 no 1 pp 1ndash7 2013

[5] S A Shappell and D A Wiegmann ldquoHuman factors investi-gation and analysis of accidents and incidentsrdquo in Encyclopediaof Forensic Sciences pp 440ndash449 Academic Press 2nd edition2013

[6] L-Y ChangD-J Lin C-HHuang andK-K Chang ldquoAnalysisof contributory factors for driving under the influence ofalcohol a stated choice approachrdquo Transportation Research PartF Traffic Psychology and Behaviour vol 18 pp 11ndash20 2013

[7] H Fu and Y Zhou ldquoThe traffic accident prediction basedon neural networkrdquo in Proceedings of the 2nd InternationalConference on Digital Manufacturing and Automation (ICDMArsquo11) pp 1349ndash1350 IEEE Zhangjiajie Hunan August 2011

[8] C-Y Kong and J-K Yang ldquoLogistic regression analysis ofpedestrian casualty risk in passenger vehicle collisions inChinardquoAccident Analysis and Prevention vol 42 no 4 pp 987ndash993 2010

[9] Y-K Ou Y-C Liu and F-Y Shih ldquoRisk prediction modelfor driversrsquo in-vehicle activitiesmdashapplication of task analysisand back-propagation neural networkrdquoTransportationResearchPart F Traffic Psychology and Behaviour vol 18 pp 83ndash93 2013

[10] R Gang and Z Zhuping ldquoTraffic safety forecasting methodby particle swarm optimization and support vector machinerdquoExpert SystemswithApplications vol 38 no 8 pp 10420ndash104242011

[11] S S Durduran ldquoA decision making system to automaticrecognize of traffic accidents on the basis of a GIS platformrdquoExpert Systems with Applications vol 37 no 12 pp 7729ndash77362010

[12] K El-Basyouny and T Sayed ldquoSafety performance functionsusing traffic conflictsrdquo Safety Science vol 51 no 1 pp 160ndash1642013

[13] T Kanchan V Kulkarni S M Bakkannavar N Kumar andB Unnikrishnan ldquoAnalysis of fatal road traffic accidents in acoastal township of South Indiardquo Journal of Forensic and LegalMedicine vol 19 no 8 pp 448ndash451 2012

[14] M Karacasua and A Er ldquoAn analysis on distribution of trafficfaults in accidents based on driverrsquos age and gender Eskisehircaserdquo ProcediandashSocial and Behavioral Sciences vol 20 pp 776ndash785 2011

[15] J De Ona G Lopez R Mujalli and F J Calvo ldquoAnalysis oftraffic accidents on rural highways using LatentClass Clusteringand BayesianNetworksrdquoAccident Analysis amp Prevention vol 51pp 1ndash10 2013

[16] A T Kashani SM Afshin andA Ranjbari ldquoAnalysis of factorsassociated with traffic injury severity on rural roads in IranrdquoJournal of Injury and Violence Research vol 4 no 1 pp 36ndash412012

[17] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[18] C Chu A-L Hsu K-H Chou P Bandettini C Lin andAlzheimerrsquos Disease Neuroimaging Initiative ldquoDoes featureselection improve classification accuracy Impact of samplesize and feature selection on classification using anatomicalmagnetic resonance imagesrdquoNeuroImage vol 60 no 1 pp 59ndash70 2012

[19] X-H Lin F-F Yang L Zhou et al ldquoA support vector machine-recursive feature elimination feature selectionmethod based onartificial contrast variables and mutual informationrdquo Journal ofChromatography B vol 910 pp 149ndash155 2012

[20] D Wei S Li and M-K Tan ldquoGraph embedding based featureselectionrdquo Neurocomputing vol 93 pp 115ndash125 2012

[21] M Ahmadzadeh A H Fard B Saranjam and H R SalimildquoPrediction of residual stresses in gas arc welding by backpropagation neural networkrdquo NDT amp E International vol 52pp 136ndash143 2012

[22] P Reed and Y-Q Wu ldquoLogistic regression for risk factormodelling in stuttering researchrdquo Journal of Fluency Disordersvol 38 no 2 pp 88ndash101 2013

[23] P Reed ldquoThe effect of traffic tickets on road traffic crashesrdquoAccident Analysis amp Prevention vol 64 pp 86ndash91 2014

[24] T-N Yang and S-D Wang ldquoRobust algorithms for principalcomponent analysisrdquo Pattern Recognition Letters vol 20 no 9pp 927ndash933 1999

[25] J Karhunen P Pajunen and E Oja ldquoThe nonlinear PCAcriterion in blind source separation relations with otherapproachesrdquo Neurocomputing vol 22 no 1ndash3 pp 5ndash20 1998

[26] L Xu and A L Yuille ldquoRobust principal component analysisby self-organizing rules based on statistical physics approachrdquoIEEE Transactions on Neural Networks vol 6 no 1 pp 131ndash1431995

[27] F Gharibnezhad L E Mujica J Rodellar and C-P FritzenldquoDamage detection using robust fuzzy principal componentanalysisrdquo UPCommons Conference Report 2013

[28] P Luukka ldquoNonlinear fuzzy robust PCA algorithms and sim-ilarity classifier in bankruptcy analysisrdquo Expert Systems withApplications vol 37 no 12 pp 8296ndash8302 2010

[29] G-G Zhang K K W Yau and G Chen ldquoRisk factorsassociatedwith traffic violations and accident severity inChinardquoAccident Analysis amp Prevention vol 59 pp 18ndash25 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article The Application of Data Mining Technology ...downloads.hindawi.com/journals/mpe/2015/170635.pdfResearch Article The Application of Data Mining Technology to Build

8 Mathematical Problems in Engineering

accidents in Norway during 2006ndash2008rdquo Forensic Science Inter-national vol 212 no 1ndash3 pp 102ndash109 2011

[4] Y Komada S Asaoka T Abe and Y Inoue ldquoShort sleepduration sleep disorders and traffic accidentsrdquo IATSS Researchvol 37 no 1 pp 1ndash7 2013

[5] S A Shappell and D A Wiegmann ldquoHuman factors investi-gation and analysis of accidents and incidentsrdquo in Encyclopediaof Forensic Sciences pp 440ndash449 Academic Press 2nd edition2013

[6] L-Y ChangD-J Lin C-HHuang andK-K Chang ldquoAnalysisof contributory factors for driving under the influence ofalcohol a stated choice approachrdquo Transportation Research PartF Traffic Psychology and Behaviour vol 18 pp 11ndash20 2013

[7] H Fu and Y Zhou ldquoThe traffic accident prediction basedon neural networkrdquo in Proceedings of the 2nd InternationalConference on Digital Manufacturing and Automation (ICDMArsquo11) pp 1349ndash1350 IEEE Zhangjiajie Hunan August 2011

[8] C-Y Kong and J-K Yang ldquoLogistic regression analysis ofpedestrian casualty risk in passenger vehicle collisions inChinardquoAccident Analysis and Prevention vol 42 no 4 pp 987ndash993 2010

[9] Y-K Ou Y-C Liu and F-Y Shih ldquoRisk prediction modelfor driversrsquo in-vehicle activitiesmdashapplication of task analysisand back-propagation neural networkrdquoTransportationResearchPart F Traffic Psychology and Behaviour vol 18 pp 83ndash93 2013

[10] R Gang and Z Zhuping ldquoTraffic safety forecasting methodby particle swarm optimization and support vector machinerdquoExpert SystemswithApplications vol 38 no 8 pp 10420ndash104242011

[11] S S Durduran ldquoA decision making system to automaticrecognize of traffic accidents on the basis of a GIS platformrdquoExpert Systems with Applications vol 37 no 12 pp 7729ndash77362010

[12] K El-Basyouny and T Sayed ldquoSafety performance functionsusing traffic conflictsrdquo Safety Science vol 51 no 1 pp 160ndash1642013

[13] T Kanchan V Kulkarni S M Bakkannavar N Kumar andB Unnikrishnan ldquoAnalysis of fatal road traffic accidents in acoastal township of South Indiardquo Journal of Forensic and LegalMedicine vol 19 no 8 pp 448ndash451 2012

[14] M Karacasua and A Er ldquoAn analysis on distribution of trafficfaults in accidents based on driverrsquos age and gender Eskisehircaserdquo ProcediandashSocial and Behavioral Sciences vol 20 pp 776ndash785 2011

[15] J De Ona G Lopez R Mujalli and F J Calvo ldquoAnalysis oftraffic accidents on rural highways using LatentClass Clusteringand BayesianNetworksrdquoAccident Analysis amp Prevention vol 51pp 1ndash10 2013

[16] A T Kashani SM Afshin andA Ranjbari ldquoAnalysis of factorsassociated with traffic injury severity on rural roads in IranrdquoJournal of Injury and Violence Research vol 4 no 1 pp 36ndash412012

[17] I Guyon J Weston S Barnhill and V Vapnik ldquoGene selec-tion for cancer classification using support vector machinesrdquoMachine Learning vol 46 no 1ndash3 pp 389ndash422 2002

[18] C Chu A-L Hsu K-H Chou P Bandettini C Lin andAlzheimerrsquos Disease Neuroimaging Initiative ldquoDoes featureselection improve classification accuracy Impact of samplesize and feature selection on classification using anatomicalmagnetic resonance imagesrdquoNeuroImage vol 60 no 1 pp 59ndash70 2012

[19] X-H Lin F-F Yang L Zhou et al ldquoA support vector machine-recursive feature elimination feature selectionmethod based onartificial contrast variables and mutual informationrdquo Journal ofChromatography B vol 910 pp 149ndash155 2012

[20] D Wei S Li and M-K Tan ldquoGraph embedding based featureselectionrdquo Neurocomputing vol 93 pp 115ndash125 2012

[21] M Ahmadzadeh A H Fard B Saranjam and H R SalimildquoPrediction of residual stresses in gas arc welding by backpropagation neural networkrdquo NDT amp E International vol 52pp 136ndash143 2012

[22] P Reed and Y-Q Wu ldquoLogistic regression for risk factormodelling in stuttering researchrdquo Journal of Fluency Disordersvol 38 no 2 pp 88ndash101 2013

[23] P Reed ldquoThe effect of traffic tickets on road traffic crashesrdquoAccident Analysis amp Prevention vol 64 pp 86ndash91 2014

[24] T-N Yang and S-D Wang ldquoRobust algorithms for principalcomponent analysisrdquo Pattern Recognition Letters vol 20 no 9pp 927ndash933 1999

[25] J Karhunen P Pajunen and E Oja ldquoThe nonlinear PCAcriterion in blind source separation relations with otherapproachesrdquo Neurocomputing vol 22 no 1ndash3 pp 5ndash20 1998

[26] L Xu and A L Yuille ldquoRobust principal component analysisby self-organizing rules based on statistical physics approachrdquoIEEE Transactions on Neural Networks vol 6 no 1 pp 131ndash1431995

[27] F Gharibnezhad L E Mujica J Rodellar and C-P FritzenldquoDamage detection using robust fuzzy principal componentanalysisrdquo UPCommons Conference Report 2013

[28] P Luukka ldquoNonlinear fuzzy robust PCA algorithms and sim-ilarity classifier in bankruptcy analysisrdquo Expert Systems withApplications vol 37 no 12 pp 8296ndash8302 2010

[29] G-G Zhang K K W Yau and G Chen ldquoRisk factorsassociatedwith traffic violations and accident severity inChinardquoAccident Analysis amp Prevention vol 59 pp 18ndash25 2013

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article The Application of Data Mining Technology ...downloads.hindawi.com/journals/mpe/2015/170635.pdfResearch Article The Application of Data Mining Technology to Build

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of