8
Research Article Unrecorded Accidents Detection on Highways Based on Temporal Data Mining Shi An, Tao Zhang, Xinming Zhang, and Jian Wang School of Transportation Science and Engineering, Harbin Institute of Technology, Harbin 150090, China Correspondence should be addressed to Xinming Zhang; [email protected] Received 3 April 2014; Revised 26 May 2014; Accepted 26 May 2014; Published 15 June 2014 Academic Editor: Hamid Reza Karimi Copyright © 2014 Shi An et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Automatic traffic accident detection, especially not recorded by traffic police, is crucial to accident black spots identification and traffic safety. A new method of detecting traffic accidents is proposed based on temporal data mining, which can identify the unknown and unrecorded accidents by traffic police. Time series model was constructed using ternary numbers to reflect the state of traffic flow based on cell transmission model. In order to deal with the aſtereffects of linear driſt between time series and to reduce the computational cost, discrete Fourier transform was implemented to turn time series from time domain to frequency domain. e pattern of the time series when an accident happened could be recognized using the historical crash data. en taking Euclidean distance as the similarity evaluation function, similarity data mining of the transformed time series was carried out. If the result was less than the given threshold, the two time series were similar and an accident happened probably. A numerical example was carried out and the results verified the effectiveness of the proposed method. 1. Introduction Road accidents are regarded as one of the leading causes of death for people between the ages of 5 and 44 according to the World Health Organization [1]. More than that, traffic crashes result in serious economic losses on account of traffic con- gestion which in turn leads to a wide variety of adverse con- sequences such as traffic delays, supply chain interruptions, travel time unreliability, and increased noise pollution, as well as deterioration of air quality [2]. us, reducing or avoiding traffic collisions is of great significance to traffic safety. High collision concentration location (HCCL) [3] detection is an effective means to find out the accident black spots and take some necessary continuous improvement measures. Histori- cal accident data is the necessary foundation of any research on this subject. One of the main problems of the accident data was considered as the heterogeneity [4] in the previous studies. A great many methods had been proposed to solve this problem, such as latent class clustering [58], Bayesian networks (BNs) [911], and continuous risk profile (CRP) [12, 13]. One of the commonalities between these methods is that historical accident data, more specifically, recorded historical accident data, were used as the basic data. However, not all traffic crashes are known and recorded by traffic police. It is undeniable that some minor accidents oſten happened on highways and were settled privately for trivial losses. Usually, traffic accident black spots are identified mainly based on the traffic crash data recorded by traffic police departments [14]. is just helps to find out the collisions we know, while ones we do not know, which did happen, keep unconsidered yet. In part, these observations motivated our study. Traffic accidents are contingent events and are difficult to detect if there is no alarm. Nonetheless, a traffic accident was bound to create an impact on traffic flow pattern and cause different levels of congestion [15, 16]. e traffic volume, traffic speed, and traffic density were changed by crashes, even minor accidents. us, capturing the change of these traffic flow parameters is very helpful for traffic accidents, especially unrecorded traffic accidents detection. Given this, an automatic traffic accident detection method was proposed in this paper. Traffic flow data was used and simulated by cell transmission model (CTM). According to the different inflow between two cells, time series model was constructed to reflect traffic flow state. Time series pattern, when accidents happened, was established based on historical accident data. To overcome the defect Hindawi Publishing Corporation Mathematical Problems in Engineering Volume 2014, Article ID 852495, 7 pages http://dx.doi.org/10.1155/2014/852495

Research Article Unrecorded Accidents Detection on ...downloads.hindawi.com/journals/mpe/2014/852495.pdf · Unrecorded Accidents Detection Using Similarity Mining. Similarity search

  • Upload
    others

  • View
    14

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Research Article Unrecorded Accidents Detection on ...downloads.hindawi.com/journals/mpe/2014/852495.pdf · Unrecorded Accidents Detection Using Similarity Mining. Similarity search

Research ArticleUnrecorded Accidents Detection on HighwaysBased on Temporal Data Mining

Shi An Tao Zhang Xinming Zhang and Jian Wang

School of Transportation Science and Engineering Harbin Institute of Technology Harbin 150090 China

Correspondence should be addressed to Xinming Zhang 12b332001hiteducn

Received 3 April 2014 Revised 26 May 2014 Accepted 26 May 2014 Published 15 June 2014

Academic Editor Hamid Reza Karimi

Copyright copy 2014 Shi An et al This is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Automatic traffic accident detection especially not recorded by traffic police is crucial to accident black spots identification andtraffic safety A new method of detecting traffic accidents is proposed based on temporal data mining which can identify theunknown and unrecorded accidents by traffic police Time series model was constructed using ternary numbers to reflect the stateof traffic flow based on cell transmission model In order to deal with the aftereffects of linear drift between time series and toreduce the computational cost discrete Fourier transform was implemented to turn time series from time domain to frequencydomainThe pattern of the time series when an accident happened could be recognized using the historical crash dataThen takingEuclidean distance as the similarity evaluation function similarity datamining of the transformed time series was carried out If theresult was less than the given threshold the two time series were similar and an accident happened probably A numerical examplewas carried out and the results verified the effectiveness of the proposed method

1 Introduction

Road accidents are regarded as one of the leading causes ofdeath for people between the ages of 5 and 44 according to theWorldHealthOrganization [1]More than that traffic crashesresult in serious economic losses on account of traffic con-gestion which in turn leads to a wide variety of adverse con-sequences such as traffic delays supply chain interruptionstravel time unreliability and increased noise pollution as wellas deterioration of air quality [2] Thus reducing or avoidingtraffic collisions is of great significance to traffic safety Highcollision concentration location (HCCL) [3] detection is aneffective means to find out the accident black spots and takesome necessary continuous improvement measures Histori-cal accident data is the necessary foundation of any researchon this subject One of the main problems of the accidentdata was considered as the heterogeneity [4] in the previousstudies A great many methods had been proposed to solvethis problem such as latent class clustering [5ndash8] Bayesiannetworks (BNs) [9ndash11] and continuous risk profile (CRP)[12 13] One of the commonalities between these methodsis that historical accident data more specifically recordedhistorical accident data were used as the basic data However

not all traffic crashes are known and recorded by traffic policeIt is undeniable that someminor accidents often happened onhighways and were settled privately for trivial losses Usuallytraffic accident black spots are identified mainly based on thetraffic crash data recorded by traffic police departments [14]This just helps to find out the collisions we know while oneswe do not know which did happen keep unconsidered yetIn part these observations motivated our study

Traffic accidents are contingent events and are difficultto detect if there is no alarm Nonetheless a traffic accidentwas bound to create an impact on traffic flow pattern andcause different levels of congestion [15 16]The traffic volumetraffic speed and trafficdensitywere changed by crashes evenminor accidents Thus capturing the change of these trafficflow parameters is very helpful for traffic accidents especiallyunrecorded traffic accidents detection

Given this an automatic traffic accident detectionmethod was proposed in this paper Traffic flow data wasused and simulated by cell transmission model (CTM)According to the different inflow between two cells timeseries model was constructed to reflect traffic flow state Timeseries pattern when accidents happened was establishedbased on historical accident data To overcome the defect

Hindawi Publishing CorporationMathematical Problems in EngineeringVolume 2014 Article ID 852495 7 pageshttpdxdoiorg1011552014852495

2 Mathematical Problems in Engineering

of Euclidean distance which does not consider the lineardrift in the time domain and to reduce the computationalcost discrete Fourier transform was implemented to turnthe time series from time domain to frequency domainLeveraging the strengths of temporal data mining at findingtime-varying patterns any time series that were similar to thegiven pattern could be figured out namely the unknown andunrecorded accidents by similarity search The premier aimand contribution of this paper are to find out ldquothe hiddenaccidentsrdquo such as compounding in private using temporaldata mining method A case study using the real highwaytraffic data in Harbin China was conducted for verification

2 Material and Methods

21 Construction of Time Series Reflecting Traffic Flow StateFor highway traffic flow there was a unique state in eachperiod One of the ways to describe this was the metaphor ofa screen capture for the traffic flow over consecutive periodsof time Every picture reflected the state of the traffic flowat a certain time These pictures constituted a sequence overtime This was the consideration of building the time seriesmodel to describe the evolution of traffic flow in this paperTraffic conditions estimation was achieved through dynamictraffic assignment (DTA) simulation that utilized temporalaspects of a transportation system Different values of inflowbetween cells in CTM were typically expressed as ternarynumbers (0 1 and 2) A series of ternary numbers generatedby CTM were introduced to illustrate the traffic flow stateThen time series data were created by converting ternarynumbers to decimal numbers Thus the traffic flow statecould be reflected by time series data and it was the basicwork for unrecorded accidents mining

211 Cell Transmission Model To model the propagation oftrafficflowand construct time series data in the section belowthe spread of highway traffic flow was simulated by CTMin this paper CTM was proposed by Daganzo [17 18] andwas considered as a proper method It was believed that therelationship between traffic flow (119902) and density (119896) was of theform depicted figurally as follows

119902 = min V119896 119902max 120596 (119896119895 minus 119896) (1)

where V 119902max 120596 and 119896119895 denoted the free-flow speed themaximum flow (or capacity) the backward wave speed andthe maximum (or jam) density respectively as shown inFigure 1

In Figure 1 if the density is less than 1198961 the traffic flow119902 is equal to V119896 if it is between 1198961 and 1198962 then 119902 reaches itsmaximum 119902max if it is between 1198962 and 119896119895 119902 is equal to 120596(119896119895 minus119896) and 119902 is 0 when the density reaches 119896119895

Then the continuous Lighthill-Whitham-Richards(LWR) equations [19 20] for a single highway link werediscretized through this method and could be approximatedby a set of difference equations The state of the system wasupdated over timeThus the discontinuous changes of trafficflow could be captured

kj1

+

1

120596

qmax

Flow

k1 k2 kj

minus120596

Density

Figure 1 The relationship of traffic flow and density in CTM

InCTM a single roadwas divided into homogeneous sec-tions (cells) 119894 whose lengths equaled the distance traveled byfree-flowing traffic speed in one clock intervalThe state of thesystem at instant 119905 was then given by the number of vehiclescontained in each cell 119899119894(119905) The following parameters weredefined for each cell119873119894(119905) is the maximum number of vehicles that can be

present in cell 119894 at time 119905 and 119876119894(119905) is the maximum numberof vehicles that can flow into cell 119894 when the clock advancedfrom 119905 to 119905 + 1

These constants could vary with time (eg contingenttraffic incidents or conscious traffic control measures) butthis dependence was able to be ignored for simplicity ofnotation The first constant 119873119894(119905) was defined to be theproduct of the cellrsquos length and its jam density and the secondonewas the product of the time interval and the cellrsquos capacity

If cells were numbered consecutively starting with theupstream end of the road from 119894 = 1 to 119868 the recursiverelationship of the CTM as discussed by Daganzo [17 18]could be expressed as

119899119894 (119905 + 1) = 119899119894 (119905) + 119910119894 (119905) minus 119910119894+1 (119905) (2a)

where119910119894(119905)was the inflow to cell 119894 in the time interval (119905 119905+1)given by

119910119894 (119905) = min 119899119894minus1 (119905) 119876119894 (119905) 120575 [119873119894 (119905) minus 119899119894 (119905)] (2b)

where 120575 = 120596VThe formulas (2a) and (2b) constituted the fundamental

equations of CTM Equation (2a) expressed the status updatesof cells over time while the latter gave the variations forupdating

212 Constructing Time Series of Traffic Flow Time seriesdata was a sequence of data evolving through time Therewere two strengths of temporal data mining data-based andpattern-based The former was more likely to approach thetruth the latter was more likely to extract the features Everytraffic accident was considered to change the traffic flow statemore or less Features of this change were supposed to beextracted by temporal data mining and these features wereable to be used to find out ldquothe hidden accidentsrdquo

Gao et al used the NaSch traffic model to simulate theevolution of traffic flow [21 22] The state of traffic flow at

Mathematical Problems in Engineering 3

0 0 0 0

0 1 1 0

0 1 1 1

1 2 1 1

2 2 2 1t = 4

t = 3

t = 2

t = 1

t = 0

Figure 2 The traffic flow trends with the tick of a clock

each time periodwas regarded as a node of a networkThen acomplex network was constructed also known as a multiple-mode system model which could describe the evolution oftraffic flow Zhao et al studied state estimation of this kind ofsystems [23] However the temporal relations among thesenodes were ignored in their research

According to CTM at each time step the inflow 119910119894(119905) tocell 119894 could be 119899119894minus1(119905) 119876119894(119905) or 120575[119873119894(119905) minus 119899119894(119905)] When 119910119894(119905) =119899119894minus1(119905) the state of the cell was represented by number 0when 119910119894(119905) = 119876119894(119905) it was represented by number 1 otherwiserepresented by number 2 Then the state of the system wasexpressed by a sequence of ternary numbers for example0 0 1 2 1

The model employed was described as follows For 119873cells we assumed that at time period (119905 119905 + 1) the statewas represented by a set of ternary numbers namely 119878119905 =1199041 1199042 119904119873 Then 119878119905 is the precursor of 119878119905+1 Here eachternary number 119904119894 can be considered as an element whichcan take three different states that is 119904119894 = 0 1 2 Figure 2depicted the evolution of traffic flow with the tick of a clock

As shown in Figure 2 when the clock advanced from 0to 4 the system (a single segment) state change could bediscovered clearly using the ldquoscreen capturingrdquo method Thestate 119878119905 was time-varying and was represented by a sequenceof numbers For convenience a parameter119872was introduced

250

200

150

100

50

0

Stat

e

0 5 10 15 20

Clock

Figure 3 Time series data reflecting traffic flow trends of a five-cellroad

in this paper to represent the value of 119878119905 and119872 = sum119873119894=1 3119894119904119894

thus ternary numbers were converted to decimal numbersfor example 0 0 1 2 1 was converted to 16 Thus timeseries data was created (see Figure 3)

Each single decimal number represented a systemrsquos stateAs shown in Figure 3 at time step 9 for instance the valueof the system state was 78 thus the ternary numbers were0 2 2 2 0 which was the state of traffic flow

22 Feature Extraction Noise in the raw time series datacould reduce accuracy and creditability of data miningLinear drift was certainly an example In many clusteringanalysis methods119870-means for example Euclidean distancewas frequently used as a similarity measure function Theunrecorded accidents detection method proposed in thispaper was mostly based on similarity mining which wouldbe discussed later Linear drift was the most important factorto influence the accuracy of the results

For time series 119883 = 1199091 1199092 119909119899 and 119884 = 11991011199102 119910119899 if at time 119905 119909119894 minus 119910119894 = 120576 119894 = 1 119899 where 120576 was aconstantThat was to say that the relationship between119883 and119884 was linear drift in the time domain as shown in Figure 4

In this case if Euclidean distance was used and 120576 wasbeyond the threshold these two time series119883 and119884were notconsidered to be similar by mining algorithms While theyhad similar shape and trend apparently the judging resultwas inaccurate obviously Aiming to prevent such errors andto realize data compression and reduce the computationalcost it was necessary to extract feature from the original timeseries data using the image in feature space to replace theoriginal one

Discrete Fourier transform (DFT) which had uniquemerits in time series analysis was an alternative way Fora given time series object DFT could be used to turnit to frequency domain from time domain According toParseval theory the time-domain energy function was equalto the frequency-domain energy function And most of theenergy in frequency domain concentrated on the first few

4 Mathematical Problems in Engineering

0 5 10 15 20

Clock

300

250

200

150

100

50

0

Stat

e

X

Y

Figure 4 Linear drift between time series119883 and 119884

coefficients hence other coefficients could be omitted Thusthe remaining coefficients were able to be seen as the featuresof the original time series

For a given time series 119883 = 119909119905 119905 = 0 1 119899 minus 1translating the series from time domain to frequency domainthe new sequence obtained byDFTwas denoted by119883 = 1199091015840119891119891 = 0 1 119899 minus 1 where

1199091015840119891 =

1

radic119899

119899minus1

sum

119905=0

119909119905 exp(minus1198952120587119891119905

119899) 119891 = 0 1 119899 minus 1 (3)

Taking the data in Figure 3 for example the result of DFTwas shown in Figure 5

The area in which the frequency was greater than 00was the real transformed data of the original finite timeseries 17 elements were left out of the initial total of 21ones Data compression had been realized and as a result ofthe transformation from time domain to frequency domainlinear drift no longer existed aswell as other noises in the timedomain

23 Unrecorded Accidents Detection Using Similarity MiningSimilarity search is an important research field in temporaldata mining As mentioned above each recorded accidentcould bring a piece of time series data and all recordedhistorical accidents data over a period of time could explainthe traffic flow trends when accidents happened After dataprocessing using the above methods which could be calleddata preprocessing clustering analysis was supposed to beimplemented in this paper The classical 119870-means methodwould be able to meet the accuracy requirements due tothe appropriate data preprocessing Results of cluster wereconsidered as ldquonormal traffic flow trendsrdquo under accidentsand the ldquohidden accidentsrdquo could be found out by similaritysearch

Themethod of similarity measurement used in this paperwas Euclidean distance Set the time sequence of ldquonormal

180

160

140

120

100

80

60

40

20

0minus05 00 05

Am

plitu

de

Frequency (Hz)

Figure 5 Discrete Fourier transform

Recorded accidents location

Traffic flow data

DFT

Time series of traffic flow trends

ldquoNormal traffic flow trendsrdquo under accidents

Clustering

Ordinary data

Unrecorded accidents location

Simila

rity

mining

Figure 6 The procedure of unrecorded accidents detection

traffic flow trendsrdquo of a single road segment which was 119909119894whose length was 119899 The time series to be measured wasdenoted by 119910119894 with its length 119873 119873 ge 119899 generally Insimilarity search subsequences of 119910119894 whose lengths were 119899were themeasuring objectThese subsequences were denotedby 119911119894 It was known that the number of 119911119894 was 119869 119869 =119873minus119899+1Then the similaritymetric function could be definedas follows

min119869

119899

sum

119894=1

(119909119894 minus 119870119869119911119869119894 )2 (4)

where 119870119869 was the scaling factor The calculation times were119873 minus 119899 + 1 obviously

So far for a single road segment the procedure to detectunrecorded accidents was described in Figure 6

Mathematical Problems in Engineering 5

Figure 7 The study site and data

3 Results and Discussions

31 The Study Site and Data Preparation A case study wasconducted based on the data extracted from records collectedon Beijing-Harbin Expressway (G1) between Harbin andLalinhe from January 2010 to July 2011 The traffic accidentsdataset included a total of 73 crashes recorded on the 72-kilometer length road in total The study site and the trafficaccidents data were shown in Figure 7

In one and a half years for such a highwaywith the annualtraffic reaching 8ndash10 million ldquoonlyrdquo 73 crashes happenedExperience told us that it did not indicate how safe thehighway was but some minor accidents were not knownor recorded It was meaningful to detect the unrecordedcollisions for traffic safety research

As the limit of detection device traffic flow data workedout using data collected by highway toll collection system Fordetailed calculation process readers could refer toWeng et al[24] The estimation accuracy could meet the requirementstechnically

32 Numerical Experiment In the numerical experiment thetime horizonwas set within half an hour and the time intervalwas 30 secondsThe accidents happened at the eleventh clockinterval namely 5 minutes after the start time of simulationThus therewere 60 points in one piece of time series dataThefree-flow speedwas 120 kmh thus the length of each cell wasthe product of the free-flow speed and the unit clock intervalthat was 120 kmh lowast 30 s = 1 km Based on the historicalaccident data the accident location was set in the middlecell in CTM and five cells were brought in to represent asingle highway segment where a crash happened as shownin Figure 8

The virtual cell in Figure 8 was on behalf of the demandgeneration There were five inflow states as shown by thefive arrows in the figure above Using the method shownin Figure 6 first of all it was significant to identify theldquonormal traffic flow trendsrdquo under accidents Time seriesdata was conducted using the method mentioned above andDFT was implemented then for each crash record 119870-meanscluster analysis was adopted and the results were indicatedin Figure 9

As is shown in Figure 9 two clusters were generated bythis method In Figure 9(a) there were 34 accidents gatheredtogether The common point of them was that congestionformed anddissipated gradually in the simulation periodThe

Accident spotVirtual cell

Figure 8 CTM for a single highway segment

shapes of these time series data were similar to the normaldistribution curveAccording to the accident records of trafficpolice most of these crashes were single-vehicle crashes andboth-car accidents In Figure 9(b) 33 crashes were involvedand the congestion was not eliminated in study period Two-thirds of them were crashes between two heavy trucks andone-third were multivehicle accidents This kind of crashescould cause severe congestion and influence traffic seriouslyActually another cluster existed in this case and only fiveaccidents were involved This cluster had less interferenceto traffic but for the records they were collisions betweenvehicles and pedestrians It was a serious threat to trafficsafety but beyond the scope of this paper so the result wasnot shown here

For the unrecorded accidents usually there were nocasualties and little losses The reason why they were notrecordedwas that theseminor accidentswere settled privatelyand even kept unknown by the traffic police So there wasevery reason to believe that these unrecorded accidents weresingle-vehicle crashes or both-car accidentsThus the clustershown in Figure 9(a) was our target in the similarity searchFor verification one accident out of the 73 crashes wasseparated out pretending to be unknown

By similarity search the ldquopreparedrdquo accident was foundout and the time series data was shown in Figure 10 Thetraffic flow trend in Figure 10 was indeed similar to the clusterin Figure 9(a) At the first 10 clock intervals in both figuresthe traffic was smooth because of the unsaturated traffic flowand no accidents Then congestion formed and dissipatedgradually in a certain period The result of similarity searchproved the reliability of the proposed method

4 Conclusions

Unrecorded accidents were significant to identify trafficaccident-prone location Based on the observation that thetraffic volume traffic speed and traffic density were changedby crashes even minor accidents an automatic traffic acci-dent identification method was proposed As most of thecurrent studies did not pay enough attention to the timefactor when studying the relationship between traffic stateand crashes on highways this paper proposed a methodto construct time series data using traffic flow data whenaccidents happened To avoid the defect of not consideringthe linear drift in the time domain between two sequencesDFT was carried out to extract features from original timeseries data Traffic flow trend could be well understood byclustering analysis Then through the method of similaritysearch unrecorded accidents which were believed to besingle-vehicle crashes or both-car accidents were foundout The case study using real data in Harbin showed thefeasibleness of the proposed method

6 Mathematical Problems in Engineering

250

200

150

100

50

0

Stat

e

0 10 6030 40 5020

Clock

(a)

250

200

150

100

50

0

Stat

e

0 10 6030 40 5020

Clock

(b)

Figure 9 Results of cluster

0 10 6030 40 5020

Clock

250

200

150

100

50

0

Stat

e

Figure 10 Result of similarity search

For further research data of car insurance would bevaluable for data mining in this area or for verification of theautomatic traffic accidents identification And further studycould focus on the traffic flow trends under accidents suchas the influence diffusion and the elimination of the crashinfluence

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the HTRDP (863) under Grantno 2012AA112310 and Research Fund for the Doctoral Pro-gram of Higher Education of Ministry of Education of China(20112302110054)

References

[1] E Krug ldquoDecade of action for road safety 2011ndash2020rdquo Injuryvol 43 no 1 pp 6ndash7 2012

[2] A Gregoriades and K C Mouskos ldquoBlack spots identificationthrough a Bayesian Networks quantification of accident riskindexrdquo Transportation Research C Emerging Technologies vol28 pp 28ndash43 2013

[3] O H Kwon M J Park H Yeo and K Chung ldquoEvaluatingthe performance of network screening methods for detectinghigh collision concentration locations on highwaysrdquo AccidentAnalysis and Prevention vol 51 pp 141ndash149 2013

[4] P T Savolainen F L Mannering D Lord and M A QuddusldquoThe statistical analysis of highway crash-injury severities areview and assessment ofmethodological alternativesrdquoAccidentAnalysis and Prevention vol 43 no 5 pp 1666ndash1676 2011

[5] B Depaire G Wets and K Vanhoof ldquoTraffic accident segmen-tation bymeans of latent class clusteringrdquoAccident Analysis andPrevention vol 40 no 4 pp 1257ndash1266 2008

[6] J M Pardillo-Mayora C A Domınguez-Lira and R Jurado-Pina ldquoEmpirical calibration of a roadside hazardousness indexfor Spanish two-lane rural roadsrdquoAccident Analysis and Preven-tion vol 42 no 6 pp 2018ndash2023 2010

[7] B-J Park and D Lord ldquoApplication of finite mixture models forvehicle crash data analysisrdquo Accident Analysis and Preventionvol 41 no 4 pp 683ndash691 2009

[8] B-J Park D Lord and J D Hart ldquoBias properties of Bayesianstatistics in finite mixture of negative binomial regressionmodels in crash data analysisrdquoAccident Analysis and Preventionvol 42 no 2 pp 741ndash749 2010

[9] M Simoncic ldquoA Bayesian network model of two-car accidentsrdquoJournal of Transportation and Statistics vol 7 no 2-3 pp 13ndash252005

[10] J De Ona R O Mujalli and F J Calvo ldquoAnalysis of traffic acci-dent injury severity on Spanish rural highways using Bayesiannetworksrdquo Accident Analysis and Prevention vol 43 no 1 pp402ndash411 2011

[11] R O Mujalli and J De Ona ldquoA method for simplifying theanalysis of traffic accidents injury severity on two-lane highwaysusing Bayesian networksrdquo Journal of Safety Research vol 42 no5 pp 317ndash326 2011

Mathematical Problems in Engineering 7

[12] K Chung K Jang S Madanat and S Washington ldquoProactivedetection of high collision concentration locations on high-waysrdquo Transportation Research A Policy and Practice vol 45no 9 pp 927ndash934 2011

[13] K Chung D R Ragland S Madanat and S M Oh ldquoThecontinuous risk profile approach for the identification of highcollision concentration locations on congested highwaysrdquo inProceedings of the 19th International Symposium on Transporta-tion amp TrafficTheory (ISTTT rsquo09) pp 463ndash480 2009

[14] G Vandenbulcke I Thomas and L Int Panis ldquoPredict-ing cycling accident risk in Brussels a spatial case-controlapproachrdquo Accident Analysis and Prevention vol 62 pp 341ndash357 2014

[15] Z Zheng ldquoEmpirical analysis on relationship between trafficconditions and crash occurrencesrdquo Procedia-Social and Behav-ioral Sciences vol 43 pp 302ndash312 2012

[16] A Hamzehei E Chung and M Miska ldquoPre-crash and non-crash traffic flow trends analysis on motorwaysrdquo in Proceedingsof the Australasian Transport Research Forum 2013

[17] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research B Methodological vol 28 no4 pp 269ndash287 1994

[18] C F Daganzo ldquoThe cell transmission model part II networktrafficrdquo Transportation Research B Methodological vol 29 no2 pp 79ndash93 1995

[19] M J Lighthill and G B Whitham ldquoOn kinematic waves I flowmovement in long rivers II a theory of traffic flow on longcrowded roadsrdquo Proceedings of the Royal Society of London Avol 229 no 1178 pp 281ndash316 1955

[20] P I Richards ldquoShock waves on the highwayrdquo OperationsResearch vol 4 pp 42ndash51 1956

[21] G Zi-You and L Ke-Ping ldquoEvolution of traffic flow with scale-free topologyrdquo Chinese Physics Letters vol 22 no 10 pp 2711ndash2714 2005

[22] X-G Li Z-Y Gao K-P Li and X-M Zhao ldquoRelationshipbetween microscopic dynamics in traffic flow and complexityin networksrdquo Physical Review E vol 76 no 1 Article ID 0161102007

[23] X Zhao H Liu J Zhang and H Li ldquoMultiple-mode observerdesign for a class of switched linear systemsrdquo IEEE Transactionson Automation Science and Engineering 2013

[24] J-C Weng L-L Liu and B Du ldquoETC data based traffic infor-mation mining techniquesrdquo Journal of Transportation SystemsEngineering and Information Technology vol 10 no 2 pp 57ndash63 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Unrecorded Accidents Detection on ...downloads.hindawi.com/journals/mpe/2014/852495.pdf · Unrecorded Accidents Detection Using Similarity Mining. Similarity search

2 Mathematical Problems in Engineering

of Euclidean distance which does not consider the lineardrift in the time domain and to reduce the computationalcost discrete Fourier transform was implemented to turnthe time series from time domain to frequency domainLeveraging the strengths of temporal data mining at findingtime-varying patterns any time series that were similar to thegiven pattern could be figured out namely the unknown andunrecorded accidents by similarity search The premier aimand contribution of this paper are to find out ldquothe hiddenaccidentsrdquo such as compounding in private using temporaldata mining method A case study using the real highwaytraffic data in Harbin China was conducted for verification

2 Material and Methods

21 Construction of Time Series Reflecting Traffic Flow StateFor highway traffic flow there was a unique state in eachperiod One of the ways to describe this was the metaphor ofa screen capture for the traffic flow over consecutive periodsof time Every picture reflected the state of the traffic flowat a certain time These pictures constituted a sequence overtime This was the consideration of building the time seriesmodel to describe the evolution of traffic flow in this paperTraffic conditions estimation was achieved through dynamictraffic assignment (DTA) simulation that utilized temporalaspects of a transportation system Different values of inflowbetween cells in CTM were typically expressed as ternarynumbers (0 1 and 2) A series of ternary numbers generatedby CTM were introduced to illustrate the traffic flow stateThen time series data were created by converting ternarynumbers to decimal numbers Thus the traffic flow statecould be reflected by time series data and it was the basicwork for unrecorded accidents mining

211 Cell Transmission Model To model the propagation oftrafficflowand construct time series data in the section belowthe spread of highway traffic flow was simulated by CTMin this paper CTM was proposed by Daganzo [17 18] andwas considered as a proper method It was believed that therelationship between traffic flow (119902) and density (119896) was of theform depicted figurally as follows

119902 = min V119896 119902max 120596 (119896119895 minus 119896) (1)

where V 119902max 120596 and 119896119895 denoted the free-flow speed themaximum flow (or capacity) the backward wave speed andthe maximum (or jam) density respectively as shown inFigure 1

In Figure 1 if the density is less than 1198961 the traffic flow119902 is equal to V119896 if it is between 1198961 and 1198962 then 119902 reaches itsmaximum 119902max if it is between 1198962 and 119896119895 119902 is equal to 120596(119896119895 minus119896) and 119902 is 0 when the density reaches 119896119895

Then the continuous Lighthill-Whitham-Richards(LWR) equations [19 20] for a single highway link werediscretized through this method and could be approximatedby a set of difference equations The state of the system wasupdated over timeThus the discontinuous changes of trafficflow could be captured

kj1

+

1

120596

qmax

Flow

k1 k2 kj

minus120596

Density

Figure 1 The relationship of traffic flow and density in CTM

InCTM a single roadwas divided into homogeneous sec-tions (cells) 119894 whose lengths equaled the distance traveled byfree-flowing traffic speed in one clock intervalThe state of thesystem at instant 119905 was then given by the number of vehiclescontained in each cell 119899119894(119905) The following parameters weredefined for each cell119873119894(119905) is the maximum number of vehicles that can be

present in cell 119894 at time 119905 and 119876119894(119905) is the maximum numberof vehicles that can flow into cell 119894 when the clock advancedfrom 119905 to 119905 + 1

These constants could vary with time (eg contingenttraffic incidents or conscious traffic control measures) butthis dependence was able to be ignored for simplicity ofnotation The first constant 119873119894(119905) was defined to be theproduct of the cellrsquos length and its jam density and the secondonewas the product of the time interval and the cellrsquos capacity

If cells were numbered consecutively starting with theupstream end of the road from 119894 = 1 to 119868 the recursiverelationship of the CTM as discussed by Daganzo [17 18]could be expressed as

119899119894 (119905 + 1) = 119899119894 (119905) + 119910119894 (119905) minus 119910119894+1 (119905) (2a)

where119910119894(119905)was the inflow to cell 119894 in the time interval (119905 119905+1)given by

119910119894 (119905) = min 119899119894minus1 (119905) 119876119894 (119905) 120575 [119873119894 (119905) minus 119899119894 (119905)] (2b)

where 120575 = 120596VThe formulas (2a) and (2b) constituted the fundamental

equations of CTM Equation (2a) expressed the status updatesof cells over time while the latter gave the variations forupdating

212 Constructing Time Series of Traffic Flow Time seriesdata was a sequence of data evolving through time Therewere two strengths of temporal data mining data-based andpattern-based The former was more likely to approach thetruth the latter was more likely to extract the features Everytraffic accident was considered to change the traffic flow statemore or less Features of this change were supposed to beextracted by temporal data mining and these features wereable to be used to find out ldquothe hidden accidentsrdquo

Gao et al used the NaSch traffic model to simulate theevolution of traffic flow [21 22] The state of traffic flow at

Mathematical Problems in Engineering 3

0 0 0 0

0 1 1 0

0 1 1 1

1 2 1 1

2 2 2 1t = 4

t = 3

t = 2

t = 1

t = 0

Figure 2 The traffic flow trends with the tick of a clock

each time periodwas regarded as a node of a networkThen acomplex network was constructed also known as a multiple-mode system model which could describe the evolution oftraffic flow Zhao et al studied state estimation of this kind ofsystems [23] However the temporal relations among thesenodes were ignored in their research

According to CTM at each time step the inflow 119910119894(119905) tocell 119894 could be 119899119894minus1(119905) 119876119894(119905) or 120575[119873119894(119905) minus 119899119894(119905)] When 119910119894(119905) =119899119894minus1(119905) the state of the cell was represented by number 0when 119910119894(119905) = 119876119894(119905) it was represented by number 1 otherwiserepresented by number 2 Then the state of the system wasexpressed by a sequence of ternary numbers for example0 0 1 2 1

The model employed was described as follows For 119873cells we assumed that at time period (119905 119905 + 1) the statewas represented by a set of ternary numbers namely 119878119905 =1199041 1199042 119904119873 Then 119878119905 is the precursor of 119878119905+1 Here eachternary number 119904119894 can be considered as an element whichcan take three different states that is 119904119894 = 0 1 2 Figure 2depicted the evolution of traffic flow with the tick of a clock

As shown in Figure 2 when the clock advanced from 0to 4 the system (a single segment) state change could bediscovered clearly using the ldquoscreen capturingrdquo method Thestate 119878119905 was time-varying and was represented by a sequenceof numbers For convenience a parameter119872was introduced

250

200

150

100

50

0

Stat

e

0 5 10 15 20

Clock

Figure 3 Time series data reflecting traffic flow trends of a five-cellroad

in this paper to represent the value of 119878119905 and119872 = sum119873119894=1 3119894119904119894

thus ternary numbers were converted to decimal numbersfor example 0 0 1 2 1 was converted to 16 Thus timeseries data was created (see Figure 3)

Each single decimal number represented a systemrsquos stateAs shown in Figure 3 at time step 9 for instance the valueof the system state was 78 thus the ternary numbers were0 2 2 2 0 which was the state of traffic flow

22 Feature Extraction Noise in the raw time series datacould reduce accuracy and creditability of data miningLinear drift was certainly an example In many clusteringanalysis methods119870-means for example Euclidean distancewas frequently used as a similarity measure function Theunrecorded accidents detection method proposed in thispaper was mostly based on similarity mining which wouldbe discussed later Linear drift was the most important factorto influence the accuracy of the results

For time series 119883 = 1199091 1199092 119909119899 and 119884 = 11991011199102 119910119899 if at time 119905 119909119894 minus 119910119894 = 120576 119894 = 1 119899 where 120576 was aconstantThat was to say that the relationship between119883 and119884 was linear drift in the time domain as shown in Figure 4

In this case if Euclidean distance was used and 120576 wasbeyond the threshold these two time series119883 and119884were notconsidered to be similar by mining algorithms While theyhad similar shape and trend apparently the judging resultwas inaccurate obviously Aiming to prevent such errors andto realize data compression and reduce the computationalcost it was necessary to extract feature from the original timeseries data using the image in feature space to replace theoriginal one

Discrete Fourier transform (DFT) which had uniquemerits in time series analysis was an alternative way Fora given time series object DFT could be used to turnit to frequency domain from time domain According toParseval theory the time-domain energy function was equalto the frequency-domain energy function And most of theenergy in frequency domain concentrated on the first few

4 Mathematical Problems in Engineering

0 5 10 15 20

Clock

300

250

200

150

100

50

0

Stat

e

X

Y

Figure 4 Linear drift between time series119883 and 119884

coefficients hence other coefficients could be omitted Thusthe remaining coefficients were able to be seen as the featuresof the original time series

For a given time series 119883 = 119909119905 119905 = 0 1 119899 minus 1translating the series from time domain to frequency domainthe new sequence obtained byDFTwas denoted by119883 = 1199091015840119891119891 = 0 1 119899 minus 1 where

1199091015840119891 =

1

radic119899

119899minus1

sum

119905=0

119909119905 exp(minus1198952120587119891119905

119899) 119891 = 0 1 119899 minus 1 (3)

Taking the data in Figure 3 for example the result of DFTwas shown in Figure 5

The area in which the frequency was greater than 00was the real transformed data of the original finite timeseries 17 elements were left out of the initial total of 21ones Data compression had been realized and as a result ofthe transformation from time domain to frequency domainlinear drift no longer existed aswell as other noises in the timedomain

23 Unrecorded Accidents Detection Using Similarity MiningSimilarity search is an important research field in temporaldata mining As mentioned above each recorded accidentcould bring a piece of time series data and all recordedhistorical accidents data over a period of time could explainthe traffic flow trends when accidents happened After dataprocessing using the above methods which could be calleddata preprocessing clustering analysis was supposed to beimplemented in this paper The classical 119870-means methodwould be able to meet the accuracy requirements due tothe appropriate data preprocessing Results of cluster wereconsidered as ldquonormal traffic flow trendsrdquo under accidentsand the ldquohidden accidentsrdquo could be found out by similaritysearch

Themethod of similarity measurement used in this paperwas Euclidean distance Set the time sequence of ldquonormal

180

160

140

120

100

80

60

40

20

0minus05 00 05

Am

plitu

de

Frequency (Hz)

Figure 5 Discrete Fourier transform

Recorded accidents location

Traffic flow data

DFT

Time series of traffic flow trends

ldquoNormal traffic flow trendsrdquo under accidents

Clustering

Ordinary data

Unrecorded accidents location

Simila

rity

mining

Figure 6 The procedure of unrecorded accidents detection

traffic flow trendsrdquo of a single road segment which was 119909119894whose length was 119899 The time series to be measured wasdenoted by 119910119894 with its length 119873 119873 ge 119899 generally Insimilarity search subsequences of 119910119894 whose lengths were 119899were themeasuring objectThese subsequences were denotedby 119911119894 It was known that the number of 119911119894 was 119869 119869 =119873minus119899+1Then the similaritymetric function could be definedas follows

min119869

119899

sum

119894=1

(119909119894 minus 119870119869119911119869119894 )2 (4)

where 119870119869 was the scaling factor The calculation times were119873 minus 119899 + 1 obviously

So far for a single road segment the procedure to detectunrecorded accidents was described in Figure 6

Mathematical Problems in Engineering 5

Figure 7 The study site and data

3 Results and Discussions

31 The Study Site and Data Preparation A case study wasconducted based on the data extracted from records collectedon Beijing-Harbin Expressway (G1) between Harbin andLalinhe from January 2010 to July 2011 The traffic accidentsdataset included a total of 73 crashes recorded on the 72-kilometer length road in total The study site and the trafficaccidents data were shown in Figure 7

In one and a half years for such a highwaywith the annualtraffic reaching 8ndash10 million ldquoonlyrdquo 73 crashes happenedExperience told us that it did not indicate how safe thehighway was but some minor accidents were not knownor recorded It was meaningful to detect the unrecordedcollisions for traffic safety research

As the limit of detection device traffic flow data workedout using data collected by highway toll collection system Fordetailed calculation process readers could refer toWeng et al[24] The estimation accuracy could meet the requirementstechnically

32 Numerical Experiment In the numerical experiment thetime horizonwas set within half an hour and the time intervalwas 30 secondsThe accidents happened at the eleventh clockinterval namely 5 minutes after the start time of simulationThus therewere 60 points in one piece of time series dataThefree-flow speedwas 120 kmh thus the length of each cell wasthe product of the free-flow speed and the unit clock intervalthat was 120 kmh lowast 30 s = 1 km Based on the historicalaccident data the accident location was set in the middlecell in CTM and five cells were brought in to represent asingle highway segment where a crash happened as shownin Figure 8

The virtual cell in Figure 8 was on behalf of the demandgeneration There were five inflow states as shown by thefive arrows in the figure above Using the method shownin Figure 6 first of all it was significant to identify theldquonormal traffic flow trendsrdquo under accidents Time seriesdata was conducted using the method mentioned above andDFT was implemented then for each crash record 119870-meanscluster analysis was adopted and the results were indicatedin Figure 9

As is shown in Figure 9 two clusters were generated bythis method In Figure 9(a) there were 34 accidents gatheredtogether The common point of them was that congestionformed anddissipated gradually in the simulation periodThe

Accident spotVirtual cell

Figure 8 CTM for a single highway segment

shapes of these time series data were similar to the normaldistribution curveAccording to the accident records of trafficpolice most of these crashes were single-vehicle crashes andboth-car accidents In Figure 9(b) 33 crashes were involvedand the congestion was not eliminated in study period Two-thirds of them were crashes between two heavy trucks andone-third were multivehicle accidents This kind of crashescould cause severe congestion and influence traffic seriouslyActually another cluster existed in this case and only fiveaccidents were involved This cluster had less interferenceto traffic but for the records they were collisions betweenvehicles and pedestrians It was a serious threat to trafficsafety but beyond the scope of this paper so the result wasnot shown here

For the unrecorded accidents usually there were nocasualties and little losses The reason why they were notrecordedwas that theseminor accidentswere settled privatelyand even kept unknown by the traffic police So there wasevery reason to believe that these unrecorded accidents weresingle-vehicle crashes or both-car accidentsThus the clustershown in Figure 9(a) was our target in the similarity searchFor verification one accident out of the 73 crashes wasseparated out pretending to be unknown

By similarity search the ldquopreparedrdquo accident was foundout and the time series data was shown in Figure 10 Thetraffic flow trend in Figure 10 was indeed similar to the clusterin Figure 9(a) At the first 10 clock intervals in both figuresthe traffic was smooth because of the unsaturated traffic flowand no accidents Then congestion formed and dissipatedgradually in a certain period The result of similarity searchproved the reliability of the proposed method

4 Conclusions

Unrecorded accidents were significant to identify trafficaccident-prone location Based on the observation that thetraffic volume traffic speed and traffic density were changedby crashes even minor accidents an automatic traffic acci-dent identification method was proposed As most of thecurrent studies did not pay enough attention to the timefactor when studying the relationship between traffic stateand crashes on highways this paper proposed a methodto construct time series data using traffic flow data whenaccidents happened To avoid the defect of not consideringthe linear drift in the time domain between two sequencesDFT was carried out to extract features from original timeseries data Traffic flow trend could be well understood byclustering analysis Then through the method of similaritysearch unrecorded accidents which were believed to besingle-vehicle crashes or both-car accidents were foundout The case study using real data in Harbin showed thefeasibleness of the proposed method

6 Mathematical Problems in Engineering

250

200

150

100

50

0

Stat

e

0 10 6030 40 5020

Clock

(a)

250

200

150

100

50

0

Stat

e

0 10 6030 40 5020

Clock

(b)

Figure 9 Results of cluster

0 10 6030 40 5020

Clock

250

200

150

100

50

0

Stat

e

Figure 10 Result of similarity search

For further research data of car insurance would bevaluable for data mining in this area or for verification of theautomatic traffic accidents identification And further studycould focus on the traffic flow trends under accidents suchas the influence diffusion and the elimination of the crashinfluence

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the HTRDP (863) under Grantno 2012AA112310 and Research Fund for the Doctoral Pro-gram of Higher Education of Ministry of Education of China(20112302110054)

References

[1] E Krug ldquoDecade of action for road safety 2011ndash2020rdquo Injuryvol 43 no 1 pp 6ndash7 2012

[2] A Gregoriades and K C Mouskos ldquoBlack spots identificationthrough a Bayesian Networks quantification of accident riskindexrdquo Transportation Research C Emerging Technologies vol28 pp 28ndash43 2013

[3] O H Kwon M J Park H Yeo and K Chung ldquoEvaluatingthe performance of network screening methods for detectinghigh collision concentration locations on highwaysrdquo AccidentAnalysis and Prevention vol 51 pp 141ndash149 2013

[4] P T Savolainen F L Mannering D Lord and M A QuddusldquoThe statistical analysis of highway crash-injury severities areview and assessment ofmethodological alternativesrdquoAccidentAnalysis and Prevention vol 43 no 5 pp 1666ndash1676 2011

[5] B Depaire G Wets and K Vanhoof ldquoTraffic accident segmen-tation bymeans of latent class clusteringrdquoAccident Analysis andPrevention vol 40 no 4 pp 1257ndash1266 2008

[6] J M Pardillo-Mayora C A Domınguez-Lira and R Jurado-Pina ldquoEmpirical calibration of a roadside hazardousness indexfor Spanish two-lane rural roadsrdquoAccident Analysis and Preven-tion vol 42 no 6 pp 2018ndash2023 2010

[7] B-J Park and D Lord ldquoApplication of finite mixture models forvehicle crash data analysisrdquo Accident Analysis and Preventionvol 41 no 4 pp 683ndash691 2009

[8] B-J Park D Lord and J D Hart ldquoBias properties of Bayesianstatistics in finite mixture of negative binomial regressionmodels in crash data analysisrdquoAccident Analysis and Preventionvol 42 no 2 pp 741ndash749 2010

[9] M Simoncic ldquoA Bayesian network model of two-car accidentsrdquoJournal of Transportation and Statistics vol 7 no 2-3 pp 13ndash252005

[10] J De Ona R O Mujalli and F J Calvo ldquoAnalysis of traffic acci-dent injury severity on Spanish rural highways using Bayesiannetworksrdquo Accident Analysis and Prevention vol 43 no 1 pp402ndash411 2011

[11] R O Mujalli and J De Ona ldquoA method for simplifying theanalysis of traffic accidents injury severity on two-lane highwaysusing Bayesian networksrdquo Journal of Safety Research vol 42 no5 pp 317ndash326 2011

Mathematical Problems in Engineering 7

[12] K Chung K Jang S Madanat and S Washington ldquoProactivedetection of high collision concentration locations on high-waysrdquo Transportation Research A Policy and Practice vol 45no 9 pp 927ndash934 2011

[13] K Chung D R Ragland S Madanat and S M Oh ldquoThecontinuous risk profile approach for the identification of highcollision concentration locations on congested highwaysrdquo inProceedings of the 19th International Symposium on Transporta-tion amp TrafficTheory (ISTTT rsquo09) pp 463ndash480 2009

[14] G Vandenbulcke I Thomas and L Int Panis ldquoPredict-ing cycling accident risk in Brussels a spatial case-controlapproachrdquo Accident Analysis and Prevention vol 62 pp 341ndash357 2014

[15] Z Zheng ldquoEmpirical analysis on relationship between trafficconditions and crash occurrencesrdquo Procedia-Social and Behav-ioral Sciences vol 43 pp 302ndash312 2012

[16] A Hamzehei E Chung and M Miska ldquoPre-crash and non-crash traffic flow trends analysis on motorwaysrdquo in Proceedingsof the Australasian Transport Research Forum 2013

[17] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research B Methodological vol 28 no4 pp 269ndash287 1994

[18] C F Daganzo ldquoThe cell transmission model part II networktrafficrdquo Transportation Research B Methodological vol 29 no2 pp 79ndash93 1995

[19] M J Lighthill and G B Whitham ldquoOn kinematic waves I flowmovement in long rivers II a theory of traffic flow on longcrowded roadsrdquo Proceedings of the Royal Society of London Avol 229 no 1178 pp 281ndash316 1955

[20] P I Richards ldquoShock waves on the highwayrdquo OperationsResearch vol 4 pp 42ndash51 1956

[21] G Zi-You and L Ke-Ping ldquoEvolution of traffic flow with scale-free topologyrdquo Chinese Physics Letters vol 22 no 10 pp 2711ndash2714 2005

[22] X-G Li Z-Y Gao K-P Li and X-M Zhao ldquoRelationshipbetween microscopic dynamics in traffic flow and complexityin networksrdquo Physical Review E vol 76 no 1 Article ID 0161102007

[23] X Zhao H Liu J Zhang and H Li ldquoMultiple-mode observerdesign for a class of switched linear systemsrdquo IEEE Transactionson Automation Science and Engineering 2013

[24] J-C Weng L-L Liu and B Du ldquoETC data based traffic infor-mation mining techniquesrdquo Journal of Transportation SystemsEngineering and Information Technology vol 10 no 2 pp 57ndash63 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Unrecorded Accidents Detection on ...downloads.hindawi.com/journals/mpe/2014/852495.pdf · Unrecorded Accidents Detection Using Similarity Mining. Similarity search

Mathematical Problems in Engineering 3

0 0 0 0

0 1 1 0

0 1 1 1

1 2 1 1

2 2 2 1t = 4

t = 3

t = 2

t = 1

t = 0

Figure 2 The traffic flow trends with the tick of a clock

each time periodwas regarded as a node of a networkThen acomplex network was constructed also known as a multiple-mode system model which could describe the evolution oftraffic flow Zhao et al studied state estimation of this kind ofsystems [23] However the temporal relations among thesenodes were ignored in their research

According to CTM at each time step the inflow 119910119894(119905) tocell 119894 could be 119899119894minus1(119905) 119876119894(119905) or 120575[119873119894(119905) minus 119899119894(119905)] When 119910119894(119905) =119899119894minus1(119905) the state of the cell was represented by number 0when 119910119894(119905) = 119876119894(119905) it was represented by number 1 otherwiserepresented by number 2 Then the state of the system wasexpressed by a sequence of ternary numbers for example0 0 1 2 1

The model employed was described as follows For 119873cells we assumed that at time period (119905 119905 + 1) the statewas represented by a set of ternary numbers namely 119878119905 =1199041 1199042 119904119873 Then 119878119905 is the precursor of 119878119905+1 Here eachternary number 119904119894 can be considered as an element whichcan take three different states that is 119904119894 = 0 1 2 Figure 2depicted the evolution of traffic flow with the tick of a clock

As shown in Figure 2 when the clock advanced from 0to 4 the system (a single segment) state change could bediscovered clearly using the ldquoscreen capturingrdquo method Thestate 119878119905 was time-varying and was represented by a sequenceof numbers For convenience a parameter119872was introduced

250

200

150

100

50

0

Stat

e

0 5 10 15 20

Clock

Figure 3 Time series data reflecting traffic flow trends of a five-cellroad

in this paper to represent the value of 119878119905 and119872 = sum119873119894=1 3119894119904119894

thus ternary numbers were converted to decimal numbersfor example 0 0 1 2 1 was converted to 16 Thus timeseries data was created (see Figure 3)

Each single decimal number represented a systemrsquos stateAs shown in Figure 3 at time step 9 for instance the valueof the system state was 78 thus the ternary numbers were0 2 2 2 0 which was the state of traffic flow

22 Feature Extraction Noise in the raw time series datacould reduce accuracy and creditability of data miningLinear drift was certainly an example In many clusteringanalysis methods119870-means for example Euclidean distancewas frequently used as a similarity measure function Theunrecorded accidents detection method proposed in thispaper was mostly based on similarity mining which wouldbe discussed later Linear drift was the most important factorto influence the accuracy of the results

For time series 119883 = 1199091 1199092 119909119899 and 119884 = 11991011199102 119910119899 if at time 119905 119909119894 minus 119910119894 = 120576 119894 = 1 119899 where 120576 was aconstantThat was to say that the relationship between119883 and119884 was linear drift in the time domain as shown in Figure 4

In this case if Euclidean distance was used and 120576 wasbeyond the threshold these two time series119883 and119884were notconsidered to be similar by mining algorithms While theyhad similar shape and trend apparently the judging resultwas inaccurate obviously Aiming to prevent such errors andto realize data compression and reduce the computationalcost it was necessary to extract feature from the original timeseries data using the image in feature space to replace theoriginal one

Discrete Fourier transform (DFT) which had uniquemerits in time series analysis was an alternative way Fora given time series object DFT could be used to turnit to frequency domain from time domain According toParseval theory the time-domain energy function was equalto the frequency-domain energy function And most of theenergy in frequency domain concentrated on the first few

4 Mathematical Problems in Engineering

0 5 10 15 20

Clock

300

250

200

150

100

50

0

Stat

e

X

Y

Figure 4 Linear drift between time series119883 and 119884

coefficients hence other coefficients could be omitted Thusthe remaining coefficients were able to be seen as the featuresof the original time series

For a given time series 119883 = 119909119905 119905 = 0 1 119899 minus 1translating the series from time domain to frequency domainthe new sequence obtained byDFTwas denoted by119883 = 1199091015840119891119891 = 0 1 119899 minus 1 where

1199091015840119891 =

1

radic119899

119899minus1

sum

119905=0

119909119905 exp(minus1198952120587119891119905

119899) 119891 = 0 1 119899 minus 1 (3)

Taking the data in Figure 3 for example the result of DFTwas shown in Figure 5

The area in which the frequency was greater than 00was the real transformed data of the original finite timeseries 17 elements were left out of the initial total of 21ones Data compression had been realized and as a result ofthe transformation from time domain to frequency domainlinear drift no longer existed aswell as other noises in the timedomain

23 Unrecorded Accidents Detection Using Similarity MiningSimilarity search is an important research field in temporaldata mining As mentioned above each recorded accidentcould bring a piece of time series data and all recordedhistorical accidents data over a period of time could explainthe traffic flow trends when accidents happened After dataprocessing using the above methods which could be calleddata preprocessing clustering analysis was supposed to beimplemented in this paper The classical 119870-means methodwould be able to meet the accuracy requirements due tothe appropriate data preprocessing Results of cluster wereconsidered as ldquonormal traffic flow trendsrdquo under accidentsand the ldquohidden accidentsrdquo could be found out by similaritysearch

Themethod of similarity measurement used in this paperwas Euclidean distance Set the time sequence of ldquonormal

180

160

140

120

100

80

60

40

20

0minus05 00 05

Am

plitu

de

Frequency (Hz)

Figure 5 Discrete Fourier transform

Recorded accidents location

Traffic flow data

DFT

Time series of traffic flow trends

ldquoNormal traffic flow trendsrdquo under accidents

Clustering

Ordinary data

Unrecorded accidents location

Simila

rity

mining

Figure 6 The procedure of unrecorded accidents detection

traffic flow trendsrdquo of a single road segment which was 119909119894whose length was 119899 The time series to be measured wasdenoted by 119910119894 with its length 119873 119873 ge 119899 generally Insimilarity search subsequences of 119910119894 whose lengths were 119899were themeasuring objectThese subsequences were denotedby 119911119894 It was known that the number of 119911119894 was 119869 119869 =119873minus119899+1Then the similaritymetric function could be definedas follows

min119869

119899

sum

119894=1

(119909119894 minus 119870119869119911119869119894 )2 (4)

where 119870119869 was the scaling factor The calculation times were119873 minus 119899 + 1 obviously

So far for a single road segment the procedure to detectunrecorded accidents was described in Figure 6

Mathematical Problems in Engineering 5

Figure 7 The study site and data

3 Results and Discussions

31 The Study Site and Data Preparation A case study wasconducted based on the data extracted from records collectedon Beijing-Harbin Expressway (G1) between Harbin andLalinhe from January 2010 to July 2011 The traffic accidentsdataset included a total of 73 crashes recorded on the 72-kilometer length road in total The study site and the trafficaccidents data were shown in Figure 7

In one and a half years for such a highwaywith the annualtraffic reaching 8ndash10 million ldquoonlyrdquo 73 crashes happenedExperience told us that it did not indicate how safe thehighway was but some minor accidents were not knownor recorded It was meaningful to detect the unrecordedcollisions for traffic safety research

As the limit of detection device traffic flow data workedout using data collected by highway toll collection system Fordetailed calculation process readers could refer toWeng et al[24] The estimation accuracy could meet the requirementstechnically

32 Numerical Experiment In the numerical experiment thetime horizonwas set within half an hour and the time intervalwas 30 secondsThe accidents happened at the eleventh clockinterval namely 5 minutes after the start time of simulationThus therewere 60 points in one piece of time series dataThefree-flow speedwas 120 kmh thus the length of each cell wasthe product of the free-flow speed and the unit clock intervalthat was 120 kmh lowast 30 s = 1 km Based on the historicalaccident data the accident location was set in the middlecell in CTM and five cells were brought in to represent asingle highway segment where a crash happened as shownin Figure 8

The virtual cell in Figure 8 was on behalf of the demandgeneration There were five inflow states as shown by thefive arrows in the figure above Using the method shownin Figure 6 first of all it was significant to identify theldquonormal traffic flow trendsrdquo under accidents Time seriesdata was conducted using the method mentioned above andDFT was implemented then for each crash record 119870-meanscluster analysis was adopted and the results were indicatedin Figure 9

As is shown in Figure 9 two clusters were generated bythis method In Figure 9(a) there were 34 accidents gatheredtogether The common point of them was that congestionformed anddissipated gradually in the simulation periodThe

Accident spotVirtual cell

Figure 8 CTM for a single highway segment

shapes of these time series data were similar to the normaldistribution curveAccording to the accident records of trafficpolice most of these crashes were single-vehicle crashes andboth-car accidents In Figure 9(b) 33 crashes were involvedand the congestion was not eliminated in study period Two-thirds of them were crashes between two heavy trucks andone-third were multivehicle accidents This kind of crashescould cause severe congestion and influence traffic seriouslyActually another cluster existed in this case and only fiveaccidents were involved This cluster had less interferenceto traffic but for the records they were collisions betweenvehicles and pedestrians It was a serious threat to trafficsafety but beyond the scope of this paper so the result wasnot shown here

For the unrecorded accidents usually there were nocasualties and little losses The reason why they were notrecordedwas that theseminor accidentswere settled privatelyand even kept unknown by the traffic police So there wasevery reason to believe that these unrecorded accidents weresingle-vehicle crashes or both-car accidentsThus the clustershown in Figure 9(a) was our target in the similarity searchFor verification one accident out of the 73 crashes wasseparated out pretending to be unknown

By similarity search the ldquopreparedrdquo accident was foundout and the time series data was shown in Figure 10 Thetraffic flow trend in Figure 10 was indeed similar to the clusterin Figure 9(a) At the first 10 clock intervals in both figuresthe traffic was smooth because of the unsaturated traffic flowand no accidents Then congestion formed and dissipatedgradually in a certain period The result of similarity searchproved the reliability of the proposed method

4 Conclusions

Unrecorded accidents were significant to identify trafficaccident-prone location Based on the observation that thetraffic volume traffic speed and traffic density were changedby crashes even minor accidents an automatic traffic acci-dent identification method was proposed As most of thecurrent studies did not pay enough attention to the timefactor when studying the relationship between traffic stateand crashes on highways this paper proposed a methodto construct time series data using traffic flow data whenaccidents happened To avoid the defect of not consideringthe linear drift in the time domain between two sequencesDFT was carried out to extract features from original timeseries data Traffic flow trend could be well understood byclustering analysis Then through the method of similaritysearch unrecorded accidents which were believed to besingle-vehicle crashes or both-car accidents were foundout The case study using real data in Harbin showed thefeasibleness of the proposed method

6 Mathematical Problems in Engineering

250

200

150

100

50

0

Stat

e

0 10 6030 40 5020

Clock

(a)

250

200

150

100

50

0

Stat

e

0 10 6030 40 5020

Clock

(b)

Figure 9 Results of cluster

0 10 6030 40 5020

Clock

250

200

150

100

50

0

Stat

e

Figure 10 Result of similarity search

For further research data of car insurance would bevaluable for data mining in this area or for verification of theautomatic traffic accidents identification And further studycould focus on the traffic flow trends under accidents suchas the influence diffusion and the elimination of the crashinfluence

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the HTRDP (863) under Grantno 2012AA112310 and Research Fund for the Doctoral Pro-gram of Higher Education of Ministry of Education of China(20112302110054)

References

[1] E Krug ldquoDecade of action for road safety 2011ndash2020rdquo Injuryvol 43 no 1 pp 6ndash7 2012

[2] A Gregoriades and K C Mouskos ldquoBlack spots identificationthrough a Bayesian Networks quantification of accident riskindexrdquo Transportation Research C Emerging Technologies vol28 pp 28ndash43 2013

[3] O H Kwon M J Park H Yeo and K Chung ldquoEvaluatingthe performance of network screening methods for detectinghigh collision concentration locations on highwaysrdquo AccidentAnalysis and Prevention vol 51 pp 141ndash149 2013

[4] P T Savolainen F L Mannering D Lord and M A QuddusldquoThe statistical analysis of highway crash-injury severities areview and assessment ofmethodological alternativesrdquoAccidentAnalysis and Prevention vol 43 no 5 pp 1666ndash1676 2011

[5] B Depaire G Wets and K Vanhoof ldquoTraffic accident segmen-tation bymeans of latent class clusteringrdquoAccident Analysis andPrevention vol 40 no 4 pp 1257ndash1266 2008

[6] J M Pardillo-Mayora C A Domınguez-Lira and R Jurado-Pina ldquoEmpirical calibration of a roadside hazardousness indexfor Spanish two-lane rural roadsrdquoAccident Analysis and Preven-tion vol 42 no 6 pp 2018ndash2023 2010

[7] B-J Park and D Lord ldquoApplication of finite mixture models forvehicle crash data analysisrdquo Accident Analysis and Preventionvol 41 no 4 pp 683ndash691 2009

[8] B-J Park D Lord and J D Hart ldquoBias properties of Bayesianstatistics in finite mixture of negative binomial regressionmodels in crash data analysisrdquoAccident Analysis and Preventionvol 42 no 2 pp 741ndash749 2010

[9] M Simoncic ldquoA Bayesian network model of two-car accidentsrdquoJournal of Transportation and Statistics vol 7 no 2-3 pp 13ndash252005

[10] J De Ona R O Mujalli and F J Calvo ldquoAnalysis of traffic acci-dent injury severity on Spanish rural highways using Bayesiannetworksrdquo Accident Analysis and Prevention vol 43 no 1 pp402ndash411 2011

[11] R O Mujalli and J De Ona ldquoA method for simplifying theanalysis of traffic accidents injury severity on two-lane highwaysusing Bayesian networksrdquo Journal of Safety Research vol 42 no5 pp 317ndash326 2011

Mathematical Problems in Engineering 7

[12] K Chung K Jang S Madanat and S Washington ldquoProactivedetection of high collision concentration locations on high-waysrdquo Transportation Research A Policy and Practice vol 45no 9 pp 927ndash934 2011

[13] K Chung D R Ragland S Madanat and S M Oh ldquoThecontinuous risk profile approach for the identification of highcollision concentration locations on congested highwaysrdquo inProceedings of the 19th International Symposium on Transporta-tion amp TrafficTheory (ISTTT rsquo09) pp 463ndash480 2009

[14] G Vandenbulcke I Thomas and L Int Panis ldquoPredict-ing cycling accident risk in Brussels a spatial case-controlapproachrdquo Accident Analysis and Prevention vol 62 pp 341ndash357 2014

[15] Z Zheng ldquoEmpirical analysis on relationship between trafficconditions and crash occurrencesrdquo Procedia-Social and Behav-ioral Sciences vol 43 pp 302ndash312 2012

[16] A Hamzehei E Chung and M Miska ldquoPre-crash and non-crash traffic flow trends analysis on motorwaysrdquo in Proceedingsof the Australasian Transport Research Forum 2013

[17] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research B Methodological vol 28 no4 pp 269ndash287 1994

[18] C F Daganzo ldquoThe cell transmission model part II networktrafficrdquo Transportation Research B Methodological vol 29 no2 pp 79ndash93 1995

[19] M J Lighthill and G B Whitham ldquoOn kinematic waves I flowmovement in long rivers II a theory of traffic flow on longcrowded roadsrdquo Proceedings of the Royal Society of London Avol 229 no 1178 pp 281ndash316 1955

[20] P I Richards ldquoShock waves on the highwayrdquo OperationsResearch vol 4 pp 42ndash51 1956

[21] G Zi-You and L Ke-Ping ldquoEvolution of traffic flow with scale-free topologyrdquo Chinese Physics Letters vol 22 no 10 pp 2711ndash2714 2005

[22] X-G Li Z-Y Gao K-P Li and X-M Zhao ldquoRelationshipbetween microscopic dynamics in traffic flow and complexityin networksrdquo Physical Review E vol 76 no 1 Article ID 0161102007

[23] X Zhao H Liu J Zhang and H Li ldquoMultiple-mode observerdesign for a class of switched linear systemsrdquo IEEE Transactionson Automation Science and Engineering 2013

[24] J-C Weng L-L Liu and B Du ldquoETC data based traffic infor-mation mining techniquesrdquo Journal of Transportation SystemsEngineering and Information Technology vol 10 no 2 pp 57ndash63 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Unrecorded Accidents Detection on ...downloads.hindawi.com/journals/mpe/2014/852495.pdf · Unrecorded Accidents Detection Using Similarity Mining. Similarity search

4 Mathematical Problems in Engineering

0 5 10 15 20

Clock

300

250

200

150

100

50

0

Stat

e

X

Y

Figure 4 Linear drift between time series119883 and 119884

coefficients hence other coefficients could be omitted Thusthe remaining coefficients were able to be seen as the featuresof the original time series

For a given time series 119883 = 119909119905 119905 = 0 1 119899 minus 1translating the series from time domain to frequency domainthe new sequence obtained byDFTwas denoted by119883 = 1199091015840119891119891 = 0 1 119899 minus 1 where

1199091015840119891 =

1

radic119899

119899minus1

sum

119905=0

119909119905 exp(minus1198952120587119891119905

119899) 119891 = 0 1 119899 minus 1 (3)

Taking the data in Figure 3 for example the result of DFTwas shown in Figure 5

The area in which the frequency was greater than 00was the real transformed data of the original finite timeseries 17 elements were left out of the initial total of 21ones Data compression had been realized and as a result ofthe transformation from time domain to frequency domainlinear drift no longer existed aswell as other noises in the timedomain

23 Unrecorded Accidents Detection Using Similarity MiningSimilarity search is an important research field in temporaldata mining As mentioned above each recorded accidentcould bring a piece of time series data and all recordedhistorical accidents data over a period of time could explainthe traffic flow trends when accidents happened After dataprocessing using the above methods which could be calleddata preprocessing clustering analysis was supposed to beimplemented in this paper The classical 119870-means methodwould be able to meet the accuracy requirements due tothe appropriate data preprocessing Results of cluster wereconsidered as ldquonormal traffic flow trendsrdquo under accidentsand the ldquohidden accidentsrdquo could be found out by similaritysearch

Themethod of similarity measurement used in this paperwas Euclidean distance Set the time sequence of ldquonormal

180

160

140

120

100

80

60

40

20

0minus05 00 05

Am

plitu

de

Frequency (Hz)

Figure 5 Discrete Fourier transform

Recorded accidents location

Traffic flow data

DFT

Time series of traffic flow trends

ldquoNormal traffic flow trendsrdquo under accidents

Clustering

Ordinary data

Unrecorded accidents location

Simila

rity

mining

Figure 6 The procedure of unrecorded accidents detection

traffic flow trendsrdquo of a single road segment which was 119909119894whose length was 119899 The time series to be measured wasdenoted by 119910119894 with its length 119873 119873 ge 119899 generally Insimilarity search subsequences of 119910119894 whose lengths were 119899were themeasuring objectThese subsequences were denotedby 119911119894 It was known that the number of 119911119894 was 119869 119869 =119873minus119899+1Then the similaritymetric function could be definedas follows

min119869

119899

sum

119894=1

(119909119894 minus 119870119869119911119869119894 )2 (4)

where 119870119869 was the scaling factor The calculation times were119873 minus 119899 + 1 obviously

So far for a single road segment the procedure to detectunrecorded accidents was described in Figure 6

Mathematical Problems in Engineering 5

Figure 7 The study site and data

3 Results and Discussions

31 The Study Site and Data Preparation A case study wasconducted based on the data extracted from records collectedon Beijing-Harbin Expressway (G1) between Harbin andLalinhe from January 2010 to July 2011 The traffic accidentsdataset included a total of 73 crashes recorded on the 72-kilometer length road in total The study site and the trafficaccidents data were shown in Figure 7

In one and a half years for such a highwaywith the annualtraffic reaching 8ndash10 million ldquoonlyrdquo 73 crashes happenedExperience told us that it did not indicate how safe thehighway was but some minor accidents were not knownor recorded It was meaningful to detect the unrecordedcollisions for traffic safety research

As the limit of detection device traffic flow data workedout using data collected by highway toll collection system Fordetailed calculation process readers could refer toWeng et al[24] The estimation accuracy could meet the requirementstechnically

32 Numerical Experiment In the numerical experiment thetime horizonwas set within half an hour and the time intervalwas 30 secondsThe accidents happened at the eleventh clockinterval namely 5 minutes after the start time of simulationThus therewere 60 points in one piece of time series dataThefree-flow speedwas 120 kmh thus the length of each cell wasthe product of the free-flow speed and the unit clock intervalthat was 120 kmh lowast 30 s = 1 km Based on the historicalaccident data the accident location was set in the middlecell in CTM and five cells were brought in to represent asingle highway segment where a crash happened as shownin Figure 8

The virtual cell in Figure 8 was on behalf of the demandgeneration There were five inflow states as shown by thefive arrows in the figure above Using the method shownin Figure 6 first of all it was significant to identify theldquonormal traffic flow trendsrdquo under accidents Time seriesdata was conducted using the method mentioned above andDFT was implemented then for each crash record 119870-meanscluster analysis was adopted and the results were indicatedin Figure 9

As is shown in Figure 9 two clusters were generated bythis method In Figure 9(a) there were 34 accidents gatheredtogether The common point of them was that congestionformed anddissipated gradually in the simulation periodThe

Accident spotVirtual cell

Figure 8 CTM for a single highway segment

shapes of these time series data were similar to the normaldistribution curveAccording to the accident records of trafficpolice most of these crashes were single-vehicle crashes andboth-car accidents In Figure 9(b) 33 crashes were involvedand the congestion was not eliminated in study period Two-thirds of them were crashes between two heavy trucks andone-third were multivehicle accidents This kind of crashescould cause severe congestion and influence traffic seriouslyActually another cluster existed in this case and only fiveaccidents were involved This cluster had less interferenceto traffic but for the records they were collisions betweenvehicles and pedestrians It was a serious threat to trafficsafety but beyond the scope of this paper so the result wasnot shown here

For the unrecorded accidents usually there were nocasualties and little losses The reason why they were notrecordedwas that theseminor accidentswere settled privatelyand even kept unknown by the traffic police So there wasevery reason to believe that these unrecorded accidents weresingle-vehicle crashes or both-car accidentsThus the clustershown in Figure 9(a) was our target in the similarity searchFor verification one accident out of the 73 crashes wasseparated out pretending to be unknown

By similarity search the ldquopreparedrdquo accident was foundout and the time series data was shown in Figure 10 Thetraffic flow trend in Figure 10 was indeed similar to the clusterin Figure 9(a) At the first 10 clock intervals in both figuresthe traffic was smooth because of the unsaturated traffic flowand no accidents Then congestion formed and dissipatedgradually in a certain period The result of similarity searchproved the reliability of the proposed method

4 Conclusions

Unrecorded accidents were significant to identify trafficaccident-prone location Based on the observation that thetraffic volume traffic speed and traffic density were changedby crashes even minor accidents an automatic traffic acci-dent identification method was proposed As most of thecurrent studies did not pay enough attention to the timefactor when studying the relationship between traffic stateand crashes on highways this paper proposed a methodto construct time series data using traffic flow data whenaccidents happened To avoid the defect of not consideringthe linear drift in the time domain between two sequencesDFT was carried out to extract features from original timeseries data Traffic flow trend could be well understood byclustering analysis Then through the method of similaritysearch unrecorded accidents which were believed to besingle-vehicle crashes or both-car accidents were foundout The case study using real data in Harbin showed thefeasibleness of the proposed method

6 Mathematical Problems in Engineering

250

200

150

100

50

0

Stat

e

0 10 6030 40 5020

Clock

(a)

250

200

150

100

50

0

Stat

e

0 10 6030 40 5020

Clock

(b)

Figure 9 Results of cluster

0 10 6030 40 5020

Clock

250

200

150

100

50

0

Stat

e

Figure 10 Result of similarity search

For further research data of car insurance would bevaluable for data mining in this area or for verification of theautomatic traffic accidents identification And further studycould focus on the traffic flow trends under accidents suchas the influence diffusion and the elimination of the crashinfluence

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the HTRDP (863) under Grantno 2012AA112310 and Research Fund for the Doctoral Pro-gram of Higher Education of Ministry of Education of China(20112302110054)

References

[1] E Krug ldquoDecade of action for road safety 2011ndash2020rdquo Injuryvol 43 no 1 pp 6ndash7 2012

[2] A Gregoriades and K C Mouskos ldquoBlack spots identificationthrough a Bayesian Networks quantification of accident riskindexrdquo Transportation Research C Emerging Technologies vol28 pp 28ndash43 2013

[3] O H Kwon M J Park H Yeo and K Chung ldquoEvaluatingthe performance of network screening methods for detectinghigh collision concentration locations on highwaysrdquo AccidentAnalysis and Prevention vol 51 pp 141ndash149 2013

[4] P T Savolainen F L Mannering D Lord and M A QuddusldquoThe statistical analysis of highway crash-injury severities areview and assessment ofmethodological alternativesrdquoAccidentAnalysis and Prevention vol 43 no 5 pp 1666ndash1676 2011

[5] B Depaire G Wets and K Vanhoof ldquoTraffic accident segmen-tation bymeans of latent class clusteringrdquoAccident Analysis andPrevention vol 40 no 4 pp 1257ndash1266 2008

[6] J M Pardillo-Mayora C A Domınguez-Lira and R Jurado-Pina ldquoEmpirical calibration of a roadside hazardousness indexfor Spanish two-lane rural roadsrdquoAccident Analysis and Preven-tion vol 42 no 6 pp 2018ndash2023 2010

[7] B-J Park and D Lord ldquoApplication of finite mixture models forvehicle crash data analysisrdquo Accident Analysis and Preventionvol 41 no 4 pp 683ndash691 2009

[8] B-J Park D Lord and J D Hart ldquoBias properties of Bayesianstatistics in finite mixture of negative binomial regressionmodels in crash data analysisrdquoAccident Analysis and Preventionvol 42 no 2 pp 741ndash749 2010

[9] M Simoncic ldquoA Bayesian network model of two-car accidentsrdquoJournal of Transportation and Statistics vol 7 no 2-3 pp 13ndash252005

[10] J De Ona R O Mujalli and F J Calvo ldquoAnalysis of traffic acci-dent injury severity on Spanish rural highways using Bayesiannetworksrdquo Accident Analysis and Prevention vol 43 no 1 pp402ndash411 2011

[11] R O Mujalli and J De Ona ldquoA method for simplifying theanalysis of traffic accidents injury severity on two-lane highwaysusing Bayesian networksrdquo Journal of Safety Research vol 42 no5 pp 317ndash326 2011

Mathematical Problems in Engineering 7

[12] K Chung K Jang S Madanat and S Washington ldquoProactivedetection of high collision concentration locations on high-waysrdquo Transportation Research A Policy and Practice vol 45no 9 pp 927ndash934 2011

[13] K Chung D R Ragland S Madanat and S M Oh ldquoThecontinuous risk profile approach for the identification of highcollision concentration locations on congested highwaysrdquo inProceedings of the 19th International Symposium on Transporta-tion amp TrafficTheory (ISTTT rsquo09) pp 463ndash480 2009

[14] G Vandenbulcke I Thomas and L Int Panis ldquoPredict-ing cycling accident risk in Brussels a spatial case-controlapproachrdquo Accident Analysis and Prevention vol 62 pp 341ndash357 2014

[15] Z Zheng ldquoEmpirical analysis on relationship between trafficconditions and crash occurrencesrdquo Procedia-Social and Behav-ioral Sciences vol 43 pp 302ndash312 2012

[16] A Hamzehei E Chung and M Miska ldquoPre-crash and non-crash traffic flow trends analysis on motorwaysrdquo in Proceedingsof the Australasian Transport Research Forum 2013

[17] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research B Methodological vol 28 no4 pp 269ndash287 1994

[18] C F Daganzo ldquoThe cell transmission model part II networktrafficrdquo Transportation Research B Methodological vol 29 no2 pp 79ndash93 1995

[19] M J Lighthill and G B Whitham ldquoOn kinematic waves I flowmovement in long rivers II a theory of traffic flow on longcrowded roadsrdquo Proceedings of the Royal Society of London Avol 229 no 1178 pp 281ndash316 1955

[20] P I Richards ldquoShock waves on the highwayrdquo OperationsResearch vol 4 pp 42ndash51 1956

[21] G Zi-You and L Ke-Ping ldquoEvolution of traffic flow with scale-free topologyrdquo Chinese Physics Letters vol 22 no 10 pp 2711ndash2714 2005

[22] X-G Li Z-Y Gao K-P Li and X-M Zhao ldquoRelationshipbetween microscopic dynamics in traffic flow and complexityin networksrdquo Physical Review E vol 76 no 1 Article ID 0161102007

[23] X Zhao H Liu J Zhang and H Li ldquoMultiple-mode observerdesign for a class of switched linear systemsrdquo IEEE Transactionson Automation Science and Engineering 2013

[24] J-C Weng L-L Liu and B Du ldquoETC data based traffic infor-mation mining techniquesrdquo Journal of Transportation SystemsEngineering and Information Technology vol 10 no 2 pp 57ndash63 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Unrecorded Accidents Detection on ...downloads.hindawi.com/journals/mpe/2014/852495.pdf · Unrecorded Accidents Detection Using Similarity Mining. Similarity search

Mathematical Problems in Engineering 5

Figure 7 The study site and data

3 Results and Discussions

31 The Study Site and Data Preparation A case study wasconducted based on the data extracted from records collectedon Beijing-Harbin Expressway (G1) between Harbin andLalinhe from January 2010 to July 2011 The traffic accidentsdataset included a total of 73 crashes recorded on the 72-kilometer length road in total The study site and the trafficaccidents data were shown in Figure 7

In one and a half years for such a highwaywith the annualtraffic reaching 8ndash10 million ldquoonlyrdquo 73 crashes happenedExperience told us that it did not indicate how safe thehighway was but some minor accidents were not knownor recorded It was meaningful to detect the unrecordedcollisions for traffic safety research

As the limit of detection device traffic flow data workedout using data collected by highway toll collection system Fordetailed calculation process readers could refer toWeng et al[24] The estimation accuracy could meet the requirementstechnically

32 Numerical Experiment In the numerical experiment thetime horizonwas set within half an hour and the time intervalwas 30 secondsThe accidents happened at the eleventh clockinterval namely 5 minutes after the start time of simulationThus therewere 60 points in one piece of time series dataThefree-flow speedwas 120 kmh thus the length of each cell wasthe product of the free-flow speed and the unit clock intervalthat was 120 kmh lowast 30 s = 1 km Based on the historicalaccident data the accident location was set in the middlecell in CTM and five cells were brought in to represent asingle highway segment where a crash happened as shownin Figure 8

The virtual cell in Figure 8 was on behalf of the demandgeneration There were five inflow states as shown by thefive arrows in the figure above Using the method shownin Figure 6 first of all it was significant to identify theldquonormal traffic flow trendsrdquo under accidents Time seriesdata was conducted using the method mentioned above andDFT was implemented then for each crash record 119870-meanscluster analysis was adopted and the results were indicatedin Figure 9

As is shown in Figure 9 two clusters were generated bythis method In Figure 9(a) there were 34 accidents gatheredtogether The common point of them was that congestionformed anddissipated gradually in the simulation periodThe

Accident spotVirtual cell

Figure 8 CTM for a single highway segment

shapes of these time series data were similar to the normaldistribution curveAccording to the accident records of trafficpolice most of these crashes were single-vehicle crashes andboth-car accidents In Figure 9(b) 33 crashes were involvedand the congestion was not eliminated in study period Two-thirds of them were crashes between two heavy trucks andone-third were multivehicle accidents This kind of crashescould cause severe congestion and influence traffic seriouslyActually another cluster existed in this case and only fiveaccidents were involved This cluster had less interferenceto traffic but for the records they were collisions betweenvehicles and pedestrians It was a serious threat to trafficsafety but beyond the scope of this paper so the result wasnot shown here

For the unrecorded accidents usually there were nocasualties and little losses The reason why they were notrecordedwas that theseminor accidentswere settled privatelyand even kept unknown by the traffic police So there wasevery reason to believe that these unrecorded accidents weresingle-vehicle crashes or both-car accidentsThus the clustershown in Figure 9(a) was our target in the similarity searchFor verification one accident out of the 73 crashes wasseparated out pretending to be unknown

By similarity search the ldquopreparedrdquo accident was foundout and the time series data was shown in Figure 10 Thetraffic flow trend in Figure 10 was indeed similar to the clusterin Figure 9(a) At the first 10 clock intervals in both figuresthe traffic was smooth because of the unsaturated traffic flowand no accidents Then congestion formed and dissipatedgradually in a certain period The result of similarity searchproved the reliability of the proposed method

4 Conclusions

Unrecorded accidents were significant to identify trafficaccident-prone location Based on the observation that thetraffic volume traffic speed and traffic density were changedby crashes even minor accidents an automatic traffic acci-dent identification method was proposed As most of thecurrent studies did not pay enough attention to the timefactor when studying the relationship between traffic stateand crashes on highways this paper proposed a methodto construct time series data using traffic flow data whenaccidents happened To avoid the defect of not consideringthe linear drift in the time domain between two sequencesDFT was carried out to extract features from original timeseries data Traffic flow trend could be well understood byclustering analysis Then through the method of similaritysearch unrecorded accidents which were believed to besingle-vehicle crashes or both-car accidents were foundout The case study using real data in Harbin showed thefeasibleness of the proposed method

6 Mathematical Problems in Engineering

250

200

150

100

50

0

Stat

e

0 10 6030 40 5020

Clock

(a)

250

200

150

100

50

0

Stat

e

0 10 6030 40 5020

Clock

(b)

Figure 9 Results of cluster

0 10 6030 40 5020

Clock

250

200

150

100

50

0

Stat

e

Figure 10 Result of similarity search

For further research data of car insurance would bevaluable for data mining in this area or for verification of theautomatic traffic accidents identification And further studycould focus on the traffic flow trends under accidents suchas the influence diffusion and the elimination of the crashinfluence

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the HTRDP (863) under Grantno 2012AA112310 and Research Fund for the Doctoral Pro-gram of Higher Education of Ministry of Education of China(20112302110054)

References

[1] E Krug ldquoDecade of action for road safety 2011ndash2020rdquo Injuryvol 43 no 1 pp 6ndash7 2012

[2] A Gregoriades and K C Mouskos ldquoBlack spots identificationthrough a Bayesian Networks quantification of accident riskindexrdquo Transportation Research C Emerging Technologies vol28 pp 28ndash43 2013

[3] O H Kwon M J Park H Yeo and K Chung ldquoEvaluatingthe performance of network screening methods for detectinghigh collision concentration locations on highwaysrdquo AccidentAnalysis and Prevention vol 51 pp 141ndash149 2013

[4] P T Savolainen F L Mannering D Lord and M A QuddusldquoThe statistical analysis of highway crash-injury severities areview and assessment ofmethodological alternativesrdquoAccidentAnalysis and Prevention vol 43 no 5 pp 1666ndash1676 2011

[5] B Depaire G Wets and K Vanhoof ldquoTraffic accident segmen-tation bymeans of latent class clusteringrdquoAccident Analysis andPrevention vol 40 no 4 pp 1257ndash1266 2008

[6] J M Pardillo-Mayora C A Domınguez-Lira and R Jurado-Pina ldquoEmpirical calibration of a roadside hazardousness indexfor Spanish two-lane rural roadsrdquoAccident Analysis and Preven-tion vol 42 no 6 pp 2018ndash2023 2010

[7] B-J Park and D Lord ldquoApplication of finite mixture models forvehicle crash data analysisrdquo Accident Analysis and Preventionvol 41 no 4 pp 683ndash691 2009

[8] B-J Park D Lord and J D Hart ldquoBias properties of Bayesianstatistics in finite mixture of negative binomial regressionmodels in crash data analysisrdquoAccident Analysis and Preventionvol 42 no 2 pp 741ndash749 2010

[9] M Simoncic ldquoA Bayesian network model of two-car accidentsrdquoJournal of Transportation and Statistics vol 7 no 2-3 pp 13ndash252005

[10] J De Ona R O Mujalli and F J Calvo ldquoAnalysis of traffic acci-dent injury severity on Spanish rural highways using Bayesiannetworksrdquo Accident Analysis and Prevention vol 43 no 1 pp402ndash411 2011

[11] R O Mujalli and J De Ona ldquoA method for simplifying theanalysis of traffic accidents injury severity on two-lane highwaysusing Bayesian networksrdquo Journal of Safety Research vol 42 no5 pp 317ndash326 2011

Mathematical Problems in Engineering 7

[12] K Chung K Jang S Madanat and S Washington ldquoProactivedetection of high collision concentration locations on high-waysrdquo Transportation Research A Policy and Practice vol 45no 9 pp 927ndash934 2011

[13] K Chung D R Ragland S Madanat and S M Oh ldquoThecontinuous risk profile approach for the identification of highcollision concentration locations on congested highwaysrdquo inProceedings of the 19th International Symposium on Transporta-tion amp TrafficTheory (ISTTT rsquo09) pp 463ndash480 2009

[14] G Vandenbulcke I Thomas and L Int Panis ldquoPredict-ing cycling accident risk in Brussels a spatial case-controlapproachrdquo Accident Analysis and Prevention vol 62 pp 341ndash357 2014

[15] Z Zheng ldquoEmpirical analysis on relationship between trafficconditions and crash occurrencesrdquo Procedia-Social and Behav-ioral Sciences vol 43 pp 302ndash312 2012

[16] A Hamzehei E Chung and M Miska ldquoPre-crash and non-crash traffic flow trends analysis on motorwaysrdquo in Proceedingsof the Australasian Transport Research Forum 2013

[17] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research B Methodological vol 28 no4 pp 269ndash287 1994

[18] C F Daganzo ldquoThe cell transmission model part II networktrafficrdquo Transportation Research B Methodological vol 29 no2 pp 79ndash93 1995

[19] M J Lighthill and G B Whitham ldquoOn kinematic waves I flowmovement in long rivers II a theory of traffic flow on longcrowded roadsrdquo Proceedings of the Royal Society of London Avol 229 no 1178 pp 281ndash316 1955

[20] P I Richards ldquoShock waves on the highwayrdquo OperationsResearch vol 4 pp 42ndash51 1956

[21] G Zi-You and L Ke-Ping ldquoEvolution of traffic flow with scale-free topologyrdquo Chinese Physics Letters vol 22 no 10 pp 2711ndash2714 2005

[22] X-G Li Z-Y Gao K-P Li and X-M Zhao ldquoRelationshipbetween microscopic dynamics in traffic flow and complexityin networksrdquo Physical Review E vol 76 no 1 Article ID 0161102007

[23] X Zhao H Liu J Zhang and H Li ldquoMultiple-mode observerdesign for a class of switched linear systemsrdquo IEEE Transactionson Automation Science and Engineering 2013

[24] J-C Weng L-L Liu and B Du ldquoETC data based traffic infor-mation mining techniquesrdquo Journal of Transportation SystemsEngineering and Information Technology vol 10 no 2 pp 57ndash63 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Unrecorded Accidents Detection on ...downloads.hindawi.com/journals/mpe/2014/852495.pdf · Unrecorded Accidents Detection Using Similarity Mining. Similarity search

6 Mathematical Problems in Engineering

250

200

150

100

50

0

Stat

e

0 10 6030 40 5020

Clock

(a)

250

200

150

100

50

0

Stat

e

0 10 6030 40 5020

Clock

(b)

Figure 9 Results of cluster

0 10 6030 40 5020

Clock

250

200

150

100

50

0

Stat

e

Figure 10 Result of similarity search

For further research data of car insurance would bevaluable for data mining in this area or for verification of theautomatic traffic accidents identification And further studycould focus on the traffic flow trends under accidents suchas the influence diffusion and the elimination of the crashinfluence

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgments

This work was supported by the HTRDP (863) under Grantno 2012AA112310 and Research Fund for the Doctoral Pro-gram of Higher Education of Ministry of Education of China(20112302110054)

References

[1] E Krug ldquoDecade of action for road safety 2011ndash2020rdquo Injuryvol 43 no 1 pp 6ndash7 2012

[2] A Gregoriades and K C Mouskos ldquoBlack spots identificationthrough a Bayesian Networks quantification of accident riskindexrdquo Transportation Research C Emerging Technologies vol28 pp 28ndash43 2013

[3] O H Kwon M J Park H Yeo and K Chung ldquoEvaluatingthe performance of network screening methods for detectinghigh collision concentration locations on highwaysrdquo AccidentAnalysis and Prevention vol 51 pp 141ndash149 2013

[4] P T Savolainen F L Mannering D Lord and M A QuddusldquoThe statistical analysis of highway crash-injury severities areview and assessment ofmethodological alternativesrdquoAccidentAnalysis and Prevention vol 43 no 5 pp 1666ndash1676 2011

[5] B Depaire G Wets and K Vanhoof ldquoTraffic accident segmen-tation bymeans of latent class clusteringrdquoAccident Analysis andPrevention vol 40 no 4 pp 1257ndash1266 2008

[6] J M Pardillo-Mayora C A Domınguez-Lira and R Jurado-Pina ldquoEmpirical calibration of a roadside hazardousness indexfor Spanish two-lane rural roadsrdquoAccident Analysis and Preven-tion vol 42 no 6 pp 2018ndash2023 2010

[7] B-J Park and D Lord ldquoApplication of finite mixture models forvehicle crash data analysisrdquo Accident Analysis and Preventionvol 41 no 4 pp 683ndash691 2009

[8] B-J Park D Lord and J D Hart ldquoBias properties of Bayesianstatistics in finite mixture of negative binomial regressionmodels in crash data analysisrdquoAccident Analysis and Preventionvol 42 no 2 pp 741ndash749 2010

[9] M Simoncic ldquoA Bayesian network model of two-car accidentsrdquoJournal of Transportation and Statistics vol 7 no 2-3 pp 13ndash252005

[10] J De Ona R O Mujalli and F J Calvo ldquoAnalysis of traffic acci-dent injury severity on Spanish rural highways using Bayesiannetworksrdquo Accident Analysis and Prevention vol 43 no 1 pp402ndash411 2011

[11] R O Mujalli and J De Ona ldquoA method for simplifying theanalysis of traffic accidents injury severity on two-lane highwaysusing Bayesian networksrdquo Journal of Safety Research vol 42 no5 pp 317ndash326 2011

Mathematical Problems in Engineering 7

[12] K Chung K Jang S Madanat and S Washington ldquoProactivedetection of high collision concentration locations on high-waysrdquo Transportation Research A Policy and Practice vol 45no 9 pp 927ndash934 2011

[13] K Chung D R Ragland S Madanat and S M Oh ldquoThecontinuous risk profile approach for the identification of highcollision concentration locations on congested highwaysrdquo inProceedings of the 19th International Symposium on Transporta-tion amp TrafficTheory (ISTTT rsquo09) pp 463ndash480 2009

[14] G Vandenbulcke I Thomas and L Int Panis ldquoPredict-ing cycling accident risk in Brussels a spatial case-controlapproachrdquo Accident Analysis and Prevention vol 62 pp 341ndash357 2014

[15] Z Zheng ldquoEmpirical analysis on relationship between trafficconditions and crash occurrencesrdquo Procedia-Social and Behav-ioral Sciences vol 43 pp 302ndash312 2012

[16] A Hamzehei E Chung and M Miska ldquoPre-crash and non-crash traffic flow trends analysis on motorwaysrdquo in Proceedingsof the Australasian Transport Research Forum 2013

[17] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research B Methodological vol 28 no4 pp 269ndash287 1994

[18] C F Daganzo ldquoThe cell transmission model part II networktrafficrdquo Transportation Research B Methodological vol 29 no2 pp 79ndash93 1995

[19] M J Lighthill and G B Whitham ldquoOn kinematic waves I flowmovement in long rivers II a theory of traffic flow on longcrowded roadsrdquo Proceedings of the Royal Society of London Avol 229 no 1178 pp 281ndash316 1955

[20] P I Richards ldquoShock waves on the highwayrdquo OperationsResearch vol 4 pp 42ndash51 1956

[21] G Zi-You and L Ke-Ping ldquoEvolution of traffic flow with scale-free topologyrdquo Chinese Physics Letters vol 22 no 10 pp 2711ndash2714 2005

[22] X-G Li Z-Y Gao K-P Li and X-M Zhao ldquoRelationshipbetween microscopic dynamics in traffic flow and complexityin networksrdquo Physical Review E vol 76 no 1 Article ID 0161102007

[23] X Zhao H Liu J Zhang and H Li ldquoMultiple-mode observerdesign for a class of switched linear systemsrdquo IEEE Transactionson Automation Science and Engineering 2013

[24] J-C Weng L-L Liu and B Du ldquoETC data based traffic infor-mation mining techniquesrdquo Journal of Transportation SystemsEngineering and Information Technology vol 10 no 2 pp 57ndash63 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Unrecorded Accidents Detection on ...downloads.hindawi.com/journals/mpe/2014/852495.pdf · Unrecorded Accidents Detection Using Similarity Mining. Similarity search

Mathematical Problems in Engineering 7

[12] K Chung K Jang S Madanat and S Washington ldquoProactivedetection of high collision concentration locations on high-waysrdquo Transportation Research A Policy and Practice vol 45no 9 pp 927ndash934 2011

[13] K Chung D R Ragland S Madanat and S M Oh ldquoThecontinuous risk profile approach for the identification of highcollision concentration locations on congested highwaysrdquo inProceedings of the 19th International Symposium on Transporta-tion amp TrafficTheory (ISTTT rsquo09) pp 463ndash480 2009

[14] G Vandenbulcke I Thomas and L Int Panis ldquoPredict-ing cycling accident risk in Brussels a spatial case-controlapproachrdquo Accident Analysis and Prevention vol 62 pp 341ndash357 2014

[15] Z Zheng ldquoEmpirical analysis on relationship between trafficconditions and crash occurrencesrdquo Procedia-Social and Behav-ioral Sciences vol 43 pp 302ndash312 2012

[16] A Hamzehei E Chung and M Miska ldquoPre-crash and non-crash traffic flow trends analysis on motorwaysrdquo in Proceedingsof the Australasian Transport Research Forum 2013

[17] C F Daganzo ldquoThe cell transmission model a dynamic repre-sentation of highway traffic consistent with the hydrodynamictheoryrdquo Transportation Research B Methodological vol 28 no4 pp 269ndash287 1994

[18] C F Daganzo ldquoThe cell transmission model part II networktrafficrdquo Transportation Research B Methodological vol 29 no2 pp 79ndash93 1995

[19] M J Lighthill and G B Whitham ldquoOn kinematic waves I flowmovement in long rivers II a theory of traffic flow on longcrowded roadsrdquo Proceedings of the Royal Society of London Avol 229 no 1178 pp 281ndash316 1955

[20] P I Richards ldquoShock waves on the highwayrdquo OperationsResearch vol 4 pp 42ndash51 1956

[21] G Zi-You and L Ke-Ping ldquoEvolution of traffic flow with scale-free topologyrdquo Chinese Physics Letters vol 22 no 10 pp 2711ndash2714 2005

[22] X-G Li Z-Y Gao K-P Li and X-M Zhao ldquoRelationshipbetween microscopic dynamics in traffic flow and complexityin networksrdquo Physical Review E vol 76 no 1 Article ID 0161102007

[23] X Zhao H Liu J Zhang and H Li ldquoMultiple-mode observerdesign for a class of switched linear systemsrdquo IEEE Transactionson Automation Science and Engineering 2013

[24] J-C Weng L-L Liu and B Du ldquoETC data based traffic infor-mation mining techniquesrdquo Journal of Transportation SystemsEngineering and Information Technology vol 10 no 2 pp 57ndash63 2010

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article Unrecorded Accidents Detection on ...downloads.hindawi.com/journals/mpe/2014/852495.pdf · Unrecorded Accidents Detection Using Similarity Mining. Similarity search

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of