06552441-libre

8/12/2019 06552441-libre

1/5

ROAD TRAFFIC PREDICTION USING BAYESIAN

NETWORKS

Poo Kuan Hoong, Ian K. T. Tan, Ong Kok Chien, Choo-Yee Ting

Faculty of Computing and Informatics, Multimedia University, Cyberjaya, Malaysia.

{khpoo, ian, ong.kok.chien08, cyting}@mmu.edu.my

Keywords: Road Traffic Prediction, Context Aware,

Personalized, Bayesian Networks.

Abstract

Having prior road condition knowledge for planned orunplanned journeys will be beneficial in terms of not only

time but potentially cost. Being able to obtain real-timeinformation will further enhance these benefits. Current

systems rely on huge infrastructure investments by

governments to install cameras, road sensors and billboards to

keep motorists informed. These efforts can only be, at best,

available at pre-identified hotspots. Radio broadcast is an

alternative, where they rely on reports by other motorists.

However, such reports are often delayed and not tailored to

individual motorist. Seeing the limitations of existing

approaches to obtain real-time road conditions, this researchwork leverages on mobile devices that provide context

sensitive information to propose a predictive analytics

framework based on a Bayesian Network for road condition

prediction. This paper aims to contribute to (i) defining a setof evidences (variables) that could potentially be utilized for

road condition prediction and (ii) construction of a BayesianNetwork model to predict road conditions. In conclusion, we

presented a novel approach to provide potentially unlimited

coverage of road traffic conditions with substantially reduced

infrastructure investments.

1 Introduction

Knowing the road traffic conditions during travelling would

be advantageous in deciding on alternative routes or even

additional planned detours. This is evident in our daily life

where radio stations compete for listening audience byensuring that they provide regular traffic reporting, especially

during peak traffic hours. However, these are typically

delayed or incorrect and largely dependent on difficult to

verify third party resources such as listeners calling into the

radio stations to report on the traffic.

City councils and local governments invest and placesignificant financial resources to deliver traffic reports

through installation of webcams and traffic billboards [10]

[7]. However, not only do these efforts require significant

investment and expensive on-going maintenance, users are

required to actively interact with the system to obtain

localized information that are relevant.

With the proliferation of smart phones with Global

Positioning Systems (GPS) as well as Internet connectivity,

this poses an opportunity to research into Awareness

Information Environments in Mobile Computing [14] and

develop a framework that is suitable for road traffic reportingin real-time. In a comprehensive survey done by

Anagnostopoulos et. al. [1], they described a 3-dimensional

model space for context-aware system where the threedimensions are context model, sensor centric support, and

system behaviour. In this research, we propose to map our

context sensitive predictive analytics approach on this model

where the context data will comprise of more than just

location [13] but to also include vehicle direction, speed,

altitude, drivers information and vehicle type. In our model,

the 2nd

dimension of sensor centric support will be in the form

of weather, maps and road design information. From thesecontext data and sensor centric support, the overall road

traffic system behaviour will be derived through real-time

predictive analytics.

Research has shown that the presence of unexpected incidentscause the inaccuracy of the forecasted traffic condition[21][16][11]. Such incidents include sudden change in traffic

flow, weather conditions (e.g., rainy or snow) [2], road

conditions, accidents [18][8][17], road works or road

constructions [18]. According to statistic provided by Federal

Highway Administration, United States, 75% of weather-

related vehicle crashes occur on wet pavement and 47%

happen during rainfall. Besides that, [9] states that rain can

increase the crash rate by 71% and the injury rate by 49%. In

addition to the unexpected incidents on the road, inaccuracy

of a predictive model is also due to the insufficient data for

training the model [16]. Such data is often difficult to collect.

Various predictive analytics approaches have been proposed

by researchers to address the challenges in traffic flow

prediction. Prediction methods can vary from very simple to

complex versions. Examples of simple ones are the random

walk (RW), which is solely based on the information aboutcurrent traffic conditions; the historical average (HA)

approach utilizes average flow rates for prediction of current

flow rates; the informed historical average (IHA) combines

RW and HA to predict the current traffic flow rates. These

methods have been proven to work in specific situations [19].

Other complex methods include time series models such as

Autoregressive Integrated Moving Average (ARIMA) and

seasonal ARIMA [19]. In addition, data mining approaches

such as Artificial Neural Network (ANN), simulation,

8/12/2019 06552441-libre

2/5

regression, fuzzy-neural, and Markov chain model have been

employed for prediction of traffic flow

[3][4][5][6][12][20][22]. The above data mining methods not

only require dataset for model training, the accuracy drops

when incomplete evidence is fed into the model. Therefore,

this research work proposes Bayesian Networks (BN) as analternative to address the incomplete information in road

traffic condition prediction.

Recent studies have shown that BN have been employed as

alternative solution to the challenges faced when modelling

and predicting road traffic conditions [15] [23]. For instance,

the work by [15] employed BN to traffic flow forecasting in

the city of Beijing. In the study, BN are mapped to the citys

roads with arcs of the network represent the traffic flow andthe weight of the arc represents the volume of traffic flow.

That is the higher the volume, the strength of arc will

increase. Based on the dataset of Beijing traffic information,

the BN model was constructed using the ExpectationMaximization (EM) algorithm. The performance of the model

was then evaluated using another set of Beijing traffic

information to elicit its accuracy. The original proposed

model was later enhanced into a spatio-temporal BN [16],

where the network incorporates background information such

as peoples activities around shopping centres, car parks and

home communities.

In a study by Zheng et al. [23], a combination of a BN and a

Neural Network was employed. The model was trained andtested using the dataset that comprises of the 15-min time

interval traffic information gathered from the Singapores

Ayer Rajah Expressway. The evaluation results have shownthat the combined model has outperformed predictors that

merely consist of two Neural Networks. While the above two

studies employed BN for short-term prediction of traffic flow,

the study by [11] employs Dynamic Bayesian Networks

(DBN) to predict the traffic flow in real-time mode. The

technique used was multi-regression dynamic model (MDM),

which aims at preserving the conditional independences and

causal drives that were exhibited by the traffic flow series.

The advantage of employing a DBN is in its ability to cater

for real-time changes rather than one time short-termprediction. In this study, the model was trained and tested

using the dataset comprises of traffic flow collected inLondon for six months duration.

2 Bayesian Network (BN) Model

Figure 1: Proposed Bayesian Network model for Road Condition Prediction (M1)

In this research work, road conditions could only bepredicted in light of receiving the information about the

traffic from various sources [21], which ultimately aim at

predicting the likelihood of traffic jams. More than often,

acquiring reliable information from such events can be

uncertain and therefore, the accuracy of forecasted trafficconditions could be varying [21][16][11]. Therefore,

maximizing the number of events as evidence was an

important phase when building the BN model.

Figure 1 depicts the proposed BN model for prediction of

road conditions in this research work (hereafter refer asM1). There are two main sub-networks, namely,Event sub-

networkandRoad Condition sub-network. TheEvent sub-networkconsists of nodes that represent the events where

information about road conditions could be obtainedmainly from micro-blogging, Twitter traffic tweets. Such

events are referred to as evidential nodes in BNs. The

events (evidential nodes) identified in this research work

are weather web services, tweet for Bad weather, tweet for

accident, low travel speed, tweet for road block,announcement for road block, tweet for road construction,

announcement for road construction, and tweet for others.

The second sub-network is the Road Condition sub-

network. The nodes in this sub-network are Bad weather

condition, Accident, Road Block, Road Construction, andOthers. These nodes are referred to asIntermediate nodes.

Arcs are directed from intermediate nodes to the evidentialnodes so that likelihood of traffic jams can be indirectly

Weather

web

service

Tweet for

bad

weather

Tweet for

accident

Low Travel

Speed

Tweet for

road

block

Road block

announcement

Tweet for

road

construction

Road

construction

announcement

Tweet for

others

Bad weather

conditionAccident Road block

Road

constructionOthers

Traffic jam

Event sub-network

Road condition sub-network

8/12/2019 06552441-libre

3/5

inferred from the states instantiated to the evidentialnodes.

For instance, the probability of traffic jam will increase, if

the probability of accident increases. However, the

probability of accident depends on whether or not the

inference engine receives a tweet about accident.

Similarly, in light of receiving a tweet about road blockand an announcement about road block, the probability of

Road Block will increase, which in turn increase the

chance of traffic jams. The information about road blockin Malaysia can be obtained via Twitter posts from Twitter

accounts including @KLroadblock, @TrafficDotMy,

@kltrafficupdate, @kltraffic, @LLMinfotrafik,

@amptraffic, @LEKAStrafik and @plustrafik. Statistics

shows that 17 out of 245 records (6.94%) of the tweets

about traffic jams are caused by road block at particularlocations. As shown in Figure 1, there is an arc directed

from traffic jam to low travel speed. The arc suggested that

low travel speedhas direct influence to the occurrence of

traffic jam.

3 Implementation

This study began with collection of events as evidence to

BN. The identified events have included road traffic, road

construction, road block, and weather from various online

resources like online websites announcements, weather

web services and social networking updates especiallyTwitter tweets. The preliminary evaluation of proposed

BN model was restricted to Twitter tweets and users

reports as main sources of evidences.

The Bayesian Network model construction was performedby employing GeNIe & SMILE. SMILE stands forStructural Modelling, Inference, and Learning Engine. It

consists of C++ library classes implementing graphical

decision-theoretic methods, such as BN and influence

diagrams, directly amenable to inclusion in intelligent

systems. GeNIe is a Windows-Based User Interface

application for graphical decision-theoretic models. In

short, SMILE is the engine and the GeNIe is the simulator

that helps to construct the BN and influence diagram. Both

modules were developed at the Decision Systems

Laboratory, University of Pittsburgh.

Preparing the dataset for evaluation of proposed BNshown in Figure 1 was not a trivial task in this research

work. This is mainly due to the fact that there is no

standard dataset that can be used directly to evaluate the

predictive accuracy of the network. In this light, this

evaluation phase began with preparation of dataset, which

was generated by randomly assigned values to each of the

features. From the randomly generated 1000 records, 200

records that resembled the experts' beliefs were

handpicked by human experts to form a dataset for training

and testing of proposed BN model for traffic prediction.

The evaluation process began with creation of two

variations of BN, namely, Nave Bayesian Network (M2)and parameter-learning Bayesian Network (M3), as

Figure 2: Sample Scenario of Road Information for Jalan

Duta and Joint Probability Expression

benchmark to measure the predictive accuracy of the

proposed Bayesian Network (M1), particularly when no

human intervention were involved in the creation process.

Before the models can be evaluated, pre-processing of raw

data and training of models are required. Two C++

programs, namely, FILTER and TESTER were developed

to perform these works. FILTER is the program that pre-

processes data by matching the case criteria (with filtering

rules) that we are interested and divide them into 2

separated files, 20% for testing and 80% for training. The

filtering rules: (1) if there is an evidence of particularevent/incident, then the event must occur; (2) if there is no

evidence of particular event/incident, the event may occur

as well; (3) if all event including others did not occurs

then the traffic jam does not occur.

TESTER is the main program that evaluates the accuracy

of the data. This programme utilizes the SMILE library,

which allows us to perform some features of BN such as

parameter learning on top of an existing model andgenerate Nave Bayes model from datasets. Besides that,

this program is also able to read the dataset in .csv format

and set the evidence(s) into the model to get the final

results such as whether a traffic jam is likely to occur.

Sample Scenario

Theres a tweet about the about Heavy Rain at Jalan Duta,

According to the weather web service, weather condition is

bad at Jalan Duta,

Theres no tweet about accident at Jalan Duta,

The vehicles moving slow at Jalan Duta,

Theres no roadblock and road construction,

Theres no other tweet about Jalan Duta.

Joint Probability Expression:

P(TrafficJam=YES | BadWeatherCondition=YES) x

P(BadWeatherCondition=YES | WeatherWebService=YES ,

TweetForBadWeather=NO ) x

P(TrafficJam=YES | Accident=NO) x

P(Accident=NO | TweetForAccident=NO) x

P(TrafficJam=YES | LowTravelSpeed=YES) x

P(TrafficJam=YES | RoadBlock=NO) x

P(RoadBlock=NO | TweetForRoadBlock=NO ,

AnnouncementForRoadBlock=NO) x

P(TrafficJam=YES | RoadConstruction=NO) x

P(RoadBlock=NO | TweetForRoadConstruction =NO,

AnnouncementForRoadConstruction=NO) x

P(TrafficJam=YES | Others=NO) x

P(Others=NO | TweetForOthers=NO)

8/12/2019 06552441-libre

4/5

In order to evaluate the accuracy of M1, three experiments

were conducted. In the first experiment, comparison was

made between M1, M2, and M3 (with five iterations of

parameter learning). This experiment involved five

different datasets.

In the second experiment, M1 was compared with M2 using

10 different datasets. Lastly, the third experiment was

conducted in order to determine the influences of the sizeof the dataset towards the accuracy of the model

performance.

4 Results and Discussion

Figure 3: Comparing M1, M2, and M3 with 5 datasets

Figure 3, 4 and 5 represent the results for the three

experiments conducted. As shown in Figure 3, the value of

M3 remains constant after the first iteration as the

parameter had reached its own optimal level. Hence, there

was no need to perform iteration learning on top of M3 in

the real working environment.

Figure 4: Performance comparison of M1 and M2 with 10

datasets

In terms of accuracy of road traffic prediction, we

measured the number of matches of testing result with

actual result and the percentage of matches. M2 depicted

the lowest average accuracy (58.57%), followed by M1(74.37%) while M3 scored the highest accuracy of 76.01%.

As depicted in Figure 4, M1 consistently outperforms M2regardless of the dataset size. The average of accuracy for

M1 was 72.10%, having a 5.13% higher than the averageaccuracy for M2.

Figure 5: Accuracy Comparison between M1, M2, and M3using 3 datasets

Figure 5 shows the result for the third experiment. From

our observation, the size of datasets does not influence theaccuracy level of the BN Model (M3) for road traffic

prediction. From our observation, the smallest dataset size

of 170 cases scored the highest accuracy while the medium

dataset size of 1,700 cases dataset scored the lowest

accuracy as compared to the largest dataset size of 17,000

cases.

5 Conclusion

The research work presented in this paper had three

objectives: (1) to identify the variables that can be used to

infer traffic condition, (2) to identify the relationships

between variables and traffic jam, and (3) to construct aBN model to represents the relationship among the

variables to infer traffic jam. We identified the variables

(unexpected incident) that can be used to infer traffic

conditions are: accident, weather condition (rain/snowy),

road work (construction), road block, and speed ofvehicles can be used to infer traffic condition. We

evaluated our constructed BN (M3) model with Original

CPT (M1) and Parameter Learning (M2). Based on our

initial results, our proposed BN model shows promising

result even though the accuracy of road traffic prediction is

much lower. For our future works, we intend to improve

the BN model in order to achieve more accurate road

traffic prediction.

8/12/2019 06552441-libre

5/5

References

[1] C. B. Anagnostopoulos, A. Tsounis, S.Hadjiefthymiades, Context Awareness in Mobile

Computing Environments. Journal of Wireless

Personal Communication. Vol. 42 (3) Aug. 2007, pp445 464, (2007).

[2] R. Billot, Integrating The Effects Of Adverse

Weather Conditions On Traffic: Methodology,Empirical Analysis And Bayesian Modelling.

[Retrieved from]

http://www.ectri.org/YRS09/Papiers/Session3/Billot_

R_Session3_Traffic(2).pdf, (2009).

[3] R. Chrobok, J.Wahle, and M. Schreckenberg,

Traffic forecast using simulations of large scalenetworks, in Proc. 4th IEEE Int. Conf. Intelligent

Transportation Systems, Oakland, CA, pp. 434

439, (2001).

[4] M. Danech-Pajouh and M. Aron, ATHENA, amethod for short-term inter-urban traffic

forecasting,INRETS, Paris, France, Tech. Rep. 177,

(1991).

[5] G. A. Davis, Adaptive forecasting of freeway traffic

congestion, Transp. Res. Rec., no. 1287, pp. 29

33, (1990).

[6] M. Der Voort, M. Dougherty, and S. Watson,Combining Kohonen maps with ARIMA time series

models to forecast traffic flow, Transp. Res., Part C

Emerg. Technol., Vol. 4 (5), pp. 307 318, (1996).

[7] Dewan Bandaraya Kuala Lumpur (DBKL) Integrated

Transport Information System. [Online]

http://www.itis.com.my/atis/index.jsf[8] S.O. John, E.B. Fabian, Distributed or Centralized

- The Applications Take.

Proceedings of the 6th Annual IEEE communications

society conference on Sensor, Mesh and Ad Hoc

Communications and Networks (SECON09), pp. 709

718, (2009).

[9] Q. Lin, W. Nixon, Effects of Adverse Weather on

Traffic Crashes: Systematic Review and Meta-

Analysis. In Proceedings of the 87nd annual

meeting of the Transportation Research Board.

CDROM. Transportation Research Board of theNational Academies, Washington, D.C., pp. 139

146, (2008).[10] New Zealand Transport Agency: Auckland Traffic

Flow. [Online]

http://www.nzta.govt.nz/traffic/current-

conditions/webcams/auckland/traffic.phtml

[11] C. Queen, C. Albers, Intervention and causality:

forecasting traffic flows using a dynamic Bayesian

network. Journal of the American Statistical

Association. Vol. 104(486), pp. 669 681, (2009).

[12] B. L. Smith and M. Demetsky, Traffic flow

forecasting: Comparison of modelling approaches,J. Transp. Eng., Vol. 123 (4), pp. 261 266, (1997).

[13] A. Schmidt, M. Beigl, Hans-W. Gellersen, There is

more to context than location, Computers &Graphics, Vol. 23 (6), Dec. pp. 893 901, (1999).

[14] K. Stefanidis, E. Pitoura, Related Work on Context-

Aware Systems. Work in Progress Report,

Department of Computer Science, University ofIoannina, Greece. [Retrieved from]

http://softsys.cs.uoi.gr/deca/deca-survey.pdf, (2001).

[15] S. Sun, C. Zhang, and G. Yu, A Bayesian networkapproach to traffic flow forecasting, IEEE Trans.

Intell. Transp. Syst., Vol. 7 (1), pp. 124 131,

(2006).[16] S. Sun, C. Zhang, and Y. Zhang, Traffic flow

forecasting using a spatio-temporal Bayesian

Network predictor. In Artificial Neural Networks:

Formal Models and Their Applications (ICANN), pp.

273 278, (2005).

[17] G.Z. Tan, Z.P. Liu, Y.D. Wang, The Determination

and Analysis of Traffic Congestion Evacuation

Priority. 2nd

IITA International Conference on

Geoscience and Remote Sensing, pp. 484 487,

(2010).[18] M. Wachs, Fighting traffic congestion with

information technology. Issues in Science andTechnology. National Academy of Sciences.

[Retrieved from]

http://www.highbeam.com/doc/1G1-93659945.html,

(2002).

[19] B. M. William, Modeling and forecasting vehicular

traffic flow as a seasonal stochastic time series

process, Ph.D. dissertation, Dept. Civil Eng., Univ.

Virginia, Charlottesville, VA, (1999).

[20] H. B. Yin, S. C. Wong, J. M. Xu, and C. K. Wong,

Urban traffic flow prediction using a fuzzy neural

approach, Transp. Res., Part C Emerg. Technol.,Vol. 10 (2), pp. 85 98, Apr. (2002).

[21] J.Y. Young, M.G. Cho, A Short-Term Prediction

Model for Forecasting Traffic Information Using

Bayesian Network. Third 2008 International

Conference on Convergence and Hybrid Information

Technology (ICCIT), pp. 242 247, (2008).[22] G. Q. Yu, J. M. Hu, C. S. Zhang, L. K. Zhuang, and

J. Y. Song, Short term traffic flow forecasting based

on Markov chain model, in Proc. IEEE Intelligent

Vehicles Symp., Columbus, OH, pp. 208 212,

(2003).[23] W. Zheng, D. Lee, and Q. Shi, Short-term freeway

traffic flow prediction: Bayesian combined neuralnetwork approach,J. Transp. Eng., Vol. 132 (2), pp

114 121, (2006).

Documents

06552441-libre