10
Research Article Developing Train Station Parking Algorithms: New Frameworks Based on Fuzzy Reinforcement Learning Wei Li, 1,2 Kai Xian, 2 Jiateng Yin , 3 and Dewang Chen 4 1 School of Traffic and Transportation, Beijing Jiaotong University, Beijing, 100044, China 2 Beijing Transport Institute, No. 9, LiuLiQiao South Lane, Fengtai District, Beijing, China 3 State Key Laboratory of Rail Traffic Control and Safety, Beijing Jiaotong University, Beijing, 100044, China 4 College of Mathematics and Computer Science, Fuzhou University, Fuzhou 350116, China Correspondence should be addressed to Jiateng Yin; [email protected] and Dewang Chen; [email protected] Received 20 March 2019; Accepted 15 May 2019; Published 4 August 2019 Academic Editor: Luigi Dell'Olio Copyright © 2019 Wei Li et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Train station parking (TSP) accuracy is important to enhance the efficiency of train operation and the safety of passengers for urban rail transit. However, TSP is always subject to a series of uncertain factors such as extreme weather and uncertain conditions of rail track resistances. To increase the parking accuracy, robustness, and self-learning ability, we propose new train station parking frameworks by using the reinforcement learning (RL) theory combined with the information of balises. ree algorithms were developed, involving a stochastic optimal selection algorithm (SOSA), a Q-learning algorithm (QLA), and a fuzzy function based Q-learning algorithm (FQLA) in order to reduce the parking error in urban rail transit. Meanwhile, five braking rates are adopted as the action vector of the three algorithms and some statistical indices are developed to evaluate parking errors. Simulation results based on real-world data show that the parking errors of the three algorithms are all within the ±30cm, which meet the requirement of urban rail transit. 1. Introduction Urban rail transit systems include underground railway, streetcar, light rail transit, monorail, automated guide tran- sit, and magnetic levitation, which have received increased attention in large cities since urban rail system has remarkable advantages in fast speed, safety, punctuality, environment friendliness, and land saving features. In modern rail transit lines, train station parking (TSP) is an important technique that needs to be addressed. Specifically, TSP refers to the running train stopping at the parking spot precisely when train enters the train station. Imprecise TSP may impact the efficiency, convenience, and safety of the urban rail transit. In particular, most newly built transit stations install platform screen doors (PSDs) to isolate the platform and rail track. PSDs can reduce the cold and hot air exchange between the platform area and rail track area. ey can also prevent passengers from falling or jumping onto the rail track and ensure safety of the passengers [1, 2]. Imprecise parking could directly lead to serious consequences that the train door can not be accessed normally and passengers can not get on and off the train easily [3]. In such cases, the train will be delayed which causes the dramatic decrease of corridor capacity [4]. A lot of existing studies have addressed the TSP problem. For example, Yoshimoto et al. applied the predictive fuzzy control technology, which performs better parking accuracy by comparing with proportional-integral-derivative control method [5]. Yasunsbo et al. used fuzzy inference for auto- matic train parking control. e parking area was divided into several sections in order to use different fuzzy inference rules [6, 7]. Hou et al. introduced the terminal iterative learning control to train automatic stopping control. It updated the current controller using the prior parking errors in the previous iterations [8]. Aſter several times of iterations, the train parking error was reduced below 0.5m, while no explicit statistical results were presented in this study. Zhou studied the regression of Gaussian process in machine learning and Boosting regression algorithm for the accurate parking problem, and these two methods are compared with the linear regression method [9]. Soſt computing methods have been Hindawi Journal of Advanced Transportation Volume 2019, Article ID 3072495, 9 pages https://doi.org/10.1155/2019/3072495

Developing Train Station Parking Algorithms: New Frameworks …downloads.hindawi.com/journals/jat/2019/3072495.pdf · 2019. 8. 4. · New Frameworks Based on Fuzzy Reinforcement Learning

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Developing Train Station Parking Algorithms: New Frameworks …downloads.hindawi.com/journals/jat/2019/3072495.pdf · 2019. 8. 4. · New Frameworks Based on Fuzzy Reinforcement Learning

Research ArticleDeveloping Train Station Parking AlgorithmsNew Frameworks Based on Fuzzy Reinforcement Learning

Wei Li12 Kai Xian2 Jiateng Yin 3 and Dewang Chen 4

1 School of Traffic and Transportation Beijing Jiaotong University Beijing 100044 China2Beijing Transport Institute No 9 LiuLiQiao South Lane Fengtai District Beijing China3State Key Laboratory of Rail Traffic Control and Safety Beijing Jiaotong University Beijing 100044 China4College of Mathematics and Computer Science Fuzhou University Fuzhou 350116 China

Correspondence should be addressed to Jiateng Yin 13111054bjtueducn and Dewang Chen dwchenfzueducn

Received 20 March 2019 Accepted 15 May 2019 Published 4 August 2019

Academic Editor Luigi DellOlio

Copyright copy 2019 Wei Li et al This is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Train station parking (TSP) accuracy is important to enhance the efficiency of train operation and the safety of passengers for urbanrail transit However TSP is always subject to a series of uncertain factors such as extreme weather and uncertain conditions of railtrack resistances To increase the parking accuracy robustness and self-learning ability we propose new train station parkingframeworks by using the reinforcement learning (RL) theory combined with the information of balises Three algorithms weredeveloped involving a stochastic optimal selection algorithm (SOSA) a Q-learning algorithm (QLA) and a fuzzy function basedQ-learning algorithm (FQLA) in order to reduce the parking error in urban rail transit Meanwhile five braking rates are adoptedas the action vector of the three algorithms and some statistical indices are developed to evaluate parking errors Simulation resultsbased on real-world data show that the parking errors of the three algorithms are all within the plusmn30cm whichmeet the requirementof urban rail transit

1 Introduction

Urban rail transit systems include underground railwaystreetcar light rail transit monorail automated guide tran-sit and magnetic levitation which have received increasedattention in large cities since urban rail systemhas remarkableadvantages in fast speed safety punctuality environmentfriendliness and land saving features In modern rail transitlines train station parking (TSP) is an important techniquethat needs to be addressed Specifically TSP refers to therunning train stopping at the parking spot precisely whentrain enters the train station Imprecise TSP may impact theefficiency convenience and safety of the urban rail transit Inparticular most newly built transit stations install platformscreen doors (PSDs) to isolate the platform and rail trackPSDs can reduce the cold and hot air exchange betweenthe platform area and rail track area They can also preventpassengers from falling or jumping onto the rail track andensure safety of the passengers [1 2] Imprecise parking coulddirectly lead to serious consequences that the train door can

not be accessed normally and passengers can not get on andoff the train easily [3] In such cases the train will be delayedwhich causes the dramatic decrease of corridor capacity [4]

A lot of existing studies have addressed the TSP problemFor example Yoshimoto et al applied the predictive fuzzycontrol technology which performs better parking accuracyby comparing with proportional-integral-derivative controlmethod [5] Yasunsbo et al used fuzzy inference for auto-matic train parking controlTheparking areawas divided intoseveral sections in order to use different fuzzy inference rules[6 7] Hou et al introduced the terminal iterative learningcontrol to train automatic stopping control It updated thecurrent controller using the prior parking errors in theprevious iterations [8] After several times of iterations thetrain parking error was reduced below 05m while no explicitstatistical results were presented in this study Zhou studiedthe regression of Gaussian process in machine learningand Boosting regression algorithm for the accurate parkingproblem and these twomethods are comparedwith the linearregression method [9] Soft computing methods have been

HindawiJournal of Advanced TransportationVolume 2019 Article ID 3072495 9 pageshttpsdoiorg10115520193072495

2 Journal of Advanced Transportation

Segment2

Segment1

SegmentN-1

1Input 1M1Output u1

3132

333-1

(a) Train passes balise B1

2Segment

SegmentN-1

31

3233

3-1

2Input 2M2

Output u2

(b) Train passes balise B2

SegmentN-1

31

3233

3-1

3

Input 3M3Output u3

(c) Train passes balise B3

SegmentN-1

31

3233

3-1

-1Input Nminus1MNminus1

Output uNminus1

(d) Train passes balise BN-1

Figure 1 Parking area and balises distribution [17]

applied to train parking in urban rail transit by Chen andGao and three models were presented involving a linearmodel a generalized regression neural network model andan adaptive network based fuzzy inference (ANFIS) model[10] The ANFIS can keep the parking errors within 30cmwith the probability being greater than 999 in simulationHowever the simulation experiments are conducted underideal environment and nodisturbances are taken into accountin this study

Based on the braking characteristics of trains in urbanrail transit two simplified models were developed to estimatethe TSP error in [11] Then a precise stop control algorithmbased on LQR (linear-quadratic-regulation) was proposedcombining the vehicles dynamics model and system iden-tification theory which obtained the train station parkingerrors within 50cm Chen et al tried to address the trainparking problem using a new machine learning techniqueand proposed a novel online learning control strategy withthe help of the precise location data of balises installed instations [12ndash14] Experiments were conducted to test thealgorithm performance under the variation of different trainparameters and the best algorithm achieved an average park-ing error of 38cm Even though a lot of studies are proposedin the existing literature we note that the train station parkingis a very complex problem due to many influencing externalfactors for example the weather train running states or on-board passengers while these uncertain factors have not beenexplicitly tackled in former literature [15]

The theory of reinforcement learning (RL) provides anormative account that describes how agents may optimizetheir control under uncertain environment [16] As a typicalalgorithm in RL field Q-learning uses action-value functionto approximate the optimal control strategies of an agentand has been widely successfully implemented into manycomplex control problems In particular Q-learning utilizes

the information through the interactions between an agentand the environment and gradually learns the optimal controlstrategies for a multistage decision problem And this processis actually very similar with TSP where a train can be termedas an agent Therefore this paper aims to introduce the RLtheory and Q-learning algorithms to solve the TSP problemin urban rail transit systems

Specifically in this paper we first establish a simulationtraining platform based on the real-world data involvingthe distance speed limit gradient and position of balisesbetween the Jiugong station and the Xiaohongmen stationof Yizhuang Line in Beijing Subway Three algorithms wereproposedmdasha stochastic optimal selection algorithm (SOSA)a Q-learning algorithm (QLA) and a fuzzy function Q-learning algorithm (FQLA) Performances of three algo-rithms were analyzed and the experimental results werecompared The rest of the paper is organized as follows InSection 2 TSP problem is described including five differentbraking rates and the TSP simulation platform is introducedIn Section 3 three algorithms for reducing and estimatingthe parking errors are developed based on the reinforcementlearning (RL) theory Seven statistical indices are defined tocalculate and evaluate the accuracy and reliability of thesethree algorithms In Section 4 experimental results of thesethree algorithms are compared and analyzed in detail usingthe field data in simulation platform Finally conclusions andfuture research are outlined in Section 5

2 Problem Description andEnvironment Construction of RL

21 Train Station Parking Problem As shown in Figure 1several balises are installed on the tracks in rail stations andthe train can correct its position and velocity once it goesthrough each balise Typically the train enters the station

Journal of Advanced Transportation 3

Inputmodule

Algorithmmodule

Brakingmodule

Displaymodule

Simulateresistance

Randomdisturbance

Basic resistance abc

White noise

V0 D etc target braking rate

E

Figure 2 Structure of the parking simulation platform

area and starts to brake at the initial spot (first balise S1)which is 119871 meters away from the stopping spot (last baliseS119873) This area between S1 and S119873 is also called parkingarea There are 119873 balises distributed in this area (includingS1 sdot sdot sdot S119873) according to the actual distribution of the urban railtransit Meanwhile the distribution of the balises satisfies thefundamental requirement that the balises near the stoppingspot are dense and the balises far from the stopping spot aresparse

There are different braking rates in each balise Every timethe train reaches a balise we will get 5 different train brakingrates through the following calculation also called 5-elementvector A They are the following(1) Nonlearning strategy (NLs) braking rate we cancalculate the theoretical braking rate 119886119905119894 by using the kine-matic formula This strategy does not consider the influenceof interference on parking The target braking rate 119886119892119894 wewant equals the theoretical braking rate in this nonlearningstrategy

1198812119879 minus 1198812119894 = 2119886119905119894119878119894 119881119879=0997888997888997888997888rarr 119886119905119894 = minus11988121198942119878119894 (1)

119886119892119894 = 119886119905119894 (2)

where119881119879 is the train speed at the last balise 119881119894 represents thetrain speed at the current balise 119878119894 is the distance between thecurrent balise and the parking spot(2) Fixed learning strategy (FLs) braking rate based onNLs we add a fixed learning rate 120578 and Δ119886119894 which is thedeviation of actual braking rate with theoretical braking rate

1198812119894+1 minus 1198812119894 = 2119886119903119894119863119894 119881119894+1 =0997888997888997888997888997888rarr 119886119903119894 = (1198812119894+1 minus 1198812119894 )2119863119894 (3)

Δ119886119894 = 119886119903119894 minus 119886119905119894 (4)

119886119892119894 = 119886119905119894 minus 120578Δ119886119894 (5)

The calculation method of 119886119905119894 is the same as NLs where 119881119894represents the train speed at the current balise119881119894+1 representsthe train speed at the next balise 119863119894 is the distance betweenthe current balise and the next balise(3) Variable learning strategy (VLs) braking rate usevariable learning rate which takes into account the train

speed passing through the balises and arrangement of thebalises instead of fixed learning rate

120578119894 = 120582119881119894119878119894 (6)

119886119892119894 = 119886119905119894 minus 120578119894Δ119886119894 (7)

Thedefinition of119881119894 and 119878119894 is the same as inNLsThedefinitionofΔ119886119894 is equivalent toΔ119886119894 of FLs where 120582 is a constant namedfixed adjustment coefficient(4) Gradient descent learning strategy (GDLs) brakingrate objective function of GDLs is as follows

119869 = 05 (119878119894 minus 119878119890119894 )2 (8)

where 119878119894 represents the actual remaining distance 119878119890119894 is theestimated remaining distance We have

119886119892119894 = 119886119905119894 minus 120578119866119894 120597119869120597119886119905119894 (9)

where 120578119866119894 represents the gradient descent learning rate(5) Newton descent learning strategy (NDLs) brakingrate objective function of NDLs is the same as GDLs andwe get

119886119892119894 = 119886119905119894 minus 120578119873119894 1205971198691205971198861199051198941205972119869120597 (119886119905119894 )2 (10)

where 120578119873119894 represents the Newton descent learning rateIn practice we can choose one of the above control

strategies once a train goes through a balise to adjust thetrain braking rate in order to stop the train precisely atthe last balise S119873 under uncertain environment To modelthe uncertain environment we next construct a simulationplatform to simulate the train parking process

22 Parking Simulation Platform Theparking system is usedto simulate the actual train parking process in rail stationsThe system mainly includes an input module algorithmmodule braking module and display module shown asFigure 2

Parameters are set up in the input module according tothe field data The input parameters include initial speed 1198810

4 Journal of Advanced Transportation

TSP system

Parking algorithm

Current reward R(MC)

Action vector A(NLs FLs

VLs GDLsNDLs)

StateS1 S2S3 S4

S5

Figure 3 Diagram of state reward and action vector

when the train starts braking the length of the parking area 119871the number of balises 119899 the distance between each balise 119863resistance effect white noise type of algorithm disturbanceand number of iterations These parameters provide the ini-tialized circumstance and related configuration informationWe can change the simulation circumstance by changing theparameters above

Themain function of the algorithm module is to generatethe target braking rate 119886 Algorithm module can calculatethe 5 different train braking rates mentioned above and thenselect the optimal or the most appropriate target braking rateaccording to the algorithms

Simulation resistance and random disturbance representthe actual urban rail transit systemThe simulation resistancegenerates the current train resistance based on the currentspeed using the formula related to the speed of the train inthe form of a quadratic equation called basic resistance unitequation

1198820 = 120572V2 + 120573V + 120593 (11)

where1198820 represents the basic resistance unit (Nt) V is speedof train (kmh) 120593 represents the part of rolling resistanceunrelated to the train speed 120573 is parameter that is used todescribe wheel flange friction resistance and 120572 is associatedwith the square of speed The white noise generator cangenerate random white noise to test the anti-interferenceability of the algorithms

The braking module is used to park the train accordingto the target braking rate 119886 The transfer function of thebraking module includes time delay time constant andtrain operating mode transition The transfer function of thebraking module is

119866119886 (119904) = 1198861 + 119879119901119904 119890minus119879119889119904 (12)

where 119886 is the target braking rate119879119889 represents the timedelayand 119879119901 is the time constant

Finally display module mainly shows the experimentalresults including parking error and target braking rate

3 Algorithm Design and Performance Indices

31 Stochastic Optimal Selection Algorithm (SOSA) In SOSA1199041 to 119904119899 correspond to current 119899 states of algorithm at themoment We assume that the reward function about currentstate is 119877 119877 can transfer each state into the real number119904119894 R997888rarr R Δ represents the distance between the runningtrain and the current nearest balise If the train passes thecurrent nearest balise the value of Δ is negative otherwise itis positive 119877(119904119894) is the relationship between the current stateand the current reward shown as

119877 (119904119894) =

+3 0119888119898 le |Δ| le 10119888119898+2 10119888119898 le |Δ| le 20119888119898+1 20119888119898 le |Δ| le 30119888119898minus3 30119888119898 le |Δ|

(119894 = 1 2 119899)(13)

When the distance Δ is less than 10cm the reward function119877(119904119894) is the largest among all the results When the distanceΔ is beyond plusmn30cm the reward function 119877(119904119894) is smallestand negative The reward function immediately indicates thereward of the train in its current state In this algorithm wedo not have to consider the long-term reward of the stateThe bigger the value of the reward function is the better theperformance of this algorithm will be

We consider the train braking rate of the controllermodule as the output action vector A When a train passesthrough a balise in the simulation the system will change thetarget braking rate to ensure an accurate parking Because ofsystem time delay and time constant the actual braking rateof the train will approach the target braking rate gradually

In this paper we adopt five braking rates They are non-learning strategy (NLs) braking rate fixed learning strategy(FLs) braking rate variable learning strategy (VLs) brakingrate gradient descent learning strategy(GDLs) braking rateand Newton descent learning strategy (NDLs) braking rateThe action vector A is a 5-element vector including thebraking rates mentioned above A=[NLs FLs VLs GDLsNDLs] shown as Figure 3 When a train passes through

Journal of Advanced Transportation 5

a balise the SOSA will choose one strategy from vectorA randomly as the current target braking rate After thetrain passes through all balises also called one episode thealgorithm records the parking error and action sequenceThesystem will start the next episode from the initial spot until119877(119904119894) is large enough or bigger than a preset value 120579

The SOSA consist of the following steps

Step 1 For all the states initialize the system parametersassume that the train enters the parking area

Step 2 After passing through each balise calculate the actionvector A and current reward 119877(119904119894) Choose an action from Arandomly as the current target braking rate

Step 3 Store the current reward in the array then repeat Step2 until the train reaches the stopping spot Store the parkingerror and action sequence Go to Step 4

Step 4 If 119877(119904119894) lt 120579 go to Step 1 or else use the current actionsequence as the train braking rate

32 Q-Learning Algorithm In QLA 1199041 to 119904119899 correspond tocurrent 119899 states of algorithm at themomentThe action vectorA=[NLs FLs VLs GDLs NDLs] Each state and action areconnected with the value of 119876(119904 119886) According to the 119876(119904 119886)the algorithm will pick an action from vector A Besides thealgorithm can also update the Q function according to the120576-greedy action selection mechanism [18]119876(119904119894 119886) function is the estimated total reward under thestate 119904119894 and action 119886 We have

119876 (119904 119886) = 119864 [119877 (1199040 119886) + 120574119877 (1199041 119886) + 1205742119877 (1199042 119886) + sdot sdot sdot+ 120574119894119877 (119904119894 119886) | 1199040 = 119904] = 119877 (119904 119886)+ 120574sum1198781015840

119875119904119886 (1199041015840)119876 (1199041015840 119886)(14)

where the discounted factor is 0 le 120574 lt 1 In this paperdiscounted factor 120574 equals 099 119904 represents the current stateand 1199041015840 is the next state when the last state is 119904 119875119904119886 is statetransition distribution probability It is the probability thatstate transfers from 119904119894 to 119904119895 using the action 119886 The trainactually runs along the railway track So no matter whataction is chosen the next state must be 119904119894+1 We decide tolet 119875119904119886(1199041015840)=1 The 119877(119904 119886) is defined the same as the 119877(119904) instochastic optimal selection algorithm

The bigger the119876(119904 119886) is the larger the immediate rewardin each balise will be In order to obtain the maximum119876 thealgorithm needs to update the Q function as shown in

120587 (119904) fl argmax119886sum1199041015840

119875119904119886 (1199041015840)119876 (1199041015840 119886) (15)

The optimal119876 can be expressed as a119876lowast It represents the totalreward according to the above updating strategy we get

119876lowast (119904 119886) = 119877 (119904 119886) + 120574sum1199041015840isin119878

119875119904119886 (1199041015840)max119886119876lowast (1199041015840 119886) (16)

Start

If it is different fromprevious iteration

End

Update Q(sa)

If Q(sa) equals Qlowast(sa)

Adopt a strategy

Calculate Q(sa)

Choose other actions inthe next iteration

Yes

No

No

Yes

Figure 4 Training flowchart of QLA

The overall training processes of QLA are presented inFigure 4 consisting of the following steps

Step 1 Initialize 119876(119904 119886)Step 2 Adopt a strategy 120587 with any action sequence Whenpassing by each balise record the sequence of state actionand immediate reward value using (13) After each episodecalculate the 119876(119904 119886) according to (14)Step 3 If the current strategy and the previous selected strat-egy are different then update the strategy using the actionselection mechanism Select an action with larger probabilitythat can make 119876(119904 119886) become the largest one like (15) Onthe contrary QL will select other actions with a smallerprobability

Step 4 Repeat the steps mentioned above when 119876(119904 119886) 997888rarr119876lowast(119904 119886) we can achieve best strategy 120587 997888rarr 120587lowast33 Fuzzy Q-Learning Algorithm FQLA is basically the sameas QLA It is an improved version of QLAWithout using thediscrete function instead the fuzzy function Q-learning usesa continuous function where the value of immediate rewardis a typical Gauss type membership function ie

119910 = 119891 (119909 120590 119888) = 119890(119909minus119888)221205902 (17)

and the fuzzy member function is shown in Figure 5The FQLA is almost the same as QLA

Step 1 Initialize the action-value function 119876(119904 119886)Step 2 Adopt a strategy 120587 with any action sequence Whenpassing by each balise record the sequence of state action

6 Journal of Advanced Transportation

6420 8 10minus4minus6minus8 minus2minus10

x

0

02

04

06

08

1

y

Figure 5 Reward as Gauss type membership function

and immediate reward value using (17) After each episodecalculate the 119876(119904 119886) according to (14)Step 3 If the current strategy and the previous selectedstrategy are different then update the strategy using theaction selection mechanism Select an action with largerprobability that can make 119876(119904 119886) become the largest onelike (15) On the contrary QL will select other actions witha smaller probability

Step 4 Repeat the steps mentioned above when 119876(119904 119886) 997888rarr119876lowast(119904 119886) we can achieve best strategy 120587 997888rarr 120587lowast34The Performance Indices To compare the performance ofthree algorithms we need to define the parking performanceindices We have the following evaluation performanceindices(1) The convergence number (119873119888) is t he number oflearning trails when the parking errors tend to be stable(2) Average errors under different speed (119864119889119904) in orderto test the merits of the parking algorithm we need to havemany simulation trials Our simulations have produced agreat deal of parking information Among them the parkingerror is one of the most important indices in the parkinginformation because it is the most direct indicator of theparking quality With n different initial speeds algorithm canget n parking error after each episode The 119864119889119904 is the absoluteaverage of n parking error as shown in the following formula

119864119889119904 = sum119899119894=1 10038161003816100381610038161205831198941003816100381610038161003816119899 (18)

where 120583119894 is parking error at different initial speeds(3) Average error (119864) 119864 is the average of 119864119889119904 as shown in

119864 = sum119898119894=1 119864119889119904119894119898 (19)

(4)The maximum error of the parking error (120583119898119886119909) themaximum error of the parking error here is an importantindicator of the performance indices 120583119898119886119909 is the maximumparking error among all the parking errors If the 120583119898119886119909 is toobig it will seriously affect the parking safety of the train wehave

120583119898119886119909 = max (1205831 1205832 120583119899) (20)

(5)The minimum of the average error (119864119898119894119899) the mini-mum error of the parking error is also an important indicator119864119898119894119899 is the minimum among all the 119864119889119904 We have

119864119898119894119899 = min (1198641198891199041 1198641198891199042 119864119889119904119898) (21)

(6) Root of mean square (120590) the 120590 reflects the parkingerror fluctuations (discrete degree) In the process of parkingalgorithm we need to assess the fluctuation which indicatesthe stability of the algorithm we have

120590 = radicsum119898119894=1 (119864 minus 119864119889119904119894)2119898 minus 1 (22)

(7) Parking probability in the required range (119875119901) 119875119901 is ameasurement of the estimated reliability This is an importantindex to control the automatic parking ability The parkingprobability will be unavailable if the 119875119901 is too small to meetthe requirement of practical use

119875119894 = (119909 minus 119864)120590 (23)

119875119901 = (1 minus 2(1 minus int119875119894minusinfin

1120590radic2120587119890minus1198752119894 2119889119875119894)) times 100 (24)

A large number of studies have given the parking errorof the train within the range [-30cm30cm] and made theparking probability requirement larger than 995 [8 10 13]For example the third line of Dalian rail transit in Chinarequires train parking accuracy to be within [-30cm30cm]and the required reliability is 995 [10] In Beijing Subwaythe allowable parking error is also 30cm with over 998reliability

4 The Experimental Results

To approach the real-world running scenaric according tothe above mathematical model we choose field data betweenthe Jiugong station and theXiaohongmen station of YizhuangLine of Beijing Subway in the train station parking simulationplatform We set that the length of the parking area is 100metersThere are six balises in this parking areaThe distancebetween the balises and the parking spot is 100m 64m 36m16m 4m and 0m respectivelyThe arrangement of the balisesmeets the actual balises

The maximum velocity (ie speed limit) for the train toenter into the area is 1428ms While there exists time delayin the train braking system we select 115ms as themaximuminitial speed of a train based on actual operating data Inthe algorithms we did not consider the parking error underone single speed By using single speed we can not fullytest the reliability and adaptability of the algorithm So weset that the initial speed varies from 9ms to 115ms andwe pick 10 different initial speeds in every 025ms speedinterval The purpose is to consider the influence under thevariable initial speed The threshold of the SOSA is [-5cm5cm] We use Gauss membership function in FQLA to

Journal of Advanced Transportation 7

5 10 15 20 25 30 35 40 45 500Trail times

0

002

004

006

008

01

012

014

016

018

02

Park

ing

erro

rs (m

)

SOSQLFQL

Figure 6 50 trials of three algorithms about 10 different initial speeds

5 10 15 20 25 30 35 40 45 500Trail times

0

002

004

006

008

01

012

014

016

018

02

Park

ing

erro

rs (m

)

SOSQLFQL

Figure 7 50 trials of three algorithms about 100 different initial speeds

substitute the current reward 119877(119904119894) in QLA After debuggingand experimenting we pick parameters 120590=01 119888=0 Figure 6presents the errors 119864119889119904 of 10 different initial speeds during 50trials As can be seen in Figure 6 the convergence iterations119873119888 are different by the three algorithms The parking error119864119889119904 is becoming smaller and smaller after times of trail andthe results also demonstrate that most of the parking errorscan be smaller than plusmn30cm in the end With respect to thecomputational efforts we note that these three algorithmsare very close because they are all based on the theory ofreinforcement learning Since each trial includes 10 times oftrain station parking process with different initial speed ittakes about 15 seconds to complete each trial and the totaltraining time is about 750 seconds

We also conduct more trials to test the variation of 119873119888when there are 100 initial speedsWhen the numbers of initial

speed reach 100 we obtain the variation of train parkingerrors as shown in Figure 7

The 119873119888 we get is larger than the 119873119888 received under 10initial speeds We can conclude from Figures 6 and 7 that themore initial speeds are set the slower convergence and largerconvergence number it will be

The performance comparison of these three algorithmcan be seen in Tables 1 and 2 It is clear that QLA andFQLA have smaller 119873119888 than SOSA This indicates that theconvergence is much faster in QLA and FQLA especiallywhen the number of initial speeds is increased

As can be seen from the figures and tables the developedthree algorithms work well on the field data between twostations in Beijing Subway which can improve most of theperformance indices especially in decreasing 119864 QLA andFQLA have smaller 119864 than SOSA The 119873119888 rate of FQLA is

8 Journal of Advanced Transportation

95 10 105 11 1159Initial speeds V0 (ms)

minus025minus02

minus015minus01

minus0050

00501

01502

025

Park

ing

erro

rs (m

)

FQLSOSQL

Figure 8 Each parking error after 50 trials about different initial speeds

Table 1 Performance indices of three algorithms with 10 initialspeeds

Indicators 119873119862 119864(119888119898) 119864119898119894119899(119888119898) 120590(119888119898) 119875119901SOSA 39 427 199 334 9999QLA 20 424 186 432 9999FQLA 18 399 183 395 9999

Table 2 Performance indices of three algorithms with 100 initialspeeds

Indicators 119873119862 119864(119888119898) 119864119898119894119899(119888119898) 120590(119888119898) 119875119901SOSA 49 418 158 367 9999QLA 24 387 140 414 9999FQLA 21 346 131 365 9999

about 19 times and it has the fastest convergence among thethree algorithms FQLA has the smallest 119864 and 119864119898119894119899 QLAachieves the largest 120590When it comes to119875119901 the119875119901 of the threealgorithms is almost the same and they are all larger than995 We can obtain the following conclusions (1) thesethree algorithms can adapt to the field data which have goodperformance indices and gradually reduce the parking error(2) in general FQLA ranks the first and SOSA ranks the thirdSOSA has the worst119873119888 119864 and 119864119898119894119899 (3) three algorithms canall meet the requirements of accurate parking and are easy tobe implemented However the intelligence exploration andself-learning of SOSA are much worse than those of QLA andFQLA

In addition we also test the performance of algorithmswith different initial speed after 50 times of trails Figure 8presents the parking errors under each different initial speed

In Figure 8 we see that 119864119889119904 is within the range of plusmn30cmand the errors of QLA and FQLA are closer to 0cm than thatof SOSAThe results also show us that the error fluctuation of

FQLA performs better than SOSA and QLA We can almostcome to similar conclusions from Tables 1 and 2 FQLAperforms the best among the three algorithms indicating thatadding fuzzy functions into Q-learning can further improveits performance for TSP In addition it is worth to mentionthat [13] proposed several online learning algorithms for trainstation parking problem The train parking errors by theseonline learning algorithms are about 5cm to 6cm We seefrom the above results that our developed algorithms canfurther reduce the parking errors compared with existingstudies

5 Conclusions

To address the TSP issue in urban rail transit this paperdeveloped one stochastic algorithm and two machine learn-ing algorithms based on theory of reinforcement learningFurthermore we propose five braking rates as the actionvector of the three algorithms within the scope of RL Perfor-mance indices were defined to evaluate the performance andreliability of the issue We evaluated the proposed algorithmsin the simulation platform with field data collected in BeijingSubway Our results show that these three algorithms canachieve good results that guarantee the train station parkingerror within 30 cm In addition fuzzy Q-learning algorithmperforms the best among these algorithms due to its fastand stable convergence In practical applications the railmanagers could use the simulation platform to optimize thevalue function with the aid of our developed algorithms andthen use the well-trained Q functions to generate the optimaltrain parking actions in real time to reduce the parking errorsof trains

In the future we will explore new algorithms and othermethods based on soft computing and machine learning tofurther increase the accuracy and reliability for the train park-ing In particular we will combine reinforcement learning

Journal of Advanced Transportation 9

with deep learning algorithms to further enhance the flexi-bility and accuracy for TSP Meanwhile there still exist manypractical issues which need to be taken into consideration inthis paper For example we only consider the TSP problemin urban rail transit while the train braking system is muchmore complex in high-speed railways Compared with urbanrail transit networks inwhich each line is a relatively enclosedsystem that is independent of each other and each trainmoveslike a ldquoshuttlerdquo on the fixed tracks the high-speed railwaynetworks are relatively open systems with heterogeneoustrains The trains usually have different circulation plansand each train may travel among different railway linesaccording to its circulation planTherefore the design of TSPalgorithms for these open systems would be definitely muchmore complex than that of urban transit systems in order tobe flexible with various external circumstances We will fulfillthese issues in our future research

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

Theauthors claim that there are no conflicts of interest in thispaper

Acknowledgments

This research was supported by the National Natural ScienceFoundation of China through Grants 61790570 61790573and U1734210 by the Key Laboratory of Intelligent Metro ofUniversities in Fujian through Grant 2018IMUF001 by theBeijing Jiaotong University Education Foundation throughGrant 9907006519 and by the China Postdoctoral ScienceFoundation Funded Project through Grant 2019M650474

References

[1] J Yin T Tang L Yang J Xun Y Huang and Z Gao ldquoResearchand development of automatic train operation for railwaytransportation systems a surveyrdquo Transportation Research PartC Emerging Technologies vol 85 pp 548ndash572 2017

[2] E Coscetta and A Carteni ldquoThe hedonic value of railwaysterminals a quantitative analysis of the impact of stationsquality on travelers behaviorrdquo Trasportation Research Part Avol 61 pp 41ndash52 2014

[3] E Ben-Elia and D Ettema ldquoRewarding rush-hour avoidance astudy of commutersrsquo travel behaviorrdquo Transportation ResearchPart A Policy and Practice vol 45 no 7 pp 567ndash582 2011

[4] J Yin L Yang X Zhou T Tang and Z Gao ldquoBalancing aone-way corridor capacity and safety-oriented reliability astochastic optimization approach for metro train timetablingrdquoNaval Research Logistics (NRL) vol 66 no 4 pp 297ndash320 2019

[5] K Yoshimoto K Kataoka andKKomaya ldquoA feasibility study oftrain automatic stop control using range sensorsrdquo in Proceedingsof the 2001 IEEE Intelligent Transportation Systems pp 802ndash807USA August 2001

[6] S Yasunobu S Miyamoto and H Ihara ldquoA fuzzy control fortrain automatic stop controlrdquo Transactions of the Society ofInstrument and Control Engineers vol 2 no 1 pp 1ndash8 2002

[7] S Yasunobu and S Miyamoto ldquoAutomatic train operationsystem by predictive fuzzy controlrdquo Industrial Applications ofFuzzy Control vol 1 no 18 pp 1ndash18 1985

[8] Z Hou Y Wang C Yin and T Tang ldquoTerminal iterative learn-ing control based station stop control of a trainrdquo InternationalJournal of Control vol 84 no 7 pp 1263ndash1274 2011

[9] J Zhou and D Chen ldquoApplications of machine learning meth-ods in problem of precise train stoppingrdquoComputer Engineeringand applications vol 46 no 25 pp 226ndash230 2010

[10] D Chen and C Gao ldquoSoft computing methods applied to trainstation parking in urban rail transitrdquo Applied Soft Computingvol 12 no 2 pp 759ndash767 2012

[11] GHeResearch on Train Precision StopControl AlgorithmBasedon LQR Beijing Jiaotong University Beijing China 2009

[12] R Chen and D Chen ldquoHeuristic self-learning algorithms andsimulation of train station stoprdquo Railway Computer Applicationvol 22 no 6 pp 9ndash13 2013

[13] D Chen R Chen Y Li and T Tang ldquoOnline learning algo-rithms for train automatic stop control using precise locationdata of balisesrdquo IEEE Transactions on Intelligent TransportationSystems vol 14 no 3 pp 1526ndash1535 2013

[14] J YinD Chen andL Li ldquoIntelligent train operation algorithmsfor subway by expert system and reinforcement learningrdquo IEEETransactions on Intelligent Transportation Systems vol 15 no 6pp 2561ndash2571 2014

[15] J Liu ldquoAnalysis of station dwelling error and adjustmentsolution for operation after screen doorrsquos installationrdquo UrbanMass Transit vol 1 pp 44ndash47 2009

[16] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[17] J Yin D Chen T Tang L Zhu and W Zhu ldquoBalise arrange-ment optimization for train station parking via expert knowl-edge and genetic algorithmrdquo Applied Mathematical Modellingvol 40 no 19-20 pp 8513ndash8529 2016

[18] J Yin T Tang L Yang Z Gao and B Ran ldquoEnergy-efficientmetro train reschedulingwith uncertain time-variant passengerdemands an approximate dynamic programming approachrdquoTransportation Research Part B Methodological vol 91 pp 178ndash210 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: Developing Train Station Parking Algorithms: New Frameworks …downloads.hindawi.com/journals/jat/2019/3072495.pdf · 2019. 8. 4. · New Frameworks Based on Fuzzy Reinforcement Learning

2 Journal of Advanced Transportation

Segment2

Segment1

SegmentN-1

1Input 1M1Output u1

3132

333-1

(a) Train passes balise B1

2Segment

SegmentN-1

31

3233

3-1

2Input 2M2

Output u2

(b) Train passes balise B2

SegmentN-1

31

3233

3-1

3

Input 3M3Output u3

(c) Train passes balise B3

SegmentN-1

31

3233

3-1

-1Input Nminus1MNminus1

Output uNminus1

(d) Train passes balise BN-1

Figure 1 Parking area and balises distribution [17]

applied to train parking in urban rail transit by Chen andGao and three models were presented involving a linearmodel a generalized regression neural network model andan adaptive network based fuzzy inference (ANFIS) model[10] The ANFIS can keep the parking errors within 30cmwith the probability being greater than 999 in simulationHowever the simulation experiments are conducted underideal environment and nodisturbances are taken into accountin this study

Based on the braking characteristics of trains in urbanrail transit two simplified models were developed to estimatethe TSP error in [11] Then a precise stop control algorithmbased on LQR (linear-quadratic-regulation) was proposedcombining the vehicles dynamics model and system iden-tification theory which obtained the train station parkingerrors within 50cm Chen et al tried to address the trainparking problem using a new machine learning techniqueand proposed a novel online learning control strategy withthe help of the precise location data of balises installed instations [12ndash14] Experiments were conducted to test thealgorithm performance under the variation of different trainparameters and the best algorithm achieved an average park-ing error of 38cm Even though a lot of studies are proposedin the existing literature we note that the train station parkingis a very complex problem due to many influencing externalfactors for example the weather train running states or on-board passengers while these uncertain factors have not beenexplicitly tackled in former literature [15]

The theory of reinforcement learning (RL) provides anormative account that describes how agents may optimizetheir control under uncertain environment [16] As a typicalalgorithm in RL field Q-learning uses action-value functionto approximate the optimal control strategies of an agentand has been widely successfully implemented into manycomplex control problems In particular Q-learning utilizes

the information through the interactions between an agentand the environment and gradually learns the optimal controlstrategies for a multistage decision problem And this processis actually very similar with TSP where a train can be termedas an agent Therefore this paper aims to introduce the RLtheory and Q-learning algorithms to solve the TSP problemin urban rail transit systems

Specifically in this paper we first establish a simulationtraining platform based on the real-world data involvingthe distance speed limit gradient and position of balisesbetween the Jiugong station and the Xiaohongmen stationof Yizhuang Line in Beijing Subway Three algorithms wereproposedmdasha stochastic optimal selection algorithm (SOSA)a Q-learning algorithm (QLA) and a fuzzy function Q-learning algorithm (FQLA) Performances of three algo-rithms were analyzed and the experimental results werecompared The rest of the paper is organized as follows InSection 2 TSP problem is described including five differentbraking rates and the TSP simulation platform is introducedIn Section 3 three algorithms for reducing and estimatingthe parking errors are developed based on the reinforcementlearning (RL) theory Seven statistical indices are defined tocalculate and evaluate the accuracy and reliability of thesethree algorithms In Section 4 experimental results of thesethree algorithms are compared and analyzed in detail usingthe field data in simulation platform Finally conclusions andfuture research are outlined in Section 5

2 Problem Description andEnvironment Construction of RL

21 Train Station Parking Problem As shown in Figure 1several balises are installed on the tracks in rail stations andthe train can correct its position and velocity once it goesthrough each balise Typically the train enters the station

Journal of Advanced Transportation 3

Inputmodule

Algorithmmodule

Brakingmodule

Displaymodule

Simulateresistance

Randomdisturbance

Basic resistance abc

White noise

V0 D etc target braking rate

E

Figure 2 Structure of the parking simulation platform

area and starts to brake at the initial spot (first balise S1)which is 119871 meters away from the stopping spot (last baliseS119873) This area between S1 and S119873 is also called parkingarea There are 119873 balises distributed in this area (includingS1 sdot sdot sdot S119873) according to the actual distribution of the urban railtransit Meanwhile the distribution of the balises satisfies thefundamental requirement that the balises near the stoppingspot are dense and the balises far from the stopping spot aresparse

There are different braking rates in each balise Every timethe train reaches a balise we will get 5 different train brakingrates through the following calculation also called 5-elementvector A They are the following(1) Nonlearning strategy (NLs) braking rate we cancalculate the theoretical braking rate 119886119905119894 by using the kine-matic formula This strategy does not consider the influenceof interference on parking The target braking rate 119886119892119894 wewant equals the theoretical braking rate in this nonlearningstrategy

1198812119879 minus 1198812119894 = 2119886119905119894119878119894 119881119879=0997888997888997888997888rarr 119886119905119894 = minus11988121198942119878119894 (1)

119886119892119894 = 119886119905119894 (2)

where119881119879 is the train speed at the last balise 119881119894 represents thetrain speed at the current balise 119878119894 is the distance between thecurrent balise and the parking spot(2) Fixed learning strategy (FLs) braking rate based onNLs we add a fixed learning rate 120578 and Δ119886119894 which is thedeviation of actual braking rate with theoretical braking rate

1198812119894+1 minus 1198812119894 = 2119886119903119894119863119894 119881119894+1 =0997888997888997888997888997888rarr 119886119903119894 = (1198812119894+1 minus 1198812119894 )2119863119894 (3)

Δ119886119894 = 119886119903119894 minus 119886119905119894 (4)

119886119892119894 = 119886119905119894 minus 120578Δ119886119894 (5)

The calculation method of 119886119905119894 is the same as NLs where 119881119894represents the train speed at the current balise119881119894+1 representsthe train speed at the next balise 119863119894 is the distance betweenthe current balise and the next balise(3) Variable learning strategy (VLs) braking rate usevariable learning rate which takes into account the train

speed passing through the balises and arrangement of thebalises instead of fixed learning rate

120578119894 = 120582119881119894119878119894 (6)

119886119892119894 = 119886119905119894 minus 120578119894Δ119886119894 (7)

Thedefinition of119881119894 and 119878119894 is the same as inNLsThedefinitionofΔ119886119894 is equivalent toΔ119886119894 of FLs where 120582 is a constant namedfixed adjustment coefficient(4) Gradient descent learning strategy (GDLs) brakingrate objective function of GDLs is as follows

119869 = 05 (119878119894 minus 119878119890119894 )2 (8)

where 119878119894 represents the actual remaining distance 119878119890119894 is theestimated remaining distance We have

119886119892119894 = 119886119905119894 minus 120578119866119894 120597119869120597119886119905119894 (9)

where 120578119866119894 represents the gradient descent learning rate(5) Newton descent learning strategy (NDLs) brakingrate objective function of NDLs is the same as GDLs andwe get

119886119892119894 = 119886119905119894 minus 120578119873119894 1205971198691205971198861199051198941205972119869120597 (119886119905119894 )2 (10)

where 120578119873119894 represents the Newton descent learning rateIn practice we can choose one of the above control

strategies once a train goes through a balise to adjust thetrain braking rate in order to stop the train precisely atthe last balise S119873 under uncertain environment To modelthe uncertain environment we next construct a simulationplatform to simulate the train parking process

22 Parking Simulation Platform Theparking system is usedto simulate the actual train parking process in rail stationsThe system mainly includes an input module algorithmmodule braking module and display module shown asFigure 2

Parameters are set up in the input module according tothe field data The input parameters include initial speed 1198810

4 Journal of Advanced Transportation

TSP system

Parking algorithm

Current reward R(MC)

Action vector A(NLs FLs

VLs GDLsNDLs)

StateS1 S2S3 S4

S5

Figure 3 Diagram of state reward and action vector

when the train starts braking the length of the parking area 119871the number of balises 119899 the distance between each balise 119863resistance effect white noise type of algorithm disturbanceand number of iterations These parameters provide the ini-tialized circumstance and related configuration informationWe can change the simulation circumstance by changing theparameters above

Themain function of the algorithm module is to generatethe target braking rate 119886 Algorithm module can calculatethe 5 different train braking rates mentioned above and thenselect the optimal or the most appropriate target braking rateaccording to the algorithms

Simulation resistance and random disturbance representthe actual urban rail transit systemThe simulation resistancegenerates the current train resistance based on the currentspeed using the formula related to the speed of the train inthe form of a quadratic equation called basic resistance unitequation

1198820 = 120572V2 + 120573V + 120593 (11)

where1198820 represents the basic resistance unit (Nt) V is speedof train (kmh) 120593 represents the part of rolling resistanceunrelated to the train speed 120573 is parameter that is used todescribe wheel flange friction resistance and 120572 is associatedwith the square of speed The white noise generator cangenerate random white noise to test the anti-interferenceability of the algorithms

The braking module is used to park the train accordingto the target braking rate 119886 The transfer function of thebraking module includes time delay time constant andtrain operating mode transition The transfer function of thebraking module is

119866119886 (119904) = 1198861 + 119879119901119904 119890minus119879119889119904 (12)

where 119886 is the target braking rate119879119889 represents the timedelayand 119879119901 is the time constant

Finally display module mainly shows the experimentalresults including parking error and target braking rate

3 Algorithm Design and Performance Indices

31 Stochastic Optimal Selection Algorithm (SOSA) In SOSA1199041 to 119904119899 correspond to current 119899 states of algorithm at themoment We assume that the reward function about currentstate is 119877 119877 can transfer each state into the real number119904119894 R997888rarr R Δ represents the distance between the runningtrain and the current nearest balise If the train passes thecurrent nearest balise the value of Δ is negative otherwise itis positive 119877(119904119894) is the relationship between the current stateand the current reward shown as

119877 (119904119894) =

+3 0119888119898 le |Δ| le 10119888119898+2 10119888119898 le |Δ| le 20119888119898+1 20119888119898 le |Δ| le 30119888119898minus3 30119888119898 le |Δ|

(119894 = 1 2 119899)(13)

When the distance Δ is less than 10cm the reward function119877(119904119894) is the largest among all the results When the distanceΔ is beyond plusmn30cm the reward function 119877(119904119894) is smallestand negative The reward function immediately indicates thereward of the train in its current state In this algorithm wedo not have to consider the long-term reward of the stateThe bigger the value of the reward function is the better theperformance of this algorithm will be

We consider the train braking rate of the controllermodule as the output action vector A When a train passesthrough a balise in the simulation the system will change thetarget braking rate to ensure an accurate parking Because ofsystem time delay and time constant the actual braking rateof the train will approach the target braking rate gradually

In this paper we adopt five braking rates They are non-learning strategy (NLs) braking rate fixed learning strategy(FLs) braking rate variable learning strategy (VLs) brakingrate gradient descent learning strategy(GDLs) braking rateand Newton descent learning strategy (NDLs) braking rateThe action vector A is a 5-element vector including thebraking rates mentioned above A=[NLs FLs VLs GDLsNDLs] shown as Figure 3 When a train passes through

Journal of Advanced Transportation 5

a balise the SOSA will choose one strategy from vectorA randomly as the current target braking rate After thetrain passes through all balises also called one episode thealgorithm records the parking error and action sequenceThesystem will start the next episode from the initial spot until119877(119904119894) is large enough or bigger than a preset value 120579

The SOSA consist of the following steps

Step 1 For all the states initialize the system parametersassume that the train enters the parking area

Step 2 After passing through each balise calculate the actionvector A and current reward 119877(119904119894) Choose an action from Arandomly as the current target braking rate

Step 3 Store the current reward in the array then repeat Step2 until the train reaches the stopping spot Store the parkingerror and action sequence Go to Step 4

Step 4 If 119877(119904119894) lt 120579 go to Step 1 or else use the current actionsequence as the train braking rate

32 Q-Learning Algorithm In QLA 1199041 to 119904119899 correspond tocurrent 119899 states of algorithm at themomentThe action vectorA=[NLs FLs VLs GDLs NDLs] Each state and action areconnected with the value of 119876(119904 119886) According to the 119876(119904 119886)the algorithm will pick an action from vector A Besides thealgorithm can also update the Q function according to the120576-greedy action selection mechanism [18]119876(119904119894 119886) function is the estimated total reward under thestate 119904119894 and action 119886 We have

119876 (119904 119886) = 119864 [119877 (1199040 119886) + 120574119877 (1199041 119886) + 1205742119877 (1199042 119886) + sdot sdot sdot+ 120574119894119877 (119904119894 119886) | 1199040 = 119904] = 119877 (119904 119886)+ 120574sum1198781015840

119875119904119886 (1199041015840)119876 (1199041015840 119886)(14)

where the discounted factor is 0 le 120574 lt 1 In this paperdiscounted factor 120574 equals 099 119904 represents the current stateand 1199041015840 is the next state when the last state is 119904 119875119904119886 is statetransition distribution probability It is the probability thatstate transfers from 119904119894 to 119904119895 using the action 119886 The trainactually runs along the railway track So no matter whataction is chosen the next state must be 119904119894+1 We decide tolet 119875119904119886(1199041015840)=1 The 119877(119904 119886) is defined the same as the 119877(119904) instochastic optimal selection algorithm

The bigger the119876(119904 119886) is the larger the immediate rewardin each balise will be In order to obtain the maximum119876 thealgorithm needs to update the Q function as shown in

120587 (119904) fl argmax119886sum1199041015840

119875119904119886 (1199041015840)119876 (1199041015840 119886) (15)

The optimal119876 can be expressed as a119876lowast It represents the totalreward according to the above updating strategy we get

119876lowast (119904 119886) = 119877 (119904 119886) + 120574sum1199041015840isin119878

119875119904119886 (1199041015840)max119886119876lowast (1199041015840 119886) (16)

Start

If it is different fromprevious iteration

End

Update Q(sa)

If Q(sa) equals Qlowast(sa)

Adopt a strategy

Calculate Q(sa)

Choose other actions inthe next iteration

Yes

No

No

Yes

Figure 4 Training flowchart of QLA

The overall training processes of QLA are presented inFigure 4 consisting of the following steps

Step 1 Initialize 119876(119904 119886)Step 2 Adopt a strategy 120587 with any action sequence Whenpassing by each balise record the sequence of state actionand immediate reward value using (13) After each episodecalculate the 119876(119904 119886) according to (14)Step 3 If the current strategy and the previous selected strat-egy are different then update the strategy using the actionselection mechanism Select an action with larger probabilitythat can make 119876(119904 119886) become the largest one like (15) Onthe contrary QL will select other actions with a smallerprobability

Step 4 Repeat the steps mentioned above when 119876(119904 119886) 997888rarr119876lowast(119904 119886) we can achieve best strategy 120587 997888rarr 120587lowast33 Fuzzy Q-Learning Algorithm FQLA is basically the sameas QLA It is an improved version of QLAWithout using thediscrete function instead the fuzzy function Q-learning usesa continuous function where the value of immediate rewardis a typical Gauss type membership function ie

119910 = 119891 (119909 120590 119888) = 119890(119909minus119888)221205902 (17)

and the fuzzy member function is shown in Figure 5The FQLA is almost the same as QLA

Step 1 Initialize the action-value function 119876(119904 119886)Step 2 Adopt a strategy 120587 with any action sequence Whenpassing by each balise record the sequence of state action

6 Journal of Advanced Transportation

6420 8 10minus4minus6minus8 minus2minus10

x

0

02

04

06

08

1

y

Figure 5 Reward as Gauss type membership function

and immediate reward value using (17) After each episodecalculate the 119876(119904 119886) according to (14)Step 3 If the current strategy and the previous selectedstrategy are different then update the strategy using theaction selection mechanism Select an action with largerprobability that can make 119876(119904 119886) become the largest onelike (15) On the contrary QL will select other actions witha smaller probability

Step 4 Repeat the steps mentioned above when 119876(119904 119886) 997888rarr119876lowast(119904 119886) we can achieve best strategy 120587 997888rarr 120587lowast34The Performance Indices To compare the performance ofthree algorithms we need to define the parking performanceindices We have the following evaluation performanceindices(1) The convergence number (119873119888) is t he number oflearning trails when the parking errors tend to be stable(2) Average errors under different speed (119864119889119904) in orderto test the merits of the parking algorithm we need to havemany simulation trials Our simulations have produced agreat deal of parking information Among them the parkingerror is one of the most important indices in the parkinginformation because it is the most direct indicator of theparking quality With n different initial speeds algorithm canget n parking error after each episode The 119864119889119904 is the absoluteaverage of n parking error as shown in the following formula

119864119889119904 = sum119899119894=1 10038161003816100381610038161205831198941003816100381610038161003816119899 (18)

where 120583119894 is parking error at different initial speeds(3) Average error (119864) 119864 is the average of 119864119889119904 as shown in

119864 = sum119898119894=1 119864119889119904119894119898 (19)

(4)The maximum error of the parking error (120583119898119886119909) themaximum error of the parking error here is an importantindicator of the performance indices 120583119898119886119909 is the maximumparking error among all the parking errors If the 120583119898119886119909 is toobig it will seriously affect the parking safety of the train wehave

120583119898119886119909 = max (1205831 1205832 120583119899) (20)

(5)The minimum of the average error (119864119898119894119899) the mini-mum error of the parking error is also an important indicator119864119898119894119899 is the minimum among all the 119864119889119904 We have

119864119898119894119899 = min (1198641198891199041 1198641198891199042 119864119889119904119898) (21)

(6) Root of mean square (120590) the 120590 reflects the parkingerror fluctuations (discrete degree) In the process of parkingalgorithm we need to assess the fluctuation which indicatesthe stability of the algorithm we have

120590 = radicsum119898119894=1 (119864 minus 119864119889119904119894)2119898 minus 1 (22)

(7) Parking probability in the required range (119875119901) 119875119901 is ameasurement of the estimated reliability This is an importantindex to control the automatic parking ability The parkingprobability will be unavailable if the 119875119901 is too small to meetthe requirement of practical use

119875119894 = (119909 minus 119864)120590 (23)

119875119901 = (1 minus 2(1 minus int119875119894minusinfin

1120590radic2120587119890minus1198752119894 2119889119875119894)) times 100 (24)

A large number of studies have given the parking errorof the train within the range [-30cm30cm] and made theparking probability requirement larger than 995 [8 10 13]For example the third line of Dalian rail transit in Chinarequires train parking accuracy to be within [-30cm30cm]and the required reliability is 995 [10] In Beijing Subwaythe allowable parking error is also 30cm with over 998reliability

4 The Experimental Results

To approach the real-world running scenaric according tothe above mathematical model we choose field data betweenthe Jiugong station and theXiaohongmen station of YizhuangLine of Beijing Subway in the train station parking simulationplatform We set that the length of the parking area is 100metersThere are six balises in this parking areaThe distancebetween the balises and the parking spot is 100m 64m 36m16m 4m and 0m respectivelyThe arrangement of the balisesmeets the actual balises

The maximum velocity (ie speed limit) for the train toenter into the area is 1428ms While there exists time delayin the train braking system we select 115ms as themaximuminitial speed of a train based on actual operating data Inthe algorithms we did not consider the parking error underone single speed By using single speed we can not fullytest the reliability and adaptability of the algorithm So weset that the initial speed varies from 9ms to 115ms andwe pick 10 different initial speeds in every 025ms speedinterval The purpose is to consider the influence under thevariable initial speed The threshold of the SOSA is [-5cm5cm] We use Gauss membership function in FQLA to

Journal of Advanced Transportation 7

5 10 15 20 25 30 35 40 45 500Trail times

0

002

004

006

008

01

012

014

016

018

02

Park

ing

erro

rs (m

)

SOSQLFQL

Figure 6 50 trials of three algorithms about 10 different initial speeds

5 10 15 20 25 30 35 40 45 500Trail times

0

002

004

006

008

01

012

014

016

018

02

Park

ing

erro

rs (m

)

SOSQLFQL

Figure 7 50 trials of three algorithms about 100 different initial speeds

substitute the current reward 119877(119904119894) in QLA After debuggingand experimenting we pick parameters 120590=01 119888=0 Figure 6presents the errors 119864119889119904 of 10 different initial speeds during 50trials As can be seen in Figure 6 the convergence iterations119873119888 are different by the three algorithms The parking error119864119889119904 is becoming smaller and smaller after times of trail andthe results also demonstrate that most of the parking errorscan be smaller than plusmn30cm in the end With respect to thecomputational efforts we note that these three algorithmsare very close because they are all based on the theory ofreinforcement learning Since each trial includes 10 times oftrain station parking process with different initial speed ittakes about 15 seconds to complete each trial and the totaltraining time is about 750 seconds

We also conduct more trials to test the variation of 119873119888when there are 100 initial speedsWhen the numbers of initial

speed reach 100 we obtain the variation of train parkingerrors as shown in Figure 7

The 119873119888 we get is larger than the 119873119888 received under 10initial speeds We can conclude from Figures 6 and 7 that themore initial speeds are set the slower convergence and largerconvergence number it will be

The performance comparison of these three algorithmcan be seen in Tables 1 and 2 It is clear that QLA andFQLA have smaller 119873119888 than SOSA This indicates that theconvergence is much faster in QLA and FQLA especiallywhen the number of initial speeds is increased

As can be seen from the figures and tables the developedthree algorithms work well on the field data between twostations in Beijing Subway which can improve most of theperformance indices especially in decreasing 119864 QLA andFQLA have smaller 119864 than SOSA The 119873119888 rate of FQLA is

8 Journal of Advanced Transportation

95 10 105 11 1159Initial speeds V0 (ms)

minus025minus02

minus015minus01

minus0050

00501

01502

025

Park

ing

erro

rs (m

)

FQLSOSQL

Figure 8 Each parking error after 50 trials about different initial speeds

Table 1 Performance indices of three algorithms with 10 initialspeeds

Indicators 119873119862 119864(119888119898) 119864119898119894119899(119888119898) 120590(119888119898) 119875119901SOSA 39 427 199 334 9999QLA 20 424 186 432 9999FQLA 18 399 183 395 9999

Table 2 Performance indices of three algorithms with 100 initialspeeds

Indicators 119873119862 119864(119888119898) 119864119898119894119899(119888119898) 120590(119888119898) 119875119901SOSA 49 418 158 367 9999QLA 24 387 140 414 9999FQLA 21 346 131 365 9999

about 19 times and it has the fastest convergence among thethree algorithms FQLA has the smallest 119864 and 119864119898119894119899 QLAachieves the largest 120590When it comes to119875119901 the119875119901 of the threealgorithms is almost the same and they are all larger than995 We can obtain the following conclusions (1) thesethree algorithms can adapt to the field data which have goodperformance indices and gradually reduce the parking error(2) in general FQLA ranks the first and SOSA ranks the thirdSOSA has the worst119873119888 119864 and 119864119898119894119899 (3) three algorithms canall meet the requirements of accurate parking and are easy tobe implemented However the intelligence exploration andself-learning of SOSA are much worse than those of QLA andFQLA

In addition we also test the performance of algorithmswith different initial speed after 50 times of trails Figure 8presents the parking errors under each different initial speed

In Figure 8 we see that 119864119889119904 is within the range of plusmn30cmand the errors of QLA and FQLA are closer to 0cm than thatof SOSAThe results also show us that the error fluctuation of

FQLA performs better than SOSA and QLA We can almostcome to similar conclusions from Tables 1 and 2 FQLAperforms the best among the three algorithms indicating thatadding fuzzy functions into Q-learning can further improveits performance for TSP In addition it is worth to mentionthat [13] proposed several online learning algorithms for trainstation parking problem The train parking errors by theseonline learning algorithms are about 5cm to 6cm We seefrom the above results that our developed algorithms canfurther reduce the parking errors compared with existingstudies

5 Conclusions

To address the TSP issue in urban rail transit this paperdeveloped one stochastic algorithm and two machine learn-ing algorithms based on theory of reinforcement learningFurthermore we propose five braking rates as the actionvector of the three algorithms within the scope of RL Perfor-mance indices were defined to evaluate the performance andreliability of the issue We evaluated the proposed algorithmsin the simulation platform with field data collected in BeijingSubway Our results show that these three algorithms canachieve good results that guarantee the train station parkingerror within 30 cm In addition fuzzy Q-learning algorithmperforms the best among these algorithms due to its fastand stable convergence In practical applications the railmanagers could use the simulation platform to optimize thevalue function with the aid of our developed algorithms andthen use the well-trained Q functions to generate the optimaltrain parking actions in real time to reduce the parking errorsof trains

In the future we will explore new algorithms and othermethods based on soft computing and machine learning tofurther increase the accuracy and reliability for the train park-ing In particular we will combine reinforcement learning

Journal of Advanced Transportation 9

with deep learning algorithms to further enhance the flexi-bility and accuracy for TSP Meanwhile there still exist manypractical issues which need to be taken into consideration inthis paper For example we only consider the TSP problemin urban rail transit while the train braking system is muchmore complex in high-speed railways Compared with urbanrail transit networks inwhich each line is a relatively enclosedsystem that is independent of each other and each trainmoveslike a ldquoshuttlerdquo on the fixed tracks the high-speed railwaynetworks are relatively open systems with heterogeneoustrains The trains usually have different circulation plansand each train may travel among different railway linesaccording to its circulation planTherefore the design of TSPalgorithms for these open systems would be definitely muchmore complex than that of urban transit systems in order tobe flexible with various external circumstances We will fulfillthese issues in our future research

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

Theauthors claim that there are no conflicts of interest in thispaper

Acknowledgments

This research was supported by the National Natural ScienceFoundation of China through Grants 61790570 61790573and U1734210 by the Key Laboratory of Intelligent Metro ofUniversities in Fujian through Grant 2018IMUF001 by theBeijing Jiaotong University Education Foundation throughGrant 9907006519 and by the China Postdoctoral ScienceFoundation Funded Project through Grant 2019M650474

References

[1] J Yin T Tang L Yang J Xun Y Huang and Z Gao ldquoResearchand development of automatic train operation for railwaytransportation systems a surveyrdquo Transportation Research PartC Emerging Technologies vol 85 pp 548ndash572 2017

[2] E Coscetta and A Carteni ldquoThe hedonic value of railwaysterminals a quantitative analysis of the impact of stationsquality on travelers behaviorrdquo Trasportation Research Part Avol 61 pp 41ndash52 2014

[3] E Ben-Elia and D Ettema ldquoRewarding rush-hour avoidance astudy of commutersrsquo travel behaviorrdquo Transportation ResearchPart A Policy and Practice vol 45 no 7 pp 567ndash582 2011

[4] J Yin L Yang X Zhou T Tang and Z Gao ldquoBalancing aone-way corridor capacity and safety-oriented reliability astochastic optimization approach for metro train timetablingrdquoNaval Research Logistics (NRL) vol 66 no 4 pp 297ndash320 2019

[5] K Yoshimoto K Kataoka andKKomaya ldquoA feasibility study oftrain automatic stop control using range sensorsrdquo in Proceedingsof the 2001 IEEE Intelligent Transportation Systems pp 802ndash807USA August 2001

[6] S Yasunobu S Miyamoto and H Ihara ldquoA fuzzy control fortrain automatic stop controlrdquo Transactions of the Society ofInstrument and Control Engineers vol 2 no 1 pp 1ndash8 2002

[7] S Yasunobu and S Miyamoto ldquoAutomatic train operationsystem by predictive fuzzy controlrdquo Industrial Applications ofFuzzy Control vol 1 no 18 pp 1ndash18 1985

[8] Z Hou Y Wang C Yin and T Tang ldquoTerminal iterative learn-ing control based station stop control of a trainrdquo InternationalJournal of Control vol 84 no 7 pp 1263ndash1274 2011

[9] J Zhou and D Chen ldquoApplications of machine learning meth-ods in problem of precise train stoppingrdquoComputer Engineeringand applications vol 46 no 25 pp 226ndash230 2010

[10] D Chen and C Gao ldquoSoft computing methods applied to trainstation parking in urban rail transitrdquo Applied Soft Computingvol 12 no 2 pp 759ndash767 2012

[11] GHeResearch on Train Precision StopControl AlgorithmBasedon LQR Beijing Jiaotong University Beijing China 2009

[12] R Chen and D Chen ldquoHeuristic self-learning algorithms andsimulation of train station stoprdquo Railway Computer Applicationvol 22 no 6 pp 9ndash13 2013

[13] D Chen R Chen Y Li and T Tang ldquoOnline learning algo-rithms for train automatic stop control using precise locationdata of balisesrdquo IEEE Transactions on Intelligent TransportationSystems vol 14 no 3 pp 1526ndash1535 2013

[14] J YinD Chen andL Li ldquoIntelligent train operation algorithmsfor subway by expert system and reinforcement learningrdquo IEEETransactions on Intelligent Transportation Systems vol 15 no 6pp 2561ndash2571 2014

[15] J Liu ldquoAnalysis of station dwelling error and adjustmentsolution for operation after screen doorrsquos installationrdquo UrbanMass Transit vol 1 pp 44ndash47 2009

[16] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[17] J Yin D Chen T Tang L Zhu and W Zhu ldquoBalise arrange-ment optimization for train station parking via expert knowl-edge and genetic algorithmrdquo Applied Mathematical Modellingvol 40 no 19-20 pp 8513ndash8529 2016

[18] J Yin T Tang L Yang Z Gao and B Ran ldquoEnergy-efficientmetro train reschedulingwith uncertain time-variant passengerdemands an approximate dynamic programming approachrdquoTransportation Research Part B Methodological vol 91 pp 178ndash210 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: Developing Train Station Parking Algorithms: New Frameworks …downloads.hindawi.com/journals/jat/2019/3072495.pdf · 2019. 8. 4. · New Frameworks Based on Fuzzy Reinforcement Learning

Journal of Advanced Transportation 3

Inputmodule

Algorithmmodule

Brakingmodule

Displaymodule

Simulateresistance

Randomdisturbance

Basic resistance abc

White noise

V0 D etc target braking rate

E

Figure 2 Structure of the parking simulation platform

area and starts to brake at the initial spot (first balise S1)which is 119871 meters away from the stopping spot (last baliseS119873) This area between S1 and S119873 is also called parkingarea There are 119873 balises distributed in this area (includingS1 sdot sdot sdot S119873) according to the actual distribution of the urban railtransit Meanwhile the distribution of the balises satisfies thefundamental requirement that the balises near the stoppingspot are dense and the balises far from the stopping spot aresparse

There are different braking rates in each balise Every timethe train reaches a balise we will get 5 different train brakingrates through the following calculation also called 5-elementvector A They are the following(1) Nonlearning strategy (NLs) braking rate we cancalculate the theoretical braking rate 119886119905119894 by using the kine-matic formula This strategy does not consider the influenceof interference on parking The target braking rate 119886119892119894 wewant equals the theoretical braking rate in this nonlearningstrategy

1198812119879 minus 1198812119894 = 2119886119905119894119878119894 119881119879=0997888997888997888997888rarr 119886119905119894 = minus11988121198942119878119894 (1)

119886119892119894 = 119886119905119894 (2)

where119881119879 is the train speed at the last balise 119881119894 represents thetrain speed at the current balise 119878119894 is the distance between thecurrent balise and the parking spot(2) Fixed learning strategy (FLs) braking rate based onNLs we add a fixed learning rate 120578 and Δ119886119894 which is thedeviation of actual braking rate with theoretical braking rate

1198812119894+1 minus 1198812119894 = 2119886119903119894119863119894 119881119894+1 =0997888997888997888997888997888rarr 119886119903119894 = (1198812119894+1 minus 1198812119894 )2119863119894 (3)

Δ119886119894 = 119886119903119894 minus 119886119905119894 (4)

119886119892119894 = 119886119905119894 minus 120578Δ119886119894 (5)

The calculation method of 119886119905119894 is the same as NLs where 119881119894represents the train speed at the current balise119881119894+1 representsthe train speed at the next balise 119863119894 is the distance betweenthe current balise and the next balise(3) Variable learning strategy (VLs) braking rate usevariable learning rate which takes into account the train

speed passing through the balises and arrangement of thebalises instead of fixed learning rate

120578119894 = 120582119881119894119878119894 (6)

119886119892119894 = 119886119905119894 minus 120578119894Δ119886119894 (7)

Thedefinition of119881119894 and 119878119894 is the same as inNLsThedefinitionofΔ119886119894 is equivalent toΔ119886119894 of FLs where 120582 is a constant namedfixed adjustment coefficient(4) Gradient descent learning strategy (GDLs) brakingrate objective function of GDLs is as follows

119869 = 05 (119878119894 minus 119878119890119894 )2 (8)

where 119878119894 represents the actual remaining distance 119878119890119894 is theestimated remaining distance We have

119886119892119894 = 119886119905119894 minus 120578119866119894 120597119869120597119886119905119894 (9)

where 120578119866119894 represents the gradient descent learning rate(5) Newton descent learning strategy (NDLs) brakingrate objective function of NDLs is the same as GDLs andwe get

119886119892119894 = 119886119905119894 minus 120578119873119894 1205971198691205971198861199051198941205972119869120597 (119886119905119894 )2 (10)

where 120578119873119894 represents the Newton descent learning rateIn practice we can choose one of the above control

strategies once a train goes through a balise to adjust thetrain braking rate in order to stop the train precisely atthe last balise S119873 under uncertain environment To modelthe uncertain environment we next construct a simulationplatform to simulate the train parking process

22 Parking Simulation Platform Theparking system is usedto simulate the actual train parking process in rail stationsThe system mainly includes an input module algorithmmodule braking module and display module shown asFigure 2

Parameters are set up in the input module according tothe field data The input parameters include initial speed 1198810

4 Journal of Advanced Transportation

TSP system

Parking algorithm

Current reward R(MC)

Action vector A(NLs FLs

VLs GDLsNDLs)

StateS1 S2S3 S4

S5

Figure 3 Diagram of state reward and action vector

when the train starts braking the length of the parking area 119871the number of balises 119899 the distance between each balise 119863resistance effect white noise type of algorithm disturbanceand number of iterations These parameters provide the ini-tialized circumstance and related configuration informationWe can change the simulation circumstance by changing theparameters above

Themain function of the algorithm module is to generatethe target braking rate 119886 Algorithm module can calculatethe 5 different train braking rates mentioned above and thenselect the optimal or the most appropriate target braking rateaccording to the algorithms

Simulation resistance and random disturbance representthe actual urban rail transit systemThe simulation resistancegenerates the current train resistance based on the currentspeed using the formula related to the speed of the train inthe form of a quadratic equation called basic resistance unitequation

1198820 = 120572V2 + 120573V + 120593 (11)

where1198820 represents the basic resistance unit (Nt) V is speedof train (kmh) 120593 represents the part of rolling resistanceunrelated to the train speed 120573 is parameter that is used todescribe wheel flange friction resistance and 120572 is associatedwith the square of speed The white noise generator cangenerate random white noise to test the anti-interferenceability of the algorithms

The braking module is used to park the train accordingto the target braking rate 119886 The transfer function of thebraking module includes time delay time constant andtrain operating mode transition The transfer function of thebraking module is

119866119886 (119904) = 1198861 + 119879119901119904 119890minus119879119889119904 (12)

where 119886 is the target braking rate119879119889 represents the timedelayand 119879119901 is the time constant

Finally display module mainly shows the experimentalresults including parking error and target braking rate

3 Algorithm Design and Performance Indices

31 Stochastic Optimal Selection Algorithm (SOSA) In SOSA1199041 to 119904119899 correspond to current 119899 states of algorithm at themoment We assume that the reward function about currentstate is 119877 119877 can transfer each state into the real number119904119894 R997888rarr R Δ represents the distance between the runningtrain and the current nearest balise If the train passes thecurrent nearest balise the value of Δ is negative otherwise itis positive 119877(119904119894) is the relationship between the current stateand the current reward shown as

119877 (119904119894) =

+3 0119888119898 le |Δ| le 10119888119898+2 10119888119898 le |Δ| le 20119888119898+1 20119888119898 le |Δ| le 30119888119898minus3 30119888119898 le |Δ|

(119894 = 1 2 119899)(13)

When the distance Δ is less than 10cm the reward function119877(119904119894) is the largest among all the results When the distanceΔ is beyond plusmn30cm the reward function 119877(119904119894) is smallestand negative The reward function immediately indicates thereward of the train in its current state In this algorithm wedo not have to consider the long-term reward of the stateThe bigger the value of the reward function is the better theperformance of this algorithm will be

We consider the train braking rate of the controllermodule as the output action vector A When a train passesthrough a balise in the simulation the system will change thetarget braking rate to ensure an accurate parking Because ofsystem time delay and time constant the actual braking rateof the train will approach the target braking rate gradually

In this paper we adopt five braking rates They are non-learning strategy (NLs) braking rate fixed learning strategy(FLs) braking rate variable learning strategy (VLs) brakingrate gradient descent learning strategy(GDLs) braking rateand Newton descent learning strategy (NDLs) braking rateThe action vector A is a 5-element vector including thebraking rates mentioned above A=[NLs FLs VLs GDLsNDLs] shown as Figure 3 When a train passes through

Journal of Advanced Transportation 5

a balise the SOSA will choose one strategy from vectorA randomly as the current target braking rate After thetrain passes through all balises also called one episode thealgorithm records the parking error and action sequenceThesystem will start the next episode from the initial spot until119877(119904119894) is large enough or bigger than a preset value 120579

The SOSA consist of the following steps

Step 1 For all the states initialize the system parametersassume that the train enters the parking area

Step 2 After passing through each balise calculate the actionvector A and current reward 119877(119904119894) Choose an action from Arandomly as the current target braking rate

Step 3 Store the current reward in the array then repeat Step2 until the train reaches the stopping spot Store the parkingerror and action sequence Go to Step 4

Step 4 If 119877(119904119894) lt 120579 go to Step 1 or else use the current actionsequence as the train braking rate

32 Q-Learning Algorithm In QLA 1199041 to 119904119899 correspond tocurrent 119899 states of algorithm at themomentThe action vectorA=[NLs FLs VLs GDLs NDLs] Each state and action areconnected with the value of 119876(119904 119886) According to the 119876(119904 119886)the algorithm will pick an action from vector A Besides thealgorithm can also update the Q function according to the120576-greedy action selection mechanism [18]119876(119904119894 119886) function is the estimated total reward under thestate 119904119894 and action 119886 We have

119876 (119904 119886) = 119864 [119877 (1199040 119886) + 120574119877 (1199041 119886) + 1205742119877 (1199042 119886) + sdot sdot sdot+ 120574119894119877 (119904119894 119886) | 1199040 = 119904] = 119877 (119904 119886)+ 120574sum1198781015840

119875119904119886 (1199041015840)119876 (1199041015840 119886)(14)

where the discounted factor is 0 le 120574 lt 1 In this paperdiscounted factor 120574 equals 099 119904 represents the current stateand 1199041015840 is the next state when the last state is 119904 119875119904119886 is statetransition distribution probability It is the probability thatstate transfers from 119904119894 to 119904119895 using the action 119886 The trainactually runs along the railway track So no matter whataction is chosen the next state must be 119904119894+1 We decide tolet 119875119904119886(1199041015840)=1 The 119877(119904 119886) is defined the same as the 119877(119904) instochastic optimal selection algorithm

The bigger the119876(119904 119886) is the larger the immediate rewardin each balise will be In order to obtain the maximum119876 thealgorithm needs to update the Q function as shown in

120587 (119904) fl argmax119886sum1199041015840

119875119904119886 (1199041015840)119876 (1199041015840 119886) (15)

The optimal119876 can be expressed as a119876lowast It represents the totalreward according to the above updating strategy we get

119876lowast (119904 119886) = 119877 (119904 119886) + 120574sum1199041015840isin119878

119875119904119886 (1199041015840)max119886119876lowast (1199041015840 119886) (16)

Start

If it is different fromprevious iteration

End

Update Q(sa)

If Q(sa) equals Qlowast(sa)

Adopt a strategy

Calculate Q(sa)

Choose other actions inthe next iteration

Yes

No

No

Yes

Figure 4 Training flowchart of QLA

The overall training processes of QLA are presented inFigure 4 consisting of the following steps

Step 1 Initialize 119876(119904 119886)Step 2 Adopt a strategy 120587 with any action sequence Whenpassing by each balise record the sequence of state actionand immediate reward value using (13) After each episodecalculate the 119876(119904 119886) according to (14)Step 3 If the current strategy and the previous selected strat-egy are different then update the strategy using the actionselection mechanism Select an action with larger probabilitythat can make 119876(119904 119886) become the largest one like (15) Onthe contrary QL will select other actions with a smallerprobability

Step 4 Repeat the steps mentioned above when 119876(119904 119886) 997888rarr119876lowast(119904 119886) we can achieve best strategy 120587 997888rarr 120587lowast33 Fuzzy Q-Learning Algorithm FQLA is basically the sameas QLA It is an improved version of QLAWithout using thediscrete function instead the fuzzy function Q-learning usesa continuous function where the value of immediate rewardis a typical Gauss type membership function ie

119910 = 119891 (119909 120590 119888) = 119890(119909minus119888)221205902 (17)

and the fuzzy member function is shown in Figure 5The FQLA is almost the same as QLA

Step 1 Initialize the action-value function 119876(119904 119886)Step 2 Adopt a strategy 120587 with any action sequence Whenpassing by each balise record the sequence of state action

6 Journal of Advanced Transportation

6420 8 10minus4minus6minus8 minus2minus10

x

0

02

04

06

08

1

y

Figure 5 Reward as Gauss type membership function

and immediate reward value using (17) After each episodecalculate the 119876(119904 119886) according to (14)Step 3 If the current strategy and the previous selectedstrategy are different then update the strategy using theaction selection mechanism Select an action with largerprobability that can make 119876(119904 119886) become the largest onelike (15) On the contrary QL will select other actions witha smaller probability

Step 4 Repeat the steps mentioned above when 119876(119904 119886) 997888rarr119876lowast(119904 119886) we can achieve best strategy 120587 997888rarr 120587lowast34The Performance Indices To compare the performance ofthree algorithms we need to define the parking performanceindices We have the following evaluation performanceindices(1) The convergence number (119873119888) is t he number oflearning trails when the parking errors tend to be stable(2) Average errors under different speed (119864119889119904) in orderto test the merits of the parking algorithm we need to havemany simulation trials Our simulations have produced agreat deal of parking information Among them the parkingerror is one of the most important indices in the parkinginformation because it is the most direct indicator of theparking quality With n different initial speeds algorithm canget n parking error after each episode The 119864119889119904 is the absoluteaverage of n parking error as shown in the following formula

119864119889119904 = sum119899119894=1 10038161003816100381610038161205831198941003816100381610038161003816119899 (18)

where 120583119894 is parking error at different initial speeds(3) Average error (119864) 119864 is the average of 119864119889119904 as shown in

119864 = sum119898119894=1 119864119889119904119894119898 (19)

(4)The maximum error of the parking error (120583119898119886119909) themaximum error of the parking error here is an importantindicator of the performance indices 120583119898119886119909 is the maximumparking error among all the parking errors If the 120583119898119886119909 is toobig it will seriously affect the parking safety of the train wehave

120583119898119886119909 = max (1205831 1205832 120583119899) (20)

(5)The minimum of the average error (119864119898119894119899) the mini-mum error of the parking error is also an important indicator119864119898119894119899 is the minimum among all the 119864119889119904 We have

119864119898119894119899 = min (1198641198891199041 1198641198891199042 119864119889119904119898) (21)

(6) Root of mean square (120590) the 120590 reflects the parkingerror fluctuations (discrete degree) In the process of parkingalgorithm we need to assess the fluctuation which indicatesthe stability of the algorithm we have

120590 = radicsum119898119894=1 (119864 minus 119864119889119904119894)2119898 minus 1 (22)

(7) Parking probability in the required range (119875119901) 119875119901 is ameasurement of the estimated reliability This is an importantindex to control the automatic parking ability The parkingprobability will be unavailable if the 119875119901 is too small to meetthe requirement of practical use

119875119894 = (119909 minus 119864)120590 (23)

119875119901 = (1 minus 2(1 minus int119875119894minusinfin

1120590radic2120587119890minus1198752119894 2119889119875119894)) times 100 (24)

A large number of studies have given the parking errorof the train within the range [-30cm30cm] and made theparking probability requirement larger than 995 [8 10 13]For example the third line of Dalian rail transit in Chinarequires train parking accuracy to be within [-30cm30cm]and the required reliability is 995 [10] In Beijing Subwaythe allowable parking error is also 30cm with over 998reliability

4 The Experimental Results

To approach the real-world running scenaric according tothe above mathematical model we choose field data betweenthe Jiugong station and theXiaohongmen station of YizhuangLine of Beijing Subway in the train station parking simulationplatform We set that the length of the parking area is 100metersThere are six balises in this parking areaThe distancebetween the balises and the parking spot is 100m 64m 36m16m 4m and 0m respectivelyThe arrangement of the balisesmeets the actual balises

The maximum velocity (ie speed limit) for the train toenter into the area is 1428ms While there exists time delayin the train braking system we select 115ms as themaximuminitial speed of a train based on actual operating data Inthe algorithms we did not consider the parking error underone single speed By using single speed we can not fullytest the reliability and adaptability of the algorithm So weset that the initial speed varies from 9ms to 115ms andwe pick 10 different initial speeds in every 025ms speedinterval The purpose is to consider the influence under thevariable initial speed The threshold of the SOSA is [-5cm5cm] We use Gauss membership function in FQLA to

Journal of Advanced Transportation 7

5 10 15 20 25 30 35 40 45 500Trail times

0

002

004

006

008

01

012

014

016

018

02

Park

ing

erro

rs (m

)

SOSQLFQL

Figure 6 50 trials of three algorithms about 10 different initial speeds

5 10 15 20 25 30 35 40 45 500Trail times

0

002

004

006

008

01

012

014

016

018

02

Park

ing

erro

rs (m

)

SOSQLFQL

Figure 7 50 trials of three algorithms about 100 different initial speeds

substitute the current reward 119877(119904119894) in QLA After debuggingand experimenting we pick parameters 120590=01 119888=0 Figure 6presents the errors 119864119889119904 of 10 different initial speeds during 50trials As can be seen in Figure 6 the convergence iterations119873119888 are different by the three algorithms The parking error119864119889119904 is becoming smaller and smaller after times of trail andthe results also demonstrate that most of the parking errorscan be smaller than plusmn30cm in the end With respect to thecomputational efforts we note that these three algorithmsare very close because they are all based on the theory ofreinforcement learning Since each trial includes 10 times oftrain station parking process with different initial speed ittakes about 15 seconds to complete each trial and the totaltraining time is about 750 seconds

We also conduct more trials to test the variation of 119873119888when there are 100 initial speedsWhen the numbers of initial

speed reach 100 we obtain the variation of train parkingerrors as shown in Figure 7

The 119873119888 we get is larger than the 119873119888 received under 10initial speeds We can conclude from Figures 6 and 7 that themore initial speeds are set the slower convergence and largerconvergence number it will be

The performance comparison of these three algorithmcan be seen in Tables 1 and 2 It is clear that QLA andFQLA have smaller 119873119888 than SOSA This indicates that theconvergence is much faster in QLA and FQLA especiallywhen the number of initial speeds is increased

As can be seen from the figures and tables the developedthree algorithms work well on the field data between twostations in Beijing Subway which can improve most of theperformance indices especially in decreasing 119864 QLA andFQLA have smaller 119864 than SOSA The 119873119888 rate of FQLA is

8 Journal of Advanced Transportation

95 10 105 11 1159Initial speeds V0 (ms)

minus025minus02

minus015minus01

minus0050

00501

01502

025

Park

ing

erro

rs (m

)

FQLSOSQL

Figure 8 Each parking error after 50 trials about different initial speeds

Table 1 Performance indices of three algorithms with 10 initialspeeds

Indicators 119873119862 119864(119888119898) 119864119898119894119899(119888119898) 120590(119888119898) 119875119901SOSA 39 427 199 334 9999QLA 20 424 186 432 9999FQLA 18 399 183 395 9999

Table 2 Performance indices of three algorithms with 100 initialspeeds

Indicators 119873119862 119864(119888119898) 119864119898119894119899(119888119898) 120590(119888119898) 119875119901SOSA 49 418 158 367 9999QLA 24 387 140 414 9999FQLA 21 346 131 365 9999

about 19 times and it has the fastest convergence among thethree algorithms FQLA has the smallest 119864 and 119864119898119894119899 QLAachieves the largest 120590When it comes to119875119901 the119875119901 of the threealgorithms is almost the same and they are all larger than995 We can obtain the following conclusions (1) thesethree algorithms can adapt to the field data which have goodperformance indices and gradually reduce the parking error(2) in general FQLA ranks the first and SOSA ranks the thirdSOSA has the worst119873119888 119864 and 119864119898119894119899 (3) three algorithms canall meet the requirements of accurate parking and are easy tobe implemented However the intelligence exploration andself-learning of SOSA are much worse than those of QLA andFQLA

In addition we also test the performance of algorithmswith different initial speed after 50 times of trails Figure 8presents the parking errors under each different initial speed

In Figure 8 we see that 119864119889119904 is within the range of plusmn30cmand the errors of QLA and FQLA are closer to 0cm than thatof SOSAThe results also show us that the error fluctuation of

FQLA performs better than SOSA and QLA We can almostcome to similar conclusions from Tables 1 and 2 FQLAperforms the best among the three algorithms indicating thatadding fuzzy functions into Q-learning can further improveits performance for TSP In addition it is worth to mentionthat [13] proposed several online learning algorithms for trainstation parking problem The train parking errors by theseonline learning algorithms are about 5cm to 6cm We seefrom the above results that our developed algorithms canfurther reduce the parking errors compared with existingstudies

5 Conclusions

To address the TSP issue in urban rail transit this paperdeveloped one stochastic algorithm and two machine learn-ing algorithms based on theory of reinforcement learningFurthermore we propose five braking rates as the actionvector of the three algorithms within the scope of RL Perfor-mance indices were defined to evaluate the performance andreliability of the issue We evaluated the proposed algorithmsin the simulation platform with field data collected in BeijingSubway Our results show that these three algorithms canachieve good results that guarantee the train station parkingerror within 30 cm In addition fuzzy Q-learning algorithmperforms the best among these algorithms due to its fastand stable convergence In practical applications the railmanagers could use the simulation platform to optimize thevalue function with the aid of our developed algorithms andthen use the well-trained Q functions to generate the optimaltrain parking actions in real time to reduce the parking errorsof trains

In the future we will explore new algorithms and othermethods based on soft computing and machine learning tofurther increase the accuracy and reliability for the train park-ing In particular we will combine reinforcement learning

Journal of Advanced Transportation 9

with deep learning algorithms to further enhance the flexi-bility and accuracy for TSP Meanwhile there still exist manypractical issues which need to be taken into consideration inthis paper For example we only consider the TSP problemin urban rail transit while the train braking system is muchmore complex in high-speed railways Compared with urbanrail transit networks inwhich each line is a relatively enclosedsystem that is independent of each other and each trainmoveslike a ldquoshuttlerdquo on the fixed tracks the high-speed railwaynetworks are relatively open systems with heterogeneoustrains The trains usually have different circulation plansand each train may travel among different railway linesaccording to its circulation planTherefore the design of TSPalgorithms for these open systems would be definitely muchmore complex than that of urban transit systems in order tobe flexible with various external circumstances We will fulfillthese issues in our future research

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

Theauthors claim that there are no conflicts of interest in thispaper

Acknowledgments

This research was supported by the National Natural ScienceFoundation of China through Grants 61790570 61790573and U1734210 by the Key Laboratory of Intelligent Metro ofUniversities in Fujian through Grant 2018IMUF001 by theBeijing Jiaotong University Education Foundation throughGrant 9907006519 and by the China Postdoctoral ScienceFoundation Funded Project through Grant 2019M650474

References

[1] J Yin T Tang L Yang J Xun Y Huang and Z Gao ldquoResearchand development of automatic train operation for railwaytransportation systems a surveyrdquo Transportation Research PartC Emerging Technologies vol 85 pp 548ndash572 2017

[2] E Coscetta and A Carteni ldquoThe hedonic value of railwaysterminals a quantitative analysis of the impact of stationsquality on travelers behaviorrdquo Trasportation Research Part Avol 61 pp 41ndash52 2014

[3] E Ben-Elia and D Ettema ldquoRewarding rush-hour avoidance astudy of commutersrsquo travel behaviorrdquo Transportation ResearchPart A Policy and Practice vol 45 no 7 pp 567ndash582 2011

[4] J Yin L Yang X Zhou T Tang and Z Gao ldquoBalancing aone-way corridor capacity and safety-oriented reliability astochastic optimization approach for metro train timetablingrdquoNaval Research Logistics (NRL) vol 66 no 4 pp 297ndash320 2019

[5] K Yoshimoto K Kataoka andKKomaya ldquoA feasibility study oftrain automatic stop control using range sensorsrdquo in Proceedingsof the 2001 IEEE Intelligent Transportation Systems pp 802ndash807USA August 2001

[6] S Yasunobu S Miyamoto and H Ihara ldquoA fuzzy control fortrain automatic stop controlrdquo Transactions of the Society ofInstrument and Control Engineers vol 2 no 1 pp 1ndash8 2002

[7] S Yasunobu and S Miyamoto ldquoAutomatic train operationsystem by predictive fuzzy controlrdquo Industrial Applications ofFuzzy Control vol 1 no 18 pp 1ndash18 1985

[8] Z Hou Y Wang C Yin and T Tang ldquoTerminal iterative learn-ing control based station stop control of a trainrdquo InternationalJournal of Control vol 84 no 7 pp 1263ndash1274 2011

[9] J Zhou and D Chen ldquoApplications of machine learning meth-ods in problem of precise train stoppingrdquoComputer Engineeringand applications vol 46 no 25 pp 226ndash230 2010

[10] D Chen and C Gao ldquoSoft computing methods applied to trainstation parking in urban rail transitrdquo Applied Soft Computingvol 12 no 2 pp 759ndash767 2012

[11] GHeResearch on Train Precision StopControl AlgorithmBasedon LQR Beijing Jiaotong University Beijing China 2009

[12] R Chen and D Chen ldquoHeuristic self-learning algorithms andsimulation of train station stoprdquo Railway Computer Applicationvol 22 no 6 pp 9ndash13 2013

[13] D Chen R Chen Y Li and T Tang ldquoOnline learning algo-rithms for train automatic stop control using precise locationdata of balisesrdquo IEEE Transactions on Intelligent TransportationSystems vol 14 no 3 pp 1526ndash1535 2013

[14] J YinD Chen andL Li ldquoIntelligent train operation algorithmsfor subway by expert system and reinforcement learningrdquo IEEETransactions on Intelligent Transportation Systems vol 15 no 6pp 2561ndash2571 2014

[15] J Liu ldquoAnalysis of station dwelling error and adjustmentsolution for operation after screen doorrsquos installationrdquo UrbanMass Transit vol 1 pp 44ndash47 2009

[16] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[17] J Yin D Chen T Tang L Zhu and W Zhu ldquoBalise arrange-ment optimization for train station parking via expert knowl-edge and genetic algorithmrdquo Applied Mathematical Modellingvol 40 no 19-20 pp 8513ndash8529 2016

[18] J Yin T Tang L Yang Z Gao and B Ran ldquoEnergy-efficientmetro train reschedulingwith uncertain time-variant passengerdemands an approximate dynamic programming approachrdquoTransportation Research Part B Methodological vol 91 pp 178ndash210 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: Developing Train Station Parking Algorithms: New Frameworks …downloads.hindawi.com/journals/jat/2019/3072495.pdf · 2019. 8. 4. · New Frameworks Based on Fuzzy Reinforcement Learning

4 Journal of Advanced Transportation

TSP system

Parking algorithm

Current reward R(MC)

Action vector A(NLs FLs

VLs GDLsNDLs)

StateS1 S2S3 S4

S5

Figure 3 Diagram of state reward and action vector

when the train starts braking the length of the parking area 119871the number of balises 119899 the distance between each balise 119863resistance effect white noise type of algorithm disturbanceand number of iterations These parameters provide the ini-tialized circumstance and related configuration informationWe can change the simulation circumstance by changing theparameters above

Themain function of the algorithm module is to generatethe target braking rate 119886 Algorithm module can calculatethe 5 different train braking rates mentioned above and thenselect the optimal or the most appropriate target braking rateaccording to the algorithms

Simulation resistance and random disturbance representthe actual urban rail transit systemThe simulation resistancegenerates the current train resistance based on the currentspeed using the formula related to the speed of the train inthe form of a quadratic equation called basic resistance unitequation

1198820 = 120572V2 + 120573V + 120593 (11)

where1198820 represents the basic resistance unit (Nt) V is speedof train (kmh) 120593 represents the part of rolling resistanceunrelated to the train speed 120573 is parameter that is used todescribe wheel flange friction resistance and 120572 is associatedwith the square of speed The white noise generator cangenerate random white noise to test the anti-interferenceability of the algorithms

The braking module is used to park the train accordingto the target braking rate 119886 The transfer function of thebraking module includes time delay time constant andtrain operating mode transition The transfer function of thebraking module is

119866119886 (119904) = 1198861 + 119879119901119904 119890minus119879119889119904 (12)

where 119886 is the target braking rate119879119889 represents the timedelayand 119879119901 is the time constant

Finally display module mainly shows the experimentalresults including parking error and target braking rate

3 Algorithm Design and Performance Indices

31 Stochastic Optimal Selection Algorithm (SOSA) In SOSA1199041 to 119904119899 correspond to current 119899 states of algorithm at themoment We assume that the reward function about currentstate is 119877 119877 can transfer each state into the real number119904119894 R997888rarr R Δ represents the distance between the runningtrain and the current nearest balise If the train passes thecurrent nearest balise the value of Δ is negative otherwise itis positive 119877(119904119894) is the relationship between the current stateand the current reward shown as

119877 (119904119894) =

+3 0119888119898 le |Δ| le 10119888119898+2 10119888119898 le |Δ| le 20119888119898+1 20119888119898 le |Δ| le 30119888119898minus3 30119888119898 le |Δ|

(119894 = 1 2 119899)(13)

When the distance Δ is less than 10cm the reward function119877(119904119894) is the largest among all the results When the distanceΔ is beyond plusmn30cm the reward function 119877(119904119894) is smallestand negative The reward function immediately indicates thereward of the train in its current state In this algorithm wedo not have to consider the long-term reward of the stateThe bigger the value of the reward function is the better theperformance of this algorithm will be

We consider the train braking rate of the controllermodule as the output action vector A When a train passesthrough a balise in the simulation the system will change thetarget braking rate to ensure an accurate parking Because ofsystem time delay and time constant the actual braking rateof the train will approach the target braking rate gradually

In this paper we adopt five braking rates They are non-learning strategy (NLs) braking rate fixed learning strategy(FLs) braking rate variable learning strategy (VLs) brakingrate gradient descent learning strategy(GDLs) braking rateand Newton descent learning strategy (NDLs) braking rateThe action vector A is a 5-element vector including thebraking rates mentioned above A=[NLs FLs VLs GDLsNDLs] shown as Figure 3 When a train passes through

Journal of Advanced Transportation 5

a balise the SOSA will choose one strategy from vectorA randomly as the current target braking rate After thetrain passes through all balises also called one episode thealgorithm records the parking error and action sequenceThesystem will start the next episode from the initial spot until119877(119904119894) is large enough or bigger than a preset value 120579

The SOSA consist of the following steps

Step 1 For all the states initialize the system parametersassume that the train enters the parking area

Step 2 After passing through each balise calculate the actionvector A and current reward 119877(119904119894) Choose an action from Arandomly as the current target braking rate

Step 3 Store the current reward in the array then repeat Step2 until the train reaches the stopping spot Store the parkingerror and action sequence Go to Step 4

Step 4 If 119877(119904119894) lt 120579 go to Step 1 or else use the current actionsequence as the train braking rate

32 Q-Learning Algorithm In QLA 1199041 to 119904119899 correspond tocurrent 119899 states of algorithm at themomentThe action vectorA=[NLs FLs VLs GDLs NDLs] Each state and action areconnected with the value of 119876(119904 119886) According to the 119876(119904 119886)the algorithm will pick an action from vector A Besides thealgorithm can also update the Q function according to the120576-greedy action selection mechanism [18]119876(119904119894 119886) function is the estimated total reward under thestate 119904119894 and action 119886 We have

119876 (119904 119886) = 119864 [119877 (1199040 119886) + 120574119877 (1199041 119886) + 1205742119877 (1199042 119886) + sdot sdot sdot+ 120574119894119877 (119904119894 119886) | 1199040 = 119904] = 119877 (119904 119886)+ 120574sum1198781015840

119875119904119886 (1199041015840)119876 (1199041015840 119886)(14)

where the discounted factor is 0 le 120574 lt 1 In this paperdiscounted factor 120574 equals 099 119904 represents the current stateand 1199041015840 is the next state when the last state is 119904 119875119904119886 is statetransition distribution probability It is the probability thatstate transfers from 119904119894 to 119904119895 using the action 119886 The trainactually runs along the railway track So no matter whataction is chosen the next state must be 119904119894+1 We decide tolet 119875119904119886(1199041015840)=1 The 119877(119904 119886) is defined the same as the 119877(119904) instochastic optimal selection algorithm

The bigger the119876(119904 119886) is the larger the immediate rewardin each balise will be In order to obtain the maximum119876 thealgorithm needs to update the Q function as shown in

120587 (119904) fl argmax119886sum1199041015840

119875119904119886 (1199041015840)119876 (1199041015840 119886) (15)

The optimal119876 can be expressed as a119876lowast It represents the totalreward according to the above updating strategy we get

119876lowast (119904 119886) = 119877 (119904 119886) + 120574sum1199041015840isin119878

119875119904119886 (1199041015840)max119886119876lowast (1199041015840 119886) (16)

Start

If it is different fromprevious iteration

End

Update Q(sa)

If Q(sa) equals Qlowast(sa)

Adopt a strategy

Calculate Q(sa)

Choose other actions inthe next iteration

Yes

No

No

Yes

Figure 4 Training flowchart of QLA

The overall training processes of QLA are presented inFigure 4 consisting of the following steps

Step 1 Initialize 119876(119904 119886)Step 2 Adopt a strategy 120587 with any action sequence Whenpassing by each balise record the sequence of state actionand immediate reward value using (13) After each episodecalculate the 119876(119904 119886) according to (14)Step 3 If the current strategy and the previous selected strat-egy are different then update the strategy using the actionselection mechanism Select an action with larger probabilitythat can make 119876(119904 119886) become the largest one like (15) Onthe contrary QL will select other actions with a smallerprobability

Step 4 Repeat the steps mentioned above when 119876(119904 119886) 997888rarr119876lowast(119904 119886) we can achieve best strategy 120587 997888rarr 120587lowast33 Fuzzy Q-Learning Algorithm FQLA is basically the sameas QLA It is an improved version of QLAWithout using thediscrete function instead the fuzzy function Q-learning usesa continuous function where the value of immediate rewardis a typical Gauss type membership function ie

119910 = 119891 (119909 120590 119888) = 119890(119909minus119888)221205902 (17)

and the fuzzy member function is shown in Figure 5The FQLA is almost the same as QLA

Step 1 Initialize the action-value function 119876(119904 119886)Step 2 Adopt a strategy 120587 with any action sequence Whenpassing by each balise record the sequence of state action

6 Journal of Advanced Transportation

6420 8 10minus4minus6minus8 minus2minus10

x

0

02

04

06

08

1

y

Figure 5 Reward as Gauss type membership function

and immediate reward value using (17) After each episodecalculate the 119876(119904 119886) according to (14)Step 3 If the current strategy and the previous selectedstrategy are different then update the strategy using theaction selection mechanism Select an action with largerprobability that can make 119876(119904 119886) become the largest onelike (15) On the contrary QL will select other actions witha smaller probability

Step 4 Repeat the steps mentioned above when 119876(119904 119886) 997888rarr119876lowast(119904 119886) we can achieve best strategy 120587 997888rarr 120587lowast34The Performance Indices To compare the performance ofthree algorithms we need to define the parking performanceindices We have the following evaluation performanceindices(1) The convergence number (119873119888) is t he number oflearning trails when the parking errors tend to be stable(2) Average errors under different speed (119864119889119904) in orderto test the merits of the parking algorithm we need to havemany simulation trials Our simulations have produced agreat deal of parking information Among them the parkingerror is one of the most important indices in the parkinginformation because it is the most direct indicator of theparking quality With n different initial speeds algorithm canget n parking error after each episode The 119864119889119904 is the absoluteaverage of n parking error as shown in the following formula

119864119889119904 = sum119899119894=1 10038161003816100381610038161205831198941003816100381610038161003816119899 (18)

where 120583119894 is parking error at different initial speeds(3) Average error (119864) 119864 is the average of 119864119889119904 as shown in

119864 = sum119898119894=1 119864119889119904119894119898 (19)

(4)The maximum error of the parking error (120583119898119886119909) themaximum error of the parking error here is an importantindicator of the performance indices 120583119898119886119909 is the maximumparking error among all the parking errors If the 120583119898119886119909 is toobig it will seriously affect the parking safety of the train wehave

120583119898119886119909 = max (1205831 1205832 120583119899) (20)

(5)The minimum of the average error (119864119898119894119899) the mini-mum error of the parking error is also an important indicator119864119898119894119899 is the minimum among all the 119864119889119904 We have

119864119898119894119899 = min (1198641198891199041 1198641198891199042 119864119889119904119898) (21)

(6) Root of mean square (120590) the 120590 reflects the parkingerror fluctuations (discrete degree) In the process of parkingalgorithm we need to assess the fluctuation which indicatesthe stability of the algorithm we have

120590 = radicsum119898119894=1 (119864 minus 119864119889119904119894)2119898 minus 1 (22)

(7) Parking probability in the required range (119875119901) 119875119901 is ameasurement of the estimated reliability This is an importantindex to control the automatic parking ability The parkingprobability will be unavailable if the 119875119901 is too small to meetthe requirement of practical use

119875119894 = (119909 minus 119864)120590 (23)

119875119901 = (1 minus 2(1 minus int119875119894minusinfin

1120590radic2120587119890minus1198752119894 2119889119875119894)) times 100 (24)

A large number of studies have given the parking errorof the train within the range [-30cm30cm] and made theparking probability requirement larger than 995 [8 10 13]For example the third line of Dalian rail transit in Chinarequires train parking accuracy to be within [-30cm30cm]and the required reliability is 995 [10] In Beijing Subwaythe allowable parking error is also 30cm with over 998reliability

4 The Experimental Results

To approach the real-world running scenaric according tothe above mathematical model we choose field data betweenthe Jiugong station and theXiaohongmen station of YizhuangLine of Beijing Subway in the train station parking simulationplatform We set that the length of the parking area is 100metersThere are six balises in this parking areaThe distancebetween the balises and the parking spot is 100m 64m 36m16m 4m and 0m respectivelyThe arrangement of the balisesmeets the actual balises

The maximum velocity (ie speed limit) for the train toenter into the area is 1428ms While there exists time delayin the train braking system we select 115ms as themaximuminitial speed of a train based on actual operating data Inthe algorithms we did not consider the parking error underone single speed By using single speed we can not fullytest the reliability and adaptability of the algorithm So weset that the initial speed varies from 9ms to 115ms andwe pick 10 different initial speeds in every 025ms speedinterval The purpose is to consider the influence under thevariable initial speed The threshold of the SOSA is [-5cm5cm] We use Gauss membership function in FQLA to

Journal of Advanced Transportation 7

5 10 15 20 25 30 35 40 45 500Trail times

0

002

004

006

008

01

012

014

016

018

02

Park

ing

erro

rs (m

)

SOSQLFQL

Figure 6 50 trials of three algorithms about 10 different initial speeds

5 10 15 20 25 30 35 40 45 500Trail times

0

002

004

006

008

01

012

014

016

018

02

Park

ing

erro

rs (m

)

SOSQLFQL

Figure 7 50 trials of three algorithms about 100 different initial speeds

substitute the current reward 119877(119904119894) in QLA After debuggingand experimenting we pick parameters 120590=01 119888=0 Figure 6presents the errors 119864119889119904 of 10 different initial speeds during 50trials As can be seen in Figure 6 the convergence iterations119873119888 are different by the three algorithms The parking error119864119889119904 is becoming smaller and smaller after times of trail andthe results also demonstrate that most of the parking errorscan be smaller than plusmn30cm in the end With respect to thecomputational efforts we note that these three algorithmsare very close because they are all based on the theory ofreinforcement learning Since each trial includes 10 times oftrain station parking process with different initial speed ittakes about 15 seconds to complete each trial and the totaltraining time is about 750 seconds

We also conduct more trials to test the variation of 119873119888when there are 100 initial speedsWhen the numbers of initial

speed reach 100 we obtain the variation of train parkingerrors as shown in Figure 7

The 119873119888 we get is larger than the 119873119888 received under 10initial speeds We can conclude from Figures 6 and 7 that themore initial speeds are set the slower convergence and largerconvergence number it will be

The performance comparison of these three algorithmcan be seen in Tables 1 and 2 It is clear that QLA andFQLA have smaller 119873119888 than SOSA This indicates that theconvergence is much faster in QLA and FQLA especiallywhen the number of initial speeds is increased

As can be seen from the figures and tables the developedthree algorithms work well on the field data between twostations in Beijing Subway which can improve most of theperformance indices especially in decreasing 119864 QLA andFQLA have smaller 119864 than SOSA The 119873119888 rate of FQLA is

8 Journal of Advanced Transportation

95 10 105 11 1159Initial speeds V0 (ms)

minus025minus02

minus015minus01

minus0050

00501

01502

025

Park

ing

erro

rs (m

)

FQLSOSQL

Figure 8 Each parking error after 50 trials about different initial speeds

Table 1 Performance indices of three algorithms with 10 initialspeeds

Indicators 119873119862 119864(119888119898) 119864119898119894119899(119888119898) 120590(119888119898) 119875119901SOSA 39 427 199 334 9999QLA 20 424 186 432 9999FQLA 18 399 183 395 9999

Table 2 Performance indices of three algorithms with 100 initialspeeds

Indicators 119873119862 119864(119888119898) 119864119898119894119899(119888119898) 120590(119888119898) 119875119901SOSA 49 418 158 367 9999QLA 24 387 140 414 9999FQLA 21 346 131 365 9999

about 19 times and it has the fastest convergence among thethree algorithms FQLA has the smallest 119864 and 119864119898119894119899 QLAachieves the largest 120590When it comes to119875119901 the119875119901 of the threealgorithms is almost the same and they are all larger than995 We can obtain the following conclusions (1) thesethree algorithms can adapt to the field data which have goodperformance indices and gradually reduce the parking error(2) in general FQLA ranks the first and SOSA ranks the thirdSOSA has the worst119873119888 119864 and 119864119898119894119899 (3) three algorithms canall meet the requirements of accurate parking and are easy tobe implemented However the intelligence exploration andself-learning of SOSA are much worse than those of QLA andFQLA

In addition we also test the performance of algorithmswith different initial speed after 50 times of trails Figure 8presents the parking errors under each different initial speed

In Figure 8 we see that 119864119889119904 is within the range of plusmn30cmand the errors of QLA and FQLA are closer to 0cm than thatof SOSAThe results also show us that the error fluctuation of

FQLA performs better than SOSA and QLA We can almostcome to similar conclusions from Tables 1 and 2 FQLAperforms the best among the three algorithms indicating thatadding fuzzy functions into Q-learning can further improveits performance for TSP In addition it is worth to mentionthat [13] proposed several online learning algorithms for trainstation parking problem The train parking errors by theseonline learning algorithms are about 5cm to 6cm We seefrom the above results that our developed algorithms canfurther reduce the parking errors compared with existingstudies

5 Conclusions

To address the TSP issue in urban rail transit this paperdeveloped one stochastic algorithm and two machine learn-ing algorithms based on theory of reinforcement learningFurthermore we propose five braking rates as the actionvector of the three algorithms within the scope of RL Perfor-mance indices were defined to evaluate the performance andreliability of the issue We evaluated the proposed algorithmsin the simulation platform with field data collected in BeijingSubway Our results show that these three algorithms canachieve good results that guarantee the train station parkingerror within 30 cm In addition fuzzy Q-learning algorithmperforms the best among these algorithms due to its fastand stable convergence In practical applications the railmanagers could use the simulation platform to optimize thevalue function with the aid of our developed algorithms andthen use the well-trained Q functions to generate the optimaltrain parking actions in real time to reduce the parking errorsof trains

In the future we will explore new algorithms and othermethods based on soft computing and machine learning tofurther increase the accuracy and reliability for the train park-ing In particular we will combine reinforcement learning

Journal of Advanced Transportation 9

with deep learning algorithms to further enhance the flexi-bility and accuracy for TSP Meanwhile there still exist manypractical issues which need to be taken into consideration inthis paper For example we only consider the TSP problemin urban rail transit while the train braking system is muchmore complex in high-speed railways Compared with urbanrail transit networks inwhich each line is a relatively enclosedsystem that is independent of each other and each trainmoveslike a ldquoshuttlerdquo on the fixed tracks the high-speed railwaynetworks are relatively open systems with heterogeneoustrains The trains usually have different circulation plansand each train may travel among different railway linesaccording to its circulation planTherefore the design of TSPalgorithms for these open systems would be definitely muchmore complex than that of urban transit systems in order tobe flexible with various external circumstances We will fulfillthese issues in our future research

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

Theauthors claim that there are no conflicts of interest in thispaper

Acknowledgments

This research was supported by the National Natural ScienceFoundation of China through Grants 61790570 61790573and U1734210 by the Key Laboratory of Intelligent Metro ofUniversities in Fujian through Grant 2018IMUF001 by theBeijing Jiaotong University Education Foundation throughGrant 9907006519 and by the China Postdoctoral ScienceFoundation Funded Project through Grant 2019M650474

References

[1] J Yin T Tang L Yang J Xun Y Huang and Z Gao ldquoResearchand development of automatic train operation for railwaytransportation systems a surveyrdquo Transportation Research PartC Emerging Technologies vol 85 pp 548ndash572 2017

[2] E Coscetta and A Carteni ldquoThe hedonic value of railwaysterminals a quantitative analysis of the impact of stationsquality on travelers behaviorrdquo Trasportation Research Part Avol 61 pp 41ndash52 2014

[3] E Ben-Elia and D Ettema ldquoRewarding rush-hour avoidance astudy of commutersrsquo travel behaviorrdquo Transportation ResearchPart A Policy and Practice vol 45 no 7 pp 567ndash582 2011

[4] J Yin L Yang X Zhou T Tang and Z Gao ldquoBalancing aone-way corridor capacity and safety-oriented reliability astochastic optimization approach for metro train timetablingrdquoNaval Research Logistics (NRL) vol 66 no 4 pp 297ndash320 2019

[5] K Yoshimoto K Kataoka andKKomaya ldquoA feasibility study oftrain automatic stop control using range sensorsrdquo in Proceedingsof the 2001 IEEE Intelligent Transportation Systems pp 802ndash807USA August 2001

[6] S Yasunobu S Miyamoto and H Ihara ldquoA fuzzy control fortrain automatic stop controlrdquo Transactions of the Society ofInstrument and Control Engineers vol 2 no 1 pp 1ndash8 2002

[7] S Yasunobu and S Miyamoto ldquoAutomatic train operationsystem by predictive fuzzy controlrdquo Industrial Applications ofFuzzy Control vol 1 no 18 pp 1ndash18 1985

[8] Z Hou Y Wang C Yin and T Tang ldquoTerminal iterative learn-ing control based station stop control of a trainrdquo InternationalJournal of Control vol 84 no 7 pp 1263ndash1274 2011

[9] J Zhou and D Chen ldquoApplications of machine learning meth-ods in problem of precise train stoppingrdquoComputer Engineeringand applications vol 46 no 25 pp 226ndash230 2010

[10] D Chen and C Gao ldquoSoft computing methods applied to trainstation parking in urban rail transitrdquo Applied Soft Computingvol 12 no 2 pp 759ndash767 2012

[11] GHeResearch on Train Precision StopControl AlgorithmBasedon LQR Beijing Jiaotong University Beijing China 2009

[12] R Chen and D Chen ldquoHeuristic self-learning algorithms andsimulation of train station stoprdquo Railway Computer Applicationvol 22 no 6 pp 9ndash13 2013

[13] D Chen R Chen Y Li and T Tang ldquoOnline learning algo-rithms for train automatic stop control using precise locationdata of balisesrdquo IEEE Transactions on Intelligent TransportationSystems vol 14 no 3 pp 1526ndash1535 2013

[14] J YinD Chen andL Li ldquoIntelligent train operation algorithmsfor subway by expert system and reinforcement learningrdquo IEEETransactions on Intelligent Transportation Systems vol 15 no 6pp 2561ndash2571 2014

[15] J Liu ldquoAnalysis of station dwelling error and adjustmentsolution for operation after screen doorrsquos installationrdquo UrbanMass Transit vol 1 pp 44ndash47 2009

[16] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[17] J Yin D Chen T Tang L Zhu and W Zhu ldquoBalise arrange-ment optimization for train station parking via expert knowl-edge and genetic algorithmrdquo Applied Mathematical Modellingvol 40 no 19-20 pp 8513ndash8529 2016

[18] J Yin T Tang L Yang Z Gao and B Ran ldquoEnergy-efficientmetro train reschedulingwith uncertain time-variant passengerdemands an approximate dynamic programming approachrdquoTransportation Research Part B Methodological vol 91 pp 178ndash210 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: Developing Train Station Parking Algorithms: New Frameworks …downloads.hindawi.com/journals/jat/2019/3072495.pdf · 2019. 8. 4. · New Frameworks Based on Fuzzy Reinforcement Learning

Journal of Advanced Transportation 5

a balise the SOSA will choose one strategy from vectorA randomly as the current target braking rate After thetrain passes through all balises also called one episode thealgorithm records the parking error and action sequenceThesystem will start the next episode from the initial spot until119877(119904119894) is large enough or bigger than a preset value 120579

The SOSA consist of the following steps

Step 1 For all the states initialize the system parametersassume that the train enters the parking area

Step 2 After passing through each balise calculate the actionvector A and current reward 119877(119904119894) Choose an action from Arandomly as the current target braking rate

Step 3 Store the current reward in the array then repeat Step2 until the train reaches the stopping spot Store the parkingerror and action sequence Go to Step 4

Step 4 If 119877(119904119894) lt 120579 go to Step 1 or else use the current actionsequence as the train braking rate

32 Q-Learning Algorithm In QLA 1199041 to 119904119899 correspond tocurrent 119899 states of algorithm at themomentThe action vectorA=[NLs FLs VLs GDLs NDLs] Each state and action areconnected with the value of 119876(119904 119886) According to the 119876(119904 119886)the algorithm will pick an action from vector A Besides thealgorithm can also update the Q function according to the120576-greedy action selection mechanism [18]119876(119904119894 119886) function is the estimated total reward under thestate 119904119894 and action 119886 We have

119876 (119904 119886) = 119864 [119877 (1199040 119886) + 120574119877 (1199041 119886) + 1205742119877 (1199042 119886) + sdot sdot sdot+ 120574119894119877 (119904119894 119886) | 1199040 = 119904] = 119877 (119904 119886)+ 120574sum1198781015840

119875119904119886 (1199041015840)119876 (1199041015840 119886)(14)

where the discounted factor is 0 le 120574 lt 1 In this paperdiscounted factor 120574 equals 099 119904 represents the current stateand 1199041015840 is the next state when the last state is 119904 119875119904119886 is statetransition distribution probability It is the probability thatstate transfers from 119904119894 to 119904119895 using the action 119886 The trainactually runs along the railway track So no matter whataction is chosen the next state must be 119904119894+1 We decide tolet 119875119904119886(1199041015840)=1 The 119877(119904 119886) is defined the same as the 119877(119904) instochastic optimal selection algorithm

The bigger the119876(119904 119886) is the larger the immediate rewardin each balise will be In order to obtain the maximum119876 thealgorithm needs to update the Q function as shown in

120587 (119904) fl argmax119886sum1199041015840

119875119904119886 (1199041015840)119876 (1199041015840 119886) (15)

The optimal119876 can be expressed as a119876lowast It represents the totalreward according to the above updating strategy we get

119876lowast (119904 119886) = 119877 (119904 119886) + 120574sum1199041015840isin119878

119875119904119886 (1199041015840)max119886119876lowast (1199041015840 119886) (16)

Start

If it is different fromprevious iteration

End

Update Q(sa)

If Q(sa) equals Qlowast(sa)

Adopt a strategy

Calculate Q(sa)

Choose other actions inthe next iteration

Yes

No

No

Yes

Figure 4 Training flowchart of QLA

The overall training processes of QLA are presented inFigure 4 consisting of the following steps

Step 1 Initialize 119876(119904 119886)Step 2 Adopt a strategy 120587 with any action sequence Whenpassing by each balise record the sequence of state actionand immediate reward value using (13) After each episodecalculate the 119876(119904 119886) according to (14)Step 3 If the current strategy and the previous selected strat-egy are different then update the strategy using the actionselection mechanism Select an action with larger probabilitythat can make 119876(119904 119886) become the largest one like (15) Onthe contrary QL will select other actions with a smallerprobability

Step 4 Repeat the steps mentioned above when 119876(119904 119886) 997888rarr119876lowast(119904 119886) we can achieve best strategy 120587 997888rarr 120587lowast33 Fuzzy Q-Learning Algorithm FQLA is basically the sameas QLA It is an improved version of QLAWithout using thediscrete function instead the fuzzy function Q-learning usesa continuous function where the value of immediate rewardis a typical Gauss type membership function ie

119910 = 119891 (119909 120590 119888) = 119890(119909minus119888)221205902 (17)

and the fuzzy member function is shown in Figure 5The FQLA is almost the same as QLA

Step 1 Initialize the action-value function 119876(119904 119886)Step 2 Adopt a strategy 120587 with any action sequence Whenpassing by each balise record the sequence of state action

6 Journal of Advanced Transportation

6420 8 10minus4minus6minus8 minus2minus10

x

0

02

04

06

08

1

y

Figure 5 Reward as Gauss type membership function

and immediate reward value using (17) After each episodecalculate the 119876(119904 119886) according to (14)Step 3 If the current strategy and the previous selectedstrategy are different then update the strategy using theaction selection mechanism Select an action with largerprobability that can make 119876(119904 119886) become the largest onelike (15) On the contrary QL will select other actions witha smaller probability

Step 4 Repeat the steps mentioned above when 119876(119904 119886) 997888rarr119876lowast(119904 119886) we can achieve best strategy 120587 997888rarr 120587lowast34The Performance Indices To compare the performance ofthree algorithms we need to define the parking performanceindices We have the following evaluation performanceindices(1) The convergence number (119873119888) is t he number oflearning trails when the parking errors tend to be stable(2) Average errors under different speed (119864119889119904) in orderto test the merits of the parking algorithm we need to havemany simulation trials Our simulations have produced agreat deal of parking information Among them the parkingerror is one of the most important indices in the parkinginformation because it is the most direct indicator of theparking quality With n different initial speeds algorithm canget n parking error after each episode The 119864119889119904 is the absoluteaverage of n parking error as shown in the following formula

119864119889119904 = sum119899119894=1 10038161003816100381610038161205831198941003816100381610038161003816119899 (18)

where 120583119894 is parking error at different initial speeds(3) Average error (119864) 119864 is the average of 119864119889119904 as shown in

119864 = sum119898119894=1 119864119889119904119894119898 (19)

(4)The maximum error of the parking error (120583119898119886119909) themaximum error of the parking error here is an importantindicator of the performance indices 120583119898119886119909 is the maximumparking error among all the parking errors If the 120583119898119886119909 is toobig it will seriously affect the parking safety of the train wehave

120583119898119886119909 = max (1205831 1205832 120583119899) (20)

(5)The minimum of the average error (119864119898119894119899) the mini-mum error of the parking error is also an important indicator119864119898119894119899 is the minimum among all the 119864119889119904 We have

119864119898119894119899 = min (1198641198891199041 1198641198891199042 119864119889119904119898) (21)

(6) Root of mean square (120590) the 120590 reflects the parkingerror fluctuations (discrete degree) In the process of parkingalgorithm we need to assess the fluctuation which indicatesthe stability of the algorithm we have

120590 = radicsum119898119894=1 (119864 minus 119864119889119904119894)2119898 minus 1 (22)

(7) Parking probability in the required range (119875119901) 119875119901 is ameasurement of the estimated reliability This is an importantindex to control the automatic parking ability The parkingprobability will be unavailable if the 119875119901 is too small to meetthe requirement of practical use

119875119894 = (119909 minus 119864)120590 (23)

119875119901 = (1 minus 2(1 minus int119875119894minusinfin

1120590radic2120587119890minus1198752119894 2119889119875119894)) times 100 (24)

A large number of studies have given the parking errorof the train within the range [-30cm30cm] and made theparking probability requirement larger than 995 [8 10 13]For example the third line of Dalian rail transit in Chinarequires train parking accuracy to be within [-30cm30cm]and the required reliability is 995 [10] In Beijing Subwaythe allowable parking error is also 30cm with over 998reliability

4 The Experimental Results

To approach the real-world running scenaric according tothe above mathematical model we choose field data betweenthe Jiugong station and theXiaohongmen station of YizhuangLine of Beijing Subway in the train station parking simulationplatform We set that the length of the parking area is 100metersThere are six balises in this parking areaThe distancebetween the balises and the parking spot is 100m 64m 36m16m 4m and 0m respectivelyThe arrangement of the balisesmeets the actual balises

The maximum velocity (ie speed limit) for the train toenter into the area is 1428ms While there exists time delayin the train braking system we select 115ms as themaximuminitial speed of a train based on actual operating data Inthe algorithms we did not consider the parking error underone single speed By using single speed we can not fullytest the reliability and adaptability of the algorithm So weset that the initial speed varies from 9ms to 115ms andwe pick 10 different initial speeds in every 025ms speedinterval The purpose is to consider the influence under thevariable initial speed The threshold of the SOSA is [-5cm5cm] We use Gauss membership function in FQLA to

Journal of Advanced Transportation 7

5 10 15 20 25 30 35 40 45 500Trail times

0

002

004

006

008

01

012

014

016

018

02

Park

ing

erro

rs (m

)

SOSQLFQL

Figure 6 50 trials of three algorithms about 10 different initial speeds

5 10 15 20 25 30 35 40 45 500Trail times

0

002

004

006

008

01

012

014

016

018

02

Park

ing

erro

rs (m

)

SOSQLFQL

Figure 7 50 trials of three algorithms about 100 different initial speeds

substitute the current reward 119877(119904119894) in QLA After debuggingand experimenting we pick parameters 120590=01 119888=0 Figure 6presents the errors 119864119889119904 of 10 different initial speeds during 50trials As can be seen in Figure 6 the convergence iterations119873119888 are different by the three algorithms The parking error119864119889119904 is becoming smaller and smaller after times of trail andthe results also demonstrate that most of the parking errorscan be smaller than plusmn30cm in the end With respect to thecomputational efforts we note that these three algorithmsare very close because they are all based on the theory ofreinforcement learning Since each trial includes 10 times oftrain station parking process with different initial speed ittakes about 15 seconds to complete each trial and the totaltraining time is about 750 seconds

We also conduct more trials to test the variation of 119873119888when there are 100 initial speedsWhen the numbers of initial

speed reach 100 we obtain the variation of train parkingerrors as shown in Figure 7

The 119873119888 we get is larger than the 119873119888 received under 10initial speeds We can conclude from Figures 6 and 7 that themore initial speeds are set the slower convergence and largerconvergence number it will be

The performance comparison of these three algorithmcan be seen in Tables 1 and 2 It is clear that QLA andFQLA have smaller 119873119888 than SOSA This indicates that theconvergence is much faster in QLA and FQLA especiallywhen the number of initial speeds is increased

As can be seen from the figures and tables the developedthree algorithms work well on the field data between twostations in Beijing Subway which can improve most of theperformance indices especially in decreasing 119864 QLA andFQLA have smaller 119864 than SOSA The 119873119888 rate of FQLA is

8 Journal of Advanced Transportation

95 10 105 11 1159Initial speeds V0 (ms)

minus025minus02

minus015minus01

minus0050

00501

01502

025

Park

ing

erro

rs (m

)

FQLSOSQL

Figure 8 Each parking error after 50 trials about different initial speeds

Table 1 Performance indices of three algorithms with 10 initialspeeds

Indicators 119873119862 119864(119888119898) 119864119898119894119899(119888119898) 120590(119888119898) 119875119901SOSA 39 427 199 334 9999QLA 20 424 186 432 9999FQLA 18 399 183 395 9999

Table 2 Performance indices of three algorithms with 100 initialspeeds

Indicators 119873119862 119864(119888119898) 119864119898119894119899(119888119898) 120590(119888119898) 119875119901SOSA 49 418 158 367 9999QLA 24 387 140 414 9999FQLA 21 346 131 365 9999

about 19 times and it has the fastest convergence among thethree algorithms FQLA has the smallest 119864 and 119864119898119894119899 QLAachieves the largest 120590When it comes to119875119901 the119875119901 of the threealgorithms is almost the same and they are all larger than995 We can obtain the following conclusions (1) thesethree algorithms can adapt to the field data which have goodperformance indices and gradually reduce the parking error(2) in general FQLA ranks the first and SOSA ranks the thirdSOSA has the worst119873119888 119864 and 119864119898119894119899 (3) three algorithms canall meet the requirements of accurate parking and are easy tobe implemented However the intelligence exploration andself-learning of SOSA are much worse than those of QLA andFQLA

In addition we also test the performance of algorithmswith different initial speed after 50 times of trails Figure 8presents the parking errors under each different initial speed

In Figure 8 we see that 119864119889119904 is within the range of plusmn30cmand the errors of QLA and FQLA are closer to 0cm than thatof SOSAThe results also show us that the error fluctuation of

FQLA performs better than SOSA and QLA We can almostcome to similar conclusions from Tables 1 and 2 FQLAperforms the best among the three algorithms indicating thatadding fuzzy functions into Q-learning can further improveits performance for TSP In addition it is worth to mentionthat [13] proposed several online learning algorithms for trainstation parking problem The train parking errors by theseonline learning algorithms are about 5cm to 6cm We seefrom the above results that our developed algorithms canfurther reduce the parking errors compared with existingstudies

5 Conclusions

To address the TSP issue in urban rail transit this paperdeveloped one stochastic algorithm and two machine learn-ing algorithms based on theory of reinforcement learningFurthermore we propose five braking rates as the actionvector of the three algorithms within the scope of RL Perfor-mance indices were defined to evaluate the performance andreliability of the issue We evaluated the proposed algorithmsin the simulation platform with field data collected in BeijingSubway Our results show that these three algorithms canachieve good results that guarantee the train station parkingerror within 30 cm In addition fuzzy Q-learning algorithmperforms the best among these algorithms due to its fastand stable convergence In practical applications the railmanagers could use the simulation platform to optimize thevalue function with the aid of our developed algorithms andthen use the well-trained Q functions to generate the optimaltrain parking actions in real time to reduce the parking errorsof trains

In the future we will explore new algorithms and othermethods based on soft computing and machine learning tofurther increase the accuracy and reliability for the train park-ing In particular we will combine reinforcement learning

Journal of Advanced Transportation 9

with deep learning algorithms to further enhance the flexi-bility and accuracy for TSP Meanwhile there still exist manypractical issues which need to be taken into consideration inthis paper For example we only consider the TSP problemin urban rail transit while the train braking system is muchmore complex in high-speed railways Compared with urbanrail transit networks inwhich each line is a relatively enclosedsystem that is independent of each other and each trainmoveslike a ldquoshuttlerdquo on the fixed tracks the high-speed railwaynetworks are relatively open systems with heterogeneoustrains The trains usually have different circulation plansand each train may travel among different railway linesaccording to its circulation planTherefore the design of TSPalgorithms for these open systems would be definitely muchmore complex than that of urban transit systems in order tobe flexible with various external circumstances We will fulfillthese issues in our future research

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

Theauthors claim that there are no conflicts of interest in thispaper

Acknowledgments

This research was supported by the National Natural ScienceFoundation of China through Grants 61790570 61790573and U1734210 by the Key Laboratory of Intelligent Metro ofUniversities in Fujian through Grant 2018IMUF001 by theBeijing Jiaotong University Education Foundation throughGrant 9907006519 and by the China Postdoctoral ScienceFoundation Funded Project through Grant 2019M650474

References

[1] J Yin T Tang L Yang J Xun Y Huang and Z Gao ldquoResearchand development of automatic train operation for railwaytransportation systems a surveyrdquo Transportation Research PartC Emerging Technologies vol 85 pp 548ndash572 2017

[2] E Coscetta and A Carteni ldquoThe hedonic value of railwaysterminals a quantitative analysis of the impact of stationsquality on travelers behaviorrdquo Trasportation Research Part Avol 61 pp 41ndash52 2014

[3] E Ben-Elia and D Ettema ldquoRewarding rush-hour avoidance astudy of commutersrsquo travel behaviorrdquo Transportation ResearchPart A Policy and Practice vol 45 no 7 pp 567ndash582 2011

[4] J Yin L Yang X Zhou T Tang and Z Gao ldquoBalancing aone-way corridor capacity and safety-oriented reliability astochastic optimization approach for metro train timetablingrdquoNaval Research Logistics (NRL) vol 66 no 4 pp 297ndash320 2019

[5] K Yoshimoto K Kataoka andKKomaya ldquoA feasibility study oftrain automatic stop control using range sensorsrdquo in Proceedingsof the 2001 IEEE Intelligent Transportation Systems pp 802ndash807USA August 2001

[6] S Yasunobu S Miyamoto and H Ihara ldquoA fuzzy control fortrain automatic stop controlrdquo Transactions of the Society ofInstrument and Control Engineers vol 2 no 1 pp 1ndash8 2002

[7] S Yasunobu and S Miyamoto ldquoAutomatic train operationsystem by predictive fuzzy controlrdquo Industrial Applications ofFuzzy Control vol 1 no 18 pp 1ndash18 1985

[8] Z Hou Y Wang C Yin and T Tang ldquoTerminal iterative learn-ing control based station stop control of a trainrdquo InternationalJournal of Control vol 84 no 7 pp 1263ndash1274 2011

[9] J Zhou and D Chen ldquoApplications of machine learning meth-ods in problem of precise train stoppingrdquoComputer Engineeringand applications vol 46 no 25 pp 226ndash230 2010

[10] D Chen and C Gao ldquoSoft computing methods applied to trainstation parking in urban rail transitrdquo Applied Soft Computingvol 12 no 2 pp 759ndash767 2012

[11] GHeResearch on Train Precision StopControl AlgorithmBasedon LQR Beijing Jiaotong University Beijing China 2009

[12] R Chen and D Chen ldquoHeuristic self-learning algorithms andsimulation of train station stoprdquo Railway Computer Applicationvol 22 no 6 pp 9ndash13 2013

[13] D Chen R Chen Y Li and T Tang ldquoOnline learning algo-rithms for train automatic stop control using precise locationdata of balisesrdquo IEEE Transactions on Intelligent TransportationSystems vol 14 no 3 pp 1526ndash1535 2013

[14] J YinD Chen andL Li ldquoIntelligent train operation algorithmsfor subway by expert system and reinforcement learningrdquo IEEETransactions on Intelligent Transportation Systems vol 15 no 6pp 2561ndash2571 2014

[15] J Liu ldquoAnalysis of station dwelling error and adjustmentsolution for operation after screen doorrsquos installationrdquo UrbanMass Transit vol 1 pp 44ndash47 2009

[16] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[17] J Yin D Chen T Tang L Zhu and W Zhu ldquoBalise arrange-ment optimization for train station parking via expert knowl-edge and genetic algorithmrdquo Applied Mathematical Modellingvol 40 no 19-20 pp 8513ndash8529 2016

[18] J Yin T Tang L Yang Z Gao and B Ran ldquoEnergy-efficientmetro train reschedulingwith uncertain time-variant passengerdemands an approximate dynamic programming approachrdquoTransportation Research Part B Methodological vol 91 pp 178ndash210 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: Developing Train Station Parking Algorithms: New Frameworks …downloads.hindawi.com/journals/jat/2019/3072495.pdf · 2019. 8. 4. · New Frameworks Based on Fuzzy Reinforcement Learning

6 Journal of Advanced Transportation

6420 8 10minus4minus6minus8 minus2minus10

x

0

02

04

06

08

1

y

Figure 5 Reward as Gauss type membership function

and immediate reward value using (17) After each episodecalculate the 119876(119904 119886) according to (14)Step 3 If the current strategy and the previous selectedstrategy are different then update the strategy using theaction selection mechanism Select an action with largerprobability that can make 119876(119904 119886) become the largest onelike (15) On the contrary QL will select other actions witha smaller probability

Step 4 Repeat the steps mentioned above when 119876(119904 119886) 997888rarr119876lowast(119904 119886) we can achieve best strategy 120587 997888rarr 120587lowast34The Performance Indices To compare the performance ofthree algorithms we need to define the parking performanceindices We have the following evaluation performanceindices(1) The convergence number (119873119888) is t he number oflearning trails when the parking errors tend to be stable(2) Average errors under different speed (119864119889119904) in orderto test the merits of the parking algorithm we need to havemany simulation trials Our simulations have produced agreat deal of parking information Among them the parkingerror is one of the most important indices in the parkinginformation because it is the most direct indicator of theparking quality With n different initial speeds algorithm canget n parking error after each episode The 119864119889119904 is the absoluteaverage of n parking error as shown in the following formula

119864119889119904 = sum119899119894=1 10038161003816100381610038161205831198941003816100381610038161003816119899 (18)

where 120583119894 is parking error at different initial speeds(3) Average error (119864) 119864 is the average of 119864119889119904 as shown in

119864 = sum119898119894=1 119864119889119904119894119898 (19)

(4)The maximum error of the parking error (120583119898119886119909) themaximum error of the parking error here is an importantindicator of the performance indices 120583119898119886119909 is the maximumparking error among all the parking errors If the 120583119898119886119909 is toobig it will seriously affect the parking safety of the train wehave

120583119898119886119909 = max (1205831 1205832 120583119899) (20)

(5)The minimum of the average error (119864119898119894119899) the mini-mum error of the parking error is also an important indicator119864119898119894119899 is the minimum among all the 119864119889119904 We have

119864119898119894119899 = min (1198641198891199041 1198641198891199042 119864119889119904119898) (21)

(6) Root of mean square (120590) the 120590 reflects the parkingerror fluctuations (discrete degree) In the process of parkingalgorithm we need to assess the fluctuation which indicatesthe stability of the algorithm we have

120590 = radicsum119898119894=1 (119864 minus 119864119889119904119894)2119898 minus 1 (22)

(7) Parking probability in the required range (119875119901) 119875119901 is ameasurement of the estimated reliability This is an importantindex to control the automatic parking ability The parkingprobability will be unavailable if the 119875119901 is too small to meetthe requirement of practical use

119875119894 = (119909 minus 119864)120590 (23)

119875119901 = (1 minus 2(1 minus int119875119894minusinfin

1120590radic2120587119890minus1198752119894 2119889119875119894)) times 100 (24)

A large number of studies have given the parking errorof the train within the range [-30cm30cm] and made theparking probability requirement larger than 995 [8 10 13]For example the third line of Dalian rail transit in Chinarequires train parking accuracy to be within [-30cm30cm]and the required reliability is 995 [10] In Beijing Subwaythe allowable parking error is also 30cm with over 998reliability

4 The Experimental Results

To approach the real-world running scenaric according tothe above mathematical model we choose field data betweenthe Jiugong station and theXiaohongmen station of YizhuangLine of Beijing Subway in the train station parking simulationplatform We set that the length of the parking area is 100metersThere are six balises in this parking areaThe distancebetween the balises and the parking spot is 100m 64m 36m16m 4m and 0m respectivelyThe arrangement of the balisesmeets the actual balises

The maximum velocity (ie speed limit) for the train toenter into the area is 1428ms While there exists time delayin the train braking system we select 115ms as themaximuminitial speed of a train based on actual operating data Inthe algorithms we did not consider the parking error underone single speed By using single speed we can not fullytest the reliability and adaptability of the algorithm So weset that the initial speed varies from 9ms to 115ms andwe pick 10 different initial speeds in every 025ms speedinterval The purpose is to consider the influence under thevariable initial speed The threshold of the SOSA is [-5cm5cm] We use Gauss membership function in FQLA to

Journal of Advanced Transportation 7

5 10 15 20 25 30 35 40 45 500Trail times

0

002

004

006

008

01

012

014

016

018

02

Park

ing

erro

rs (m

)

SOSQLFQL

Figure 6 50 trials of three algorithms about 10 different initial speeds

5 10 15 20 25 30 35 40 45 500Trail times

0

002

004

006

008

01

012

014

016

018

02

Park

ing

erro

rs (m

)

SOSQLFQL

Figure 7 50 trials of three algorithms about 100 different initial speeds

substitute the current reward 119877(119904119894) in QLA After debuggingand experimenting we pick parameters 120590=01 119888=0 Figure 6presents the errors 119864119889119904 of 10 different initial speeds during 50trials As can be seen in Figure 6 the convergence iterations119873119888 are different by the three algorithms The parking error119864119889119904 is becoming smaller and smaller after times of trail andthe results also demonstrate that most of the parking errorscan be smaller than plusmn30cm in the end With respect to thecomputational efforts we note that these three algorithmsare very close because they are all based on the theory ofreinforcement learning Since each trial includes 10 times oftrain station parking process with different initial speed ittakes about 15 seconds to complete each trial and the totaltraining time is about 750 seconds

We also conduct more trials to test the variation of 119873119888when there are 100 initial speedsWhen the numbers of initial

speed reach 100 we obtain the variation of train parkingerrors as shown in Figure 7

The 119873119888 we get is larger than the 119873119888 received under 10initial speeds We can conclude from Figures 6 and 7 that themore initial speeds are set the slower convergence and largerconvergence number it will be

The performance comparison of these three algorithmcan be seen in Tables 1 and 2 It is clear that QLA andFQLA have smaller 119873119888 than SOSA This indicates that theconvergence is much faster in QLA and FQLA especiallywhen the number of initial speeds is increased

As can be seen from the figures and tables the developedthree algorithms work well on the field data between twostations in Beijing Subway which can improve most of theperformance indices especially in decreasing 119864 QLA andFQLA have smaller 119864 than SOSA The 119873119888 rate of FQLA is

8 Journal of Advanced Transportation

95 10 105 11 1159Initial speeds V0 (ms)

minus025minus02

minus015minus01

minus0050

00501

01502

025

Park

ing

erro

rs (m

)

FQLSOSQL

Figure 8 Each parking error after 50 trials about different initial speeds

Table 1 Performance indices of three algorithms with 10 initialspeeds

Indicators 119873119862 119864(119888119898) 119864119898119894119899(119888119898) 120590(119888119898) 119875119901SOSA 39 427 199 334 9999QLA 20 424 186 432 9999FQLA 18 399 183 395 9999

Table 2 Performance indices of three algorithms with 100 initialspeeds

Indicators 119873119862 119864(119888119898) 119864119898119894119899(119888119898) 120590(119888119898) 119875119901SOSA 49 418 158 367 9999QLA 24 387 140 414 9999FQLA 21 346 131 365 9999

about 19 times and it has the fastest convergence among thethree algorithms FQLA has the smallest 119864 and 119864119898119894119899 QLAachieves the largest 120590When it comes to119875119901 the119875119901 of the threealgorithms is almost the same and they are all larger than995 We can obtain the following conclusions (1) thesethree algorithms can adapt to the field data which have goodperformance indices and gradually reduce the parking error(2) in general FQLA ranks the first and SOSA ranks the thirdSOSA has the worst119873119888 119864 and 119864119898119894119899 (3) three algorithms canall meet the requirements of accurate parking and are easy tobe implemented However the intelligence exploration andself-learning of SOSA are much worse than those of QLA andFQLA

In addition we also test the performance of algorithmswith different initial speed after 50 times of trails Figure 8presents the parking errors under each different initial speed

In Figure 8 we see that 119864119889119904 is within the range of plusmn30cmand the errors of QLA and FQLA are closer to 0cm than thatof SOSAThe results also show us that the error fluctuation of

FQLA performs better than SOSA and QLA We can almostcome to similar conclusions from Tables 1 and 2 FQLAperforms the best among the three algorithms indicating thatadding fuzzy functions into Q-learning can further improveits performance for TSP In addition it is worth to mentionthat [13] proposed several online learning algorithms for trainstation parking problem The train parking errors by theseonline learning algorithms are about 5cm to 6cm We seefrom the above results that our developed algorithms canfurther reduce the parking errors compared with existingstudies

5 Conclusions

To address the TSP issue in urban rail transit this paperdeveloped one stochastic algorithm and two machine learn-ing algorithms based on theory of reinforcement learningFurthermore we propose five braking rates as the actionvector of the three algorithms within the scope of RL Perfor-mance indices were defined to evaluate the performance andreliability of the issue We evaluated the proposed algorithmsin the simulation platform with field data collected in BeijingSubway Our results show that these three algorithms canachieve good results that guarantee the train station parkingerror within 30 cm In addition fuzzy Q-learning algorithmperforms the best among these algorithms due to its fastand stable convergence In practical applications the railmanagers could use the simulation platform to optimize thevalue function with the aid of our developed algorithms andthen use the well-trained Q functions to generate the optimaltrain parking actions in real time to reduce the parking errorsof trains

In the future we will explore new algorithms and othermethods based on soft computing and machine learning tofurther increase the accuracy and reliability for the train park-ing In particular we will combine reinforcement learning

Journal of Advanced Transportation 9

with deep learning algorithms to further enhance the flexi-bility and accuracy for TSP Meanwhile there still exist manypractical issues which need to be taken into consideration inthis paper For example we only consider the TSP problemin urban rail transit while the train braking system is muchmore complex in high-speed railways Compared with urbanrail transit networks inwhich each line is a relatively enclosedsystem that is independent of each other and each trainmoveslike a ldquoshuttlerdquo on the fixed tracks the high-speed railwaynetworks are relatively open systems with heterogeneoustrains The trains usually have different circulation plansand each train may travel among different railway linesaccording to its circulation planTherefore the design of TSPalgorithms for these open systems would be definitely muchmore complex than that of urban transit systems in order tobe flexible with various external circumstances We will fulfillthese issues in our future research

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

Theauthors claim that there are no conflicts of interest in thispaper

Acknowledgments

This research was supported by the National Natural ScienceFoundation of China through Grants 61790570 61790573and U1734210 by the Key Laboratory of Intelligent Metro ofUniversities in Fujian through Grant 2018IMUF001 by theBeijing Jiaotong University Education Foundation throughGrant 9907006519 and by the China Postdoctoral ScienceFoundation Funded Project through Grant 2019M650474

References

[1] J Yin T Tang L Yang J Xun Y Huang and Z Gao ldquoResearchand development of automatic train operation for railwaytransportation systems a surveyrdquo Transportation Research PartC Emerging Technologies vol 85 pp 548ndash572 2017

[2] E Coscetta and A Carteni ldquoThe hedonic value of railwaysterminals a quantitative analysis of the impact of stationsquality on travelers behaviorrdquo Trasportation Research Part Avol 61 pp 41ndash52 2014

[3] E Ben-Elia and D Ettema ldquoRewarding rush-hour avoidance astudy of commutersrsquo travel behaviorrdquo Transportation ResearchPart A Policy and Practice vol 45 no 7 pp 567ndash582 2011

[4] J Yin L Yang X Zhou T Tang and Z Gao ldquoBalancing aone-way corridor capacity and safety-oriented reliability astochastic optimization approach for metro train timetablingrdquoNaval Research Logistics (NRL) vol 66 no 4 pp 297ndash320 2019

[5] K Yoshimoto K Kataoka andKKomaya ldquoA feasibility study oftrain automatic stop control using range sensorsrdquo in Proceedingsof the 2001 IEEE Intelligent Transportation Systems pp 802ndash807USA August 2001

[6] S Yasunobu S Miyamoto and H Ihara ldquoA fuzzy control fortrain automatic stop controlrdquo Transactions of the Society ofInstrument and Control Engineers vol 2 no 1 pp 1ndash8 2002

[7] S Yasunobu and S Miyamoto ldquoAutomatic train operationsystem by predictive fuzzy controlrdquo Industrial Applications ofFuzzy Control vol 1 no 18 pp 1ndash18 1985

[8] Z Hou Y Wang C Yin and T Tang ldquoTerminal iterative learn-ing control based station stop control of a trainrdquo InternationalJournal of Control vol 84 no 7 pp 1263ndash1274 2011

[9] J Zhou and D Chen ldquoApplications of machine learning meth-ods in problem of precise train stoppingrdquoComputer Engineeringand applications vol 46 no 25 pp 226ndash230 2010

[10] D Chen and C Gao ldquoSoft computing methods applied to trainstation parking in urban rail transitrdquo Applied Soft Computingvol 12 no 2 pp 759ndash767 2012

[11] GHeResearch on Train Precision StopControl AlgorithmBasedon LQR Beijing Jiaotong University Beijing China 2009

[12] R Chen and D Chen ldquoHeuristic self-learning algorithms andsimulation of train station stoprdquo Railway Computer Applicationvol 22 no 6 pp 9ndash13 2013

[13] D Chen R Chen Y Li and T Tang ldquoOnline learning algo-rithms for train automatic stop control using precise locationdata of balisesrdquo IEEE Transactions on Intelligent TransportationSystems vol 14 no 3 pp 1526ndash1535 2013

[14] J YinD Chen andL Li ldquoIntelligent train operation algorithmsfor subway by expert system and reinforcement learningrdquo IEEETransactions on Intelligent Transportation Systems vol 15 no 6pp 2561ndash2571 2014

[15] J Liu ldquoAnalysis of station dwelling error and adjustmentsolution for operation after screen doorrsquos installationrdquo UrbanMass Transit vol 1 pp 44ndash47 2009

[16] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[17] J Yin D Chen T Tang L Zhu and W Zhu ldquoBalise arrange-ment optimization for train station parking via expert knowl-edge and genetic algorithmrdquo Applied Mathematical Modellingvol 40 no 19-20 pp 8513ndash8529 2016

[18] J Yin T Tang L Yang Z Gao and B Ran ldquoEnergy-efficientmetro train reschedulingwith uncertain time-variant passengerdemands an approximate dynamic programming approachrdquoTransportation Research Part B Methodological vol 91 pp 178ndash210 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: Developing Train Station Parking Algorithms: New Frameworks …downloads.hindawi.com/journals/jat/2019/3072495.pdf · 2019. 8. 4. · New Frameworks Based on Fuzzy Reinforcement Learning

Journal of Advanced Transportation 7

5 10 15 20 25 30 35 40 45 500Trail times

0

002

004

006

008

01

012

014

016

018

02

Park

ing

erro

rs (m

)

SOSQLFQL

Figure 6 50 trials of three algorithms about 10 different initial speeds

5 10 15 20 25 30 35 40 45 500Trail times

0

002

004

006

008

01

012

014

016

018

02

Park

ing

erro

rs (m

)

SOSQLFQL

Figure 7 50 trials of three algorithms about 100 different initial speeds

substitute the current reward 119877(119904119894) in QLA After debuggingand experimenting we pick parameters 120590=01 119888=0 Figure 6presents the errors 119864119889119904 of 10 different initial speeds during 50trials As can be seen in Figure 6 the convergence iterations119873119888 are different by the three algorithms The parking error119864119889119904 is becoming smaller and smaller after times of trail andthe results also demonstrate that most of the parking errorscan be smaller than plusmn30cm in the end With respect to thecomputational efforts we note that these three algorithmsare very close because they are all based on the theory ofreinforcement learning Since each trial includes 10 times oftrain station parking process with different initial speed ittakes about 15 seconds to complete each trial and the totaltraining time is about 750 seconds

We also conduct more trials to test the variation of 119873119888when there are 100 initial speedsWhen the numbers of initial

speed reach 100 we obtain the variation of train parkingerrors as shown in Figure 7

The 119873119888 we get is larger than the 119873119888 received under 10initial speeds We can conclude from Figures 6 and 7 that themore initial speeds are set the slower convergence and largerconvergence number it will be

The performance comparison of these three algorithmcan be seen in Tables 1 and 2 It is clear that QLA andFQLA have smaller 119873119888 than SOSA This indicates that theconvergence is much faster in QLA and FQLA especiallywhen the number of initial speeds is increased

As can be seen from the figures and tables the developedthree algorithms work well on the field data between twostations in Beijing Subway which can improve most of theperformance indices especially in decreasing 119864 QLA andFQLA have smaller 119864 than SOSA The 119873119888 rate of FQLA is

8 Journal of Advanced Transportation

95 10 105 11 1159Initial speeds V0 (ms)

minus025minus02

minus015minus01

minus0050

00501

01502

025

Park

ing

erro

rs (m

)

FQLSOSQL

Figure 8 Each parking error after 50 trials about different initial speeds

Table 1 Performance indices of three algorithms with 10 initialspeeds

Indicators 119873119862 119864(119888119898) 119864119898119894119899(119888119898) 120590(119888119898) 119875119901SOSA 39 427 199 334 9999QLA 20 424 186 432 9999FQLA 18 399 183 395 9999

Table 2 Performance indices of three algorithms with 100 initialspeeds

Indicators 119873119862 119864(119888119898) 119864119898119894119899(119888119898) 120590(119888119898) 119875119901SOSA 49 418 158 367 9999QLA 24 387 140 414 9999FQLA 21 346 131 365 9999

about 19 times and it has the fastest convergence among thethree algorithms FQLA has the smallest 119864 and 119864119898119894119899 QLAachieves the largest 120590When it comes to119875119901 the119875119901 of the threealgorithms is almost the same and they are all larger than995 We can obtain the following conclusions (1) thesethree algorithms can adapt to the field data which have goodperformance indices and gradually reduce the parking error(2) in general FQLA ranks the first and SOSA ranks the thirdSOSA has the worst119873119888 119864 and 119864119898119894119899 (3) three algorithms canall meet the requirements of accurate parking and are easy tobe implemented However the intelligence exploration andself-learning of SOSA are much worse than those of QLA andFQLA

In addition we also test the performance of algorithmswith different initial speed after 50 times of trails Figure 8presents the parking errors under each different initial speed

In Figure 8 we see that 119864119889119904 is within the range of plusmn30cmand the errors of QLA and FQLA are closer to 0cm than thatof SOSAThe results also show us that the error fluctuation of

FQLA performs better than SOSA and QLA We can almostcome to similar conclusions from Tables 1 and 2 FQLAperforms the best among the three algorithms indicating thatadding fuzzy functions into Q-learning can further improveits performance for TSP In addition it is worth to mentionthat [13] proposed several online learning algorithms for trainstation parking problem The train parking errors by theseonline learning algorithms are about 5cm to 6cm We seefrom the above results that our developed algorithms canfurther reduce the parking errors compared with existingstudies

5 Conclusions

To address the TSP issue in urban rail transit this paperdeveloped one stochastic algorithm and two machine learn-ing algorithms based on theory of reinforcement learningFurthermore we propose five braking rates as the actionvector of the three algorithms within the scope of RL Perfor-mance indices were defined to evaluate the performance andreliability of the issue We evaluated the proposed algorithmsin the simulation platform with field data collected in BeijingSubway Our results show that these three algorithms canachieve good results that guarantee the train station parkingerror within 30 cm In addition fuzzy Q-learning algorithmperforms the best among these algorithms due to its fastand stable convergence In practical applications the railmanagers could use the simulation platform to optimize thevalue function with the aid of our developed algorithms andthen use the well-trained Q functions to generate the optimaltrain parking actions in real time to reduce the parking errorsof trains

In the future we will explore new algorithms and othermethods based on soft computing and machine learning tofurther increase the accuracy and reliability for the train park-ing In particular we will combine reinforcement learning

Journal of Advanced Transportation 9

with deep learning algorithms to further enhance the flexi-bility and accuracy for TSP Meanwhile there still exist manypractical issues which need to be taken into consideration inthis paper For example we only consider the TSP problemin urban rail transit while the train braking system is muchmore complex in high-speed railways Compared with urbanrail transit networks inwhich each line is a relatively enclosedsystem that is independent of each other and each trainmoveslike a ldquoshuttlerdquo on the fixed tracks the high-speed railwaynetworks are relatively open systems with heterogeneoustrains The trains usually have different circulation plansand each train may travel among different railway linesaccording to its circulation planTherefore the design of TSPalgorithms for these open systems would be definitely muchmore complex than that of urban transit systems in order tobe flexible with various external circumstances We will fulfillthese issues in our future research

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

Theauthors claim that there are no conflicts of interest in thispaper

Acknowledgments

This research was supported by the National Natural ScienceFoundation of China through Grants 61790570 61790573and U1734210 by the Key Laboratory of Intelligent Metro ofUniversities in Fujian through Grant 2018IMUF001 by theBeijing Jiaotong University Education Foundation throughGrant 9907006519 and by the China Postdoctoral ScienceFoundation Funded Project through Grant 2019M650474

References

[1] J Yin T Tang L Yang J Xun Y Huang and Z Gao ldquoResearchand development of automatic train operation for railwaytransportation systems a surveyrdquo Transportation Research PartC Emerging Technologies vol 85 pp 548ndash572 2017

[2] E Coscetta and A Carteni ldquoThe hedonic value of railwaysterminals a quantitative analysis of the impact of stationsquality on travelers behaviorrdquo Trasportation Research Part Avol 61 pp 41ndash52 2014

[3] E Ben-Elia and D Ettema ldquoRewarding rush-hour avoidance astudy of commutersrsquo travel behaviorrdquo Transportation ResearchPart A Policy and Practice vol 45 no 7 pp 567ndash582 2011

[4] J Yin L Yang X Zhou T Tang and Z Gao ldquoBalancing aone-way corridor capacity and safety-oriented reliability astochastic optimization approach for metro train timetablingrdquoNaval Research Logistics (NRL) vol 66 no 4 pp 297ndash320 2019

[5] K Yoshimoto K Kataoka andKKomaya ldquoA feasibility study oftrain automatic stop control using range sensorsrdquo in Proceedingsof the 2001 IEEE Intelligent Transportation Systems pp 802ndash807USA August 2001

[6] S Yasunobu S Miyamoto and H Ihara ldquoA fuzzy control fortrain automatic stop controlrdquo Transactions of the Society ofInstrument and Control Engineers vol 2 no 1 pp 1ndash8 2002

[7] S Yasunobu and S Miyamoto ldquoAutomatic train operationsystem by predictive fuzzy controlrdquo Industrial Applications ofFuzzy Control vol 1 no 18 pp 1ndash18 1985

[8] Z Hou Y Wang C Yin and T Tang ldquoTerminal iterative learn-ing control based station stop control of a trainrdquo InternationalJournal of Control vol 84 no 7 pp 1263ndash1274 2011

[9] J Zhou and D Chen ldquoApplications of machine learning meth-ods in problem of precise train stoppingrdquoComputer Engineeringand applications vol 46 no 25 pp 226ndash230 2010

[10] D Chen and C Gao ldquoSoft computing methods applied to trainstation parking in urban rail transitrdquo Applied Soft Computingvol 12 no 2 pp 759ndash767 2012

[11] GHeResearch on Train Precision StopControl AlgorithmBasedon LQR Beijing Jiaotong University Beijing China 2009

[12] R Chen and D Chen ldquoHeuristic self-learning algorithms andsimulation of train station stoprdquo Railway Computer Applicationvol 22 no 6 pp 9ndash13 2013

[13] D Chen R Chen Y Li and T Tang ldquoOnline learning algo-rithms for train automatic stop control using precise locationdata of balisesrdquo IEEE Transactions on Intelligent TransportationSystems vol 14 no 3 pp 1526ndash1535 2013

[14] J YinD Chen andL Li ldquoIntelligent train operation algorithmsfor subway by expert system and reinforcement learningrdquo IEEETransactions on Intelligent Transportation Systems vol 15 no 6pp 2561ndash2571 2014

[15] J Liu ldquoAnalysis of station dwelling error and adjustmentsolution for operation after screen doorrsquos installationrdquo UrbanMass Transit vol 1 pp 44ndash47 2009

[16] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[17] J Yin D Chen T Tang L Zhu and W Zhu ldquoBalise arrange-ment optimization for train station parking via expert knowl-edge and genetic algorithmrdquo Applied Mathematical Modellingvol 40 no 19-20 pp 8513ndash8529 2016

[18] J Yin T Tang L Yang Z Gao and B Ran ldquoEnergy-efficientmetro train reschedulingwith uncertain time-variant passengerdemands an approximate dynamic programming approachrdquoTransportation Research Part B Methodological vol 91 pp 178ndash210 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: Developing Train Station Parking Algorithms: New Frameworks …downloads.hindawi.com/journals/jat/2019/3072495.pdf · 2019. 8. 4. · New Frameworks Based on Fuzzy Reinforcement Learning

8 Journal of Advanced Transportation

95 10 105 11 1159Initial speeds V0 (ms)

minus025minus02

minus015minus01

minus0050

00501

01502

025

Park

ing

erro

rs (m

)

FQLSOSQL

Figure 8 Each parking error after 50 trials about different initial speeds

Table 1 Performance indices of three algorithms with 10 initialspeeds

Indicators 119873119862 119864(119888119898) 119864119898119894119899(119888119898) 120590(119888119898) 119875119901SOSA 39 427 199 334 9999QLA 20 424 186 432 9999FQLA 18 399 183 395 9999

Table 2 Performance indices of three algorithms with 100 initialspeeds

Indicators 119873119862 119864(119888119898) 119864119898119894119899(119888119898) 120590(119888119898) 119875119901SOSA 49 418 158 367 9999QLA 24 387 140 414 9999FQLA 21 346 131 365 9999

about 19 times and it has the fastest convergence among thethree algorithms FQLA has the smallest 119864 and 119864119898119894119899 QLAachieves the largest 120590When it comes to119875119901 the119875119901 of the threealgorithms is almost the same and they are all larger than995 We can obtain the following conclusions (1) thesethree algorithms can adapt to the field data which have goodperformance indices and gradually reduce the parking error(2) in general FQLA ranks the first and SOSA ranks the thirdSOSA has the worst119873119888 119864 and 119864119898119894119899 (3) three algorithms canall meet the requirements of accurate parking and are easy tobe implemented However the intelligence exploration andself-learning of SOSA are much worse than those of QLA andFQLA

In addition we also test the performance of algorithmswith different initial speed after 50 times of trails Figure 8presents the parking errors under each different initial speed

In Figure 8 we see that 119864119889119904 is within the range of plusmn30cmand the errors of QLA and FQLA are closer to 0cm than thatof SOSAThe results also show us that the error fluctuation of

FQLA performs better than SOSA and QLA We can almostcome to similar conclusions from Tables 1 and 2 FQLAperforms the best among the three algorithms indicating thatadding fuzzy functions into Q-learning can further improveits performance for TSP In addition it is worth to mentionthat [13] proposed several online learning algorithms for trainstation parking problem The train parking errors by theseonline learning algorithms are about 5cm to 6cm We seefrom the above results that our developed algorithms canfurther reduce the parking errors compared with existingstudies

5 Conclusions

To address the TSP issue in urban rail transit this paperdeveloped one stochastic algorithm and two machine learn-ing algorithms based on theory of reinforcement learningFurthermore we propose five braking rates as the actionvector of the three algorithms within the scope of RL Perfor-mance indices were defined to evaluate the performance andreliability of the issue We evaluated the proposed algorithmsin the simulation platform with field data collected in BeijingSubway Our results show that these three algorithms canachieve good results that guarantee the train station parkingerror within 30 cm In addition fuzzy Q-learning algorithmperforms the best among these algorithms due to its fastand stable convergence In practical applications the railmanagers could use the simulation platform to optimize thevalue function with the aid of our developed algorithms andthen use the well-trained Q functions to generate the optimaltrain parking actions in real time to reduce the parking errorsof trains

In the future we will explore new algorithms and othermethods based on soft computing and machine learning tofurther increase the accuracy and reliability for the train park-ing In particular we will combine reinforcement learning

Journal of Advanced Transportation 9

with deep learning algorithms to further enhance the flexi-bility and accuracy for TSP Meanwhile there still exist manypractical issues which need to be taken into consideration inthis paper For example we only consider the TSP problemin urban rail transit while the train braking system is muchmore complex in high-speed railways Compared with urbanrail transit networks inwhich each line is a relatively enclosedsystem that is independent of each other and each trainmoveslike a ldquoshuttlerdquo on the fixed tracks the high-speed railwaynetworks are relatively open systems with heterogeneoustrains The trains usually have different circulation plansand each train may travel among different railway linesaccording to its circulation planTherefore the design of TSPalgorithms for these open systems would be definitely muchmore complex than that of urban transit systems in order tobe flexible with various external circumstances We will fulfillthese issues in our future research

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

Theauthors claim that there are no conflicts of interest in thispaper

Acknowledgments

This research was supported by the National Natural ScienceFoundation of China through Grants 61790570 61790573and U1734210 by the Key Laboratory of Intelligent Metro ofUniversities in Fujian through Grant 2018IMUF001 by theBeijing Jiaotong University Education Foundation throughGrant 9907006519 and by the China Postdoctoral ScienceFoundation Funded Project through Grant 2019M650474

References

[1] J Yin T Tang L Yang J Xun Y Huang and Z Gao ldquoResearchand development of automatic train operation for railwaytransportation systems a surveyrdquo Transportation Research PartC Emerging Technologies vol 85 pp 548ndash572 2017

[2] E Coscetta and A Carteni ldquoThe hedonic value of railwaysterminals a quantitative analysis of the impact of stationsquality on travelers behaviorrdquo Trasportation Research Part Avol 61 pp 41ndash52 2014

[3] E Ben-Elia and D Ettema ldquoRewarding rush-hour avoidance astudy of commutersrsquo travel behaviorrdquo Transportation ResearchPart A Policy and Practice vol 45 no 7 pp 567ndash582 2011

[4] J Yin L Yang X Zhou T Tang and Z Gao ldquoBalancing aone-way corridor capacity and safety-oriented reliability astochastic optimization approach for metro train timetablingrdquoNaval Research Logistics (NRL) vol 66 no 4 pp 297ndash320 2019

[5] K Yoshimoto K Kataoka andKKomaya ldquoA feasibility study oftrain automatic stop control using range sensorsrdquo in Proceedingsof the 2001 IEEE Intelligent Transportation Systems pp 802ndash807USA August 2001

[6] S Yasunobu S Miyamoto and H Ihara ldquoA fuzzy control fortrain automatic stop controlrdquo Transactions of the Society ofInstrument and Control Engineers vol 2 no 1 pp 1ndash8 2002

[7] S Yasunobu and S Miyamoto ldquoAutomatic train operationsystem by predictive fuzzy controlrdquo Industrial Applications ofFuzzy Control vol 1 no 18 pp 1ndash18 1985

[8] Z Hou Y Wang C Yin and T Tang ldquoTerminal iterative learn-ing control based station stop control of a trainrdquo InternationalJournal of Control vol 84 no 7 pp 1263ndash1274 2011

[9] J Zhou and D Chen ldquoApplications of machine learning meth-ods in problem of precise train stoppingrdquoComputer Engineeringand applications vol 46 no 25 pp 226ndash230 2010

[10] D Chen and C Gao ldquoSoft computing methods applied to trainstation parking in urban rail transitrdquo Applied Soft Computingvol 12 no 2 pp 759ndash767 2012

[11] GHeResearch on Train Precision StopControl AlgorithmBasedon LQR Beijing Jiaotong University Beijing China 2009

[12] R Chen and D Chen ldquoHeuristic self-learning algorithms andsimulation of train station stoprdquo Railway Computer Applicationvol 22 no 6 pp 9ndash13 2013

[13] D Chen R Chen Y Li and T Tang ldquoOnline learning algo-rithms for train automatic stop control using precise locationdata of balisesrdquo IEEE Transactions on Intelligent TransportationSystems vol 14 no 3 pp 1526ndash1535 2013

[14] J YinD Chen andL Li ldquoIntelligent train operation algorithmsfor subway by expert system and reinforcement learningrdquo IEEETransactions on Intelligent Transportation Systems vol 15 no 6pp 2561ndash2571 2014

[15] J Liu ldquoAnalysis of station dwelling error and adjustmentsolution for operation after screen doorrsquos installationrdquo UrbanMass Transit vol 1 pp 44ndash47 2009

[16] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[17] J Yin D Chen T Tang L Zhu and W Zhu ldquoBalise arrange-ment optimization for train station parking via expert knowl-edge and genetic algorithmrdquo Applied Mathematical Modellingvol 40 no 19-20 pp 8513ndash8529 2016

[18] J Yin T Tang L Yang Z Gao and B Ran ldquoEnergy-efficientmetro train reschedulingwith uncertain time-variant passengerdemands an approximate dynamic programming approachrdquoTransportation Research Part B Methodological vol 91 pp 178ndash210 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: Developing Train Station Parking Algorithms: New Frameworks …downloads.hindawi.com/journals/jat/2019/3072495.pdf · 2019. 8. 4. · New Frameworks Based on Fuzzy Reinforcement Learning

Journal of Advanced Transportation 9

with deep learning algorithms to further enhance the flexi-bility and accuracy for TSP Meanwhile there still exist manypractical issues which need to be taken into consideration inthis paper For example we only consider the TSP problemin urban rail transit while the train braking system is muchmore complex in high-speed railways Compared with urbanrail transit networks inwhich each line is a relatively enclosedsystem that is independent of each other and each trainmoveslike a ldquoshuttlerdquo on the fixed tracks the high-speed railwaynetworks are relatively open systems with heterogeneoustrains The trains usually have different circulation plansand each train may travel among different railway linesaccording to its circulation planTherefore the design of TSPalgorithms for these open systems would be definitely muchmore complex than that of urban transit systems in order tobe flexible with various external circumstances We will fulfillthese issues in our future research

Data Availability

The data used to support the findings of this study areavailable from the corresponding author upon request

Conflicts of Interest

Theauthors claim that there are no conflicts of interest in thispaper

Acknowledgments

This research was supported by the National Natural ScienceFoundation of China through Grants 61790570 61790573and U1734210 by the Key Laboratory of Intelligent Metro ofUniversities in Fujian through Grant 2018IMUF001 by theBeijing Jiaotong University Education Foundation throughGrant 9907006519 and by the China Postdoctoral ScienceFoundation Funded Project through Grant 2019M650474

References

[1] J Yin T Tang L Yang J Xun Y Huang and Z Gao ldquoResearchand development of automatic train operation for railwaytransportation systems a surveyrdquo Transportation Research PartC Emerging Technologies vol 85 pp 548ndash572 2017

[2] E Coscetta and A Carteni ldquoThe hedonic value of railwaysterminals a quantitative analysis of the impact of stationsquality on travelers behaviorrdquo Trasportation Research Part Avol 61 pp 41ndash52 2014

[3] E Ben-Elia and D Ettema ldquoRewarding rush-hour avoidance astudy of commutersrsquo travel behaviorrdquo Transportation ResearchPart A Policy and Practice vol 45 no 7 pp 567ndash582 2011

[4] J Yin L Yang X Zhou T Tang and Z Gao ldquoBalancing aone-way corridor capacity and safety-oriented reliability astochastic optimization approach for metro train timetablingrdquoNaval Research Logistics (NRL) vol 66 no 4 pp 297ndash320 2019

[5] K Yoshimoto K Kataoka andKKomaya ldquoA feasibility study oftrain automatic stop control using range sensorsrdquo in Proceedingsof the 2001 IEEE Intelligent Transportation Systems pp 802ndash807USA August 2001

[6] S Yasunobu S Miyamoto and H Ihara ldquoA fuzzy control fortrain automatic stop controlrdquo Transactions of the Society ofInstrument and Control Engineers vol 2 no 1 pp 1ndash8 2002

[7] S Yasunobu and S Miyamoto ldquoAutomatic train operationsystem by predictive fuzzy controlrdquo Industrial Applications ofFuzzy Control vol 1 no 18 pp 1ndash18 1985

[8] Z Hou Y Wang C Yin and T Tang ldquoTerminal iterative learn-ing control based station stop control of a trainrdquo InternationalJournal of Control vol 84 no 7 pp 1263ndash1274 2011

[9] J Zhou and D Chen ldquoApplications of machine learning meth-ods in problem of precise train stoppingrdquoComputer Engineeringand applications vol 46 no 25 pp 226ndash230 2010

[10] D Chen and C Gao ldquoSoft computing methods applied to trainstation parking in urban rail transitrdquo Applied Soft Computingvol 12 no 2 pp 759ndash767 2012

[11] GHeResearch on Train Precision StopControl AlgorithmBasedon LQR Beijing Jiaotong University Beijing China 2009

[12] R Chen and D Chen ldquoHeuristic self-learning algorithms andsimulation of train station stoprdquo Railway Computer Applicationvol 22 no 6 pp 9ndash13 2013

[13] D Chen R Chen Y Li and T Tang ldquoOnline learning algo-rithms for train automatic stop control using precise locationdata of balisesrdquo IEEE Transactions on Intelligent TransportationSystems vol 14 no 3 pp 1526ndash1535 2013

[14] J YinD Chen andL Li ldquoIntelligent train operation algorithmsfor subway by expert system and reinforcement learningrdquo IEEETransactions on Intelligent Transportation Systems vol 15 no 6pp 2561ndash2571 2014

[15] J Liu ldquoAnalysis of station dwelling error and adjustmentsolution for operation after screen doorrsquos installationrdquo UrbanMass Transit vol 1 pp 44ndash47 2009

[16] V Mnih K Kavukcuoglu D Silver et al ldquoHuman-level controlthrough deep reinforcement learningrdquo Nature vol 518 no7540 pp 529ndash533 2015

[17] J Yin D Chen T Tang L Zhu and W Zhu ldquoBalise arrange-ment optimization for train station parking via expert knowl-edge and genetic algorithmrdquo Applied Mathematical Modellingvol 40 no 19-20 pp 8513ndash8529 2016

[18] J Yin T Tang L Yang Z Gao and B Ran ldquoEnergy-efficientmetro train reschedulingwith uncertain time-variant passengerdemands an approximate dynamic programming approachrdquoTransportation Research Part B Methodological vol 91 pp 178ndash210 2016

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: Developing Train Station Parking Algorithms: New Frameworks …downloads.hindawi.com/journals/jat/2019/3072495.pdf · 2019. 8. 4. · New Frameworks Based on Fuzzy Reinforcement Learning

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom