13
Research Article An Energy Management Strategy for a Super-Mild Hybrid Electric Vehicle Based on a Known Model of Reinforcement Learning Yanli Yin , 1 Yan Ran, 1 Liufeng Zhang, 1 Xiaoliang Pan, 2 and Yong Luo 3 1 School of Mechatronics & Automobile Engineering, Chongqing Jiao Tong University, Chongqing 400054, China 2 Chongqing Changan Automobile Stock Co., Ltd., Chongqing 400054, China 3 Key Laboratory of Advanced Manufacturing Technology for Automobile Parts, Ministry of Education, Chongqing University of Technology, Chongqing 400054, China Correspondence should be addressed to Yanli Yin; cqu [email protected] Received 9 December 2018; Revised 20 February 2019; Accepted 21 February 2019; Published 1 April 2019 Academic Editor: Yongji Wang Copyright © 2019 Yanli Yin et al. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. For global optimal control strategy, it is not only necessary to know the driving cycle in advance but also difficult to implement online because of its large calculation volume. As an artificial intelligent-based control strategy, reinforcement learning (RL) is applied to an energy management strategy of a super-mild hybrid electric vehicle. According to time-speed datasets of sample driving cycles, a stochastic model of the driver’s power demand is developed. Based on the Markov decision process theory, a mathematical model of an RL-based energy management strategy is established, which assumes the minimum cumulative return expectation as its optimization objective. A policy iteration algorithm is adopted to obtain the optimum control policy that takes the vehicle speed, driver’s power demand, and state of charge (SOC) as the input and the engine power as the output. Using a MATLAB/Simulink platform, CYC WVUCITY simulation model is established. e results show that, compared with dynamic programming, this method can not only adapt to random driving cycles and reduce fuel consumption of 2.4%, but also be implemented online because of its small calculation volume. 1. Introduction With the increasing problems of global warming, air pollu- tion, and energy shortage, hybrid electric vehicles (HEVs) have been extensively studied because of their potential to significantly improve fuel economy and reduce emissions. Energy management strategies are critical for HEVs to achieve optimal performance and energy efficiency through power split control. Energy management strategies can be divided according to the way they are implemented: rule- based strategies and optimization-based strategies. Optimization-based strategies form an important group of energy management strategies and can be divided into strategies based on instantaneous optimization, global opti- mization, model predictive control, and artificial intelligent. Rule-based strategies are primarily based on human experience. e torque distribution of the engine and motor is based on preset control rules, which are formulated by the steady-state MAP chart of the engine and motor. Reference [1] proposed an energy management strategy based on a logic threshold and a fuzzy algorithm for improving fuel economy. Wang et al. [2] developed algorithms for On/Off control, load tracking control, and bus voltage control and conducted a simulation. Rule-based control strategy is simple and can be easily implemented online; however, a static control strategy is not optimal in theory and it also does not consider the dynamic changes in working conditions. An instantaneous optimal control strategy can ensure an optimum objective at every time step; however, it cannot guarantee an optimum objective design over the whole driving cycle. Compared with the logic threshold strategy, the calculation amount is large, but it can be implemented online. Reference [3] formulated the energy management strategy of an HEV based on driving style recognition. Reference Hindawi Journal of Control Science and Engineering Volume 2019, Article ID 9259712, 12 pages https://doi.org/10.1155/2019/9259712

An Energy Management Strategy for a Super-Mild Hybrid

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Energy Management Strategy for a Super-Mild Hybrid

Research ArticleAn Energy Management Strategy for a Super-MildHybrid Electric Vehicle Based on a Known Model ofReinforcement Learning

Yanli Yin 1 Yan Ran1 Liufeng Zhang1 Xiaoliang Pan2 and Yong Luo 3

1School of Mechatronics amp Automobile Engineering Chongqing Jiao Tong University Chongqing 400054 China2Chongqing Changan Automobile Stock Co Ltd Chongqing 400054 China3Key Laboratory of Advanced Manufacturing Technology for Automobile Parts Ministry of EducationChongqing University of Technology Chongqing 400054 China

Correspondence should be addressed to Yanli Yin cqu ylyin126com

Received 9 December 2018 Revised 20 February 2019 Accepted 21 February 2019 Published 1 April 2019

Academic Editor Yongji Wang

Copyright copy 2019 Yanli Yin et alThis is an open access article distributed under the Creative Commons Attribution License whichpermits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

For global optimal control strategy it is not only necessary to know the driving cycle in advance but also difficult to implement onlinebecause of its large calculation volume As an artificial intelligent-based control strategy reinforcement learning (RL) is applied toan energy management strategy of a super-mild hybrid electric vehicle According to time-speed datasets of sample driving cyclesa stochastic model of the driverrsquos power demand is developed Based on theMarkov decision process theory a mathematical modelof an RL-based energy management strategy is established which assumes the minimum cumulative return expectation as itsoptimization objective A policy iteration algorithm is adopted to obtain the optimum control policy that takes the vehicle speeddriverrsquos power demand and state of charge (SOC) as the input and the engine power as the output Using a MATLABSimulinkplatform CYC WVUCITY simulation model is established The results show that compared with dynamic programming thismethod can not only adapt to random driving cycles and reduce fuel consumption of 24 but also be implemented online becauseof its small calculation volume

1 Introduction

With the increasing problems of global warming air pollu-tion and energy shortage hybrid electric vehicles (HEVs)have been extensively studied because of their potential tosignificantly improve fuel economy and reduce emissions

Energy management strategies are critical for HEVs toachieve optimal performance and energy efficiency throughpower split control Energy management strategies can bedivided according to the way they are implemented rule-based strategies and optimization-based strategies

Optimization-based strategies form an important groupof energy management strategies and can be divided intostrategies based on instantaneous optimization global opti-mization model predictive control and artificial intelligent

Rule-based strategies are primarily based on humanexperience The torque distribution of the engine and motor

is based on preset control rules which are formulated by thesteady-state MAP chart of the engine and motor Reference[1] proposed an energymanagement strategy based on a logicthreshold and a fuzzy algorithm for improving fuel economyWang et al [2] developed algorithms for OnOff control loadtracking control and bus voltage control and conducted asimulation Rule-based control strategy is simple and can beeasily implemented online however a static control strategyis not optimal in theory and it also does not consider thedynamic changes in working conditions

An instantaneous optimal control strategy can ensure anoptimum objective at every time step however it cannotguarantee an optimum objective design over the wholedriving cycle Comparedwith the logic threshold strategy thecalculation amount is large but it can be implemented onlineReference [3] formulated the energy management strategyof an HEV based on driving style recognition Reference

HindawiJournal of Control Science and EngineeringVolume 2019 Article ID 9259712 12 pageshttpsdoiorg10115520199259712

2 Journal of Control Science and Engineering

Table 1 Super-mild HEV parameters

Component Parameters Value

EngineEngine type JL472Q1

Rated power kW 49plusmn245Maximum torque N∙m 82plusmn41

Motor Rated power kW 5Rated speed r∙minminus1 1000

Battery Rated capacity A∙h 65Rated voltage V 80

Transmission

Minimum gear ratio 0498Maximum gear ratio 404

PGT parameter 3875Fixed gear ratio 24704

Others Final drive gear ratio 44059Motor reducing gear ratio 15

[4] combined K-means clustering algorithm with equivalentconsumption minimization strategy to realize the energymanagement of the whole vehicle

An MPC strategy can ensure an objective optimumdesign in the prediction domain and can be implementedonline Reference [5] proposed a stochastic model predictivecontrol-based energy management strategy using the vehiclelocation traveling direction and terrain information of thearea for HEVs operating in hilly regions with light traffic

A global optimal control strategy can guarantee anobjective optimum design over a given driving cycle bydistributing the power of the engine and motor Howeverit can only be implemented offline because of the largevolume of calculations involved Reference [6] proposed animproved dynamic programming (DP) control strategy forhybrid electric buses based on the state transition probabilityDP was applied to an EVT-based HEV powertrain to realizeits optimal control in [7] A DP algorithm was also used forglobal optimization on the performance of a speed couplingISG HEV in [8]

Intelligent energy management strategies include fuzzylogic neural network genetic algorithm and machinelearning-based control strategies Energy managementstrategies based on machine learning include those based onsupervised learning [9 10] unsupervised learning [11 12]and reinforcement learning

RL is a data-driven approach that assumes the system asa black box regardless of whether it is linear or nonlinearAs a type of self-adaptive optimal control method based onmachine learning RL has been widely applied to the learningcontrol of several nonlinear systems

Reference [13] proposed an energy management strategyfor PHEB based on RL Liu et al [14] proposed an RL-enabled energy management strategy by using the speedyQ-learning algorithm to accelerate the convergence rate inMarkov Chain-based control policy computation Reference[15] developed an online energy management controller fora plug-in HEV based on driving condition recognition and a

genetic algorithm In [16] deep Q learning was adopted forenergy management and the strategy was proposed and veri-fied RL was shown to derive model-free and adaptive controlfor energy management in [17] Liu et al [18] proposed a bi-level control framework that combined predictive learningwith RL to formulate an energy management strategy

RL can ensure global optimum over the driving cycleand does not require foreseeing the driving cycle Comparedwith complex dynamic models data-driven models can beimplemented online because of the small calculation volume

Aimed at super-mild HEV [19] studied rule-based andinstantaneous optimization methods to be applied to energymanagement strategies for super-mild HEVs However RLhas not been reported to be applied to an energymanagementstrategy of super-mild HEVs

This study establishes an energy management strategyfor super-mild HEVs based on a known model of RL Theoptimal control results are obtained by a policy iteration algo-rithm Based on theMATLABSimulink simulation platforma simulation is conducted on the economic performance ofthe vehicle

2 Super-Mild Hybrid Electric Vehicle Model

21 Structure and Main Parameters A super-mild HEV isprimarily composed of an engine motor and continuouslyvariable transmission with reflux power The continuouslyvariable transmission with reflux power includes a metal beltcontinuously variable transmission a fixed speed ratio geartransmission device a planetary gear transmission devicewet clutches a one-way clutch and a brake Its structure isshown in Figure 1Themain parameters are shown in Table 1

22 Working Modes There are four working modes ofa super-mild HEV only-motor mode only-engine modeengine-charging mode and regenerative braking mode asshown in Figure 2

Journal of Control Science and Engineering 3

L0

Battery

CVT system with reflux power

Motor

EngineBL1

L2

L4 L3

1

2 3

1-metal belt continuously variable transmission2-fixed speed ratio gear transmission device3-planetary gear transmission device B-brakeL1 L2 L3-wet friction clutch 4-one-way clutch

Figure 1 Super-mild HEV architecture

L0

Battery

BL1L2

L4 L3

1

2 3

(a)

L0

Battery

BL1L2

L4 L3

1

2 3

(b)

L0

Battery

BL1L2

L4 L3

1

2 3

(c)

L0

Battery

BL1

L2

L4 L3

1

2 3

(d)

Figure 2 Working modes of super-mild HEV (a) only-motor (b) only-engine (c) Engine-charging (d) Regenerative braking

23 Vehicle Dynamics Model

231 Power DemandModel The power demand at the wheelis the power sum of rolling resistance air resistance andacceleration resistance

119875re119902 = (119865119891 + 119865119908 + 119865119895) V119865119891 = 119891119898119892119865119908 = 119862119863119860V22115119865119895 = 120575119898119889V119889119905

(1)

where 119865119891 is the rolling resistance 119865119908 is the air resis-tance 119865119895 is the acceleration resistance V is the vehiclespeed 119891 is the rolling resistance coefficient 119898 is thevehicle mass 119862119863 is the air resistance coefficient 119860 is thefrontal area and 120575 is the conversion coefficient of rotationmass

232 Engine Model The engine is a highly nonlinear systemand its working process is very complex Therefore enginedata for a steady-state condition are obtained through exper-iment testing Based on these data an engine torque modeland an effective fuel consumption model are established asshown in Figures 3 and 4 respectively

233 BatteryModel Based on aNi-MHbattery performanceexperiment the electromotive force and internal resistancemodel are obtained as shown in Formulas (2) and (3)

119864119904119900119888 = 1198640 + 5sum1

119864119894SOC119894 (2)

where 1198640 is the electromotive constant of the battery 119864119894 is thefitting coefficients SOC is the state of charge and 119864soc is theelectromotive force under the current state

119877119904119900119888 = 1205750(1198770 + 6sum1

120582119894SOC119894) (3)

4 Journal of Control Science and Engineering

10002000 3000 4000

5000 6000

0

50

1000

20

40

60

80

100

Engi

ne to

rque

T

(Nmiddotm

)rottle opening ()

Engine rotation speed n(rmiddotmin-1)

Figure 3 Engine torque model

10002000

30004000

50006000

020

4060

80

200

400

600

800

1000

Engine torque T (Nmiddotm)

Engine rotation speed n(rmiddotmin-1)

Fuel

cons

umpt

ion

rate

(gmiddotk

wh-1

)

Figure 4 Engine fuel consumption rate model

where 1198770 is the internal resistance constant of battery 120582119894 isthe fitting coefficient 1205750 is the compensation coefficient ofinternal resistance with the change in current SOC is state ofcharge and 119877soc is internal resistance under the current state

The process to calculate the change in SOC is shown byFormulas (4) sim (6)

Δ119878119874119862 = minus119868119887119886119905 (119905)119876119887119886119905 (4)

119868119887119886119905 (119905) = 119864119904119900119888 minus radic1198641199041199001198882 minus 41198751198871198861199051198771199041199001198882119877119904119900119888 (5)

Therefore

Δ119878119874119862 = minus119864119904119900119888 minus radic1198641199041199001198882 minus 41198751198871198861199051198771199041199001198882119877119904119900119888119876119887119886119905 (6)

where 119875bat is the battery power 119868bat is the battery current and119876bat is the battery capacity

3 Stochastic Modeling of DriverrsquosPower Demand

Traditionally the driverrsquos power demand is obtained accord-ing to a given driving cycle however in reality the drivingcycle is random The discrete-time stochastic process isregarded as Markov decision process (MDP) In other wordsthe transition probability from the current state to the nextstate only depends on the current state and the selectedaction this is independent of historical states The powerdemand at next state only depends on the current powerdemand which is independent of any previous state Toestablish the transition probability of power demand a largevolume of data is required In this study time-speed datasetsof the UDDS and ECE EUDC driving cycles are adopted tocalculate the power demand at each moment (Figure 5) Thetransition probability matrix of the driverrsquos power demand isobtained using the maximum likelihood estimation method

The steps to calculate the power demand based on thedriving cycle are as follows

119875119903119890119902 isin 1198751119903119890119902 1198752119903119890119902 sdot sdot sdot 119875119904119903119890119902 (7)

Journal of Control Science and Engineering 5

0 200 400 600 800 1000 1200 14000

50

100

150

Time (s)

Time (s)0 200 400 600 800 1000 1200 1400

0

20

40

60

80

100

Spee

d

(kmmiddoth

-1)

Spee

d

(kmmiddoth

-1)

Figure 5 ECE EUDC and UDDS driving cycles

The transition probability 119875119894119895 is the probability from thecurrent state 119875119894req to the next state 119875119895req119875119894119895 = 119875 119875119903119890119902 (119896 + 1) = 119875119895119903119890119902 | 119875119903119890119902 (119896) = 119875119894119903119890119902

119894 119895 = 1 2 sdot sdot sdot 119904 and 119904sum119895=1

119875119894119895 = 1 (8)

According to the maximum likelihood estimationmethod the power transition probability can be obtained by

119875119894119895 = 119898119894119895119898119894 119894 119895 isin 1 2 sdot sdot sdot 119904 (9)

where 119898119894119895 represents the number of times that the transitionfrom119875119894req to119875119895req has occured given the vehicle speed and119898119894represents the total number of times that 119875119894req has occurredat the vehicle speed it is given by

119898119894 = 119904sum119895=1

119898119894119895 (10)

Based on the data of ECE EUDC and UDDS drivingcycles transition probability matrices for the power demandare obtained at given speeds of 10 kmh 20 kmh 30 kmh40 kmh as shown in Figure 6

4 Known-Model RL EnergyManagement Strategy

41 Addressing Energy Management by RL From a macro-scopic perspective the energy management strategy ofan HEV involves determining the drivers power demandaccording to the drivers operation (acceleration pedal orbraking pedal) and distributing optimal power split betweentwo power sources (engine and motor) on the premise of

guaranteeing dynamic performance Fromamicroscopic per-spective the energy management strategy can be abstractedas solving a sequential optimization decision problem RL is amachine learningmethod based onMarkov decision processwhich can solve sequential optimization decision problems

The process of solving sequential decision problem by RLis as follows First a Markov decision process is representedby the tuple (119904 119886 119875 119903 120574) where 119904 is the state set 119886 is the actionset 119875 is the transition probability 119903 is the return functionand 120574 is the discount factor (Figure 7) Second based on theMarkov decision process the vehicle controller of an HEV isregarded as an agent the distribution of its engine power isregarded as action 119886 and the hybrid electric system exceptfor the vehicle controller is regarded as the environment Inorder to achieve minimum cumulative fuel consumption theagent takes certain action to interact with the environmentAfter the environment accepts the action the state beginsto change and an immediate return is generated to feedbackto the agent The agent chooses the next action based onan immediate return and the current state of environmentand then interacts with the environment again The agentinteracts with the environment continually thus generating aconsiderable amount of data RL utilizes the generated datato modify the action variable Then the agent interacts withthe environment again and generates new data this newdata is utilized to further optimize the action variable Afterseveral iterations the agent will learn the optimal action thatcan complete the corresponding task in other words it willdetermine the decision sequence 120587lowast0 997888rarr 120587lowast1 997888rarr sdot sdot sdot 120587lowast119905(Figure 8) thereby solving the sequential decision problem

42 Mathematical Model of an Energy Management StrategyBased on RL Mathematical models of the action variable119886 state variable 119904 and immediate return 119903 are establishedaccording to RL theoryThe SOC vehicle speed V and powerdemand 119875req are taken as state variables the engine power119875119890 is taken as the action variable and the sum of equivalentfuel consumption of a one-step state transition and the SOC

6 Journal of Control Science and Engineering

minus4 minus2 0 2 4 6 8 10

minus7minus5

minus3minus1

13

57

0

20

40

60

80

100

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW) Next demand power P

jLK

(kW)

(a) v=10kmh

49

14

minus14minus9

minus41

611

020

40

6080

100

minus11minus6

minus1

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW) Next demand power P

jLK

(kW)

(b) v=20kmh

4 6 810 12

minus4minus2 0

24

68

1012

0

20

40

60

80

100

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW)

Next demand power PjLK

(kW)

(c) v=30kmh

minus13 minus9minus5

minus13 7

11 15minus10

minus6minus2

26

10

0

50

100

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW)

Next demand power PjLK

(kW)

(d) v=40kmh

Figure 6 Power demand transition probability matrix

AGENT ENVIRONMENT

Action

State

Reward

State

R d

Figure 7 Schematic of reinforcement learning

s0 s1 stlowast

0 lowast1

Figure 8 Policy decision sequence

Journal of Control Science and Engineering 7

PI Algorithm1 Input state transition probability 119875119894119895 return function 119903 discount factor 120574Initialization value function V(s)=0Initialization policy 1205870

2 Repeat k=01 3 find 119881120587119896 policy evaluation4 120587119896+1 (119904) = arg min

a119902120587119896 (119904 119886)policy improvement

5 Until 120587119896+1 = 1205871198966 Output 120587lowast = 120587119896

Algorithm 1 Flowchart of the PI algorithm

penalty is taken as the immediate return 119903 1205821015840 is a factor of theequivalent fuel consumption 119898fuel is the fuel consumption119898ele is the electricity consumption 120572 is the penalty factor ofthe battery and 119878119874119862ref is reference value of the SOC

119904 = [119878119874119862 V 119875119903119890119902]119886 = 119875119890119903 = 119898119891119906119890119897 + 1205821015840119898119890119897119890 + 120572 (119878119874119862 minus 119878119874119862119903119890119891)2

(11)

In an infinite time domain the Markov decision processwill solve the problem of determining a sequence of decisionpolicies that can predict the minimum cumulative returnexpectation of a randomprocess called the optimal state valuefunction 119881lowast(119904)

119881lowast (119904) = min120587

119864[infinsum119905=0

120574119905119903119905+1] (12)

where 120574 isin [0 1] is the discount factor 119903119905+1 represents thereturn value at 119905 + 1 time and 120587 represents the control policy

Meanwhile the control and state variablesmust alsomeetthe following constraints

119878119874119862min le 119878119874119862 le 119878119874119862max

Vmin le V le Vmax

119875119890min le 119875119890 le 119875119890max

119875119898min le 119875119898 le 119875119898max

(13)

where the subscript min and max represent the maximumand minimum threshold values for the state of charge speedand power respectively

43 Policy Iterative Algorithm For Formula (12) according tothe Berman equation

119881lowast (119904) = minsum119886

120587 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840)sdot (119903 (119904 119886 1199041015840) + 120574119881lowast (1199041015840))

(14)

where s1015840 represents the state at the next time step

Table 2 Simulation results based on ECE EUDC

Control Policy Fuel Consumption (L100 km)Rule-based 5368Instantaneous optimization 5256DP 4952RL 5069

The purpose of the solution is to determine the optimalpolicy 120587lowast

120587lowast (119904) = argmin sum119886

120587 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840)sdot (119903 (119904 119886 1199041015840) + 120574119881lowast (1199041015840))

(15)

A policy iterative (PI) algorithm is used to solve the prob-lemof the randomprocess It involves a policy estimation stepand a policy improvement-step

The calculation process is shown in Algorithm 1In the policy evaluation step for a control policy 120587k(s)

(the subscript k represents the number of iterations) thecorresponding state value function 119881120587119896(s) is calculated asshown in

119881120587119896 (119904)= sum119886

120587119896 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840) (119903 (119904 119886 1199041015840) + 120574119881120587119896 (1199041015840)) (16)

In the policy improvement step the improved policy120587119896+1(s) is determined through a greedy strategy as shown in

120587119896+1 (119904) = argmin119886

119902120587119896 (119904 119886)119902120587119896 (119904 119886) = 119903 (119904 119886) + 120574sum119875(119904 119886 1199041015840)119881120587119896 (1199041015840) (17)

During policy iteration policy evaluation and policyimprovement are performed alternately until the state valuefunction and the policy converge

The policy iteration algorithm is adopted obtaining theoptimum control policy that takes the vehicle speed driverrsquospower demand and SOC as the input and the engine poweras the output Figure 9 shows the optimized engine powers atvehicle speeds of 10 kmh 20 kmh 30 kmh and 40 kmh

8 Journal of Control Science and Engineering

03 04 05 06 070246810

05

1015

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(a) v=10kmh

03 04 05 06 07

2468101205

101520

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(b) v=20kmh

03 04 05 06 0724681012

05

101520

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(c) v=30kmh

03 04 05 06 07

05101520250

10

20

30

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(d) v=40kmh

Figure 9 Optimization charts of engine power

Transition Probability Matrix

N

Y

Markov Chain

Offline Frame

Engine Model

SOCv

Online Frame

k=k+

1

Cost Function

0 200 400 600 800 1000 1200 14000

50

100

150

0 200 400 600 800 1000 1200 14000

20

40

60

80

100Time (s)

Time (s)

immediate return =mOF +mF+(SOC-SOCL )2

PLK=(F+FQ+FD) PCD =mCDmC

state variable s=[SOCPLK]

action variable a=P r(sa)

Motor Model

Battery Model

TransmissionModel Vehicle

Engine Power Look-up Table

Policy Iteration

Initial (s)=0 0 k=0

E+1(s)=LAGCHqE (sa)

VE (s)=sumE(sa) sumP(sas)(r(sas)+VE (s))

qE (sa)=r(sa)+sumP(sas)VE (s)

E+1 = E

lowast=E

DriverModel

PowerDemand

Accelerator pedal

Brake pedal

==

<LE

P

PG

P<N

T n

TG nG

minus4 minus2 0 2 4 6 8 10

minus7minus5

minus3minus1

13

57

0

20

40

60

80

100

49

14

minus14minus9

minus41

611

0

20

40

60

80

100

minus11minus6

minus1

4 6 810 12

minus4minus2 0

24

68

1012

0

20

40

60

80

100

minus13 minus9minus5 minus1

3 711 15

minus10minus6

minus22

610

0

50

100

=10kmh

=40kmh=30kmh

=20kmh

=10kmh

=40kmh=30kmh

=20kmh

03 04 05 06 070246810

05

1015

SOC03 04 05 06 07

2468101205

101520

SOC

03 04 05 06 0724681012

05

101520

SOC03 04 05 06 07

05101520250

10

20

30

SOC

Engine Power Optimization Chart

03 04 05 0607

05

10152025

0

10

20

30

SOC

Figure 10 Implementation frame of the energy management strategy

It can be seen from Figure 9 that only the motor workswhen the SOC is high and the power demand is low whenboth the SOC and the power demand are high only theengine works and when the SOC is low the engine is in thecharging mode

Figure 10 illustrates the offline and online implemen-tation frames of the energy management strategy In theoffline part the power demand transition probability matrixis obtained by the Markov Chain Mathematical models

of the state variable 119904 action variable 119886 and immediatereturn 119903 are derived according to RL theory The policyiteration algorithm is employed to determine engine poweroptimization tables In the online part the drivers powerdemand is obtained by the opening of drivers acceleratorpedal and brake pedal Then the power of the engine andmotor is distributed by looking up the offline engine poweroptimization tables Finally the power is transferred to thewheels by the transmission system

Journal of Control Science and Engineering 9

0 200 400 600 800 1000 1200 14000

50

100

150

Time (s)

Spee

d

(kmmiddoth

-1)

(a)

0 200 400 600 800 1000 1200 14000

1

2

3

4

Time (s)

Gea

r rat

ioi A

(b)

0 200 400 600 800 1000 1200 140003

04

05

06

07

SOC

Time (s)

(c)

0 200 400 600 800 1000 1200 1400minus50minus40minus30minus20minus10

010203040

Time (s)

Dem

and

pow

erPLK

(kW

)(d)

0 200 400 600 800 1000 1200 1400minus5

0

5

Time (s)

Mot

or p

ower

PG

(kW

)

(e)

40

0 200 400 600 800 1000 1200 14000

20

60

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(f)

0 200 400 600 800 1000 1200 1400minus30minus20

minus100

1020

Time (s)

Mot

or to

rque

TG

(Nmiddotm

)

(g)

1200 14000 200 400 600 800 1000minus05

005101520

3035

25

40

Equi

vale

nt fu

el

Time (s)

cons

umpt

ionb

KO

(mLmiddot

s-1)

(h)

Figure 11 ECE EUDC simulation results based on RL

5 Simulation Experiment and Results

A simulation is conducted on a MATLAB Simulink plat-form taking ECE EUDC driving cycle as the simulationdriving cycle and setting the initial SOC as 06 The energymanagement strategy based on a known model of RL isadopted simulating the vehicle operation status onlineResults are shown in Figure 11

From Figure 11 we can see that the following parameterschange with time gear ratio of transmission 119894g SOC ofthe battery power demand 119875req motor power 119875m enginetorque 119879e motor torque 119879119898 and instantaneous equivalentfuel consumption 119887equ From Figure 11(b) it can be seen that

the gear ratio varies continuously from 0498 to 404 thisis primarily because of the use of a CVT with reflux powerwhich unlike AMT can realize power without interruptionIn Figure 11(c) SOC of the battery is seen to change from theinitial value of 06 to the final value of 05845SOC = 00155this can meet the HEV SOC balance requirement before andafter the cycle (minus002 le Δ119878119874119862 le 002) The power demandchange curve of the vehicle is obtained based on ECE EUDCas shown in Figure 11(d)The power distribution curves of themotor and the engine can be obtained according to the enginepower optimization control tables as shown in Figures 11(e)and 11(f) The motor power is negative which indicates thatthe motor is in the generating state

10 Journal of Control Science and Engineering

0 200 400 600 800 1000 1200 1400

SOC

04

05

06

07

RL DP

Time (s)

(a)

0 200 400 600 800 1000 1200 1400minus30

minus20

minus10

0

10

20

RL DPTime (s)

Mot

or to

rque

TG

(Nmiddotm

)

(b)

0 200 400 600 800 1000 1200 14000

10

20

30

40

5060

70

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(c)

Figure 12 Comparison between DP and RL optimization results based on ECE EUDC

0 500 1000 15000

102030405060

Time (s)

Spee

d

(kmmiddoth

-1)

RL(a)

0 500 1000 150004

05

06

07

SOC

Time (s)

RL DP(b)

0 500 1000 1500minus50

minus25

0

25

50

Time (s)

Mot

or to

rque

TG

(Nmiddotm

)

RL DP(c)

0 500 1000 15000

20

40

60

80

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(d)

Figure 13 Comparison between DP and RL optimization results based on CYC WVUCITY

Journal of Control Science and Engineering 11

Table 3 Simulation results based on CYC WVUCITY

Control Policy Fuel Consumption (L100 km) Computation time(s)a

offline onlineDP 4983 5280 mdashRL 4864 4300 35aA 25GHz microprocessor with 8GB RAM was used

In order to validate the optimal performance of the RLcontrol strategy vehicle simulation tests based on DP andRL were carried out Figure 12 shows a comparison of theoptimization results by the two control methods based onthe ECE EUDC driving cycle The solid line indicates theoptimization results of RL and the dotted line indicatesthe results of DP Figure 12(a) shows the SOC optimizationtrajectories of the DP- and RL-based strategies The trend ofthe two curves is essentially the sameThe difference betweenthe final SOC and the initial SOC is within 002 and the SOCis relatively stable Compared with RL the SOC curve of DP-based strategy fluctuates greatly this is primarily related tothe distribution of the power source torque Figures 12(b) and12(c) indicate engine and motor torque distribution curvesbased on DP and RL strategies The engine torque curvesessentially coincide however the motor torque curves aresomewhat different This is primarily reflected in the torquedistribution when the motor is in the generation state

Table 2 shows the equivalent fuel consumption obtainedthrough DP and RL optimization Compared with thatobtained by DP (4952L) the value of fuel consumptionobtained by RL is 23 higher The reason for this is thatDP only ensures a global optimum at a given driving cycle(ECE EUDC) whereas RL optimizes the result for a series ofdriving cycles in an average sense thereby realizing a cumu-lative expected value optimum Compared with rule-based(5368L) and instantaneous optimization (5256L) proposedin the literature [19] RL decreased the fuel consumption by56 and 36 respectively

In order to validate the adaptability of RL to randomdriving cycle CYC WVUCITY is also selected as a simula-tion cycle The optimization results are shown in Figure 13Figure 13(b) shows the variation curve of the SOC of thebattery based on DP and RL control strategies The solidline indicates the optimization results of RL and the dottedline indicates the results of DP The final SOC value for theDP and RL strategies is 059 and 058 respectively with thesame initial value of 06The SOC remained essentially stableFigures 13(c) and 13(d) show the optimal torque curves ofthe motor and the engine based on DP and RL strategies Itcan be seen from the figure that the changing trend of thetorque curves based on the two strategies is largely the sameIn comparison the change in the torque obtained by DPfluctuates greatly Table 3 demonstrates the equivalent fuelconsumption of the two control strategies Compared withDP RL saves fuel by 24 it can adapt with random cycleperfectly Meanwhile the computation time based onDP andRL is recorded which include offline and online computationtime The offline computation time of DP is 5280s and thatof RL is 4300s Due to large calculation volume and the

driving cycle unknown in advance DP cannot be realizedonline while RL is not limited to a certain cycle which can berealized online by embedding the engine power optimizationtables into the vehicle controller The online operation timeof RL is 35s based on CYC WVUCITY simulation cycle

6 Conclusion

(1) We established a stochastic Markov model of thedriverrsquos power demand based on datasets of the UDDS andECE EUDC driving cycles(2) An RL energy management strategy was proposedwhich takes SOC vehicle speed and power demand as statevariables engine power as power as the action variable andminimum cumulative return expectation as the optimizationobjective(3) Using the MATLABSimulink platform a CYCWVUCITY simulation model was established The resultsshow that comparedwithDP RL could reduce fuel consump-tion by 24 and be implemented online

Data Availability

The data used to support the findings of this study areincluded within the article

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This research is supported by Scientific and Technol-ogy Research Program of Chongqing Municipal EducationCommission (Grant no KJQN201800718) and ChongqingResearchProgramof Basic Research andFrontier Technology(Grant no cstc2017jcyjAX0053)

References

[1] G Zhao and Z Chen ldquoEnergy management strategy for serieshybrid electric vehiclerdquo Journal of Northeastern University vol34 no 4 pp 583ndash587 2013

[2] W Wang and X Cheng ldquoForward simulation on the energymanagement system for series hybrid electric busrdquo AutomotiveEngineering vol 35 no 2 pp 121ndash126 2013

[3] D Qin and S Zhan ldquoManagement strategy of hybrid electricalvehicle based on driving style recognitionrdquo Journal of Mechani-cal Engineering vol 52 no 8 pp 162ndash169 2016

12 Journal of Control Science and Engineering

[4] S Zhan and D Qin ldquoEnergy management strategy of HEVbased on driving cycle recognition using genetic optimized K-means clustering algorithmrdquo Chinese Journal of Highway andTransport vol 29 no 4 pp 130ndash137 2016

[5] X Zeng and JWang ldquoA Parallel Hybrid Electric Vehicle EnergyManagement Strategy Using Stochastic Model Predictive Con-trol With Road Grade Previewrdquo IEEE Transactions on ControlSystems Technology vol 23 no 6 pp 2416ndash2423 2015

[6] X Tang and B Wang ldquoControl strategy for hybrid electric busbased on state transition probabilityrdquo Automotive EngineeringMagazine vol 38 no 3 pp 263ndash268 2016

[7] Z Ma and S Cui ldquoApplying dynamic programming to HEVpowertrain based on EVTrdquo in Proceedings of the 2015 Inter-national Conference on Control Automation and InformationSciences vol 22 pp 37ndash41 2015

[8] Y Yang and P Ye ldquoA Research on the Fuzzy Control Strategy fora Speed-coupling ISG HEV Based on Dynamic Programmingoptimizationrdquo Automotive Engineering vol 38 no 6 pp 674ndash679 2016

[9] Z Chen C C Mi J Xu X Gong and C You ldquoEnergymanagement for a power-split plug-in hybrid electric vehiclebased on dynamic programming and neural networksrdquo IEEETransactions on Vehicular Technology vol 63 no 4 pp 1567ndash1580 2014

[10] C Sun S J Moura X Hu J K Hedrick and F Sun ldquoDynamictraffic feedback data enabled energy management in plug-inhybrid electric vehiclesrdquo IEEE Transactions on Control SystemsTechnology vol 23 no 3 pp 1075ndash1086 2015

[11] B Bader ldquoAn energy management strategy for plug-in hybridelectric vehiclesrdquoMateria vol 29 no 11 2013

[12] Z Chen L Li B Yan C Yang CMarinaMartinez andD CaoldquoMultimode Energy Management for Plug-In Hybrid ElectricBuses Based on Driving Cycles Predictionrdquo IEEE Transactionson Intelligent Transportation Systems vol 17 no 10 pp 2811ndash2821 2016

[13] Z Chen and Y Liu ldquoEnergy Management Strategy for Plug-inHybrid Electric Bus with Evolutionary Reinforcement Learningmethodrdquo Journal of Mechanical Engineering vol 53 no 16 pp86ndash93 2017

[14] T Liu B Wang and C Yang ldquoOnline Markov Chain-basedenergy management for a hybrid tracked vehicle with speedyQ-learningrdquo Energy vol 160 pp 544ndash555 2018

[15] L Teng ldquoOnline Energy Management for Multimode Plug-in Hybrid Electric Vehiclesrdquo IEEE Transactions on IndustrialInformatics vol 2018 no 10 pp 1ndash9 2018

[16] JWuHHe J Peng Y Li andZ Li ldquoContinuous reinforcementlearning of energy management with deep Q network for apower split hybrid electric busrdquo Applied Energy vol 222 pp799ndash811 2018

[17] T Liu X Hu S E Li and D Cao ldquoReinforcement LearningOptimized Look-Ahead Energy Management of a ParallelHybrid Electric Vehiclerdquo IEEEASME Transactions on Mecha-tronics vol 22 no 4 pp 1497ndash1507 2017

[18] T Liu and X Hu ldquoA Bi-Level Control for Energy EfficiencyImprovement of a Hybrid Tracked Vehiclerdquo IEEE Transactionson Industrial Informatics vol 14 no 4 pp 1616ndash1625 2018

[19] Y Yin Study on Optimization Matching and Simulation forSuper-Mild HEV Chongqing university Chongqing China2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 2: An Energy Management Strategy for a Super-Mild Hybrid

2 Journal of Control Science and Engineering

Table 1 Super-mild HEV parameters

Component Parameters Value

EngineEngine type JL472Q1

Rated power kW 49plusmn245Maximum torque N∙m 82plusmn41

Motor Rated power kW 5Rated speed r∙minminus1 1000

Battery Rated capacity A∙h 65Rated voltage V 80

Transmission

Minimum gear ratio 0498Maximum gear ratio 404

PGT parameter 3875Fixed gear ratio 24704

Others Final drive gear ratio 44059Motor reducing gear ratio 15

[4] combined K-means clustering algorithm with equivalentconsumption minimization strategy to realize the energymanagement of the whole vehicle

An MPC strategy can ensure an objective optimumdesign in the prediction domain and can be implementedonline Reference [5] proposed a stochastic model predictivecontrol-based energy management strategy using the vehiclelocation traveling direction and terrain information of thearea for HEVs operating in hilly regions with light traffic

A global optimal control strategy can guarantee anobjective optimum design over a given driving cycle bydistributing the power of the engine and motor Howeverit can only be implemented offline because of the largevolume of calculations involved Reference [6] proposed animproved dynamic programming (DP) control strategy forhybrid electric buses based on the state transition probabilityDP was applied to an EVT-based HEV powertrain to realizeits optimal control in [7] A DP algorithm was also used forglobal optimization on the performance of a speed couplingISG HEV in [8]

Intelligent energy management strategies include fuzzylogic neural network genetic algorithm and machinelearning-based control strategies Energy managementstrategies based on machine learning include those based onsupervised learning [9 10] unsupervised learning [11 12]and reinforcement learning

RL is a data-driven approach that assumes the system asa black box regardless of whether it is linear or nonlinearAs a type of self-adaptive optimal control method based onmachine learning RL has been widely applied to the learningcontrol of several nonlinear systems

Reference [13] proposed an energy management strategyfor PHEB based on RL Liu et al [14] proposed an RL-enabled energy management strategy by using the speedyQ-learning algorithm to accelerate the convergence rate inMarkov Chain-based control policy computation Reference[15] developed an online energy management controller fora plug-in HEV based on driving condition recognition and a

genetic algorithm In [16] deep Q learning was adopted forenergy management and the strategy was proposed and veri-fied RL was shown to derive model-free and adaptive controlfor energy management in [17] Liu et al [18] proposed a bi-level control framework that combined predictive learningwith RL to formulate an energy management strategy

RL can ensure global optimum over the driving cycleand does not require foreseeing the driving cycle Comparedwith complex dynamic models data-driven models can beimplemented online because of the small calculation volume

Aimed at super-mild HEV [19] studied rule-based andinstantaneous optimization methods to be applied to energymanagement strategies for super-mild HEVs However RLhas not been reported to be applied to an energymanagementstrategy of super-mild HEVs

This study establishes an energy management strategyfor super-mild HEVs based on a known model of RL Theoptimal control results are obtained by a policy iteration algo-rithm Based on theMATLABSimulink simulation platforma simulation is conducted on the economic performance ofthe vehicle

2 Super-Mild Hybrid Electric Vehicle Model

21 Structure and Main Parameters A super-mild HEV isprimarily composed of an engine motor and continuouslyvariable transmission with reflux power The continuouslyvariable transmission with reflux power includes a metal beltcontinuously variable transmission a fixed speed ratio geartransmission device a planetary gear transmission devicewet clutches a one-way clutch and a brake Its structure isshown in Figure 1Themain parameters are shown in Table 1

22 Working Modes There are four working modes ofa super-mild HEV only-motor mode only-engine modeengine-charging mode and regenerative braking mode asshown in Figure 2

Journal of Control Science and Engineering 3

L0

Battery

CVT system with reflux power

Motor

EngineBL1

L2

L4 L3

1

2 3

1-metal belt continuously variable transmission2-fixed speed ratio gear transmission device3-planetary gear transmission device B-brakeL1 L2 L3-wet friction clutch 4-one-way clutch

Figure 1 Super-mild HEV architecture

L0

Battery

BL1L2

L4 L3

1

2 3

(a)

L0

Battery

BL1L2

L4 L3

1

2 3

(b)

L0

Battery

BL1L2

L4 L3

1

2 3

(c)

L0

Battery

BL1

L2

L4 L3

1

2 3

(d)

Figure 2 Working modes of super-mild HEV (a) only-motor (b) only-engine (c) Engine-charging (d) Regenerative braking

23 Vehicle Dynamics Model

231 Power DemandModel The power demand at the wheelis the power sum of rolling resistance air resistance andacceleration resistance

119875re119902 = (119865119891 + 119865119908 + 119865119895) V119865119891 = 119891119898119892119865119908 = 119862119863119860V22115119865119895 = 120575119898119889V119889119905

(1)

where 119865119891 is the rolling resistance 119865119908 is the air resis-tance 119865119895 is the acceleration resistance V is the vehiclespeed 119891 is the rolling resistance coefficient 119898 is thevehicle mass 119862119863 is the air resistance coefficient 119860 is thefrontal area and 120575 is the conversion coefficient of rotationmass

232 Engine Model The engine is a highly nonlinear systemand its working process is very complex Therefore enginedata for a steady-state condition are obtained through exper-iment testing Based on these data an engine torque modeland an effective fuel consumption model are established asshown in Figures 3 and 4 respectively

233 BatteryModel Based on aNi-MHbattery performanceexperiment the electromotive force and internal resistancemodel are obtained as shown in Formulas (2) and (3)

119864119904119900119888 = 1198640 + 5sum1

119864119894SOC119894 (2)

where 1198640 is the electromotive constant of the battery 119864119894 is thefitting coefficients SOC is the state of charge and 119864soc is theelectromotive force under the current state

119877119904119900119888 = 1205750(1198770 + 6sum1

120582119894SOC119894) (3)

4 Journal of Control Science and Engineering

10002000 3000 4000

5000 6000

0

50

1000

20

40

60

80

100

Engi

ne to

rque

T

(Nmiddotm

)rottle opening ()

Engine rotation speed n(rmiddotmin-1)

Figure 3 Engine torque model

10002000

30004000

50006000

020

4060

80

200

400

600

800

1000

Engine torque T (Nmiddotm)

Engine rotation speed n(rmiddotmin-1)

Fuel

cons

umpt

ion

rate

(gmiddotk

wh-1

)

Figure 4 Engine fuel consumption rate model

where 1198770 is the internal resistance constant of battery 120582119894 isthe fitting coefficient 1205750 is the compensation coefficient ofinternal resistance with the change in current SOC is state ofcharge and 119877soc is internal resistance under the current state

The process to calculate the change in SOC is shown byFormulas (4) sim (6)

Δ119878119874119862 = minus119868119887119886119905 (119905)119876119887119886119905 (4)

119868119887119886119905 (119905) = 119864119904119900119888 minus radic1198641199041199001198882 minus 41198751198871198861199051198771199041199001198882119877119904119900119888 (5)

Therefore

Δ119878119874119862 = minus119864119904119900119888 minus radic1198641199041199001198882 minus 41198751198871198861199051198771199041199001198882119877119904119900119888119876119887119886119905 (6)

where 119875bat is the battery power 119868bat is the battery current and119876bat is the battery capacity

3 Stochastic Modeling of DriverrsquosPower Demand

Traditionally the driverrsquos power demand is obtained accord-ing to a given driving cycle however in reality the drivingcycle is random The discrete-time stochastic process isregarded as Markov decision process (MDP) In other wordsthe transition probability from the current state to the nextstate only depends on the current state and the selectedaction this is independent of historical states The powerdemand at next state only depends on the current powerdemand which is independent of any previous state Toestablish the transition probability of power demand a largevolume of data is required In this study time-speed datasetsof the UDDS and ECE EUDC driving cycles are adopted tocalculate the power demand at each moment (Figure 5) Thetransition probability matrix of the driverrsquos power demand isobtained using the maximum likelihood estimation method

The steps to calculate the power demand based on thedriving cycle are as follows

119875119903119890119902 isin 1198751119903119890119902 1198752119903119890119902 sdot sdot sdot 119875119904119903119890119902 (7)

Journal of Control Science and Engineering 5

0 200 400 600 800 1000 1200 14000

50

100

150

Time (s)

Time (s)0 200 400 600 800 1000 1200 1400

0

20

40

60

80

100

Spee

d

(kmmiddoth

-1)

Spee

d

(kmmiddoth

-1)

Figure 5 ECE EUDC and UDDS driving cycles

The transition probability 119875119894119895 is the probability from thecurrent state 119875119894req to the next state 119875119895req119875119894119895 = 119875 119875119903119890119902 (119896 + 1) = 119875119895119903119890119902 | 119875119903119890119902 (119896) = 119875119894119903119890119902

119894 119895 = 1 2 sdot sdot sdot 119904 and 119904sum119895=1

119875119894119895 = 1 (8)

According to the maximum likelihood estimationmethod the power transition probability can be obtained by

119875119894119895 = 119898119894119895119898119894 119894 119895 isin 1 2 sdot sdot sdot 119904 (9)

where 119898119894119895 represents the number of times that the transitionfrom119875119894req to119875119895req has occured given the vehicle speed and119898119894represents the total number of times that 119875119894req has occurredat the vehicle speed it is given by

119898119894 = 119904sum119895=1

119898119894119895 (10)

Based on the data of ECE EUDC and UDDS drivingcycles transition probability matrices for the power demandare obtained at given speeds of 10 kmh 20 kmh 30 kmh40 kmh as shown in Figure 6

4 Known-Model RL EnergyManagement Strategy

41 Addressing Energy Management by RL From a macro-scopic perspective the energy management strategy ofan HEV involves determining the drivers power demandaccording to the drivers operation (acceleration pedal orbraking pedal) and distributing optimal power split betweentwo power sources (engine and motor) on the premise of

guaranteeing dynamic performance Fromamicroscopic per-spective the energy management strategy can be abstractedas solving a sequential optimization decision problem RL is amachine learningmethod based onMarkov decision processwhich can solve sequential optimization decision problems

The process of solving sequential decision problem by RLis as follows First a Markov decision process is representedby the tuple (119904 119886 119875 119903 120574) where 119904 is the state set 119886 is the actionset 119875 is the transition probability 119903 is the return functionand 120574 is the discount factor (Figure 7) Second based on theMarkov decision process the vehicle controller of an HEV isregarded as an agent the distribution of its engine power isregarded as action 119886 and the hybrid electric system exceptfor the vehicle controller is regarded as the environment Inorder to achieve minimum cumulative fuel consumption theagent takes certain action to interact with the environmentAfter the environment accepts the action the state beginsto change and an immediate return is generated to feedbackto the agent The agent chooses the next action based onan immediate return and the current state of environmentand then interacts with the environment again The agentinteracts with the environment continually thus generating aconsiderable amount of data RL utilizes the generated datato modify the action variable Then the agent interacts withthe environment again and generates new data this newdata is utilized to further optimize the action variable Afterseveral iterations the agent will learn the optimal action thatcan complete the corresponding task in other words it willdetermine the decision sequence 120587lowast0 997888rarr 120587lowast1 997888rarr sdot sdot sdot 120587lowast119905(Figure 8) thereby solving the sequential decision problem

42 Mathematical Model of an Energy Management StrategyBased on RL Mathematical models of the action variable119886 state variable 119904 and immediate return 119903 are establishedaccording to RL theoryThe SOC vehicle speed V and powerdemand 119875req are taken as state variables the engine power119875119890 is taken as the action variable and the sum of equivalentfuel consumption of a one-step state transition and the SOC

6 Journal of Control Science and Engineering

minus4 minus2 0 2 4 6 8 10

minus7minus5

minus3minus1

13

57

0

20

40

60

80

100

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW) Next demand power P

jLK

(kW)

(a) v=10kmh

49

14

minus14minus9

minus41

611

020

40

6080

100

minus11minus6

minus1

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW) Next demand power P

jLK

(kW)

(b) v=20kmh

4 6 810 12

minus4minus2 0

24

68

1012

0

20

40

60

80

100

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW)

Next demand power PjLK

(kW)

(c) v=30kmh

minus13 minus9minus5

minus13 7

11 15minus10

minus6minus2

26

10

0

50

100

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW)

Next demand power PjLK

(kW)

(d) v=40kmh

Figure 6 Power demand transition probability matrix

AGENT ENVIRONMENT

Action

State

Reward

State

R d

Figure 7 Schematic of reinforcement learning

s0 s1 stlowast

0 lowast1

Figure 8 Policy decision sequence

Journal of Control Science and Engineering 7

PI Algorithm1 Input state transition probability 119875119894119895 return function 119903 discount factor 120574Initialization value function V(s)=0Initialization policy 1205870

2 Repeat k=01 3 find 119881120587119896 policy evaluation4 120587119896+1 (119904) = arg min

a119902120587119896 (119904 119886)policy improvement

5 Until 120587119896+1 = 1205871198966 Output 120587lowast = 120587119896

Algorithm 1 Flowchart of the PI algorithm

penalty is taken as the immediate return 119903 1205821015840 is a factor of theequivalent fuel consumption 119898fuel is the fuel consumption119898ele is the electricity consumption 120572 is the penalty factor ofthe battery and 119878119874119862ref is reference value of the SOC

119904 = [119878119874119862 V 119875119903119890119902]119886 = 119875119890119903 = 119898119891119906119890119897 + 1205821015840119898119890119897119890 + 120572 (119878119874119862 minus 119878119874119862119903119890119891)2

(11)

In an infinite time domain the Markov decision processwill solve the problem of determining a sequence of decisionpolicies that can predict the minimum cumulative returnexpectation of a randomprocess called the optimal state valuefunction 119881lowast(119904)

119881lowast (119904) = min120587

119864[infinsum119905=0

120574119905119903119905+1] (12)

where 120574 isin [0 1] is the discount factor 119903119905+1 represents thereturn value at 119905 + 1 time and 120587 represents the control policy

Meanwhile the control and state variablesmust alsomeetthe following constraints

119878119874119862min le 119878119874119862 le 119878119874119862max

Vmin le V le Vmax

119875119890min le 119875119890 le 119875119890max

119875119898min le 119875119898 le 119875119898max

(13)

where the subscript min and max represent the maximumand minimum threshold values for the state of charge speedand power respectively

43 Policy Iterative Algorithm For Formula (12) according tothe Berman equation

119881lowast (119904) = minsum119886

120587 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840)sdot (119903 (119904 119886 1199041015840) + 120574119881lowast (1199041015840))

(14)

where s1015840 represents the state at the next time step

Table 2 Simulation results based on ECE EUDC

Control Policy Fuel Consumption (L100 km)Rule-based 5368Instantaneous optimization 5256DP 4952RL 5069

The purpose of the solution is to determine the optimalpolicy 120587lowast

120587lowast (119904) = argmin sum119886

120587 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840)sdot (119903 (119904 119886 1199041015840) + 120574119881lowast (1199041015840))

(15)

A policy iterative (PI) algorithm is used to solve the prob-lemof the randomprocess It involves a policy estimation stepand a policy improvement-step

The calculation process is shown in Algorithm 1In the policy evaluation step for a control policy 120587k(s)

(the subscript k represents the number of iterations) thecorresponding state value function 119881120587119896(s) is calculated asshown in

119881120587119896 (119904)= sum119886

120587119896 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840) (119903 (119904 119886 1199041015840) + 120574119881120587119896 (1199041015840)) (16)

In the policy improvement step the improved policy120587119896+1(s) is determined through a greedy strategy as shown in

120587119896+1 (119904) = argmin119886

119902120587119896 (119904 119886)119902120587119896 (119904 119886) = 119903 (119904 119886) + 120574sum119875(119904 119886 1199041015840)119881120587119896 (1199041015840) (17)

During policy iteration policy evaluation and policyimprovement are performed alternately until the state valuefunction and the policy converge

The policy iteration algorithm is adopted obtaining theoptimum control policy that takes the vehicle speed driverrsquospower demand and SOC as the input and the engine poweras the output Figure 9 shows the optimized engine powers atvehicle speeds of 10 kmh 20 kmh 30 kmh and 40 kmh

8 Journal of Control Science and Engineering

03 04 05 06 070246810

05

1015

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(a) v=10kmh

03 04 05 06 07

2468101205

101520

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(b) v=20kmh

03 04 05 06 0724681012

05

101520

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(c) v=30kmh

03 04 05 06 07

05101520250

10

20

30

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(d) v=40kmh

Figure 9 Optimization charts of engine power

Transition Probability Matrix

N

Y

Markov Chain

Offline Frame

Engine Model

SOCv

Online Frame

k=k+

1

Cost Function

0 200 400 600 800 1000 1200 14000

50

100

150

0 200 400 600 800 1000 1200 14000

20

40

60

80

100Time (s)

Time (s)

immediate return =mOF +mF+(SOC-SOCL )2

PLK=(F+FQ+FD) PCD =mCDmC

state variable s=[SOCPLK]

action variable a=P r(sa)

Motor Model

Battery Model

TransmissionModel Vehicle

Engine Power Look-up Table

Policy Iteration

Initial (s)=0 0 k=0

E+1(s)=LAGCHqE (sa)

VE (s)=sumE(sa) sumP(sas)(r(sas)+VE (s))

qE (sa)=r(sa)+sumP(sas)VE (s)

E+1 = E

lowast=E

DriverModel

PowerDemand

Accelerator pedal

Brake pedal

==

<LE

P

PG

P<N

T n

TG nG

minus4 minus2 0 2 4 6 8 10

minus7minus5

minus3minus1

13

57

0

20

40

60

80

100

49

14

minus14minus9

minus41

611

0

20

40

60

80

100

minus11minus6

minus1

4 6 810 12

minus4minus2 0

24

68

1012

0

20

40

60

80

100

minus13 minus9minus5 minus1

3 711 15

minus10minus6

minus22

610

0

50

100

=10kmh

=40kmh=30kmh

=20kmh

=10kmh

=40kmh=30kmh

=20kmh

03 04 05 06 070246810

05

1015

SOC03 04 05 06 07

2468101205

101520

SOC

03 04 05 06 0724681012

05

101520

SOC03 04 05 06 07

05101520250

10

20

30

SOC

Engine Power Optimization Chart

03 04 05 0607

05

10152025

0

10

20

30

SOC

Figure 10 Implementation frame of the energy management strategy

It can be seen from Figure 9 that only the motor workswhen the SOC is high and the power demand is low whenboth the SOC and the power demand are high only theengine works and when the SOC is low the engine is in thecharging mode

Figure 10 illustrates the offline and online implemen-tation frames of the energy management strategy In theoffline part the power demand transition probability matrixis obtained by the Markov Chain Mathematical models

of the state variable 119904 action variable 119886 and immediatereturn 119903 are derived according to RL theory The policyiteration algorithm is employed to determine engine poweroptimization tables In the online part the drivers powerdemand is obtained by the opening of drivers acceleratorpedal and brake pedal Then the power of the engine andmotor is distributed by looking up the offline engine poweroptimization tables Finally the power is transferred to thewheels by the transmission system

Journal of Control Science and Engineering 9

0 200 400 600 800 1000 1200 14000

50

100

150

Time (s)

Spee

d

(kmmiddoth

-1)

(a)

0 200 400 600 800 1000 1200 14000

1

2

3

4

Time (s)

Gea

r rat

ioi A

(b)

0 200 400 600 800 1000 1200 140003

04

05

06

07

SOC

Time (s)

(c)

0 200 400 600 800 1000 1200 1400minus50minus40minus30minus20minus10

010203040

Time (s)

Dem

and

pow

erPLK

(kW

)(d)

0 200 400 600 800 1000 1200 1400minus5

0

5

Time (s)

Mot

or p

ower

PG

(kW

)

(e)

40

0 200 400 600 800 1000 1200 14000

20

60

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(f)

0 200 400 600 800 1000 1200 1400minus30minus20

minus100

1020

Time (s)

Mot

or to

rque

TG

(Nmiddotm

)

(g)

1200 14000 200 400 600 800 1000minus05

005101520

3035

25

40

Equi

vale

nt fu

el

Time (s)

cons

umpt

ionb

KO

(mLmiddot

s-1)

(h)

Figure 11 ECE EUDC simulation results based on RL

5 Simulation Experiment and Results

A simulation is conducted on a MATLAB Simulink plat-form taking ECE EUDC driving cycle as the simulationdriving cycle and setting the initial SOC as 06 The energymanagement strategy based on a known model of RL isadopted simulating the vehicle operation status onlineResults are shown in Figure 11

From Figure 11 we can see that the following parameterschange with time gear ratio of transmission 119894g SOC ofthe battery power demand 119875req motor power 119875m enginetorque 119879e motor torque 119879119898 and instantaneous equivalentfuel consumption 119887equ From Figure 11(b) it can be seen that

the gear ratio varies continuously from 0498 to 404 thisis primarily because of the use of a CVT with reflux powerwhich unlike AMT can realize power without interruptionIn Figure 11(c) SOC of the battery is seen to change from theinitial value of 06 to the final value of 05845SOC = 00155this can meet the HEV SOC balance requirement before andafter the cycle (minus002 le Δ119878119874119862 le 002) The power demandchange curve of the vehicle is obtained based on ECE EUDCas shown in Figure 11(d)The power distribution curves of themotor and the engine can be obtained according to the enginepower optimization control tables as shown in Figures 11(e)and 11(f) The motor power is negative which indicates thatthe motor is in the generating state

10 Journal of Control Science and Engineering

0 200 400 600 800 1000 1200 1400

SOC

04

05

06

07

RL DP

Time (s)

(a)

0 200 400 600 800 1000 1200 1400minus30

minus20

minus10

0

10

20

RL DPTime (s)

Mot

or to

rque

TG

(Nmiddotm

)

(b)

0 200 400 600 800 1000 1200 14000

10

20

30

40

5060

70

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(c)

Figure 12 Comparison between DP and RL optimization results based on ECE EUDC

0 500 1000 15000

102030405060

Time (s)

Spee

d

(kmmiddoth

-1)

RL(a)

0 500 1000 150004

05

06

07

SOC

Time (s)

RL DP(b)

0 500 1000 1500minus50

minus25

0

25

50

Time (s)

Mot

or to

rque

TG

(Nmiddotm

)

RL DP(c)

0 500 1000 15000

20

40

60

80

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(d)

Figure 13 Comparison between DP and RL optimization results based on CYC WVUCITY

Journal of Control Science and Engineering 11

Table 3 Simulation results based on CYC WVUCITY

Control Policy Fuel Consumption (L100 km) Computation time(s)a

offline onlineDP 4983 5280 mdashRL 4864 4300 35aA 25GHz microprocessor with 8GB RAM was used

In order to validate the optimal performance of the RLcontrol strategy vehicle simulation tests based on DP andRL were carried out Figure 12 shows a comparison of theoptimization results by the two control methods based onthe ECE EUDC driving cycle The solid line indicates theoptimization results of RL and the dotted line indicatesthe results of DP Figure 12(a) shows the SOC optimizationtrajectories of the DP- and RL-based strategies The trend ofthe two curves is essentially the sameThe difference betweenthe final SOC and the initial SOC is within 002 and the SOCis relatively stable Compared with RL the SOC curve of DP-based strategy fluctuates greatly this is primarily related tothe distribution of the power source torque Figures 12(b) and12(c) indicate engine and motor torque distribution curvesbased on DP and RL strategies The engine torque curvesessentially coincide however the motor torque curves aresomewhat different This is primarily reflected in the torquedistribution when the motor is in the generation state

Table 2 shows the equivalent fuel consumption obtainedthrough DP and RL optimization Compared with thatobtained by DP (4952L) the value of fuel consumptionobtained by RL is 23 higher The reason for this is thatDP only ensures a global optimum at a given driving cycle(ECE EUDC) whereas RL optimizes the result for a series ofdriving cycles in an average sense thereby realizing a cumu-lative expected value optimum Compared with rule-based(5368L) and instantaneous optimization (5256L) proposedin the literature [19] RL decreased the fuel consumption by56 and 36 respectively

In order to validate the adaptability of RL to randomdriving cycle CYC WVUCITY is also selected as a simula-tion cycle The optimization results are shown in Figure 13Figure 13(b) shows the variation curve of the SOC of thebattery based on DP and RL control strategies The solidline indicates the optimization results of RL and the dottedline indicates the results of DP The final SOC value for theDP and RL strategies is 059 and 058 respectively with thesame initial value of 06The SOC remained essentially stableFigures 13(c) and 13(d) show the optimal torque curves ofthe motor and the engine based on DP and RL strategies Itcan be seen from the figure that the changing trend of thetorque curves based on the two strategies is largely the sameIn comparison the change in the torque obtained by DPfluctuates greatly Table 3 demonstrates the equivalent fuelconsumption of the two control strategies Compared withDP RL saves fuel by 24 it can adapt with random cycleperfectly Meanwhile the computation time based onDP andRL is recorded which include offline and online computationtime The offline computation time of DP is 5280s and thatof RL is 4300s Due to large calculation volume and the

driving cycle unknown in advance DP cannot be realizedonline while RL is not limited to a certain cycle which can berealized online by embedding the engine power optimizationtables into the vehicle controller The online operation timeof RL is 35s based on CYC WVUCITY simulation cycle

6 Conclusion

(1) We established a stochastic Markov model of thedriverrsquos power demand based on datasets of the UDDS andECE EUDC driving cycles(2) An RL energy management strategy was proposedwhich takes SOC vehicle speed and power demand as statevariables engine power as power as the action variable andminimum cumulative return expectation as the optimizationobjective(3) Using the MATLABSimulink platform a CYCWVUCITY simulation model was established The resultsshow that comparedwithDP RL could reduce fuel consump-tion by 24 and be implemented online

Data Availability

The data used to support the findings of this study areincluded within the article

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This research is supported by Scientific and Technol-ogy Research Program of Chongqing Municipal EducationCommission (Grant no KJQN201800718) and ChongqingResearchProgramof Basic Research andFrontier Technology(Grant no cstc2017jcyjAX0053)

References

[1] G Zhao and Z Chen ldquoEnergy management strategy for serieshybrid electric vehiclerdquo Journal of Northeastern University vol34 no 4 pp 583ndash587 2013

[2] W Wang and X Cheng ldquoForward simulation on the energymanagement system for series hybrid electric busrdquo AutomotiveEngineering vol 35 no 2 pp 121ndash126 2013

[3] D Qin and S Zhan ldquoManagement strategy of hybrid electricalvehicle based on driving style recognitionrdquo Journal of Mechani-cal Engineering vol 52 no 8 pp 162ndash169 2016

12 Journal of Control Science and Engineering

[4] S Zhan and D Qin ldquoEnergy management strategy of HEVbased on driving cycle recognition using genetic optimized K-means clustering algorithmrdquo Chinese Journal of Highway andTransport vol 29 no 4 pp 130ndash137 2016

[5] X Zeng and JWang ldquoA Parallel Hybrid Electric Vehicle EnergyManagement Strategy Using Stochastic Model Predictive Con-trol With Road Grade Previewrdquo IEEE Transactions on ControlSystems Technology vol 23 no 6 pp 2416ndash2423 2015

[6] X Tang and B Wang ldquoControl strategy for hybrid electric busbased on state transition probabilityrdquo Automotive EngineeringMagazine vol 38 no 3 pp 263ndash268 2016

[7] Z Ma and S Cui ldquoApplying dynamic programming to HEVpowertrain based on EVTrdquo in Proceedings of the 2015 Inter-national Conference on Control Automation and InformationSciences vol 22 pp 37ndash41 2015

[8] Y Yang and P Ye ldquoA Research on the Fuzzy Control Strategy fora Speed-coupling ISG HEV Based on Dynamic Programmingoptimizationrdquo Automotive Engineering vol 38 no 6 pp 674ndash679 2016

[9] Z Chen C C Mi J Xu X Gong and C You ldquoEnergymanagement for a power-split plug-in hybrid electric vehiclebased on dynamic programming and neural networksrdquo IEEETransactions on Vehicular Technology vol 63 no 4 pp 1567ndash1580 2014

[10] C Sun S J Moura X Hu J K Hedrick and F Sun ldquoDynamictraffic feedback data enabled energy management in plug-inhybrid electric vehiclesrdquo IEEE Transactions on Control SystemsTechnology vol 23 no 3 pp 1075ndash1086 2015

[11] B Bader ldquoAn energy management strategy for plug-in hybridelectric vehiclesrdquoMateria vol 29 no 11 2013

[12] Z Chen L Li B Yan C Yang CMarinaMartinez andD CaoldquoMultimode Energy Management for Plug-In Hybrid ElectricBuses Based on Driving Cycles Predictionrdquo IEEE Transactionson Intelligent Transportation Systems vol 17 no 10 pp 2811ndash2821 2016

[13] Z Chen and Y Liu ldquoEnergy Management Strategy for Plug-inHybrid Electric Bus with Evolutionary Reinforcement Learningmethodrdquo Journal of Mechanical Engineering vol 53 no 16 pp86ndash93 2017

[14] T Liu B Wang and C Yang ldquoOnline Markov Chain-basedenergy management for a hybrid tracked vehicle with speedyQ-learningrdquo Energy vol 160 pp 544ndash555 2018

[15] L Teng ldquoOnline Energy Management for Multimode Plug-in Hybrid Electric Vehiclesrdquo IEEE Transactions on IndustrialInformatics vol 2018 no 10 pp 1ndash9 2018

[16] JWuHHe J Peng Y Li andZ Li ldquoContinuous reinforcementlearning of energy management with deep Q network for apower split hybrid electric busrdquo Applied Energy vol 222 pp799ndash811 2018

[17] T Liu X Hu S E Li and D Cao ldquoReinforcement LearningOptimized Look-Ahead Energy Management of a ParallelHybrid Electric Vehiclerdquo IEEEASME Transactions on Mecha-tronics vol 22 no 4 pp 1497ndash1507 2017

[18] T Liu and X Hu ldquoA Bi-Level Control for Energy EfficiencyImprovement of a Hybrid Tracked Vehiclerdquo IEEE Transactionson Industrial Informatics vol 14 no 4 pp 1616ndash1625 2018

[19] Y Yin Study on Optimization Matching and Simulation forSuper-Mild HEV Chongqing university Chongqing China2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 3: An Energy Management Strategy for a Super-Mild Hybrid

Journal of Control Science and Engineering 3

L0

Battery

CVT system with reflux power

Motor

EngineBL1

L2

L4 L3

1

2 3

1-metal belt continuously variable transmission2-fixed speed ratio gear transmission device3-planetary gear transmission device B-brakeL1 L2 L3-wet friction clutch 4-one-way clutch

Figure 1 Super-mild HEV architecture

L0

Battery

BL1L2

L4 L3

1

2 3

(a)

L0

Battery

BL1L2

L4 L3

1

2 3

(b)

L0

Battery

BL1L2

L4 L3

1

2 3

(c)

L0

Battery

BL1

L2

L4 L3

1

2 3

(d)

Figure 2 Working modes of super-mild HEV (a) only-motor (b) only-engine (c) Engine-charging (d) Regenerative braking

23 Vehicle Dynamics Model

231 Power DemandModel The power demand at the wheelis the power sum of rolling resistance air resistance andacceleration resistance

119875re119902 = (119865119891 + 119865119908 + 119865119895) V119865119891 = 119891119898119892119865119908 = 119862119863119860V22115119865119895 = 120575119898119889V119889119905

(1)

where 119865119891 is the rolling resistance 119865119908 is the air resis-tance 119865119895 is the acceleration resistance V is the vehiclespeed 119891 is the rolling resistance coefficient 119898 is thevehicle mass 119862119863 is the air resistance coefficient 119860 is thefrontal area and 120575 is the conversion coefficient of rotationmass

232 Engine Model The engine is a highly nonlinear systemand its working process is very complex Therefore enginedata for a steady-state condition are obtained through exper-iment testing Based on these data an engine torque modeland an effective fuel consumption model are established asshown in Figures 3 and 4 respectively

233 BatteryModel Based on aNi-MHbattery performanceexperiment the electromotive force and internal resistancemodel are obtained as shown in Formulas (2) and (3)

119864119904119900119888 = 1198640 + 5sum1

119864119894SOC119894 (2)

where 1198640 is the electromotive constant of the battery 119864119894 is thefitting coefficients SOC is the state of charge and 119864soc is theelectromotive force under the current state

119877119904119900119888 = 1205750(1198770 + 6sum1

120582119894SOC119894) (3)

4 Journal of Control Science and Engineering

10002000 3000 4000

5000 6000

0

50

1000

20

40

60

80

100

Engi

ne to

rque

T

(Nmiddotm

)rottle opening ()

Engine rotation speed n(rmiddotmin-1)

Figure 3 Engine torque model

10002000

30004000

50006000

020

4060

80

200

400

600

800

1000

Engine torque T (Nmiddotm)

Engine rotation speed n(rmiddotmin-1)

Fuel

cons

umpt

ion

rate

(gmiddotk

wh-1

)

Figure 4 Engine fuel consumption rate model

where 1198770 is the internal resistance constant of battery 120582119894 isthe fitting coefficient 1205750 is the compensation coefficient ofinternal resistance with the change in current SOC is state ofcharge and 119877soc is internal resistance under the current state

The process to calculate the change in SOC is shown byFormulas (4) sim (6)

Δ119878119874119862 = minus119868119887119886119905 (119905)119876119887119886119905 (4)

119868119887119886119905 (119905) = 119864119904119900119888 minus radic1198641199041199001198882 minus 41198751198871198861199051198771199041199001198882119877119904119900119888 (5)

Therefore

Δ119878119874119862 = minus119864119904119900119888 minus radic1198641199041199001198882 minus 41198751198871198861199051198771199041199001198882119877119904119900119888119876119887119886119905 (6)

where 119875bat is the battery power 119868bat is the battery current and119876bat is the battery capacity

3 Stochastic Modeling of DriverrsquosPower Demand

Traditionally the driverrsquos power demand is obtained accord-ing to a given driving cycle however in reality the drivingcycle is random The discrete-time stochastic process isregarded as Markov decision process (MDP) In other wordsthe transition probability from the current state to the nextstate only depends on the current state and the selectedaction this is independent of historical states The powerdemand at next state only depends on the current powerdemand which is independent of any previous state Toestablish the transition probability of power demand a largevolume of data is required In this study time-speed datasetsof the UDDS and ECE EUDC driving cycles are adopted tocalculate the power demand at each moment (Figure 5) Thetransition probability matrix of the driverrsquos power demand isobtained using the maximum likelihood estimation method

The steps to calculate the power demand based on thedriving cycle are as follows

119875119903119890119902 isin 1198751119903119890119902 1198752119903119890119902 sdot sdot sdot 119875119904119903119890119902 (7)

Journal of Control Science and Engineering 5

0 200 400 600 800 1000 1200 14000

50

100

150

Time (s)

Time (s)0 200 400 600 800 1000 1200 1400

0

20

40

60

80

100

Spee

d

(kmmiddoth

-1)

Spee

d

(kmmiddoth

-1)

Figure 5 ECE EUDC and UDDS driving cycles

The transition probability 119875119894119895 is the probability from thecurrent state 119875119894req to the next state 119875119895req119875119894119895 = 119875 119875119903119890119902 (119896 + 1) = 119875119895119903119890119902 | 119875119903119890119902 (119896) = 119875119894119903119890119902

119894 119895 = 1 2 sdot sdot sdot 119904 and 119904sum119895=1

119875119894119895 = 1 (8)

According to the maximum likelihood estimationmethod the power transition probability can be obtained by

119875119894119895 = 119898119894119895119898119894 119894 119895 isin 1 2 sdot sdot sdot 119904 (9)

where 119898119894119895 represents the number of times that the transitionfrom119875119894req to119875119895req has occured given the vehicle speed and119898119894represents the total number of times that 119875119894req has occurredat the vehicle speed it is given by

119898119894 = 119904sum119895=1

119898119894119895 (10)

Based on the data of ECE EUDC and UDDS drivingcycles transition probability matrices for the power demandare obtained at given speeds of 10 kmh 20 kmh 30 kmh40 kmh as shown in Figure 6

4 Known-Model RL EnergyManagement Strategy

41 Addressing Energy Management by RL From a macro-scopic perspective the energy management strategy ofan HEV involves determining the drivers power demandaccording to the drivers operation (acceleration pedal orbraking pedal) and distributing optimal power split betweentwo power sources (engine and motor) on the premise of

guaranteeing dynamic performance Fromamicroscopic per-spective the energy management strategy can be abstractedas solving a sequential optimization decision problem RL is amachine learningmethod based onMarkov decision processwhich can solve sequential optimization decision problems

The process of solving sequential decision problem by RLis as follows First a Markov decision process is representedby the tuple (119904 119886 119875 119903 120574) where 119904 is the state set 119886 is the actionset 119875 is the transition probability 119903 is the return functionand 120574 is the discount factor (Figure 7) Second based on theMarkov decision process the vehicle controller of an HEV isregarded as an agent the distribution of its engine power isregarded as action 119886 and the hybrid electric system exceptfor the vehicle controller is regarded as the environment Inorder to achieve minimum cumulative fuel consumption theagent takes certain action to interact with the environmentAfter the environment accepts the action the state beginsto change and an immediate return is generated to feedbackto the agent The agent chooses the next action based onan immediate return and the current state of environmentand then interacts with the environment again The agentinteracts with the environment continually thus generating aconsiderable amount of data RL utilizes the generated datato modify the action variable Then the agent interacts withthe environment again and generates new data this newdata is utilized to further optimize the action variable Afterseveral iterations the agent will learn the optimal action thatcan complete the corresponding task in other words it willdetermine the decision sequence 120587lowast0 997888rarr 120587lowast1 997888rarr sdot sdot sdot 120587lowast119905(Figure 8) thereby solving the sequential decision problem

42 Mathematical Model of an Energy Management StrategyBased on RL Mathematical models of the action variable119886 state variable 119904 and immediate return 119903 are establishedaccording to RL theoryThe SOC vehicle speed V and powerdemand 119875req are taken as state variables the engine power119875119890 is taken as the action variable and the sum of equivalentfuel consumption of a one-step state transition and the SOC

6 Journal of Control Science and Engineering

minus4 minus2 0 2 4 6 8 10

minus7minus5

minus3minus1

13

57

0

20

40

60

80

100

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW) Next demand power P

jLK

(kW)

(a) v=10kmh

49

14

minus14minus9

minus41

611

020

40

6080

100

minus11minus6

minus1

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW) Next demand power P

jLK

(kW)

(b) v=20kmh

4 6 810 12

minus4minus2 0

24

68

1012

0

20

40

60

80

100

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW)

Next demand power PjLK

(kW)

(c) v=30kmh

minus13 minus9minus5

minus13 7

11 15minus10

minus6minus2

26

10

0

50

100

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW)

Next demand power PjLK

(kW)

(d) v=40kmh

Figure 6 Power demand transition probability matrix

AGENT ENVIRONMENT

Action

State

Reward

State

R d

Figure 7 Schematic of reinforcement learning

s0 s1 stlowast

0 lowast1

Figure 8 Policy decision sequence

Journal of Control Science and Engineering 7

PI Algorithm1 Input state transition probability 119875119894119895 return function 119903 discount factor 120574Initialization value function V(s)=0Initialization policy 1205870

2 Repeat k=01 3 find 119881120587119896 policy evaluation4 120587119896+1 (119904) = arg min

a119902120587119896 (119904 119886)policy improvement

5 Until 120587119896+1 = 1205871198966 Output 120587lowast = 120587119896

Algorithm 1 Flowchart of the PI algorithm

penalty is taken as the immediate return 119903 1205821015840 is a factor of theequivalent fuel consumption 119898fuel is the fuel consumption119898ele is the electricity consumption 120572 is the penalty factor ofthe battery and 119878119874119862ref is reference value of the SOC

119904 = [119878119874119862 V 119875119903119890119902]119886 = 119875119890119903 = 119898119891119906119890119897 + 1205821015840119898119890119897119890 + 120572 (119878119874119862 minus 119878119874119862119903119890119891)2

(11)

In an infinite time domain the Markov decision processwill solve the problem of determining a sequence of decisionpolicies that can predict the minimum cumulative returnexpectation of a randomprocess called the optimal state valuefunction 119881lowast(119904)

119881lowast (119904) = min120587

119864[infinsum119905=0

120574119905119903119905+1] (12)

where 120574 isin [0 1] is the discount factor 119903119905+1 represents thereturn value at 119905 + 1 time and 120587 represents the control policy

Meanwhile the control and state variablesmust alsomeetthe following constraints

119878119874119862min le 119878119874119862 le 119878119874119862max

Vmin le V le Vmax

119875119890min le 119875119890 le 119875119890max

119875119898min le 119875119898 le 119875119898max

(13)

where the subscript min and max represent the maximumand minimum threshold values for the state of charge speedand power respectively

43 Policy Iterative Algorithm For Formula (12) according tothe Berman equation

119881lowast (119904) = minsum119886

120587 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840)sdot (119903 (119904 119886 1199041015840) + 120574119881lowast (1199041015840))

(14)

where s1015840 represents the state at the next time step

Table 2 Simulation results based on ECE EUDC

Control Policy Fuel Consumption (L100 km)Rule-based 5368Instantaneous optimization 5256DP 4952RL 5069

The purpose of the solution is to determine the optimalpolicy 120587lowast

120587lowast (119904) = argmin sum119886

120587 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840)sdot (119903 (119904 119886 1199041015840) + 120574119881lowast (1199041015840))

(15)

A policy iterative (PI) algorithm is used to solve the prob-lemof the randomprocess It involves a policy estimation stepand a policy improvement-step

The calculation process is shown in Algorithm 1In the policy evaluation step for a control policy 120587k(s)

(the subscript k represents the number of iterations) thecorresponding state value function 119881120587119896(s) is calculated asshown in

119881120587119896 (119904)= sum119886

120587119896 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840) (119903 (119904 119886 1199041015840) + 120574119881120587119896 (1199041015840)) (16)

In the policy improvement step the improved policy120587119896+1(s) is determined through a greedy strategy as shown in

120587119896+1 (119904) = argmin119886

119902120587119896 (119904 119886)119902120587119896 (119904 119886) = 119903 (119904 119886) + 120574sum119875(119904 119886 1199041015840)119881120587119896 (1199041015840) (17)

During policy iteration policy evaluation and policyimprovement are performed alternately until the state valuefunction and the policy converge

The policy iteration algorithm is adopted obtaining theoptimum control policy that takes the vehicle speed driverrsquospower demand and SOC as the input and the engine poweras the output Figure 9 shows the optimized engine powers atvehicle speeds of 10 kmh 20 kmh 30 kmh and 40 kmh

8 Journal of Control Science and Engineering

03 04 05 06 070246810

05

1015

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(a) v=10kmh

03 04 05 06 07

2468101205

101520

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(b) v=20kmh

03 04 05 06 0724681012

05

101520

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(c) v=30kmh

03 04 05 06 07

05101520250

10

20

30

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(d) v=40kmh

Figure 9 Optimization charts of engine power

Transition Probability Matrix

N

Y

Markov Chain

Offline Frame

Engine Model

SOCv

Online Frame

k=k+

1

Cost Function

0 200 400 600 800 1000 1200 14000

50

100

150

0 200 400 600 800 1000 1200 14000

20

40

60

80

100Time (s)

Time (s)

immediate return =mOF +mF+(SOC-SOCL )2

PLK=(F+FQ+FD) PCD =mCDmC

state variable s=[SOCPLK]

action variable a=P r(sa)

Motor Model

Battery Model

TransmissionModel Vehicle

Engine Power Look-up Table

Policy Iteration

Initial (s)=0 0 k=0

E+1(s)=LAGCHqE (sa)

VE (s)=sumE(sa) sumP(sas)(r(sas)+VE (s))

qE (sa)=r(sa)+sumP(sas)VE (s)

E+1 = E

lowast=E

DriverModel

PowerDemand

Accelerator pedal

Brake pedal

==

<LE

P

PG

P<N

T n

TG nG

minus4 minus2 0 2 4 6 8 10

minus7minus5

minus3minus1

13

57

0

20

40

60

80

100

49

14

minus14minus9

minus41

611

0

20

40

60

80

100

minus11minus6

minus1

4 6 810 12

minus4minus2 0

24

68

1012

0

20

40

60

80

100

minus13 minus9minus5 minus1

3 711 15

minus10minus6

minus22

610

0

50

100

=10kmh

=40kmh=30kmh

=20kmh

=10kmh

=40kmh=30kmh

=20kmh

03 04 05 06 070246810

05

1015

SOC03 04 05 06 07

2468101205

101520

SOC

03 04 05 06 0724681012

05

101520

SOC03 04 05 06 07

05101520250

10

20

30

SOC

Engine Power Optimization Chart

03 04 05 0607

05

10152025

0

10

20

30

SOC

Figure 10 Implementation frame of the energy management strategy

It can be seen from Figure 9 that only the motor workswhen the SOC is high and the power demand is low whenboth the SOC and the power demand are high only theengine works and when the SOC is low the engine is in thecharging mode

Figure 10 illustrates the offline and online implemen-tation frames of the energy management strategy In theoffline part the power demand transition probability matrixis obtained by the Markov Chain Mathematical models

of the state variable 119904 action variable 119886 and immediatereturn 119903 are derived according to RL theory The policyiteration algorithm is employed to determine engine poweroptimization tables In the online part the drivers powerdemand is obtained by the opening of drivers acceleratorpedal and brake pedal Then the power of the engine andmotor is distributed by looking up the offline engine poweroptimization tables Finally the power is transferred to thewheels by the transmission system

Journal of Control Science and Engineering 9

0 200 400 600 800 1000 1200 14000

50

100

150

Time (s)

Spee

d

(kmmiddoth

-1)

(a)

0 200 400 600 800 1000 1200 14000

1

2

3

4

Time (s)

Gea

r rat

ioi A

(b)

0 200 400 600 800 1000 1200 140003

04

05

06

07

SOC

Time (s)

(c)

0 200 400 600 800 1000 1200 1400minus50minus40minus30minus20minus10

010203040

Time (s)

Dem

and

pow

erPLK

(kW

)(d)

0 200 400 600 800 1000 1200 1400minus5

0

5

Time (s)

Mot

or p

ower

PG

(kW

)

(e)

40

0 200 400 600 800 1000 1200 14000

20

60

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(f)

0 200 400 600 800 1000 1200 1400minus30minus20

minus100

1020

Time (s)

Mot

or to

rque

TG

(Nmiddotm

)

(g)

1200 14000 200 400 600 800 1000minus05

005101520

3035

25

40

Equi

vale

nt fu

el

Time (s)

cons

umpt

ionb

KO

(mLmiddot

s-1)

(h)

Figure 11 ECE EUDC simulation results based on RL

5 Simulation Experiment and Results

A simulation is conducted on a MATLAB Simulink plat-form taking ECE EUDC driving cycle as the simulationdriving cycle and setting the initial SOC as 06 The energymanagement strategy based on a known model of RL isadopted simulating the vehicle operation status onlineResults are shown in Figure 11

From Figure 11 we can see that the following parameterschange with time gear ratio of transmission 119894g SOC ofthe battery power demand 119875req motor power 119875m enginetorque 119879e motor torque 119879119898 and instantaneous equivalentfuel consumption 119887equ From Figure 11(b) it can be seen that

the gear ratio varies continuously from 0498 to 404 thisis primarily because of the use of a CVT with reflux powerwhich unlike AMT can realize power without interruptionIn Figure 11(c) SOC of the battery is seen to change from theinitial value of 06 to the final value of 05845SOC = 00155this can meet the HEV SOC balance requirement before andafter the cycle (minus002 le Δ119878119874119862 le 002) The power demandchange curve of the vehicle is obtained based on ECE EUDCas shown in Figure 11(d)The power distribution curves of themotor and the engine can be obtained according to the enginepower optimization control tables as shown in Figures 11(e)and 11(f) The motor power is negative which indicates thatthe motor is in the generating state

10 Journal of Control Science and Engineering

0 200 400 600 800 1000 1200 1400

SOC

04

05

06

07

RL DP

Time (s)

(a)

0 200 400 600 800 1000 1200 1400minus30

minus20

minus10

0

10

20

RL DPTime (s)

Mot

or to

rque

TG

(Nmiddotm

)

(b)

0 200 400 600 800 1000 1200 14000

10

20

30

40

5060

70

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(c)

Figure 12 Comparison between DP and RL optimization results based on ECE EUDC

0 500 1000 15000

102030405060

Time (s)

Spee

d

(kmmiddoth

-1)

RL(a)

0 500 1000 150004

05

06

07

SOC

Time (s)

RL DP(b)

0 500 1000 1500minus50

minus25

0

25

50

Time (s)

Mot

or to

rque

TG

(Nmiddotm

)

RL DP(c)

0 500 1000 15000

20

40

60

80

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(d)

Figure 13 Comparison between DP and RL optimization results based on CYC WVUCITY

Journal of Control Science and Engineering 11

Table 3 Simulation results based on CYC WVUCITY

Control Policy Fuel Consumption (L100 km) Computation time(s)a

offline onlineDP 4983 5280 mdashRL 4864 4300 35aA 25GHz microprocessor with 8GB RAM was used

In order to validate the optimal performance of the RLcontrol strategy vehicle simulation tests based on DP andRL were carried out Figure 12 shows a comparison of theoptimization results by the two control methods based onthe ECE EUDC driving cycle The solid line indicates theoptimization results of RL and the dotted line indicatesthe results of DP Figure 12(a) shows the SOC optimizationtrajectories of the DP- and RL-based strategies The trend ofthe two curves is essentially the sameThe difference betweenthe final SOC and the initial SOC is within 002 and the SOCis relatively stable Compared with RL the SOC curve of DP-based strategy fluctuates greatly this is primarily related tothe distribution of the power source torque Figures 12(b) and12(c) indicate engine and motor torque distribution curvesbased on DP and RL strategies The engine torque curvesessentially coincide however the motor torque curves aresomewhat different This is primarily reflected in the torquedistribution when the motor is in the generation state

Table 2 shows the equivalent fuel consumption obtainedthrough DP and RL optimization Compared with thatobtained by DP (4952L) the value of fuel consumptionobtained by RL is 23 higher The reason for this is thatDP only ensures a global optimum at a given driving cycle(ECE EUDC) whereas RL optimizes the result for a series ofdriving cycles in an average sense thereby realizing a cumu-lative expected value optimum Compared with rule-based(5368L) and instantaneous optimization (5256L) proposedin the literature [19] RL decreased the fuel consumption by56 and 36 respectively

In order to validate the adaptability of RL to randomdriving cycle CYC WVUCITY is also selected as a simula-tion cycle The optimization results are shown in Figure 13Figure 13(b) shows the variation curve of the SOC of thebattery based on DP and RL control strategies The solidline indicates the optimization results of RL and the dottedline indicates the results of DP The final SOC value for theDP and RL strategies is 059 and 058 respectively with thesame initial value of 06The SOC remained essentially stableFigures 13(c) and 13(d) show the optimal torque curves ofthe motor and the engine based on DP and RL strategies Itcan be seen from the figure that the changing trend of thetorque curves based on the two strategies is largely the sameIn comparison the change in the torque obtained by DPfluctuates greatly Table 3 demonstrates the equivalent fuelconsumption of the two control strategies Compared withDP RL saves fuel by 24 it can adapt with random cycleperfectly Meanwhile the computation time based onDP andRL is recorded which include offline and online computationtime The offline computation time of DP is 5280s and thatof RL is 4300s Due to large calculation volume and the

driving cycle unknown in advance DP cannot be realizedonline while RL is not limited to a certain cycle which can berealized online by embedding the engine power optimizationtables into the vehicle controller The online operation timeof RL is 35s based on CYC WVUCITY simulation cycle

6 Conclusion

(1) We established a stochastic Markov model of thedriverrsquos power demand based on datasets of the UDDS andECE EUDC driving cycles(2) An RL energy management strategy was proposedwhich takes SOC vehicle speed and power demand as statevariables engine power as power as the action variable andminimum cumulative return expectation as the optimizationobjective(3) Using the MATLABSimulink platform a CYCWVUCITY simulation model was established The resultsshow that comparedwithDP RL could reduce fuel consump-tion by 24 and be implemented online

Data Availability

The data used to support the findings of this study areincluded within the article

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This research is supported by Scientific and Technol-ogy Research Program of Chongqing Municipal EducationCommission (Grant no KJQN201800718) and ChongqingResearchProgramof Basic Research andFrontier Technology(Grant no cstc2017jcyjAX0053)

References

[1] G Zhao and Z Chen ldquoEnergy management strategy for serieshybrid electric vehiclerdquo Journal of Northeastern University vol34 no 4 pp 583ndash587 2013

[2] W Wang and X Cheng ldquoForward simulation on the energymanagement system for series hybrid electric busrdquo AutomotiveEngineering vol 35 no 2 pp 121ndash126 2013

[3] D Qin and S Zhan ldquoManagement strategy of hybrid electricalvehicle based on driving style recognitionrdquo Journal of Mechani-cal Engineering vol 52 no 8 pp 162ndash169 2016

12 Journal of Control Science and Engineering

[4] S Zhan and D Qin ldquoEnergy management strategy of HEVbased on driving cycle recognition using genetic optimized K-means clustering algorithmrdquo Chinese Journal of Highway andTransport vol 29 no 4 pp 130ndash137 2016

[5] X Zeng and JWang ldquoA Parallel Hybrid Electric Vehicle EnergyManagement Strategy Using Stochastic Model Predictive Con-trol With Road Grade Previewrdquo IEEE Transactions on ControlSystems Technology vol 23 no 6 pp 2416ndash2423 2015

[6] X Tang and B Wang ldquoControl strategy for hybrid electric busbased on state transition probabilityrdquo Automotive EngineeringMagazine vol 38 no 3 pp 263ndash268 2016

[7] Z Ma and S Cui ldquoApplying dynamic programming to HEVpowertrain based on EVTrdquo in Proceedings of the 2015 Inter-national Conference on Control Automation and InformationSciences vol 22 pp 37ndash41 2015

[8] Y Yang and P Ye ldquoA Research on the Fuzzy Control Strategy fora Speed-coupling ISG HEV Based on Dynamic Programmingoptimizationrdquo Automotive Engineering vol 38 no 6 pp 674ndash679 2016

[9] Z Chen C C Mi J Xu X Gong and C You ldquoEnergymanagement for a power-split plug-in hybrid electric vehiclebased on dynamic programming and neural networksrdquo IEEETransactions on Vehicular Technology vol 63 no 4 pp 1567ndash1580 2014

[10] C Sun S J Moura X Hu J K Hedrick and F Sun ldquoDynamictraffic feedback data enabled energy management in plug-inhybrid electric vehiclesrdquo IEEE Transactions on Control SystemsTechnology vol 23 no 3 pp 1075ndash1086 2015

[11] B Bader ldquoAn energy management strategy for plug-in hybridelectric vehiclesrdquoMateria vol 29 no 11 2013

[12] Z Chen L Li B Yan C Yang CMarinaMartinez andD CaoldquoMultimode Energy Management for Plug-In Hybrid ElectricBuses Based on Driving Cycles Predictionrdquo IEEE Transactionson Intelligent Transportation Systems vol 17 no 10 pp 2811ndash2821 2016

[13] Z Chen and Y Liu ldquoEnergy Management Strategy for Plug-inHybrid Electric Bus with Evolutionary Reinforcement Learningmethodrdquo Journal of Mechanical Engineering vol 53 no 16 pp86ndash93 2017

[14] T Liu B Wang and C Yang ldquoOnline Markov Chain-basedenergy management for a hybrid tracked vehicle with speedyQ-learningrdquo Energy vol 160 pp 544ndash555 2018

[15] L Teng ldquoOnline Energy Management for Multimode Plug-in Hybrid Electric Vehiclesrdquo IEEE Transactions on IndustrialInformatics vol 2018 no 10 pp 1ndash9 2018

[16] JWuHHe J Peng Y Li andZ Li ldquoContinuous reinforcementlearning of energy management with deep Q network for apower split hybrid electric busrdquo Applied Energy vol 222 pp799ndash811 2018

[17] T Liu X Hu S E Li and D Cao ldquoReinforcement LearningOptimized Look-Ahead Energy Management of a ParallelHybrid Electric Vehiclerdquo IEEEASME Transactions on Mecha-tronics vol 22 no 4 pp 1497ndash1507 2017

[18] T Liu and X Hu ldquoA Bi-Level Control for Energy EfficiencyImprovement of a Hybrid Tracked Vehiclerdquo IEEE Transactionson Industrial Informatics vol 14 no 4 pp 1616ndash1625 2018

[19] Y Yin Study on Optimization Matching and Simulation forSuper-Mild HEV Chongqing university Chongqing China2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 4: An Energy Management Strategy for a Super-Mild Hybrid

4 Journal of Control Science and Engineering

10002000 3000 4000

5000 6000

0

50

1000

20

40

60

80

100

Engi

ne to

rque

T

(Nmiddotm

)rottle opening ()

Engine rotation speed n(rmiddotmin-1)

Figure 3 Engine torque model

10002000

30004000

50006000

020

4060

80

200

400

600

800

1000

Engine torque T (Nmiddotm)

Engine rotation speed n(rmiddotmin-1)

Fuel

cons

umpt

ion

rate

(gmiddotk

wh-1

)

Figure 4 Engine fuel consumption rate model

where 1198770 is the internal resistance constant of battery 120582119894 isthe fitting coefficient 1205750 is the compensation coefficient ofinternal resistance with the change in current SOC is state ofcharge and 119877soc is internal resistance under the current state

The process to calculate the change in SOC is shown byFormulas (4) sim (6)

Δ119878119874119862 = minus119868119887119886119905 (119905)119876119887119886119905 (4)

119868119887119886119905 (119905) = 119864119904119900119888 minus radic1198641199041199001198882 minus 41198751198871198861199051198771199041199001198882119877119904119900119888 (5)

Therefore

Δ119878119874119862 = minus119864119904119900119888 minus radic1198641199041199001198882 minus 41198751198871198861199051198771199041199001198882119877119904119900119888119876119887119886119905 (6)

where 119875bat is the battery power 119868bat is the battery current and119876bat is the battery capacity

3 Stochastic Modeling of DriverrsquosPower Demand

Traditionally the driverrsquos power demand is obtained accord-ing to a given driving cycle however in reality the drivingcycle is random The discrete-time stochastic process isregarded as Markov decision process (MDP) In other wordsthe transition probability from the current state to the nextstate only depends on the current state and the selectedaction this is independent of historical states The powerdemand at next state only depends on the current powerdemand which is independent of any previous state Toestablish the transition probability of power demand a largevolume of data is required In this study time-speed datasetsof the UDDS and ECE EUDC driving cycles are adopted tocalculate the power demand at each moment (Figure 5) Thetransition probability matrix of the driverrsquos power demand isobtained using the maximum likelihood estimation method

The steps to calculate the power demand based on thedriving cycle are as follows

119875119903119890119902 isin 1198751119903119890119902 1198752119903119890119902 sdot sdot sdot 119875119904119903119890119902 (7)

Journal of Control Science and Engineering 5

0 200 400 600 800 1000 1200 14000

50

100

150

Time (s)

Time (s)0 200 400 600 800 1000 1200 1400

0

20

40

60

80

100

Spee

d

(kmmiddoth

-1)

Spee

d

(kmmiddoth

-1)

Figure 5 ECE EUDC and UDDS driving cycles

The transition probability 119875119894119895 is the probability from thecurrent state 119875119894req to the next state 119875119895req119875119894119895 = 119875 119875119903119890119902 (119896 + 1) = 119875119895119903119890119902 | 119875119903119890119902 (119896) = 119875119894119903119890119902

119894 119895 = 1 2 sdot sdot sdot 119904 and 119904sum119895=1

119875119894119895 = 1 (8)

According to the maximum likelihood estimationmethod the power transition probability can be obtained by

119875119894119895 = 119898119894119895119898119894 119894 119895 isin 1 2 sdot sdot sdot 119904 (9)

where 119898119894119895 represents the number of times that the transitionfrom119875119894req to119875119895req has occured given the vehicle speed and119898119894represents the total number of times that 119875119894req has occurredat the vehicle speed it is given by

119898119894 = 119904sum119895=1

119898119894119895 (10)

Based on the data of ECE EUDC and UDDS drivingcycles transition probability matrices for the power demandare obtained at given speeds of 10 kmh 20 kmh 30 kmh40 kmh as shown in Figure 6

4 Known-Model RL EnergyManagement Strategy

41 Addressing Energy Management by RL From a macro-scopic perspective the energy management strategy ofan HEV involves determining the drivers power demandaccording to the drivers operation (acceleration pedal orbraking pedal) and distributing optimal power split betweentwo power sources (engine and motor) on the premise of

guaranteeing dynamic performance Fromamicroscopic per-spective the energy management strategy can be abstractedas solving a sequential optimization decision problem RL is amachine learningmethod based onMarkov decision processwhich can solve sequential optimization decision problems

The process of solving sequential decision problem by RLis as follows First a Markov decision process is representedby the tuple (119904 119886 119875 119903 120574) where 119904 is the state set 119886 is the actionset 119875 is the transition probability 119903 is the return functionand 120574 is the discount factor (Figure 7) Second based on theMarkov decision process the vehicle controller of an HEV isregarded as an agent the distribution of its engine power isregarded as action 119886 and the hybrid electric system exceptfor the vehicle controller is regarded as the environment Inorder to achieve minimum cumulative fuel consumption theagent takes certain action to interact with the environmentAfter the environment accepts the action the state beginsto change and an immediate return is generated to feedbackto the agent The agent chooses the next action based onan immediate return and the current state of environmentand then interacts with the environment again The agentinteracts with the environment continually thus generating aconsiderable amount of data RL utilizes the generated datato modify the action variable Then the agent interacts withthe environment again and generates new data this newdata is utilized to further optimize the action variable Afterseveral iterations the agent will learn the optimal action thatcan complete the corresponding task in other words it willdetermine the decision sequence 120587lowast0 997888rarr 120587lowast1 997888rarr sdot sdot sdot 120587lowast119905(Figure 8) thereby solving the sequential decision problem

42 Mathematical Model of an Energy Management StrategyBased on RL Mathematical models of the action variable119886 state variable 119904 and immediate return 119903 are establishedaccording to RL theoryThe SOC vehicle speed V and powerdemand 119875req are taken as state variables the engine power119875119890 is taken as the action variable and the sum of equivalentfuel consumption of a one-step state transition and the SOC

6 Journal of Control Science and Engineering

minus4 minus2 0 2 4 6 8 10

minus7minus5

minus3minus1

13

57

0

20

40

60

80

100

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW) Next demand power P

jLK

(kW)

(a) v=10kmh

49

14

minus14minus9

minus41

611

020

40

6080

100

minus11minus6

minus1

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW) Next demand power P

jLK

(kW)

(b) v=20kmh

4 6 810 12

minus4minus2 0

24

68

1012

0

20

40

60

80

100

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW)

Next demand power PjLK

(kW)

(c) v=30kmh

minus13 minus9minus5

minus13 7

11 15minus10

minus6minus2

26

10

0

50

100

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW)

Next demand power PjLK

(kW)

(d) v=40kmh

Figure 6 Power demand transition probability matrix

AGENT ENVIRONMENT

Action

State

Reward

State

R d

Figure 7 Schematic of reinforcement learning

s0 s1 stlowast

0 lowast1

Figure 8 Policy decision sequence

Journal of Control Science and Engineering 7

PI Algorithm1 Input state transition probability 119875119894119895 return function 119903 discount factor 120574Initialization value function V(s)=0Initialization policy 1205870

2 Repeat k=01 3 find 119881120587119896 policy evaluation4 120587119896+1 (119904) = arg min

a119902120587119896 (119904 119886)policy improvement

5 Until 120587119896+1 = 1205871198966 Output 120587lowast = 120587119896

Algorithm 1 Flowchart of the PI algorithm

penalty is taken as the immediate return 119903 1205821015840 is a factor of theequivalent fuel consumption 119898fuel is the fuel consumption119898ele is the electricity consumption 120572 is the penalty factor ofthe battery and 119878119874119862ref is reference value of the SOC

119904 = [119878119874119862 V 119875119903119890119902]119886 = 119875119890119903 = 119898119891119906119890119897 + 1205821015840119898119890119897119890 + 120572 (119878119874119862 minus 119878119874119862119903119890119891)2

(11)

In an infinite time domain the Markov decision processwill solve the problem of determining a sequence of decisionpolicies that can predict the minimum cumulative returnexpectation of a randomprocess called the optimal state valuefunction 119881lowast(119904)

119881lowast (119904) = min120587

119864[infinsum119905=0

120574119905119903119905+1] (12)

where 120574 isin [0 1] is the discount factor 119903119905+1 represents thereturn value at 119905 + 1 time and 120587 represents the control policy

Meanwhile the control and state variablesmust alsomeetthe following constraints

119878119874119862min le 119878119874119862 le 119878119874119862max

Vmin le V le Vmax

119875119890min le 119875119890 le 119875119890max

119875119898min le 119875119898 le 119875119898max

(13)

where the subscript min and max represent the maximumand minimum threshold values for the state of charge speedand power respectively

43 Policy Iterative Algorithm For Formula (12) according tothe Berman equation

119881lowast (119904) = minsum119886

120587 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840)sdot (119903 (119904 119886 1199041015840) + 120574119881lowast (1199041015840))

(14)

where s1015840 represents the state at the next time step

Table 2 Simulation results based on ECE EUDC

Control Policy Fuel Consumption (L100 km)Rule-based 5368Instantaneous optimization 5256DP 4952RL 5069

The purpose of the solution is to determine the optimalpolicy 120587lowast

120587lowast (119904) = argmin sum119886

120587 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840)sdot (119903 (119904 119886 1199041015840) + 120574119881lowast (1199041015840))

(15)

A policy iterative (PI) algorithm is used to solve the prob-lemof the randomprocess It involves a policy estimation stepand a policy improvement-step

The calculation process is shown in Algorithm 1In the policy evaluation step for a control policy 120587k(s)

(the subscript k represents the number of iterations) thecorresponding state value function 119881120587119896(s) is calculated asshown in

119881120587119896 (119904)= sum119886

120587119896 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840) (119903 (119904 119886 1199041015840) + 120574119881120587119896 (1199041015840)) (16)

In the policy improvement step the improved policy120587119896+1(s) is determined through a greedy strategy as shown in

120587119896+1 (119904) = argmin119886

119902120587119896 (119904 119886)119902120587119896 (119904 119886) = 119903 (119904 119886) + 120574sum119875(119904 119886 1199041015840)119881120587119896 (1199041015840) (17)

During policy iteration policy evaluation and policyimprovement are performed alternately until the state valuefunction and the policy converge

The policy iteration algorithm is adopted obtaining theoptimum control policy that takes the vehicle speed driverrsquospower demand and SOC as the input and the engine poweras the output Figure 9 shows the optimized engine powers atvehicle speeds of 10 kmh 20 kmh 30 kmh and 40 kmh

8 Journal of Control Science and Engineering

03 04 05 06 070246810

05

1015

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(a) v=10kmh

03 04 05 06 07

2468101205

101520

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(b) v=20kmh

03 04 05 06 0724681012

05

101520

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(c) v=30kmh

03 04 05 06 07

05101520250

10

20

30

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(d) v=40kmh

Figure 9 Optimization charts of engine power

Transition Probability Matrix

N

Y

Markov Chain

Offline Frame

Engine Model

SOCv

Online Frame

k=k+

1

Cost Function

0 200 400 600 800 1000 1200 14000

50

100

150

0 200 400 600 800 1000 1200 14000

20

40

60

80

100Time (s)

Time (s)

immediate return =mOF +mF+(SOC-SOCL )2

PLK=(F+FQ+FD) PCD =mCDmC

state variable s=[SOCPLK]

action variable a=P r(sa)

Motor Model

Battery Model

TransmissionModel Vehicle

Engine Power Look-up Table

Policy Iteration

Initial (s)=0 0 k=0

E+1(s)=LAGCHqE (sa)

VE (s)=sumE(sa) sumP(sas)(r(sas)+VE (s))

qE (sa)=r(sa)+sumP(sas)VE (s)

E+1 = E

lowast=E

DriverModel

PowerDemand

Accelerator pedal

Brake pedal

==

<LE

P

PG

P<N

T n

TG nG

minus4 minus2 0 2 4 6 8 10

minus7minus5

minus3minus1

13

57

0

20

40

60

80

100

49

14

minus14minus9

minus41

611

0

20

40

60

80

100

minus11minus6

minus1

4 6 810 12

minus4minus2 0

24

68

1012

0

20

40

60

80

100

minus13 minus9minus5 minus1

3 711 15

minus10minus6

minus22

610

0

50

100

=10kmh

=40kmh=30kmh

=20kmh

=10kmh

=40kmh=30kmh

=20kmh

03 04 05 06 070246810

05

1015

SOC03 04 05 06 07

2468101205

101520

SOC

03 04 05 06 0724681012

05

101520

SOC03 04 05 06 07

05101520250

10

20

30

SOC

Engine Power Optimization Chart

03 04 05 0607

05

10152025

0

10

20

30

SOC

Figure 10 Implementation frame of the energy management strategy

It can be seen from Figure 9 that only the motor workswhen the SOC is high and the power demand is low whenboth the SOC and the power demand are high only theengine works and when the SOC is low the engine is in thecharging mode

Figure 10 illustrates the offline and online implemen-tation frames of the energy management strategy In theoffline part the power demand transition probability matrixis obtained by the Markov Chain Mathematical models

of the state variable 119904 action variable 119886 and immediatereturn 119903 are derived according to RL theory The policyiteration algorithm is employed to determine engine poweroptimization tables In the online part the drivers powerdemand is obtained by the opening of drivers acceleratorpedal and brake pedal Then the power of the engine andmotor is distributed by looking up the offline engine poweroptimization tables Finally the power is transferred to thewheels by the transmission system

Journal of Control Science and Engineering 9

0 200 400 600 800 1000 1200 14000

50

100

150

Time (s)

Spee

d

(kmmiddoth

-1)

(a)

0 200 400 600 800 1000 1200 14000

1

2

3

4

Time (s)

Gea

r rat

ioi A

(b)

0 200 400 600 800 1000 1200 140003

04

05

06

07

SOC

Time (s)

(c)

0 200 400 600 800 1000 1200 1400minus50minus40minus30minus20minus10

010203040

Time (s)

Dem

and

pow

erPLK

(kW

)(d)

0 200 400 600 800 1000 1200 1400minus5

0

5

Time (s)

Mot

or p

ower

PG

(kW

)

(e)

40

0 200 400 600 800 1000 1200 14000

20

60

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(f)

0 200 400 600 800 1000 1200 1400minus30minus20

minus100

1020

Time (s)

Mot

or to

rque

TG

(Nmiddotm

)

(g)

1200 14000 200 400 600 800 1000minus05

005101520

3035

25

40

Equi

vale

nt fu

el

Time (s)

cons

umpt

ionb

KO

(mLmiddot

s-1)

(h)

Figure 11 ECE EUDC simulation results based on RL

5 Simulation Experiment and Results

A simulation is conducted on a MATLAB Simulink plat-form taking ECE EUDC driving cycle as the simulationdriving cycle and setting the initial SOC as 06 The energymanagement strategy based on a known model of RL isadopted simulating the vehicle operation status onlineResults are shown in Figure 11

From Figure 11 we can see that the following parameterschange with time gear ratio of transmission 119894g SOC ofthe battery power demand 119875req motor power 119875m enginetorque 119879e motor torque 119879119898 and instantaneous equivalentfuel consumption 119887equ From Figure 11(b) it can be seen that

the gear ratio varies continuously from 0498 to 404 thisis primarily because of the use of a CVT with reflux powerwhich unlike AMT can realize power without interruptionIn Figure 11(c) SOC of the battery is seen to change from theinitial value of 06 to the final value of 05845SOC = 00155this can meet the HEV SOC balance requirement before andafter the cycle (minus002 le Δ119878119874119862 le 002) The power demandchange curve of the vehicle is obtained based on ECE EUDCas shown in Figure 11(d)The power distribution curves of themotor and the engine can be obtained according to the enginepower optimization control tables as shown in Figures 11(e)and 11(f) The motor power is negative which indicates thatthe motor is in the generating state

10 Journal of Control Science and Engineering

0 200 400 600 800 1000 1200 1400

SOC

04

05

06

07

RL DP

Time (s)

(a)

0 200 400 600 800 1000 1200 1400minus30

minus20

minus10

0

10

20

RL DPTime (s)

Mot

or to

rque

TG

(Nmiddotm

)

(b)

0 200 400 600 800 1000 1200 14000

10

20

30

40

5060

70

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(c)

Figure 12 Comparison between DP and RL optimization results based on ECE EUDC

0 500 1000 15000

102030405060

Time (s)

Spee

d

(kmmiddoth

-1)

RL(a)

0 500 1000 150004

05

06

07

SOC

Time (s)

RL DP(b)

0 500 1000 1500minus50

minus25

0

25

50

Time (s)

Mot

or to

rque

TG

(Nmiddotm

)

RL DP(c)

0 500 1000 15000

20

40

60

80

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(d)

Figure 13 Comparison between DP and RL optimization results based on CYC WVUCITY

Journal of Control Science and Engineering 11

Table 3 Simulation results based on CYC WVUCITY

Control Policy Fuel Consumption (L100 km) Computation time(s)a

offline onlineDP 4983 5280 mdashRL 4864 4300 35aA 25GHz microprocessor with 8GB RAM was used

In order to validate the optimal performance of the RLcontrol strategy vehicle simulation tests based on DP andRL were carried out Figure 12 shows a comparison of theoptimization results by the two control methods based onthe ECE EUDC driving cycle The solid line indicates theoptimization results of RL and the dotted line indicatesthe results of DP Figure 12(a) shows the SOC optimizationtrajectories of the DP- and RL-based strategies The trend ofthe two curves is essentially the sameThe difference betweenthe final SOC and the initial SOC is within 002 and the SOCis relatively stable Compared with RL the SOC curve of DP-based strategy fluctuates greatly this is primarily related tothe distribution of the power source torque Figures 12(b) and12(c) indicate engine and motor torque distribution curvesbased on DP and RL strategies The engine torque curvesessentially coincide however the motor torque curves aresomewhat different This is primarily reflected in the torquedistribution when the motor is in the generation state

Table 2 shows the equivalent fuel consumption obtainedthrough DP and RL optimization Compared with thatobtained by DP (4952L) the value of fuel consumptionobtained by RL is 23 higher The reason for this is thatDP only ensures a global optimum at a given driving cycle(ECE EUDC) whereas RL optimizes the result for a series ofdriving cycles in an average sense thereby realizing a cumu-lative expected value optimum Compared with rule-based(5368L) and instantaneous optimization (5256L) proposedin the literature [19] RL decreased the fuel consumption by56 and 36 respectively

In order to validate the adaptability of RL to randomdriving cycle CYC WVUCITY is also selected as a simula-tion cycle The optimization results are shown in Figure 13Figure 13(b) shows the variation curve of the SOC of thebattery based on DP and RL control strategies The solidline indicates the optimization results of RL and the dottedline indicates the results of DP The final SOC value for theDP and RL strategies is 059 and 058 respectively with thesame initial value of 06The SOC remained essentially stableFigures 13(c) and 13(d) show the optimal torque curves ofthe motor and the engine based on DP and RL strategies Itcan be seen from the figure that the changing trend of thetorque curves based on the two strategies is largely the sameIn comparison the change in the torque obtained by DPfluctuates greatly Table 3 demonstrates the equivalent fuelconsumption of the two control strategies Compared withDP RL saves fuel by 24 it can adapt with random cycleperfectly Meanwhile the computation time based onDP andRL is recorded which include offline and online computationtime The offline computation time of DP is 5280s and thatof RL is 4300s Due to large calculation volume and the

driving cycle unknown in advance DP cannot be realizedonline while RL is not limited to a certain cycle which can berealized online by embedding the engine power optimizationtables into the vehicle controller The online operation timeof RL is 35s based on CYC WVUCITY simulation cycle

6 Conclusion

(1) We established a stochastic Markov model of thedriverrsquos power demand based on datasets of the UDDS andECE EUDC driving cycles(2) An RL energy management strategy was proposedwhich takes SOC vehicle speed and power demand as statevariables engine power as power as the action variable andminimum cumulative return expectation as the optimizationobjective(3) Using the MATLABSimulink platform a CYCWVUCITY simulation model was established The resultsshow that comparedwithDP RL could reduce fuel consump-tion by 24 and be implemented online

Data Availability

The data used to support the findings of this study areincluded within the article

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This research is supported by Scientific and Technol-ogy Research Program of Chongqing Municipal EducationCommission (Grant no KJQN201800718) and ChongqingResearchProgramof Basic Research andFrontier Technology(Grant no cstc2017jcyjAX0053)

References

[1] G Zhao and Z Chen ldquoEnergy management strategy for serieshybrid electric vehiclerdquo Journal of Northeastern University vol34 no 4 pp 583ndash587 2013

[2] W Wang and X Cheng ldquoForward simulation on the energymanagement system for series hybrid electric busrdquo AutomotiveEngineering vol 35 no 2 pp 121ndash126 2013

[3] D Qin and S Zhan ldquoManagement strategy of hybrid electricalvehicle based on driving style recognitionrdquo Journal of Mechani-cal Engineering vol 52 no 8 pp 162ndash169 2016

12 Journal of Control Science and Engineering

[4] S Zhan and D Qin ldquoEnergy management strategy of HEVbased on driving cycle recognition using genetic optimized K-means clustering algorithmrdquo Chinese Journal of Highway andTransport vol 29 no 4 pp 130ndash137 2016

[5] X Zeng and JWang ldquoA Parallel Hybrid Electric Vehicle EnergyManagement Strategy Using Stochastic Model Predictive Con-trol With Road Grade Previewrdquo IEEE Transactions on ControlSystems Technology vol 23 no 6 pp 2416ndash2423 2015

[6] X Tang and B Wang ldquoControl strategy for hybrid electric busbased on state transition probabilityrdquo Automotive EngineeringMagazine vol 38 no 3 pp 263ndash268 2016

[7] Z Ma and S Cui ldquoApplying dynamic programming to HEVpowertrain based on EVTrdquo in Proceedings of the 2015 Inter-national Conference on Control Automation and InformationSciences vol 22 pp 37ndash41 2015

[8] Y Yang and P Ye ldquoA Research on the Fuzzy Control Strategy fora Speed-coupling ISG HEV Based on Dynamic Programmingoptimizationrdquo Automotive Engineering vol 38 no 6 pp 674ndash679 2016

[9] Z Chen C C Mi J Xu X Gong and C You ldquoEnergymanagement for a power-split plug-in hybrid electric vehiclebased on dynamic programming and neural networksrdquo IEEETransactions on Vehicular Technology vol 63 no 4 pp 1567ndash1580 2014

[10] C Sun S J Moura X Hu J K Hedrick and F Sun ldquoDynamictraffic feedback data enabled energy management in plug-inhybrid electric vehiclesrdquo IEEE Transactions on Control SystemsTechnology vol 23 no 3 pp 1075ndash1086 2015

[11] B Bader ldquoAn energy management strategy for plug-in hybridelectric vehiclesrdquoMateria vol 29 no 11 2013

[12] Z Chen L Li B Yan C Yang CMarinaMartinez andD CaoldquoMultimode Energy Management for Plug-In Hybrid ElectricBuses Based on Driving Cycles Predictionrdquo IEEE Transactionson Intelligent Transportation Systems vol 17 no 10 pp 2811ndash2821 2016

[13] Z Chen and Y Liu ldquoEnergy Management Strategy for Plug-inHybrid Electric Bus with Evolutionary Reinforcement Learningmethodrdquo Journal of Mechanical Engineering vol 53 no 16 pp86ndash93 2017

[14] T Liu B Wang and C Yang ldquoOnline Markov Chain-basedenergy management for a hybrid tracked vehicle with speedyQ-learningrdquo Energy vol 160 pp 544ndash555 2018

[15] L Teng ldquoOnline Energy Management for Multimode Plug-in Hybrid Electric Vehiclesrdquo IEEE Transactions on IndustrialInformatics vol 2018 no 10 pp 1ndash9 2018

[16] JWuHHe J Peng Y Li andZ Li ldquoContinuous reinforcementlearning of energy management with deep Q network for apower split hybrid electric busrdquo Applied Energy vol 222 pp799ndash811 2018

[17] T Liu X Hu S E Li and D Cao ldquoReinforcement LearningOptimized Look-Ahead Energy Management of a ParallelHybrid Electric Vehiclerdquo IEEEASME Transactions on Mecha-tronics vol 22 no 4 pp 1497ndash1507 2017

[18] T Liu and X Hu ldquoA Bi-Level Control for Energy EfficiencyImprovement of a Hybrid Tracked Vehiclerdquo IEEE Transactionson Industrial Informatics vol 14 no 4 pp 1616ndash1625 2018

[19] Y Yin Study on Optimization Matching and Simulation forSuper-Mild HEV Chongqing university Chongqing China2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 5: An Energy Management Strategy for a Super-Mild Hybrid

Journal of Control Science and Engineering 5

0 200 400 600 800 1000 1200 14000

50

100

150

Time (s)

Time (s)0 200 400 600 800 1000 1200 1400

0

20

40

60

80

100

Spee

d

(kmmiddoth

-1)

Spee

d

(kmmiddoth

-1)

Figure 5 ECE EUDC and UDDS driving cycles

The transition probability 119875119894119895 is the probability from thecurrent state 119875119894req to the next state 119875119895req119875119894119895 = 119875 119875119903119890119902 (119896 + 1) = 119875119895119903119890119902 | 119875119903119890119902 (119896) = 119875119894119903119890119902

119894 119895 = 1 2 sdot sdot sdot 119904 and 119904sum119895=1

119875119894119895 = 1 (8)

According to the maximum likelihood estimationmethod the power transition probability can be obtained by

119875119894119895 = 119898119894119895119898119894 119894 119895 isin 1 2 sdot sdot sdot 119904 (9)

where 119898119894119895 represents the number of times that the transitionfrom119875119894req to119875119895req has occured given the vehicle speed and119898119894represents the total number of times that 119875119894req has occurredat the vehicle speed it is given by

119898119894 = 119904sum119895=1

119898119894119895 (10)

Based on the data of ECE EUDC and UDDS drivingcycles transition probability matrices for the power demandare obtained at given speeds of 10 kmh 20 kmh 30 kmh40 kmh as shown in Figure 6

4 Known-Model RL EnergyManagement Strategy

41 Addressing Energy Management by RL From a macro-scopic perspective the energy management strategy ofan HEV involves determining the drivers power demandaccording to the drivers operation (acceleration pedal orbraking pedal) and distributing optimal power split betweentwo power sources (engine and motor) on the premise of

guaranteeing dynamic performance Fromamicroscopic per-spective the energy management strategy can be abstractedas solving a sequential optimization decision problem RL is amachine learningmethod based onMarkov decision processwhich can solve sequential optimization decision problems

The process of solving sequential decision problem by RLis as follows First a Markov decision process is representedby the tuple (119904 119886 119875 119903 120574) where 119904 is the state set 119886 is the actionset 119875 is the transition probability 119903 is the return functionand 120574 is the discount factor (Figure 7) Second based on theMarkov decision process the vehicle controller of an HEV isregarded as an agent the distribution of its engine power isregarded as action 119886 and the hybrid electric system exceptfor the vehicle controller is regarded as the environment Inorder to achieve minimum cumulative fuel consumption theagent takes certain action to interact with the environmentAfter the environment accepts the action the state beginsto change and an immediate return is generated to feedbackto the agent The agent chooses the next action based onan immediate return and the current state of environmentand then interacts with the environment again The agentinteracts with the environment continually thus generating aconsiderable amount of data RL utilizes the generated datato modify the action variable Then the agent interacts withthe environment again and generates new data this newdata is utilized to further optimize the action variable Afterseveral iterations the agent will learn the optimal action thatcan complete the corresponding task in other words it willdetermine the decision sequence 120587lowast0 997888rarr 120587lowast1 997888rarr sdot sdot sdot 120587lowast119905(Figure 8) thereby solving the sequential decision problem

42 Mathematical Model of an Energy Management StrategyBased on RL Mathematical models of the action variable119886 state variable 119904 and immediate return 119903 are establishedaccording to RL theoryThe SOC vehicle speed V and powerdemand 119875req are taken as state variables the engine power119875119890 is taken as the action variable and the sum of equivalentfuel consumption of a one-step state transition and the SOC

6 Journal of Control Science and Engineering

minus4 minus2 0 2 4 6 8 10

minus7minus5

minus3minus1

13

57

0

20

40

60

80

100

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW) Next demand power P

jLK

(kW)

(a) v=10kmh

49

14

minus14minus9

minus41

611

020

40

6080

100

minus11minus6

minus1

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW) Next demand power P

jLK

(kW)

(b) v=20kmh

4 6 810 12

minus4minus2 0

24

68

1012

0

20

40

60

80

100

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW)

Next demand power PjLK

(kW)

(c) v=30kmh

minus13 minus9minus5

minus13 7

11 15minus10

minus6minus2

26

10

0

50

100

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW)

Next demand power PjLK

(kW)

(d) v=40kmh

Figure 6 Power demand transition probability matrix

AGENT ENVIRONMENT

Action

State

Reward

State

R d

Figure 7 Schematic of reinforcement learning

s0 s1 stlowast

0 lowast1

Figure 8 Policy decision sequence

Journal of Control Science and Engineering 7

PI Algorithm1 Input state transition probability 119875119894119895 return function 119903 discount factor 120574Initialization value function V(s)=0Initialization policy 1205870

2 Repeat k=01 3 find 119881120587119896 policy evaluation4 120587119896+1 (119904) = arg min

a119902120587119896 (119904 119886)policy improvement

5 Until 120587119896+1 = 1205871198966 Output 120587lowast = 120587119896

Algorithm 1 Flowchart of the PI algorithm

penalty is taken as the immediate return 119903 1205821015840 is a factor of theequivalent fuel consumption 119898fuel is the fuel consumption119898ele is the electricity consumption 120572 is the penalty factor ofthe battery and 119878119874119862ref is reference value of the SOC

119904 = [119878119874119862 V 119875119903119890119902]119886 = 119875119890119903 = 119898119891119906119890119897 + 1205821015840119898119890119897119890 + 120572 (119878119874119862 minus 119878119874119862119903119890119891)2

(11)

In an infinite time domain the Markov decision processwill solve the problem of determining a sequence of decisionpolicies that can predict the minimum cumulative returnexpectation of a randomprocess called the optimal state valuefunction 119881lowast(119904)

119881lowast (119904) = min120587

119864[infinsum119905=0

120574119905119903119905+1] (12)

where 120574 isin [0 1] is the discount factor 119903119905+1 represents thereturn value at 119905 + 1 time and 120587 represents the control policy

Meanwhile the control and state variablesmust alsomeetthe following constraints

119878119874119862min le 119878119874119862 le 119878119874119862max

Vmin le V le Vmax

119875119890min le 119875119890 le 119875119890max

119875119898min le 119875119898 le 119875119898max

(13)

where the subscript min and max represent the maximumand minimum threshold values for the state of charge speedand power respectively

43 Policy Iterative Algorithm For Formula (12) according tothe Berman equation

119881lowast (119904) = minsum119886

120587 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840)sdot (119903 (119904 119886 1199041015840) + 120574119881lowast (1199041015840))

(14)

where s1015840 represents the state at the next time step

Table 2 Simulation results based on ECE EUDC

Control Policy Fuel Consumption (L100 km)Rule-based 5368Instantaneous optimization 5256DP 4952RL 5069

The purpose of the solution is to determine the optimalpolicy 120587lowast

120587lowast (119904) = argmin sum119886

120587 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840)sdot (119903 (119904 119886 1199041015840) + 120574119881lowast (1199041015840))

(15)

A policy iterative (PI) algorithm is used to solve the prob-lemof the randomprocess It involves a policy estimation stepand a policy improvement-step

The calculation process is shown in Algorithm 1In the policy evaluation step for a control policy 120587k(s)

(the subscript k represents the number of iterations) thecorresponding state value function 119881120587119896(s) is calculated asshown in

119881120587119896 (119904)= sum119886

120587119896 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840) (119903 (119904 119886 1199041015840) + 120574119881120587119896 (1199041015840)) (16)

In the policy improvement step the improved policy120587119896+1(s) is determined through a greedy strategy as shown in

120587119896+1 (119904) = argmin119886

119902120587119896 (119904 119886)119902120587119896 (119904 119886) = 119903 (119904 119886) + 120574sum119875(119904 119886 1199041015840)119881120587119896 (1199041015840) (17)

During policy iteration policy evaluation and policyimprovement are performed alternately until the state valuefunction and the policy converge

The policy iteration algorithm is adopted obtaining theoptimum control policy that takes the vehicle speed driverrsquospower demand and SOC as the input and the engine poweras the output Figure 9 shows the optimized engine powers atvehicle speeds of 10 kmh 20 kmh 30 kmh and 40 kmh

8 Journal of Control Science and Engineering

03 04 05 06 070246810

05

1015

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(a) v=10kmh

03 04 05 06 07

2468101205

101520

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(b) v=20kmh

03 04 05 06 0724681012

05

101520

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(c) v=30kmh

03 04 05 06 07

05101520250

10

20

30

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(d) v=40kmh

Figure 9 Optimization charts of engine power

Transition Probability Matrix

N

Y

Markov Chain

Offline Frame

Engine Model

SOCv

Online Frame

k=k+

1

Cost Function

0 200 400 600 800 1000 1200 14000

50

100

150

0 200 400 600 800 1000 1200 14000

20

40

60

80

100Time (s)

Time (s)

immediate return =mOF +mF+(SOC-SOCL )2

PLK=(F+FQ+FD) PCD =mCDmC

state variable s=[SOCPLK]

action variable a=P r(sa)

Motor Model

Battery Model

TransmissionModel Vehicle

Engine Power Look-up Table

Policy Iteration

Initial (s)=0 0 k=0

E+1(s)=LAGCHqE (sa)

VE (s)=sumE(sa) sumP(sas)(r(sas)+VE (s))

qE (sa)=r(sa)+sumP(sas)VE (s)

E+1 = E

lowast=E

DriverModel

PowerDemand

Accelerator pedal

Brake pedal

==

<LE

P

PG

P<N

T n

TG nG

minus4 minus2 0 2 4 6 8 10

minus7minus5

minus3minus1

13

57

0

20

40

60

80

100

49

14

minus14minus9

minus41

611

0

20

40

60

80

100

minus11minus6

minus1

4 6 810 12

minus4minus2 0

24

68

1012

0

20

40

60

80

100

minus13 minus9minus5 minus1

3 711 15

minus10minus6

minus22

610

0

50

100

=10kmh

=40kmh=30kmh

=20kmh

=10kmh

=40kmh=30kmh

=20kmh

03 04 05 06 070246810

05

1015

SOC03 04 05 06 07

2468101205

101520

SOC

03 04 05 06 0724681012

05

101520

SOC03 04 05 06 07

05101520250

10

20

30

SOC

Engine Power Optimization Chart

03 04 05 0607

05

10152025

0

10

20

30

SOC

Figure 10 Implementation frame of the energy management strategy

It can be seen from Figure 9 that only the motor workswhen the SOC is high and the power demand is low whenboth the SOC and the power demand are high only theengine works and when the SOC is low the engine is in thecharging mode

Figure 10 illustrates the offline and online implemen-tation frames of the energy management strategy In theoffline part the power demand transition probability matrixis obtained by the Markov Chain Mathematical models

of the state variable 119904 action variable 119886 and immediatereturn 119903 are derived according to RL theory The policyiteration algorithm is employed to determine engine poweroptimization tables In the online part the drivers powerdemand is obtained by the opening of drivers acceleratorpedal and brake pedal Then the power of the engine andmotor is distributed by looking up the offline engine poweroptimization tables Finally the power is transferred to thewheels by the transmission system

Journal of Control Science and Engineering 9

0 200 400 600 800 1000 1200 14000

50

100

150

Time (s)

Spee

d

(kmmiddoth

-1)

(a)

0 200 400 600 800 1000 1200 14000

1

2

3

4

Time (s)

Gea

r rat

ioi A

(b)

0 200 400 600 800 1000 1200 140003

04

05

06

07

SOC

Time (s)

(c)

0 200 400 600 800 1000 1200 1400minus50minus40minus30minus20minus10

010203040

Time (s)

Dem

and

pow

erPLK

(kW

)(d)

0 200 400 600 800 1000 1200 1400minus5

0

5

Time (s)

Mot

or p

ower

PG

(kW

)

(e)

40

0 200 400 600 800 1000 1200 14000

20

60

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(f)

0 200 400 600 800 1000 1200 1400minus30minus20

minus100

1020

Time (s)

Mot

or to

rque

TG

(Nmiddotm

)

(g)

1200 14000 200 400 600 800 1000minus05

005101520

3035

25

40

Equi

vale

nt fu

el

Time (s)

cons

umpt

ionb

KO

(mLmiddot

s-1)

(h)

Figure 11 ECE EUDC simulation results based on RL

5 Simulation Experiment and Results

A simulation is conducted on a MATLAB Simulink plat-form taking ECE EUDC driving cycle as the simulationdriving cycle and setting the initial SOC as 06 The energymanagement strategy based on a known model of RL isadopted simulating the vehicle operation status onlineResults are shown in Figure 11

From Figure 11 we can see that the following parameterschange with time gear ratio of transmission 119894g SOC ofthe battery power demand 119875req motor power 119875m enginetorque 119879e motor torque 119879119898 and instantaneous equivalentfuel consumption 119887equ From Figure 11(b) it can be seen that

the gear ratio varies continuously from 0498 to 404 thisis primarily because of the use of a CVT with reflux powerwhich unlike AMT can realize power without interruptionIn Figure 11(c) SOC of the battery is seen to change from theinitial value of 06 to the final value of 05845SOC = 00155this can meet the HEV SOC balance requirement before andafter the cycle (minus002 le Δ119878119874119862 le 002) The power demandchange curve of the vehicle is obtained based on ECE EUDCas shown in Figure 11(d)The power distribution curves of themotor and the engine can be obtained according to the enginepower optimization control tables as shown in Figures 11(e)and 11(f) The motor power is negative which indicates thatthe motor is in the generating state

10 Journal of Control Science and Engineering

0 200 400 600 800 1000 1200 1400

SOC

04

05

06

07

RL DP

Time (s)

(a)

0 200 400 600 800 1000 1200 1400minus30

minus20

minus10

0

10

20

RL DPTime (s)

Mot

or to

rque

TG

(Nmiddotm

)

(b)

0 200 400 600 800 1000 1200 14000

10

20

30

40

5060

70

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(c)

Figure 12 Comparison between DP and RL optimization results based on ECE EUDC

0 500 1000 15000

102030405060

Time (s)

Spee

d

(kmmiddoth

-1)

RL(a)

0 500 1000 150004

05

06

07

SOC

Time (s)

RL DP(b)

0 500 1000 1500minus50

minus25

0

25

50

Time (s)

Mot

or to

rque

TG

(Nmiddotm

)

RL DP(c)

0 500 1000 15000

20

40

60

80

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(d)

Figure 13 Comparison between DP and RL optimization results based on CYC WVUCITY

Journal of Control Science and Engineering 11

Table 3 Simulation results based on CYC WVUCITY

Control Policy Fuel Consumption (L100 km) Computation time(s)a

offline onlineDP 4983 5280 mdashRL 4864 4300 35aA 25GHz microprocessor with 8GB RAM was used

In order to validate the optimal performance of the RLcontrol strategy vehicle simulation tests based on DP andRL were carried out Figure 12 shows a comparison of theoptimization results by the two control methods based onthe ECE EUDC driving cycle The solid line indicates theoptimization results of RL and the dotted line indicatesthe results of DP Figure 12(a) shows the SOC optimizationtrajectories of the DP- and RL-based strategies The trend ofthe two curves is essentially the sameThe difference betweenthe final SOC and the initial SOC is within 002 and the SOCis relatively stable Compared with RL the SOC curve of DP-based strategy fluctuates greatly this is primarily related tothe distribution of the power source torque Figures 12(b) and12(c) indicate engine and motor torque distribution curvesbased on DP and RL strategies The engine torque curvesessentially coincide however the motor torque curves aresomewhat different This is primarily reflected in the torquedistribution when the motor is in the generation state

Table 2 shows the equivalent fuel consumption obtainedthrough DP and RL optimization Compared with thatobtained by DP (4952L) the value of fuel consumptionobtained by RL is 23 higher The reason for this is thatDP only ensures a global optimum at a given driving cycle(ECE EUDC) whereas RL optimizes the result for a series ofdriving cycles in an average sense thereby realizing a cumu-lative expected value optimum Compared with rule-based(5368L) and instantaneous optimization (5256L) proposedin the literature [19] RL decreased the fuel consumption by56 and 36 respectively

In order to validate the adaptability of RL to randomdriving cycle CYC WVUCITY is also selected as a simula-tion cycle The optimization results are shown in Figure 13Figure 13(b) shows the variation curve of the SOC of thebattery based on DP and RL control strategies The solidline indicates the optimization results of RL and the dottedline indicates the results of DP The final SOC value for theDP and RL strategies is 059 and 058 respectively with thesame initial value of 06The SOC remained essentially stableFigures 13(c) and 13(d) show the optimal torque curves ofthe motor and the engine based on DP and RL strategies Itcan be seen from the figure that the changing trend of thetorque curves based on the two strategies is largely the sameIn comparison the change in the torque obtained by DPfluctuates greatly Table 3 demonstrates the equivalent fuelconsumption of the two control strategies Compared withDP RL saves fuel by 24 it can adapt with random cycleperfectly Meanwhile the computation time based onDP andRL is recorded which include offline and online computationtime The offline computation time of DP is 5280s and thatof RL is 4300s Due to large calculation volume and the

driving cycle unknown in advance DP cannot be realizedonline while RL is not limited to a certain cycle which can berealized online by embedding the engine power optimizationtables into the vehicle controller The online operation timeof RL is 35s based on CYC WVUCITY simulation cycle

6 Conclusion

(1) We established a stochastic Markov model of thedriverrsquos power demand based on datasets of the UDDS andECE EUDC driving cycles(2) An RL energy management strategy was proposedwhich takes SOC vehicle speed and power demand as statevariables engine power as power as the action variable andminimum cumulative return expectation as the optimizationobjective(3) Using the MATLABSimulink platform a CYCWVUCITY simulation model was established The resultsshow that comparedwithDP RL could reduce fuel consump-tion by 24 and be implemented online

Data Availability

The data used to support the findings of this study areincluded within the article

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This research is supported by Scientific and Technol-ogy Research Program of Chongqing Municipal EducationCommission (Grant no KJQN201800718) and ChongqingResearchProgramof Basic Research andFrontier Technology(Grant no cstc2017jcyjAX0053)

References

[1] G Zhao and Z Chen ldquoEnergy management strategy for serieshybrid electric vehiclerdquo Journal of Northeastern University vol34 no 4 pp 583ndash587 2013

[2] W Wang and X Cheng ldquoForward simulation on the energymanagement system for series hybrid electric busrdquo AutomotiveEngineering vol 35 no 2 pp 121ndash126 2013

[3] D Qin and S Zhan ldquoManagement strategy of hybrid electricalvehicle based on driving style recognitionrdquo Journal of Mechani-cal Engineering vol 52 no 8 pp 162ndash169 2016

12 Journal of Control Science and Engineering

[4] S Zhan and D Qin ldquoEnergy management strategy of HEVbased on driving cycle recognition using genetic optimized K-means clustering algorithmrdquo Chinese Journal of Highway andTransport vol 29 no 4 pp 130ndash137 2016

[5] X Zeng and JWang ldquoA Parallel Hybrid Electric Vehicle EnergyManagement Strategy Using Stochastic Model Predictive Con-trol With Road Grade Previewrdquo IEEE Transactions on ControlSystems Technology vol 23 no 6 pp 2416ndash2423 2015

[6] X Tang and B Wang ldquoControl strategy for hybrid electric busbased on state transition probabilityrdquo Automotive EngineeringMagazine vol 38 no 3 pp 263ndash268 2016

[7] Z Ma and S Cui ldquoApplying dynamic programming to HEVpowertrain based on EVTrdquo in Proceedings of the 2015 Inter-national Conference on Control Automation and InformationSciences vol 22 pp 37ndash41 2015

[8] Y Yang and P Ye ldquoA Research on the Fuzzy Control Strategy fora Speed-coupling ISG HEV Based on Dynamic Programmingoptimizationrdquo Automotive Engineering vol 38 no 6 pp 674ndash679 2016

[9] Z Chen C C Mi J Xu X Gong and C You ldquoEnergymanagement for a power-split plug-in hybrid electric vehiclebased on dynamic programming and neural networksrdquo IEEETransactions on Vehicular Technology vol 63 no 4 pp 1567ndash1580 2014

[10] C Sun S J Moura X Hu J K Hedrick and F Sun ldquoDynamictraffic feedback data enabled energy management in plug-inhybrid electric vehiclesrdquo IEEE Transactions on Control SystemsTechnology vol 23 no 3 pp 1075ndash1086 2015

[11] B Bader ldquoAn energy management strategy for plug-in hybridelectric vehiclesrdquoMateria vol 29 no 11 2013

[12] Z Chen L Li B Yan C Yang CMarinaMartinez andD CaoldquoMultimode Energy Management for Plug-In Hybrid ElectricBuses Based on Driving Cycles Predictionrdquo IEEE Transactionson Intelligent Transportation Systems vol 17 no 10 pp 2811ndash2821 2016

[13] Z Chen and Y Liu ldquoEnergy Management Strategy for Plug-inHybrid Electric Bus with Evolutionary Reinforcement Learningmethodrdquo Journal of Mechanical Engineering vol 53 no 16 pp86ndash93 2017

[14] T Liu B Wang and C Yang ldquoOnline Markov Chain-basedenergy management for a hybrid tracked vehicle with speedyQ-learningrdquo Energy vol 160 pp 544ndash555 2018

[15] L Teng ldquoOnline Energy Management for Multimode Plug-in Hybrid Electric Vehiclesrdquo IEEE Transactions on IndustrialInformatics vol 2018 no 10 pp 1ndash9 2018

[16] JWuHHe J Peng Y Li andZ Li ldquoContinuous reinforcementlearning of energy management with deep Q network for apower split hybrid electric busrdquo Applied Energy vol 222 pp799ndash811 2018

[17] T Liu X Hu S E Li and D Cao ldquoReinforcement LearningOptimized Look-Ahead Energy Management of a ParallelHybrid Electric Vehiclerdquo IEEEASME Transactions on Mecha-tronics vol 22 no 4 pp 1497ndash1507 2017

[18] T Liu and X Hu ldquoA Bi-Level Control for Energy EfficiencyImprovement of a Hybrid Tracked Vehiclerdquo IEEE Transactionson Industrial Informatics vol 14 no 4 pp 1616ndash1625 2018

[19] Y Yin Study on Optimization Matching and Simulation forSuper-Mild HEV Chongqing university Chongqing China2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 6: An Energy Management Strategy for a Super-Mild Hybrid

6 Journal of Control Science and Engineering

minus4 minus2 0 2 4 6 8 10

minus7minus5

minus3minus1

13

57

0

20

40

60

80

100

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW) Next demand power P

jLK

(kW)

(a) v=10kmh

49

14

minus14minus9

minus41

611

020

40

6080

100

minus11minus6

minus1

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW) Next demand power P

jLK

(kW)

(b) v=20kmh

4 6 810 12

minus4minus2 0

24

68

1012

0

20

40

60

80

100

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW)

Next demand power PjLK

(kW)

(c) v=30kmh

minus13 minus9minus5

minus13 7

11 15minus10

minus6minus2

26

10

0

50

100

Tran

sitio

n pr

obab

ility

0CD

Current demand power P iLK (kW)

Next demand power PjLK

(kW)

(d) v=40kmh

Figure 6 Power demand transition probability matrix

AGENT ENVIRONMENT

Action

State

Reward

State

R d

Figure 7 Schematic of reinforcement learning

s0 s1 stlowast

0 lowast1

Figure 8 Policy decision sequence

Journal of Control Science and Engineering 7

PI Algorithm1 Input state transition probability 119875119894119895 return function 119903 discount factor 120574Initialization value function V(s)=0Initialization policy 1205870

2 Repeat k=01 3 find 119881120587119896 policy evaluation4 120587119896+1 (119904) = arg min

a119902120587119896 (119904 119886)policy improvement

5 Until 120587119896+1 = 1205871198966 Output 120587lowast = 120587119896

Algorithm 1 Flowchart of the PI algorithm

penalty is taken as the immediate return 119903 1205821015840 is a factor of theequivalent fuel consumption 119898fuel is the fuel consumption119898ele is the electricity consumption 120572 is the penalty factor ofthe battery and 119878119874119862ref is reference value of the SOC

119904 = [119878119874119862 V 119875119903119890119902]119886 = 119875119890119903 = 119898119891119906119890119897 + 1205821015840119898119890119897119890 + 120572 (119878119874119862 minus 119878119874119862119903119890119891)2

(11)

In an infinite time domain the Markov decision processwill solve the problem of determining a sequence of decisionpolicies that can predict the minimum cumulative returnexpectation of a randomprocess called the optimal state valuefunction 119881lowast(119904)

119881lowast (119904) = min120587

119864[infinsum119905=0

120574119905119903119905+1] (12)

where 120574 isin [0 1] is the discount factor 119903119905+1 represents thereturn value at 119905 + 1 time and 120587 represents the control policy

Meanwhile the control and state variablesmust alsomeetthe following constraints

119878119874119862min le 119878119874119862 le 119878119874119862max

Vmin le V le Vmax

119875119890min le 119875119890 le 119875119890max

119875119898min le 119875119898 le 119875119898max

(13)

where the subscript min and max represent the maximumand minimum threshold values for the state of charge speedand power respectively

43 Policy Iterative Algorithm For Formula (12) according tothe Berman equation

119881lowast (119904) = minsum119886

120587 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840)sdot (119903 (119904 119886 1199041015840) + 120574119881lowast (1199041015840))

(14)

where s1015840 represents the state at the next time step

Table 2 Simulation results based on ECE EUDC

Control Policy Fuel Consumption (L100 km)Rule-based 5368Instantaneous optimization 5256DP 4952RL 5069

The purpose of the solution is to determine the optimalpolicy 120587lowast

120587lowast (119904) = argmin sum119886

120587 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840)sdot (119903 (119904 119886 1199041015840) + 120574119881lowast (1199041015840))

(15)

A policy iterative (PI) algorithm is used to solve the prob-lemof the randomprocess It involves a policy estimation stepand a policy improvement-step

The calculation process is shown in Algorithm 1In the policy evaluation step for a control policy 120587k(s)

(the subscript k represents the number of iterations) thecorresponding state value function 119881120587119896(s) is calculated asshown in

119881120587119896 (119904)= sum119886

120587119896 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840) (119903 (119904 119886 1199041015840) + 120574119881120587119896 (1199041015840)) (16)

In the policy improvement step the improved policy120587119896+1(s) is determined through a greedy strategy as shown in

120587119896+1 (119904) = argmin119886

119902120587119896 (119904 119886)119902120587119896 (119904 119886) = 119903 (119904 119886) + 120574sum119875(119904 119886 1199041015840)119881120587119896 (1199041015840) (17)

During policy iteration policy evaluation and policyimprovement are performed alternately until the state valuefunction and the policy converge

The policy iteration algorithm is adopted obtaining theoptimum control policy that takes the vehicle speed driverrsquospower demand and SOC as the input and the engine poweras the output Figure 9 shows the optimized engine powers atvehicle speeds of 10 kmh 20 kmh 30 kmh and 40 kmh

8 Journal of Control Science and Engineering

03 04 05 06 070246810

05

1015

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(a) v=10kmh

03 04 05 06 07

2468101205

101520

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(b) v=20kmh

03 04 05 06 0724681012

05

101520

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(c) v=30kmh

03 04 05 06 07

05101520250

10

20

30

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(d) v=40kmh

Figure 9 Optimization charts of engine power

Transition Probability Matrix

N

Y

Markov Chain

Offline Frame

Engine Model

SOCv

Online Frame

k=k+

1

Cost Function

0 200 400 600 800 1000 1200 14000

50

100

150

0 200 400 600 800 1000 1200 14000

20

40

60

80

100Time (s)

Time (s)

immediate return =mOF +mF+(SOC-SOCL )2

PLK=(F+FQ+FD) PCD =mCDmC

state variable s=[SOCPLK]

action variable a=P r(sa)

Motor Model

Battery Model

TransmissionModel Vehicle

Engine Power Look-up Table

Policy Iteration

Initial (s)=0 0 k=0

E+1(s)=LAGCHqE (sa)

VE (s)=sumE(sa) sumP(sas)(r(sas)+VE (s))

qE (sa)=r(sa)+sumP(sas)VE (s)

E+1 = E

lowast=E

DriverModel

PowerDemand

Accelerator pedal

Brake pedal

==

<LE

P

PG

P<N

T n

TG nG

minus4 minus2 0 2 4 6 8 10

minus7minus5

minus3minus1

13

57

0

20

40

60

80

100

49

14

minus14minus9

minus41

611

0

20

40

60

80

100

minus11minus6

minus1

4 6 810 12

minus4minus2 0

24

68

1012

0

20

40

60

80

100

minus13 minus9minus5 minus1

3 711 15

minus10minus6

minus22

610

0

50

100

=10kmh

=40kmh=30kmh

=20kmh

=10kmh

=40kmh=30kmh

=20kmh

03 04 05 06 070246810

05

1015

SOC03 04 05 06 07

2468101205

101520

SOC

03 04 05 06 0724681012

05

101520

SOC03 04 05 06 07

05101520250

10

20

30

SOC

Engine Power Optimization Chart

03 04 05 0607

05

10152025

0

10

20

30

SOC

Figure 10 Implementation frame of the energy management strategy

It can be seen from Figure 9 that only the motor workswhen the SOC is high and the power demand is low whenboth the SOC and the power demand are high only theengine works and when the SOC is low the engine is in thecharging mode

Figure 10 illustrates the offline and online implemen-tation frames of the energy management strategy In theoffline part the power demand transition probability matrixis obtained by the Markov Chain Mathematical models

of the state variable 119904 action variable 119886 and immediatereturn 119903 are derived according to RL theory The policyiteration algorithm is employed to determine engine poweroptimization tables In the online part the drivers powerdemand is obtained by the opening of drivers acceleratorpedal and brake pedal Then the power of the engine andmotor is distributed by looking up the offline engine poweroptimization tables Finally the power is transferred to thewheels by the transmission system

Journal of Control Science and Engineering 9

0 200 400 600 800 1000 1200 14000

50

100

150

Time (s)

Spee

d

(kmmiddoth

-1)

(a)

0 200 400 600 800 1000 1200 14000

1

2

3

4

Time (s)

Gea

r rat

ioi A

(b)

0 200 400 600 800 1000 1200 140003

04

05

06

07

SOC

Time (s)

(c)

0 200 400 600 800 1000 1200 1400minus50minus40minus30minus20minus10

010203040

Time (s)

Dem

and

pow

erPLK

(kW

)(d)

0 200 400 600 800 1000 1200 1400minus5

0

5

Time (s)

Mot

or p

ower

PG

(kW

)

(e)

40

0 200 400 600 800 1000 1200 14000

20

60

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(f)

0 200 400 600 800 1000 1200 1400minus30minus20

minus100

1020

Time (s)

Mot

or to

rque

TG

(Nmiddotm

)

(g)

1200 14000 200 400 600 800 1000minus05

005101520

3035

25

40

Equi

vale

nt fu

el

Time (s)

cons

umpt

ionb

KO

(mLmiddot

s-1)

(h)

Figure 11 ECE EUDC simulation results based on RL

5 Simulation Experiment and Results

A simulation is conducted on a MATLAB Simulink plat-form taking ECE EUDC driving cycle as the simulationdriving cycle and setting the initial SOC as 06 The energymanagement strategy based on a known model of RL isadopted simulating the vehicle operation status onlineResults are shown in Figure 11

From Figure 11 we can see that the following parameterschange with time gear ratio of transmission 119894g SOC ofthe battery power demand 119875req motor power 119875m enginetorque 119879e motor torque 119879119898 and instantaneous equivalentfuel consumption 119887equ From Figure 11(b) it can be seen that

the gear ratio varies continuously from 0498 to 404 thisis primarily because of the use of a CVT with reflux powerwhich unlike AMT can realize power without interruptionIn Figure 11(c) SOC of the battery is seen to change from theinitial value of 06 to the final value of 05845SOC = 00155this can meet the HEV SOC balance requirement before andafter the cycle (minus002 le Δ119878119874119862 le 002) The power demandchange curve of the vehicle is obtained based on ECE EUDCas shown in Figure 11(d)The power distribution curves of themotor and the engine can be obtained according to the enginepower optimization control tables as shown in Figures 11(e)and 11(f) The motor power is negative which indicates thatthe motor is in the generating state

10 Journal of Control Science and Engineering

0 200 400 600 800 1000 1200 1400

SOC

04

05

06

07

RL DP

Time (s)

(a)

0 200 400 600 800 1000 1200 1400minus30

minus20

minus10

0

10

20

RL DPTime (s)

Mot

or to

rque

TG

(Nmiddotm

)

(b)

0 200 400 600 800 1000 1200 14000

10

20

30

40

5060

70

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(c)

Figure 12 Comparison between DP and RL optimization results based on ECE EUDC

0 500 1000 15000

102030405060

Time (s)

Spee

d

(kmmiddoth

-1)

RL(a)

0 500 1000 150004

05

06

07

SOC

Time (s)

RL DP(b)

0 500 1000 1500minus50

minus25

0

25

50

Time (s)

Mot

or to

rque

TG

(Nmiddotm

)

RL DP(c)

0 500 1000 15000

20

40

60

80

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(d)

Figure 13 Comparison between DP and RL optimization results based on CYC WVUCITY

Journal of Control Science and Engineering 11

Table 3 Simulation results based on CYC WVUCITY

Control Policy Fuel Consumption (L100 km) Computation time(s)a

offline onlineDP 4983 5280 mdashRL 4864 4300 35aA 25GHz microprocessor with 8GB RAM was used

In order to validate the optimal performance of the RLcontrol strategy vehicle simulation tests based on DP andRL were carried out Figure 12 shows a comparison of theoptimization results by the two control methods based onthe ECE EUDC driving cycle The solid line indicates theoptimization results of RL and the dotted line indicatesthe results of DP Figure 12(a) shows the SOC optimizationtrajectories of the DP- and RL-based strategies The trend ofthe two curves is essentially the sameThe difference betweenthe final SOC and the initial SOC is within 002 and the SOCis relatively stable Compared with RL the SOC curve of DP-based strategy fluctuates greatly this is primarily related tothe distribution of the power source torque Figures 12(b) and12(c) indicate engine and motor torque distribution curvesbased on DP and RL strategies The engine torque curvesessentially coincide however the motor torque curves aresomewhat different This is primarily reflected in the torquedistribution when the motor is in the generation state

Table 2 shows the equivalent fuel consumption obtainedthrough DP and RL optimization Compared with thatobtained by DP (4952L) the value of fuel consumptionobtained by RL is 23 higher The reason for this is thatDP only ensures a global optimum at a given driving cycle(ECE EUDC) whereas RL optimizes the result for a series ofdriving cycles in an average sense thereby realizing a cumu-lative expected value optimum Compared with rule-based(5368L) and instantaneous optimization (5256L) proposedin the literature [19] RL decreased the fuel consumption by56 and 36 respectively

In order to validate the adaptability of RL to randomdriving cycle CYC WVUCITY is also selected as a simula-tion cycle The optimization results are shown in Figure 13Figure 13(b) shows the variation curve of the SOC of thebattery based on DP and RL control strategies The solidline indicates the optimization results of RL and the dottedline indicates the results of DP The final SOC value for theDP and RL strategies is 059 and 058 respectively with thesame initial value of 06The SOC remained essentially stableFigures 13(c) and 13(d) show the optimal torque curves ofthe motor and the engine based on DP and RL strategies Itcan be seen from the figure that the changing trend of thetorque curves based on the two strategies is largely the sameIn comparison the change in the torque obtained by DPfluctuates greatly Table 3 demonstrates the equivalent fuelconsumption of the two control strategies Compared withDP RL saves fuel by 24 it can adapt with random cycleperfectly Meanwhile the computation time based onDP andRL is recorded which include offline and online computationtime The offline computation time of DP is 5280s and thatof RL is 4300s Due to large calculation volume and the

driving cycle unknown in advance DP cannot be realizedonline while RL is not limited to a certain cycle which can berealized online by embedding the engine power optimizationtables into the vehicle controller The online operation timeof RL is 35s based on CYC WVUCITY simulation cycle

6 Conclusion

(1) We established a stochastic Markov model of thedriverrsquos power demand based on datasets of the UDDS andECE EUDC driving cycles(2) An RL energy management strategy was proposedwhich takes SOC vehicle speed and power demand as statevariables engine power as power as the action variable andminimum cumulative return expectation as the optimizationobjective(3) Using the MATLABSimulink platform a CYCWVUCITY simulation model was established The resultsshow that comparedwithDP RL could reduce fuel consump-tion by 24 and be implemented online

Data Availability

The data used to support the findings of this study areincluded within the article

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This research is supported by Scientific and Technol-ogy Research Program of Chongqing Municipal EducationCommission (Grant no KJQN201800718) and ChongqingResearchProgramof Basic Research andFrontier Technology(Grant no cstc2017jcyjAX0053)

References

[1] G Zhao and Z Chen ldquoEnergy management strategy for serieshybrid electric vehiclerdquo Journal of Northeastern University vol34 no 4 pp 583ndash587 2013

[2] W Wang and X Cheng ldquoForward simulation on the energymanagement system for series hybrid electric busrdquo AutomotiveEngineering vol 35 no 2 pp 121ndash126 2013

[3] D Qin and S Zhan ldquoManagement strategy of hybrid electricalvehicle based on driving style recognitionrdquo Journal of Mechani-cal Engineering vol 52 no 8 pp 162ndash169 2016

12 Journal of Control Science and Engineering

[4] S Zhan and D Qin ldquoEnergy management strategy of HEVbased on driving cycle recognition using genetic optimized K-means clustering algorithmrdquo Chinese Journal of Highway andTransport vol 29 no 4 pp 130ndash137 2016

[5] X Zeng and JWang ldquoA Parallel Hybrid Electric Vehicle EnergyManagement Strategy Using Stochastic Model Predictive Con-trol With Road Grade Previewrdquo IEEE Transactions on ControlSystems Technology vol 23 no 6 pp 2416ndash2423 2015

[6] X Tang and B Wang ldquoControl strategy for hybrid electric busbased on state transition probabilityrdquo Automotive EngineeringMagazine vol 38 no 3 pp 263ndash268 2016

[7] Z Ma and S Cui ldquoApplying dynamic programming to HEVpowertrain based on EVTrdquo in Proceedings of the 2015 Inter-national Conference on Control Automation and InformationSciences vol 22 pp 37ndash41 2015

[8] Y Yang and P Ye ldquoA Research on the Fuzzy Control Strategy fora Speed-coupling ISG HEV Based on Dynamic Programmingoptimizationrdquo Automotive Engineering vol 38 no 6 pp 674ndash679 2016

[9] Z Chen C C Mi J Xu X Gong and C You ldquoEnergymanagement for a power-split plug-in hybrid electric vehiclebased on dynamic programming and neural networksrdquo IEEETransactions on Vehicular Technology vol 63 no 4 pp 1567ndash1580 2014

[10] C Sun S J Moura X Hu J K Hedrick and F Sun ldquoDynamictraffic feedback data enabled energy management in plug-inhybrid electric vehiclesrdquo IEEE Transactions on Control SystemsTechnology vol 23 no 3 pp 1075ndash1086 2015

[11] B Bader ldquoAn energy management strategy for plug-in hybridelectric vehiclesrdquoMateria vol 29 no 11 2013

[12] Z Chen L Li B Yan C Yang CMarinaMartinez andD CaoldquoMultimode Energy Management for Plug-In Hybrid ElectricBuses Based on Driving Cycles Predictionrdquo IEEE Transactionson Intelligent Transportation Systems vol 17 no 10 pp 2811ndash2821 2016

[13] Z Chen and Y Liu ldquoEnergy Management Strategy for Plug-inHybrid Electric Bus with Evolutionary Reinforcement Learningmethodrdquo Journal of Mechanical Engineering vol 53 no 16 pp86ndash93 2017

[14] T Liu B Wang and C Yang ldquoOnline Markov Chain-basedenergy management for a hybrid tracked vehicle with speedyQ-learningrdquo Energy vol 160 pp 544ndash555 2018

[15] L Teng ldquoOnline Energy Management for Multimode Plug-in Hybrid Electric Vehiclesrdquo IEEE Transactions on IndustrialInformatics vol 2018 no 10 pp 1ndash9 2018

[16] JWuHHe J Peng Y Li andZ Li ldquoContinuous reinforcementlearning of energy management with deep Q network for apower split hybrid electric busrdquo Applied Energy vol 222 pp799ndash811 2018

[17] T Liu X Hu S E Li and D Cao ldquoReinforcement LearningOptimized Look-Ahead Energy Management of a ParallelHybrid Electric Vehiclerdquo IEEEASME Transactions on Mecha-tronics vol 22 no 4 pp 1497ndash1507 2017

[18] T Liu and X Hu ldquoA Bi-Level Control for Energy EfficiencyImprovement of a Hybrid Tracked Vehiclerdquo IEEE Transactionson Industrial Informatics vol 14 no 4 pp 1616ndash1625 2018

[19] Y Yin Study on Optimization Matching and Simulation forSuper-Mild HEV Chongqing university Chongqing China2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 7: An Energy Management Strategy for a Super-Mild Hybrid

Journal of Control Science and Engineering 7

PI Algorithm1 Input state transition probability 119875119894119895 return function 119903 discount factor 120574Initialization value function V(s)=0Initialization policy 1205870

2 Repeat k=01 3 find 119881120587119896 policy evaluation4 120587119896+1 (119904) = arg min

a119902120587119896 (119904 119886)policy improvement

5 Until 120587119896+1 = 1205871198966 Output 120587lowast = 120587119896

Algorithm 1 Flowchart of the PI algorithm

penalty is taken as the immediate return 119903 1205821015840 is a factor of theequivalent fuel consumption 119898fuel is the fuel consumption119898ele is the electricity consumption 120572 is the penalty factor ofthe battery and 119878119874119862ref is reference value of the SOC

119904 = [119878119874119862 V 119875119903119890119902]119886 = 119875119890119903 = 119898119891119906119890119897 + 1205821015840119898119890119897119890 + 120572 (119878119874119862 minus 119878119874119862119903119890119891)2

(11)

In an infinite time domain the Markov decision processwill solve the problem of determining a sequence of decisionpolicies that can predict the minimum cumulative returnexpectation of a randomprocess called the optimal state valuefunction 119881lowast(119904)

119881lowast (119904) = min120587

119864[infinsum119905=0

120574119905119903119905+1] (12)

where 120574 isin [0 1] is the discount factor 119903119905+1 represents thereturn value at 119905 + 1 time and 120587 represents the control policy

Meanwhile the control and state variablesmust alsomeetthe following constraints

119878119874119862min le 119878119874119862 le 119878119874119862max

Vmin le V le Vmax

119875119890min le 119875119890 le 119875119890max

119875119898min le 119875119898 le 119875119898max

(13)

where the subscript min and max represent the maximumand minimum threshold values for the state of charge speedand power respectively

43 Policy Iterative Algorithm For Formula (12) according tothe Berman equation

119881lowast (119904) = minsum119886

120587 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840)sdot (119903 (119904 119886 1199041015840) + 120574119881lowast (1199041015840))

(14)

where s1015840 represents the state at the next time step

Table 2 Simulation results based on ECE EUDC

Control Policy Fuel Consumption (L100 km)Rule-based 5368Instantaneous optimization 5256DP 4952RL 5069

The purpose of the solution is to determine the optimalpolicy 120587lowast

120587lowast (119904) = argmin sum119886

120587 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840)sdot (119903 (119904 119886 1199041015840) + 120574119881lowast (1199041015840))

(15)

A policy iterative (PI) algorithm is used to solve the prob-lemof the randomprocess It involves a policy estimation stepand a policy improvement-step

The calculation process is shown in Algorithm 1In the policy evaluation step for a control policy 120587k(s)

(the subscript k represents the number of iterations) thecorresponding state value function 119881120587119896(s) is calculated asshown in

119881120587119896 (119904)= sum119886

120587119896 (119904 119886)sum1199041015840

119875 (119904 119886 1199041015840) (119903 (119904 119886 1199041015840) + 120574119881120587119896 (1199041015840)) (16)

In the policy improvement step the improved policy120587119896+1(s) is determined through a greedy strategy as shown in

120587119896+1 (119904) = argmin119886

119902120587119896 (119904 119886)119902120587119896 (119904 119886) = 119903 (119904 119886) + 120574sum119875(119904 119886 1199041015840)119881120587119896 (1199041015840) (17)

During policy iteration policy evaluation and policyimprovement are performed alternately until the state valuefunction and the policy converge

The policy iteration algorithm is adopted obtaining theoptimum control policy that takes the vehicle speed driverrsquospower demand and SOC as the input and the engine poweras the output Figure 9 shows the optimized engine powers atvehicle speeds of 10 kmh 20 kmh 30 kmh and 40 kmh

8 Journal of Control Science and Engineering

03 04 05 06 070246810

05

1015

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(a) v=10kmh

03 04 05 06 07

2468101205

101520

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(b) v=20kmh

03 04 05 06 0724681012

05

101520

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(c) v=30kmh

03 04 05 06 07

05101520250

10

20

30

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(d) v=40kmh

Figure 9 Optimization charts of engine power

Transition Probability Matrix

N

Y

Markov Chain

Offline Frame

Engine Model

SOCv

Online Frame

k=k+

1

Cost Function

0 200 400 600 800 1000 1200 14000

50

100

150

0 200 400 600 800 1000 1200 14000

20

40

60

80

100Time (s)

Time (s)

immediate return =mOF +mF+(SOC-SOCL )2

PLK=(F+FQ+FD) PCD =mCDmC

state variable s=[SOCPLK]

action variable a=P r(sa)

Motor Model

Battery Model

TransmissionModel Vehicle

Engine Power Look-up Table

Policy Iteration

Initial (s)=0 0 k=0

E+1(s)=LAGCHqE (sa)

VE (s)=sumE(sa) sumP(sas)(r(sas)+VE (s))

qE (sa)=r(sa)+sumP(sas)VE (s)

E+1 = E

lowast=E

DriverModel

PowerDemand

Accelerator pedal

Brake pedal

==

<LE

P

PG

P<N

T n

TG nG

minus4 minus2 0 2 4 6 8 10

minus7minus5

minus3minus1

13

57

0

20

40

60

80

100

49

14

minus14minus9

minus41

611

0

20

40

60

80

100

minus11minus6

minus1

4 6 810 12

minus4minus2 0

24

68

1012

0

20

40

60

80

100

minus13 minus9minus5 minus1

3 711 15

minus10minus6

minus22

610

0

50

100

=10kmh

=40kmh=30kmh

=20kmh

=10kmh

=40kmh=30kmh

=20kmh

03 04 05 06 070246810

05

1015

SOC03 04 05 06 07

2468101205

101520

SOC

03 04 05 06 0724681012

05

101520

SOC03 04 05 06 07

05101520250

10

20

30

SOC

Engine Power Optimization Chart

03 04 05 0607

05

10152025

0

10

20

30

SOC

Figure 10 Implementation frame of the energy management strategy

It can be seen from Figure 9 that only the motor workswhen the SOC is high and the power demand is low whenboth the SOC and the power demand are high only theengine works and when the SOC is low the engine is in thecharging mode

Figure 10 illustrates the offline and online implemen-tation frames of the energy management strategy In theoffline part the power demand transition probability matrixis obtained by the Markov Chain Mathematical models

of the state variable 119904 action variable 119886 and immediatereturn 119903 are derived according to RL theory The policyiteration algorithm is employed to determine engine poweroptimization tables In the online part the drivers powerdemand is obtained by the opening of drivers acceleratorpedal and brake pedal Then the power of the engine andmotor is distributed by looking up the offline engine poweroptimization tables Finally the power is transferred to thewheels by the transmission system

Journal of Control Science and Engineering 9

0 200 400 600 800 1000 1200 14000

50

100

150

Time (s)

Spee

d

(kmmiddoth

-1)

(a)

0 200 400 600 800 1000 1200 14000

1

2

3

4

Time (s)

Gea

r rat

ioi A

(b)

0 200 400 600 800 1000 1200 140003

04

05

06

07

SOC

Time (s)

(c)

0 200 400 600 800 1000 1200 1400minus50minus40minus30minus20minus10

010203040

Time (s)

Dem

and

pow

erPLK

(kW

)(d)

0 200 400 600 800 1000 1200 1400minus5

0

5

Time (s)

Mot

or p

ower

PG

(kW

)

(e)

40

0 200 400 600 800 1000 1200 14000

20

60

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(f)

0 200 400 600 800 1000 1200 1400minus30minus20

minus100

1020

Time (s)

Mot

or to

rque

TG

(Nmiddotm

)

(g)

1200 14000 200 400 600 800 1000minus05

005101520

3035

25

40

Equi

vale

nt fu

el

Time (s)

cons

umpt

ionb

KO

(mLmiddot

s-1)

(h)

Figure 11 ECE EUDC simulation results based on RL

5 Simulation Experiment and Results

A simulation is conducted on a MATLAB Simulink plat-form taking ECE EUDC driving cycle as the simulationdriving cycle and setting the initial SOC as 06 The energymanagement strategy based on a known model of RL isadopted simulating the vehicle operation status onlineResults are shown in Figure 11

From Figure 11 we can see that the following parameterschange with time gear ratio of transmission 119894g SOC ofthe battery power demand 119875req motor power 119875m enginetorque 119879e motor torque 119879119898 and instantaneous equivalentfuel consumption 119887equ From Figure 11(b) it can be seen that

the gear ratio varies continuously from 0498 to 404 thisis primarily because of the use of a CVT with reflux powerwhich unlike AMT can realize power without interruptionIn Figure 11(c) SOC of the battery is seen to change from theinitial value of 06 to the final value of 05845SOC = 00155this can meet the HEV SOC balance requirement before andafter the cycle (minus002 le Δ119878119874119862 le 002) The power demandchange curve of the vehicle is obtained based on ECE EUDCas shown in Figure 11(d)The power distribution curves of themotor and the engine can be obtained according to the enginepower optimization control tables as shown in Figures 11(e)and 11(f) The motor power is negative which indicates thatthe motor is in the generating state

10 Journal of Control Science and Engineering

0 200 400 600 800 1000 1200 1400

SOC

04

05

06

07

RL DP

Time (s)

(a)

0 200 400 600 800 1000 1200 1400minus30

minus20

minus10

0

10

20

RL DPTime (s)

Mot

or to

rque

TG

(Nmiddotm

)

(b)

0 200 400 600 800 1000 1200 14000

10

20

30

40

5060

70

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(c)

Figure 12 Comparison between DP and RL optimization results based on ECE EUDC

0 500 1000 15000

102030405060

Time (s)

Spee

d

(kmmiddoth

-1)

RL(a)

0 500 1000 150004

05

06

07

SOC

Time (s)

RL DP(b)

0 500 1000 1500minus50

minus25

0

25

50

Time (s)

Mot

or to

rque

TG

(Nmiddotm

)

RL DP(c)

0 500 1000 15000

20

40

60

80

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(d)

Figure 13 Comparison between DP and RL optimization results based on CYC WVUCITY

Journal of Control Science and Engineering 11

Table 3 Simulation results based on CYC WVUCITY

Control Policy Fuel Consumption (L100 km) Computation time(s)a

offline onlineDP 4983 5280 mdashRL 4864 4300 35aA 25GHz microprocessor with 8GB RAM was used

In order to validate the optimal performance of the RLcontrol strategy vehicle simulation tests based on DP andRL were carried out Figure 12 shows a comparison of theoptimization results by the two control methods based onthe ECE EUDC driving cycle The solid line indicates theoptimization results of RL and the dotted line indicatesthe results of DP Figure 12(a) shows the SOC optimizationtrajectories of the DP- and RL-based strategies The trend ofthe two curves is essentially the sameThe difference betweenthe final SOC and the initial SOC is within 002 and the SOCis relatively stable Compared with RL the SOC curve of DP-based strategy fluctuates greatly this is primarily related tothe distribution of the power source torque Figures 12(b) and12(c) indicate engine and motor torque distribution curvesbased on DP and RL strategies The engine torque curvesessentially coincide however the motor torque curves aresomewhat different This is primarily reflected in the torquedistribution when the motor is in the generation state

Table 2 shows the equivalent fuel consumption obtainedthrough DP and RL optimization Compared with thatobtained by DP (4952L) the value of fuel consumptionobtained by RL is 23 higher The reason for this is thatDP only ensures a global optimum at a given driving cycle(ECE EUDC) whereas RL optimizes the result for a series ofdriving cycles in an average sense thereby realizing a cumu-lative expected value optimum Compared with rule-based(5368L) and instantaneous optimization (5256L) proposedin the literature [19] RL decreased the fuel consumption by56 and 36 respectively

In order to validate the adaptability of RL to randomdriving cycle CYC WVUCITY is also selected as a simula-tion cycle The optimization results are shown in Figure 13Figure 13(b) shows the variation curve of the SOC of thebattery based on DP and RL control strategies The solidline indicates the optimization results of RL and the dottedline indicates the results of DP The final SOC value for theDP and RL strategies is 059 and 058 respectively with thesame initial value of 06The SOC remained essentially stableFigures 13(c) and 13(d) show the optimal torque curves ofthe motor and the engine based on DP and RL strategies Itcan be seen from the figure that the changing trend of thetorque curves based on the two strategies is largely the sameIn comparison the change in the torque obtained by DPfluctuates greatly Table 3 demonstrates the equivalent fuelconsumption of the two control strategies Compared withDP RL saves fuel by 24 it can adapt with random cycleperfectly Meanwhile the computation time based onDP andRL is recorded which include offline and online computationtime The offline computation time of DP is 5280s and thatof RL is 4300s Due to large calculation volume and the

driving cycle unknown in advance DP cannot be realizedonline while RL is not limited to a certain cycle which can berealized online by embedding the engine power optimizationtables into the vehicle controller The online operation timeof RL is 35s based on CYC WVUCITY simulation cycle

6 Conclusion

(1) We established a stochastic Markov model of thedriverrsquos power demand based on datasets of the UDDS andECE EUDC driving cycles(2) An RL energy management strategy was proposedwhich takes SOC vehicle speed and power demand as statevariables engine power as power as the action variable andminimum cumulative return expectation as the optimizationobjective(3) Using the MATLABSimulink platform a CYCWVUCITY simulation model was established The resultsshow that comparedwithDP RL could reduce fuel consump-tion by 24 and be implemented online

Data Availability

The data used to support the findings of this study areincluded within the article

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This research is supported by Scientific and Technol-ogy Research Program of Chongqing Municipal EducationCommission (Grant no KJQN201800718) and ChongqingResearchProgramof Basic Research andFrontier Technology(Grant no cstc2017jcyjAX0053)

References

[1] G Zhao and Z Chen ldquoEnergy management strategy for serieshybrid electric vehiclerdquo Journal of Northeastern University vol34 no 4 pp 583ndash587 2013

[2] W Wang and X Cheng ldquoForward simulation on the energymanagement system for series hybrid electric busrdquo AutomotiveEngineering vol 35 no 2 pp 121ndash126 2013

[3] D Qin and S Zhan ldquoManagement strategy of hybrid electricalvehicle based on driving style recognitionrdquo Journal of Mechani-cal Engineering vol 52 no 8 pp 162ndash169 2016

12 Journal of Control Science and Engineering

[4] S Zhan and D Qin ldquoEnergy management strategy of HEVbased on driving cycle recognition using genetic optimized K-means clustering algorithmrdquo Chinese Journal of Highway andTransport vol 29 no 4 pp 130ndash137 2016

[5] X Zeng and JWang ldquoA Parallel Hybrid Electric Vehicle EnergyManagement Strategy Using Stochastic Model Predictive Con-trol With Road Grade Previewrdquo IEEE Transactions on ControlSystems Technology vol 23 no 6 pp 2416ndash2423 2015

[6] X Tang and B Wang ldquoControl strategy for hybrid electric busbased on state transition probabilityrdquo Automotive EngineeringMagazine vol 38 no 3 pp 263ndash268 2016

[7] Z Ma and S Cui ldquoApplying dynamic programming to HEVpowertrain based on EVTrdquo in Proceedings of the 2015 Inter-national Conference on Control Automation and InformationSciences vol 22 pp 37ndash41 2015

[8] Y Yang and P Ye ldquoA Research on the Fuzzy Control Strategy fora Speed-coupling ISG HEV Based on Dynamic Programmingoptimizationrdquo Automotive Engineering vol 38 no 6 pp 674ndash679 2016

[9] Z Chen C C Mi J Xu X Gong and C You ldquoEnergymanagement for a power-split plug-in hybrid electric vehiclebased on dynamic programming and neural networksrdquo IEEETransactions on Vehicular Technology vol 63 no 4 pp 1567ndash1580 2014

[10] C Sun S J Moura X Hu J K Hedrick and F Sun ldquoDynamictraffic feedback data enabled energy management in plug-inhybrid electric vehiclesrdquo IEEE Transactions on Control SystemsTechnology vol 23 no 3 pp 1075ndash1086 2015

[11] B Bader ldquoAn energy management strategy for plug-in hybridelectric vehiclesrdquoMateria vol 29 no 11 2013

[12] Z Chen L Li B Yan C Yang CMarinaMartinez andD CaoldquoMultimode Energy Management for Plug-In Hybrid ElectricBuses Based on Driving Cycles Predictionrdquo IEEE Transactionson Intelligent Transportation Systems vol 17 no 10 pp 2811ndash2821 2016

[13] Z Chen and Y Liu ldquoEnergy Management Strategy for Plug-inHybrid Electric Bus with Evolutionary Reinforcement Learningmethodrdquo Journal of Mechanical Engineering vol 53 no 16 pp86ndash93 2017

[14] T Liu B Wang and C Yang ldquoOnline Markov Chain-basedenergy management for a hybrid tracked vehicle with speedyQ-learningrdquo Energy vol 160 pp 544ndash555 2018

[15] L Teng ldquoOnline Energy Management for Multimode Plug-in Hybrid Electric Vehiclesrdquo IEEE Transactions on IndustrialInformatics vol 2018 no 10 pp 1ndash9 2018

[16] JWuHHe J Peng Y Li andZ Li ldquoContinuous reinforcementlearning of energy management with deep Q network for apower split hybrid electric busrdquo Applied Energy vol 222 pp799ndash811 2018

[17] T Liu X Hu S E Li and D Cao ldquoReinforcement LearningOptimized Look-Ahead Energy Management of a ParallelHybrid Electric Vehiclerdquo IEEEASME Transactions on Mecha-tronics vol 22 no 4 pp 1497ndash1507 2017

[18] T Liu and X Hu ldquoA Bi-Level Control for Energy EfficiencyImprovement of a Hybrid Tracked Vehiclerdquo IEEE Transactionson Industrial Informatics vol 14 no 4 pp 1616ndash1625 2018

[19] Y Yin Study on Optimization Matching and Simulation forSuper-Mild HEV Chongqing university Chongqing China2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 8: An Energy Management Strategy for a Super-Mild Hybrid

8 Journal of Control Science and Engineering

03 04 05 06 070246810

05

1015

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(a) v=10kmh

03 04 05 06 07

2468101205

101520

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(b) v=20kmh

03 04 05 06 0724681012

05

101520

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(c) v=30kmh

03 04 05 06 07

05101520250

10

20

30

SOC

Engi

ne p

ower

(kW

)

Demand power 0LK (kW)

(d) v=40kmh

Figure 9 Optimization charts of engine power

Transition Probability Matrix

N

Y

Markov Chain

Offline Frame

Engine Model

SOCv

Online Frame

k=k+

1

Cost Function

0 200 400 600 800 1000 1200 14000

50

100

150

0 200 400 600 800 1000 1200 14000

20

40

60

80

100Time (s)

Time (s)

immediate return =mOF +mF+(SOC-SOCL )2

PLK=(F+FQ+FD) PCD =mCDmC

state variable s=[SOCPLK]

action variable a=P r(sa)

Motor Model

Battery Model

TransmissionModel Vehicle

Engine Power Look-up Table

Policy Iteration

Initial (s)=0 0 k=0

E+1(s)=LAGCHqE (sa)

VE (s)=sumE(sa) sumP(sas)(r(sas)+VE (s))

qE (sa)=r(sa)+sumP(sas)VE (s)

E+1 = E

lowast=E

DriverModel

PowerDemand

Accelerator pedal

Brake pedal

==

<LE

P

PG

P<N

T n

TG nG

minus4 minus2 0 2 4 6 8 10

minus7minus5

minus3minus1

13

57

0

20

40

60

80

100

49

14

minus14minus9

minus41

611

0

20

40

60

80

100

minus11minus6

minus1

4 6 810 12

minus4minus2 0

24

68

1012

0

20

40

60

80

100

minus13 minus9minus5 minus1

3 711 15

minus10minus6

minus22

610

0

50

100

=10kmh

=40kmh=30kmh

=20kmh

=10kmh

=40kmh=30kmh

=20kmh

03 04 05 06 070246810

05

1015

SOC03 04 05 06 07

2468101205

101520

SOC

03 04 05 06 0724681012

05

101520

SOC03 04 05 06 07

05101520250

10

20

30

SOC

Engine Power Optimization Chart

03 04 05 0607

05

10152025

0

10

20

30

SOC

Figure 10 Implementation frame of the energy management strategy

It can be seen from Figure 9 that only the motor workswhen the SOC is high and the power demand is low whenboth the SOC and the power demand are high only theengine works and when the SOC is low the engine is in thecharging mode

Figure 10 illustrates the offline and online implemen-tation frames of the energy management strategy In theoffline part the power demand transition probability matrixis obtained by the Markov Chain Mathematical models

of the state variable 119904 action variable 119886 and immediatereturn 119903 are derived according to RL theory The policyiteration algorithm is employed to determine engine poweroptimization tables In the online part the drivers powerdemand is obtained by the opening of drivers acceleratorpedal and brake pedal Then the power of the engine andmotor is distributed by looking up the offline engine poweroptimization tables Finally the power is transferred to thewheels by the transmission system

Journal of Control Science and Engineering 9

0 200 400 600 800 1000 1200 14000

50

100

150

Time (s)

Spee

d

(kmmiddoth

-1)

(a)

0 200 400 600 800 1000 1200 14000

1

2

3

4

Time (s)

Gea

r rat

ioi A

(b)

0 200 400 600 800 1000 1200 140003

04

05

06

07

SOC

Time (s)

(c)

0 200 400 600 800 1000 1200 1400minus50minus40minus30minus20minus10

010203040

Time (s)

Dem

and

pow

erPLK

(kW

)(d)

0 200 400 600 800 1000 1200 1400minus5

0

5

Time (s)

Mot

or p

ower

PG

(kW

)

(e)

40

0 200 400 600 800 1000 1200 14000

20

60

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(f)

0 200 400 600 800 1000 1200 1400minus30minus20

minus100

1020

Time (s)

Mot

or to

rque

TG

(Nmiddotm

)

(g)

1200 14000 200 400 600 800 1000minus05

005101520

3035

25

40

Equi

vale

nt fu

el

Time (s)

cons

umpt

ionb

KO

(mLmiddot

s-1)

(h)

Figure 11 ECE EUDC simulation results based on RL

5 Simulation Experiment and Results

A simulation is conducted on a MATLAB Simulink plat-form taking ECE EUDC driving cycle as the simulationdriving cycle and setting the initial SOC as 06 The energymanagement strategy based on a known model of RL isadopted simulating the vehicle operation status onlineResults are shown in Figure 11

From Figure 11 we can see that the following parameterschange with time gear ratio of transmission 119894g SOC ofthe battery power demand 119875req motor power 119875m enginetorque 119879e motor torque 119879119898 and instantaneous equivalentfuel consumption 119887equ From Figure 11(b) it can be seen that

the gear ratio varies continuously from 0498 to 404 thisis primarily because of the use of a CVT with reflux powerwhich unlike AMT can realize power without interruptionIn Figure 11(c) SOC of the battery is seen to change from theinitial value of 06 to the final value of 05845SOC = 00155this can meet the HEV SOC balance requirement before andafter the cycle (minus002 le Δ119878119874119862 le 002) The power demandchange curve of the vehicle is obtained based on ECE EUDCas shown in Figure 11(d)The power distribution curves of themotor and the engine can be obtained according to the enginepower optimization control tables as shown in Figures 11(e)and 11(f) The motor power is negative which indicates thatthe motor is in the generating state

10 Journal of Control Science and Engineering

0 200 400 600 800 1000 1200 1400

SOC

04

05

06

07

RL DP

Time (s)

(a)

0 200 400 600 800 1000 1200 1400minus30

minus20

minus10

0

10

20

RL DPTime (s)

Mot

or to

rque

TG

(Nmiddotm

)

(b)

0 200 400 600 800 1000 1200 14000

10

20

30

40

5060

70

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(c)

Figure 12 Comparison between DP and RL optimization results based on ECE EUDC

0 500 1000 15000

102030405060

Time (s)

Spee

d

(kmmiddoth

-1)

RL(a)

0 500 1000 150004

05

06

07

SOC

Time (s)

RL DP(b)

0 500 1000 1500minus50

minus25

0

25

50

Time (s)

Mot

or to

rque

TG

(Nmiddotm

)

RL DP(c)

0 500 1000 15000

20

40

60

80

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(d)

Figure 13 Comparison between DP and RL optimization results based on CYC WVUCITY

Journal of Control Science and Engineering 11

Table 3 Simulation results based on CYC WVUCITY

Control Policy Fuel Consumption (L100 km) Computation time(s)a

offline onlineDP 4983 5280 mdashRL 4864 4300 35aA 25GHz microprocessor with 8GB RAM was used

In order to validate the optimal performance of the RLcontrol strategy vehicle simulation tests based on DP andRL were carried out Figure 12 shows a comparison of theoptimization results by the two control methods based onthe ECE EUDC driving cycle The solid line indicates theoptimization results of RL and the dotted line indicatesthe results of DP Figure 12(a) shows the SOC optimizationtrajectories of the DP- and RL-based strategies The trend ofthe two curves is essentially the sameThe difference betweenthe final SOC and the initial SOC is within 002 and the SOCis relatively stable Compared with RL the SOC curve of DP-based strategy fluctuates greatly this is primarily related tothe distribution of the power source torque Figures 12(b) and12(c) indicate engine and motor torque distribution curvesbased on DP and RL strategies The engine torque curvesessentially coincide however the motor torque curves aresomewhat different This is primarily reflected in the torquedistribution when the motor is in the generation state

Table 2 shows the equivalent fuel consumption obtainedthrough DP and RL optimization Compared with thatobtained by DP (4952L) the value of fuel consumptionobtained by RL is 23 higher The reason for this is thatDP only ensures a global optimum at a given driving cycle(ECE EUDC) whereas RL optimizes the result for a series ofdriving cycles in an average sense thereby realizing a cumu-lative expected value optimum Compared with rule-based(5368L) and instantaneous optimization (5256L) proposedin the literature [19] RL decreased the fuel consumption by56 and 36 respectively

In order to validate the adaptability of RL to randomdriving cycle CYC WVUCITY is also selected as a simula-tion cycle The optimization results are shown in Figure 13Figure 13(b) shows the variation curve of the SOC of thebattery based on DP and RL control strategies The solidline indicates the optimization results of RL and the dottedline indicates the results of DP The final SOC value for theDP and RL strategies is 059 and 058 respectively with thesame initial value of 06The SOC remained essentially stableFigures 13(c) and 13(d) show the optimal torque curves ofthe motor and the engine based on DP and RL strategies Itcan be seen from the figure that the changing trend of thetorque curves based on the two strategies is largely the sameIn comparison the change in the torque obtained by DPfluctuates greatly Table 3 demonstrates the equivalent fuelconsumption of the two control strategies Compared withDP RL saves fuel by 24 it can adapt with random cycleperfectly Meanwhile the computation time based onDP andRL is recorded which include offline and online computationtime The offline computation time of DP is 5280s and thatof RL is 4300s Due to large calculation volume and the

driving cycle unknown in advance DP cannot be realizedonline while RL is not limited to a certain cycle which can berealized online by embedding the engine power optimizationtables into the vehicle controller The online operation timeof RL is 35s based on CYC WVUCITY simulation cycle

6 Conclusion

(1) We established a stochastic Markov model of thedriverrsquos power demand based on datasets of the UDDS andECE EUDC driving cycles(2) An RL energy management strategy was proposedwhich takes SOC vehicle speed and power demand as statevariables engine power as power as the action variable andminimum cumulative return expectation as the optimizationobjective(3) Using the MATLABSimulink platform a CYCWVUCITY simulation model was established The resultsshow that comparedwithDP RL could reduce fuel consump-tion by 24 and be implemented online

Data Availability

The data used to support the findings of this study areincluded within the article

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This research is supported by Scientific and Technol-ogy Research Program of Chongqing Municipal EducationCommission (Grant no KJQN201800718) and ChongqingResearchProgramof Basic Research andFrontier Technology(Grant no cstc2017jcyjAX0053)

References

[1] G Zhao and Z Chen ldquoEnergy management strategy for serieshybrid electric vehiclerdquo Journal of Northeastern University vol34 no 4 pp 583ndash587 2013

[2] W Wang and X Cheng ldquoForward simulation on the energymanagement system for series hybrid electric busrdquo AutomotiveEngineering vol 35 no 2 pp 121ndash126 2013

[3] D Qin and S Zhan ldquoManagement strategy of hybrid electricalvehicle based on driving style recognitionrdquo Journal of Mechani-cal Engineering vol 52 no 8 pp 162ndash169 2016

12 Journal of Control Science and Engineering

[4] S Zhan and D Qin ldquoEnergy management strategy of HEVbased on driving cycle recognition using genetic optimized K-means clustering algorithmrdquo Chinese Journal of Highway andTransport vol 29 no 4 pp 130ndash137 2016

[5] X Zeng and JWang ldquoA Parallel Hybrid Electric Vehicle EnergyManagement Strategy Using Stochastic Model Predictive Con-trol With Road Grade Previewrdquo IEEE Transactions on ControlSystems Technology vol 23 no 6 pp 2416ndash2423 2015

[6] X Tang and B Wang ldquoControl strategy for hybrid electric busbased on state transition probabilityrdquo Automotive EngineeringMagazine vol 38 no 3 pp 263ndash268 2016

[7] Z Ma and S Cui ldquoApplying dynamic programming to HEVpowertrain based on EVTrdquo in Proceedings of the 2015 Inter-national Conference on Control Automation and InformationSciences vol 22 pp 37ndash41 2015

[8] Y Yang and P Ye ldquoA Research on the Fuzzy Control Strategy fora Speed-coupling ISG HEV Based on Dynamic Programmingoptimizationrdquo Automotive Engineering vol 38 no 6 pp 674ndash679 2016

[9] Z Chen C C Mi J Xu X Gong and C You ldquoEnergymanagement for a power-split plug-in hybrid electric vehiclebased on dynamic programming and neural networksrdquo IEEETransactions on Vehicular Technology vol 63 no 4 pp 1567ndash1580 2014

[10] C Sun S J Moura X Hu J K Hedrick and F Sun ldquoDynamictraffic feedback data enabled energy management in plug-inhybrid electric vehiclesrdquo IEEE Transactions on Control SystemsTechnology vol 23 no 3 pp 1075ndash1086 2015

[11] B Bader ldquoAn energy management strategy for plug-in hybridelectric vehiclesrdquoMateria vol 29 no 11 2013

[12] Z Chen L Li B Yan C Yang CMarinaMartinez andD CaoldquoMultimode Energy Management for Plug-In Hybrid ElectricBuses Based on Driving Cycles Predictionrdquo IEEE Transactionson Intelligent Transportation Systems vol 17 no 10 pp 2811ndash2821 2016

[13] Z Chen and Y Liu ldquoEnergy Management Strategy for Plug-inHybrid Electric Bus with Evolutionary Reinforcement Learningmethodrdquo Journal of Mechanical Engineering vol 53 no 16 pp86ndash93 2017

[14] T Liu B Wang and C Yang ldquoOnline Markov Chain-basedenergy management for a hybrid tracked vehicle with speedyQ-learningrdquo Energy vol 160 pp 544ndash555 2018

[15] L Teng ldquoOnline Energy Management for Multimode Plug-in Hybrid Electric Vehiclesrdquo IEEE Transactions on IndustrialInformatics vol 2018 no 10 pp 1ndash9 2018

[16] JWuHHe J Peng Y Li andZ Li ldquoContinuous reinforcementlearning of energy management with deep Q network for apower split hybrid electric busrdquo Applied Energy vol 222 pp799ndash811 2018

[17] T Liu X Hu S E Li and D Cao ldquoReinforcement LearningOptimized Look-Ahead Energy Management of a ParallelHybrid Electric Vehiclerdquo IEEEASME Transactions on Mecha-tronics vol 22 no 4 pp 1497ndash1507 2017

[18] T Liu and X Hu ldquoA Bi-Level Control for Energy EfficiencyImprovement of a Hybrid Tracked Vehiclerdquo IEEE Transactionson Industrial Informatics vol 14 no 4 pp 1616ndash1625 2018

[19] Y Yin Study on Optimization Matching and Simulation forSuper-Mild HEV Chongqing university Chongqing China2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 9: An Energy Management Strategy for a Super-Mild Hybrid

Journal of Control Science and Engineering 9

0 200 400 600 800 1000 1200 14000

50

100

150

Time (s)

Spee

d

(kmmiddoth

-1)

(a)

0 200 400 600 800 1000 1200 14000

1

2

3

4

Time (s)

Gea

r rat

ioi A

(b)

0 200 400 600 800 1000 1200 140003

04

05

06

07

SOC

Time (s)

(c)

0 200 400 600 800 1000 1200 1400minus50minus40minus30minus20minus10

010203040

Time (s)

Dem

and

pow

erPLK

(kW

)(d)

0 200 400 600 800 1000 1200 1400minus5

0

5

Time (s)

Mot

or p

ower

PG

(kW

)

(e)

40

0 200 400 600 800 1000 1200 14000

20

60

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(f)

0 200 400 600 800 1000 1200 1400minus30minus20

minus100

1020

Time (s)

Mot

or to

rque

TG

(Nmiddotm

)

(g)

1200 14000 200 400 600 800 1000minus05

005101520

3035

25

40

Equi

vale

nt fu

el

Time (s)

cons

umpt

ionb

KO

(mLmiddot

s-1)

(h)

Figure 11 ECE EUDC simulation results based on RL

5 Simulation Experiment and Results

A simulation is conducted on a MATLAB Simulink plat-form taking ECE EUDC driving cycle as the simulationdriving cycle and setting the initial SOC as 06 The energymanagement strategy based on a known model of RL isadopted simulating the vehicle operation status onlineResults are shown in Figure 11

From Figure 11 we can see that the following parameterschange with time gear ratio of transmission 119894g SOC ofthe battery power demand 119875req motor power 119875m enginetorque 119879e motor torque 119879119898 and instantaneous equivalentfuel consumption 119887equ From Figure 11(b) it can be seen that

the gear ratio varies continuously from 0498 to 404 thisis primarily because of the use of a CVT with reflux powerwhich unlike AMT can realize power without interruptionIn Figure 11(c) SOC of the battery is seen to change from theinitial value of 06 to the final value of 05845SOC = 00155this can meet the HEV SOC balance requirement before andafter the cycle (minus002 le Δ119878119874119862 le 002) The power demandchange curve of the vehicle is obtained based on ECE EUDCas shown in Figure 11(d)The power distribution curves of themotor and the engine can be obtained according to the enginepower optimization control tables as shown in Figures 11(e)and 11(f) The motor power is negative which indicates thatthe motor is in the generating state

10 Journal of Control Science and Engineering

0 200 400 600 800 1000 1200 1400

SOC

04

05

06

07

RL DP

Time (s)

(a)

0 200 400 600 800 1000 1200 1400minus30

minus20

minus10

0

10

20

RL DPTime (s)

Mot

or to

rque

TG

(Nmiddotm

)

(b)

0 200 400 600 800 1000 1200 14000

10

20

30

40

5060

70

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(c)

Figure 12 Comparison between DP and RL optimization results based on ECE EUDC

0 500 1000 15000

102030405060

Time (s)

Spee

d

(kmmiddoth

-1)

RL(a)

0 500 1000 150004

05

06

07

SOC

Time (s)

RL DP(b)

0 500 1000 1500minus50

minus25

0

25

50

Time (s)

Mot

or to

rque

TG

(Nmiddotm

)

RL DP(c)

0 500 1000 15000

20

40

60

80

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(d)

Figure 13 Comparison between DP and RL optimization results based on CYC WVUCITY

Journal of Control Science and Engineering 11

Table 3 Simulation results based on CYC WVUCITY

Control Policy Fuel Consumption (L100 km) Computation time(s)a

offline onlineDP 4983 5280 mdashRL 4864 4300 35aA 25GHz microprocessor with 8GB RAM was used

In order to validate the optimal performance of the RLcontrol strategy vehicle simulation tests based on DP andRL were carried out Figure 12 shows a comparison of theoptimization results by the two control methods based onthe ECE EUDC driving cycle The solid line indicates theoptimization results of RL and the dotted line indicatesthe results of DP Figure 12(a) shows the SOC optimizationtrajectories of the DP- and RL-based strategies The trend ofthe two curves is essentially the sameThe difference betweenthe final SOC and the initial SOC is within 002 and the SOCis relatively stable Compared with RL the SOC curve of DP-based strategy fluctuates greatly this is primarily related tothe distribution of the power source torque Figures 12(b) and12(c) indicate engine and motor torque distribution curvesbased on DP and RL strategies The engine torque curvesessentially coincide however the motor torque curves aresomewhat different This is primarily reflected in the torquedistribution when the motor is in the generation state

Table 2 shows the equivalent fuel consumption obtainedthrough DP and RL optimization Compared with thatobtained by DP (4952L) the value of fuel consumptionobtained by RL is 23 higher The reason for this is thatDP only ensures a global optimum at a given driving cycle(ECE EUDC) whereas RL optimizes the result for a series ofdriving cycles in an average sense thereby realizing a cumu-lative expected value optimum Compared with rule-based(5368L) and instantaneous optimization (5256L) proposedin the literature [19] RL decreased the fuel consumption by56 and 36 respectively

In order to validate the adaptability of RL to randomdriving cycle CYC WVUCITY is also selected as a simula-tion cycle The optimization results are shown in Figure 13Figure 13(b) shows the variation curve of the SOC of thebattery based on DP and RL control strategies The solidline indicates the optimization results of RL and the dottedline indicates the results of DP The final SOC value for theDP and RL strategies is 059 and 058 respectively with thesame initial value of 06The SOC remained essentially stableFigures 13(c) and 13(d) show the optimal torque curves ofthe motor and the engine based on DP and RL strategies Itcan be seen from the figure that the changing trend of thetorque curves based on the two strategies is largely the sameIn comparison the change in the torque obtained by DPfluctuates greatly Table 3 demonstrates the equivalent fuelconsumption of the two control strategies Compared withDP RL saves fuel by 24 it can adapt with random cycleperfectly Meanwhile the computation time based onDP andRL is recorded which include offline and online computationtime The offline computation time of DP is 5280s and thatof RL is 4300s Due to large calculation volume and the

driving cycle unknown in advance DP cannot be realizedonline while RL is not limited to a certain cycle which can berealized online by embedding the engine power optimizationtables into the vehicle controller The online operation timeof RL is 35s based on CYC WVUCITY simulation cycle

6 Conclusion

(1) We established a stochastic Markov model of thedriverrsquos power demand based on datasets of the UDDS andECE EUDC driving cycles(2) An RL energy management strategy was proposedwhich takes SOC vehicle speed and power demand as statevariables engine power as power as the action variable andminimum cumulative return expectation as the optimizationobjective(3) Using the MATLABSimulink platform a CYCWVUCITY simulation model was established The resultsshow that comparedwithDP RL could reduce fuel consump-tion by 24 and be implemented online

Data Availability

The data used to support the findings of this study areincluded within the article

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This research is supported by Scientific and Technol-ogy Research Program of Chongqing Municipal EducationCommission (Grant no KJQN201800718) and ChongqingResearchProgramof Basic Research andFrontier Technology(Grant no cstc2017jcyjAX0053)

References

[1] G Zhao and Z Chen ldquoEnergy management strategy for serieshybrid electric vehiclerdquo Journal of Northeastern University vol34 no 4 pp 583ndash587 2013

[2] W Wang and X Cheng ldquoForward simulation on the energymanagement system for series hybrid electric busrdquo AutomotiveEngineering vol 35 no 2 pp 121ndash126 2013

[3] D Qin and S Zhan ldquoManagement strategy of hybrid electricalvehicle based on driving style recognitionrdquo Journal of Mechani-cal Engineering vol 52 no 8 pp 162ndash169 2016

12 Journal of Control Science and Engineering

[4] S Zhan and D Qin ldquoEnergy management strategy of HEVbased on driving cycle recognition using genetic optimized K-means clustering algorithmrdquo Chinese Journal of Highway andTransport vol 29 no 4 pp 130ndash137 2016

[5] X Zeng and JWang ldquoA Parallel Hybrid Electric Vehicle EnergyManagement Strategy Using Stochastic Model Predictive Con-trol With Road Grade Previewrdquo IEEE Transactions on ControlSystems Technology vol 23 no 6 pp 2416ndash2423 2015

[6] X Tang and B Wang ldquoControl strategy for hybrid electric busbased on state transition probabilityrdquo Automotive EngineeringMagazine vol 38 no 3 pp 263ndash268 2016

[7] Z Ma and S Cui ldquoApplying dynamic programming to HEVpowertrain based on EVTrdquo in Proceedings of the 2015 Inter-national Conference on Control Automation and InformationSciences vol 22 pp 37ndash41 2015

[8] Y Yang and P Ye ldquoA Research on the Fuzzy Control Strategy fora Speed-coupling ISG HEV Based on Dynamic Programmingoptimizationrdquo Automotive Engineering vol 38 no 6 pp 674ndash679 2016

[9] Z Chen C C Mi J Xu X Gong and C You ldquoEnergymanagement for a power-split plug-in hybrid electric vehiclebased on dynamic programming and neural networksrdquo IEEETransactions on Vehicular Technology vol 63 no 4 pp 1567ndash1580 2014

[10] C Sun S J Moura X Hu J K Hedrick and F Sun ldquoDynamictraffic feedback data enabled energy management in plug-inhybrid electric vehiclesrdquo IEEE Transactions on Control SystemsTechnology vol 23 no 3 pp 1075ndash1086 2015

[11] B Bader ldquoAn energy management strategy for plug-in hybridelectric vehiclesrdquoMateria vol 29 no 11 2013

[12] Z Chen L Li B Yan C Yang CMarinaMartinez andD CaoldquoMultimode Energy Management for Plug-In Hybrid ElectricBuses Based on Driving Cycles Predictionrdquo IEEE Transactionson Intelligent Transportation Systems vol 17 no 10 pp 2811ndash2821 2016

[13] Z Chen and Y Liu ldquoEnergy Management Strategy for Plug-inHybrid Electric Bus with Evolutionary Reinforcement Learningmethodrdquo Journal of Mechanical Engineering vol 53 no 16 pp86ndash93 2017

[14] T Liu B Wang and C Yang ldquoOnline Markov Chain-basedenergy management for a hybrid tracked vehicle with speedyQ-learningrdquo Energy vol 160 pp 544ndash555 2018

[15] L Teng ldquoOnline Energy Management for Multimode Plug-in Hybrid Electric Vehiclesrdquo IEEE Transactions on IndustrialInformatics vol 2018 no 10 pp 1ndash9 2018

[16] JWuHHe J Peng Y Li andZ Li ldquoContinuous reinforcementlearning of energy management with deep Q network for apower split hybrid electric busrdquo Applied Energy vol 222 pp799ndash811 2018

[17] T Liu X Hu S E Li and D Cao ldquoReinforcement LearningOptimized Look-Ahead Energy Management of a ParallelHybrid Electric Vehiclerdquo IEEEASME Transactions on Mecha-tronics vol 22 no 4 pp 1497ndash1507 2017

[18] T Liu and X Hu ldquoA Bi-Level Control for Energy EfficiencyImprovement of a Hybrid Tracked Vehiclerdquo IEEE Transactionson Industrial Informatics vol 14 no 4 pp 1616ndash1625 2018

[19] Y Yin Study on Optimization Matching and Simulation forSuper-Mild HEV Chongqing university Chongqing China2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 10: An Energy Management Strategy for a Super-Mild Hybrid

10 Journal of Control Science and Engineering

0 200 400 600 800 1000 1200 1400

SOC

04

05

06

07

RL DP

Time (s)

(a)

0 200 400 600 800 1000 1200 1400minus30

minus20

minus10

0

10

20

RL DPTime (s)

Mot

or to

rque

TG

(Nmiddotm

)

(b)

0 200 400 600 800 1000 1200 14000

10

20

30

40

5060

70

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(c)

Figure 12 Comparison between DP and RL optimization results based on ECE EUDC

0 500 1000 15000

102030405060

Time (s)

Spee

d

(kmmiddoth

-1)

RL(a)

0 500 1000 150004

05

06

07

SOC

Time (s)

RL DP(b)

0 500 1000 1500minus50

minus25

0

25

50

Time (s)

Mot

or to

rque

TG

(Nmiddotm

)

RL DP(c)

0 500 1000 15000

20

40

60

80

RL DP

Time (s)

Engi

ne to

rque

T

(Nmiddotm

)

(d)

Figure 13 Comparison between DP and RL optimization results based on CYC WVUCITY

Journal of Control Science and Engineering 11

Table 3 Simulation results based on CYC WVUCITY

Control Policy Fuel Consumption (L100 km) Computation time(s)a

offline onlineDP 4983 5280 mdashRL 4864 4300 35aA 25GHz microprocessor with 8GB RAM was used

In order to validate the optimal performance of the RLcontrol strategy vehicle simulation tests based on DP andRL were carried out Figure 12 shows a comparison of theoptimization results by the two control methods based onthe ECE EUDC driving cycle The solid line indicates theoptimization results of RL and the dotted line indicatesthe results of DP Figure 12(a) shows the SOC optimizationtrajectories of the DP- and RL-based strategies The trend ofthe two curves is essentially the sameThe difference betweenthe final SOC and the initial SOC is within 002 and the SOCis relatively stable Compared with RL the SOC curve of DP-based strategy fluctuates greatly this is primarily related tothe distribution of the power source torque Figures 12(b) and12(c) indicate engine and motor torque distribution curvesbased on DP and RL strategies The engine torque curvesessentially coincide however the motor torque curves aresomewhat different This is primarily reflected in the torquedistribution when the motor is in the generation state

Table 2 shows the equivalent fuel consumption obtainedthrough DP and RL optimization Compared with thatobtained by DP (4952L) the value of fuel consumptionobtained by RL is 23 higher The reason for this is thatDP only ensures a global optimum at a given driving cycle(ECE EUDC) whereas RL optimizes the result for a series ofdriving cycles in an average sense thereby realizing a cumu-lative expected value optimum Compared with rule-based(5368L) and instantaneous optimization (5256L) proposedin the literature [19] RL decreased the fuel consumption by56 and 36 respectively

In order to validate the adaptability of RL to randomdriving cycle CYC WVUCITY is also selected as a simula-tion cycle The optimization results are shown in Figure 13Figure 13(b) shows the variation curve of the SOC of thebattery based on DP and RL control strategies The solidline indicates the optimization results of RL and the dottedline indicates the results of DP The final SOC value for theDP and RL strategies is 059 and 058 respectively with thesame initial value of 06The SOC remained essentially stableFigures 13(c) and 13(d) show the optimal torque curves ofthe motor and the engine based on DP and RL strategies Itcan be seen from the figure that the changing trend of thetorque curves based on the two strategies is largely the sameIn comparison the change in the torque obtained by DPfluctuates greatly Table 3 demonstrates the equivalent fuelconsumption of the two control strategies Compared withDP RL saves fuel by 24 it can adapt with random cycleperfectly Meanwhile the computation time based onDP andRL is recorded which include offline and online computationtime The offline computation time of DP is 5280s and thatof RL is 4300s Due to large calculation volume and the

driving cycle unknown in advance DP cannot be realizedonline while RL is not limited to a certain cycle which can berealized online by embedding the engine power optimizationtables into the vehicle controller The online operation timeof RL is 35s based on CYC WVUCITY simulation cycle

6 Conclusion

(1) We established a stochastic Markov model of thedriverrsquos power demand based on datasets of the UDDS andECE EUDC driving cycles(2) An RL energy management strategy was proposedwhich takes SOC vehicle speed and power demand as statevariables engine power as power as the action variable andminimum cumulative return expectation as the optimizationobjective(3) Using the MATLABSimulink platform a CYCWVUCITY simulation model was established The resultsshow that comparedwithDP RL could reduce fuel consump-tion by 24 and be implemented online

Data Availability

The data used to support the findings of this study areincluded within the article

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This research is supported by Scientific and Technol-ogy Research Program of Chongqing Municipal EducationCommission (Grant no KJQN201800718) and ChongqingResearchProgramof Basic Research andFrontier Technology(Grant no cstc2017jcyjAX0053)

References

[1] G Zhao and Z Chen ldquoEnergy management strategy for serieshybrid electric vehiclerdquo Journal of Northeastern University vol34 no 4 pp 583ndash587 2013

[2] W Wang and X Cheng ldquoForward simulation on the energymanagement system for series hybrid electric busrdquo AutomotiveEngineering vol 35 no 2 pp 121ndash126 2013

[3] D Qin and S Zhan ldquoManagement strategy of hybrid electricalvehicle based on driving style recognitionrdquo Journal of Mechani-cal Engineering vol 52 no 8 pp 162ndash169 2016

12 Journal of Control Science and Engineering

[4] S Zhan and D Qin ldquoEnergy management strategy of HEVbased on driving cycle recognition using genetic optimized K-means clustering algorithmrdquo Chinese Journal of Highway andTransport vol 29 no 4 pp 130ndash137 2016

[5] X Zeng and JWang ldquoA Parallel Hybrid Electric Vehicle EnergyManagement Strategy Using Stochastic Model Predictive Con-trol With Road Grade Previewrdquo IEEE Transactions on ControlSystems Technology vol 23 no 6 pp 2416ndash2423 2015

[6] X Tang and B Wang ldquoControl strategy for hybrid electric busbased on state transition probabilityrdquo Automotive EngineeringMagazine vol 38 no 3 pp 263ndash268 2016

[7] Z Ma and S Cui ldquoApplying dynamic programming to HEVpowertrain based on EVTrdquo in Proceedings of the 2015 Inter-national Conference on Control Automation and InformationSciences vol 22 pp 37ndash41 2015

[8] Y Yang and P Ye ldquoA Research on the Fuzzy Control Strategy fora Speed-coupling ISG HEV Based on Dynamic Programmingoptimizationrdquo Automotive Engineering vol 38 no 6 pp 674ndash679 2016

[9] Z Chen C C Mi J Xu X Gong and C You ldquoEnergymanagement for a power-split plug-in hybrid electric vehiclebased on dynamic programming and neural networksrdquo IEEETransactions on Vehicular Technology vol 63 no 4 pp 1567ndash1580 2014

[10] C Sun S J Moura X Hu J K Hedrick and F Sun ldquoDynamictraffic feedback data enabled energy management in plug-inhybrid electric vehiclesrdquo IEEE Transactions on Control SystemsTechnology vol 23 no 3 pp 1075ndash1086 2015

[11] B Bader ldquoAn energy management strategy for plug-in hybridelectric vehiclesrdquoMateria vol 29 no 11 2013

[12] Z Chen L Li B Yan C Yang CMarinaMartinez andD CaoldquoMultimode Energy Management for Plug-In Hybrid ElectricBuses Based on Driving Cycles Predictionrdquo IEEE Transactionson Intelligent Transportation Systems vol 17 no 10 pp 2811ndash2821 2016

[13] Z Chen and Y Liu ldquoEnergy Management Strategy for Plug-inHybrid Electric Bus with Evolutionary Reinforcement Learningmethodrdquo Journal of Mechanical Engineering vol 53 no 16 pp86ndash93 2017

[14] T Liu B Wang and C Yang ldquoOnline Markov Chain-basedenergy management for a hybrid tracked vehicle with speedyQ-learningrdquo Energy vol 160 pp 544ndash555 2018

[15] L Teng ldquoOnline Energy Management for Multimode Plug-in Hybrid Electric Vehiclesrdquo IEEE Transactions on IndustrialInformatics vol 2018 no 10 pp 1ndash9 2018

[16] JWuHHe J Peng Y Li andZ Li ldquoContinuous reinforcementlearning of energy management with deep Q network for apower split hybrid electric busrdquo Applied Energy vol 222 pp799ndash811 2018

[17] T Liu X Hu S E Li and D Cao ldquoReinforcement LearningOptimized Look-Ahead Energy Management of a ParallelHybrid Electric Vehiclerdquo IEEEASME Transactions on Mecha-tronics vol 22 no 4 pp 1497ndash1507 2017

[18] T Liu and X Hu ldquoA Bi-Level Control for Energy EfficiencyImprovement of a Hybrid Tracked Vehiclerdquo IEEE Transactionson Industrial Informatics vol 14 no 4 pp 1616ndash1625 2018

[19] Y Yin Study on Optimization Matching and Simulation forSuper-Mild HEV Chongqing university Chongqing China2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 11: An Energy Management Strategy for a Super-Mild Hybrid

Journal of Control Science and Engineering 11

Table 3 Simulation results based on CYC WVUCITY

Control Policy Fuel Consumption (L100 km) Computation time(s)a

offline onlineDP 4983 5280 mdashRL 4864 4300 35aA 25GHz microprocessor with 8GB RAM was used

In order to validate the optimal performance of the RLcontrol strategy vehicle simulation tests based on DP andRL were carried out Figure 12 shows a comparison of theoptimization results by the two control methods based onthe ECE EUDC driving cycle The solid line indicates theoptimization results of RL and the dotted line indicatesthe results of DP Figure 12(a) shows the SOC optimizationtrajectories of the DP- and RL-based strategies The trend ofthe two curves is essentially the sameThe difference betweenthe final SOC and the initial SOC is within 002 and the SOCis relatively stable Compared with RL the SOC curve of DP-based strategy fluctuates greatly this is primarily related tothe distribution of the power source torque Figures 12(b) and12(c) indicate engine and motor torque distribution curvesbased on DP and RL strategies The engine torque curvesessentially coincide however the motor torque curves aresomewhat different This is primarily reflected in the torquedistribution when the motor is in the generation state

Table 2 shows the equivalent fuel consumption obtainedthrough DP and RL optimization Compared with thatobtained by DP (4952L) the value of fuel consumptionobtained by RL is 23 higher The reason for this is thatDP only ensures a global optimum at a given driving cycle(ECE EUDC) whereas RL optimizes the result for a series ofdriving cycles in an average sense thereby realizing a cumu-lative expected value optimum Compared with rule-based(5368L) and instantaneous optimization (5256L) proposedin the literature [19] RL decreased the fuel consumption by56 and 36 respectively

In order to validate the adaptability of RL to randomdriving cycle CYC WVUCITY is also selected as a simula-tion cycle The optimization results are shown in Figure 13Figure 13(b) shows the variation curve of the SOC of thebattery based on DP and RL control strategies The solidline indicates the optimization results of RL and the dottedline indicates the results of DP The final SOC value for theDP and RL strategies is 059 and 058 respectively with thesame initial value of 06The SOC remained essentially stableFigures 13(c) and 13(d) show the optimal torque curves ofthe motor and the engine based on DP and RL strategies Itcan be seen from the figure that the changing trend of thetorque curves based on the two strategies is largely the sameIn comparison the change in the torque obtained by DPfluctuates greatly Table 3 demonstrates the equivalent fuelconsumption of the two control strategies Compared withDP RL saves fuel by 24 it can adapt with random cycleperfectly Meanwhile the computation time based onDP andRL is recorded which include offline and online computationtime The offline computation time of DP is 5280s and thatof RL is 4300s Due to large calculation volume and the

driving cycle unknown in advance DP cannot be realizedonline while RL is not limited to a certain cycle which can berealized online by embedding the engine power optimizationtables into the vehicle controller The online operation timeof RL is 35s based on CYC WVUCITY simulation cycle

6 Conclusion

(1) We established a stochastic Markov model of thedriverrsquos power demand based on datasets of the UDDS andECE EUDC driving cycles(2) An RL energy management strategy was proposedwhich takes SOC vehicle speed and power demand as statevariables engine power as power as the action variable andminimum cumulative return expectation as the optimizationobjective(3) Using the MATLABSimulink platform a CYCWVUCITY simulation model was established The resultsshow that comparedwithDP RL could reduce fuel consump-tion by 24 and be implemented online

Data Availability

The data used to support the findings of this study areincluded within the article

Conflicts of Interest

The authors declare that there are no conflicts of interestregarding the publication of this paper

Acknowledgments

This research is supported by Scientific and Technol-ogy Research Program of Chongqing Municipal EducationCommission (Grant no KJQN201800718) and ChongqingResearchProgramof Basic Research andFrontier Technology(Grant no cstc2017jcyjAX0053)

References

[1] G Zhao and Z Chen ldquoEnergy management strategy for serieshybrid electric vehiclerdquo Journal of Northeastern University vol34 no 4 pp 583ndash587 2013

[2] W Wang and X Cheng ldquoForward simulation on the energymanagement system for series hybrid electric busrdquo AutomotiveEngineering vol 35 no 2 pp 121ndash126 2013

[3] D Qin and S Zhan ldquoManagement strategy of hybrid electricalvehicle based on driving style recognitionrdquo Journal of Mechani-cal Engineering vol 52 no 8 pp 162ndash169 2016

12 Journal of Control Science and Engineering

[4] S Zhan and D Qin ldquoEnergy management strategy of HEVbased on driving cycle recognition using genetic optimized K-means clustering algorithmrdquo Chinese Journal of Highway andTransport vol 29 no 4 pp 130ndash137 2016

[5] X Zeng and JWang ldquoA Parallel Hybrid Electric Vehicle EnergyManagement Strategy Using Stochastic Model Predictive Con-trol With Road Grade Previewrdquo IEEE Transactions on ControlSystems Technology vol 23 no 6 pp 2416ndash2423 2015

[6] X Tang and B Wang ldquoControl strategy for hybrid electric busbased on state transition probabilityrdquo Automotive EngineeringMagazine vol 38 no 3 pp 263ndash268 2016

[7] Z Ma and S Cui ldquoApplying dynamic programming to HEVpowertrain based on EVTrdquo in Proceedings of the 2015 Inter-national Conference on Control Automation and InformationSciences vol 22 pp 37ndash41 2015

[8] Y Yang and P Ye ldquoA Research on the Fuzzy Control Strategy fora Speed-coupling ISG HEV Based on Dynamic Programmingoptimizationrdquo Automotive Engineering vol 38 no 6 pp 674ndash679 2016

[9] Z Chen C C Mi J Xu X Gong and C You ldquoEnergymanagement for a power-split plug-in hybrid electric vehiclebased on dynamic programming and neural networksrdquo IEEETransactions on Vehicular Technology vol 63 no 4 pp 1567ndash1580 2014

[10] C Sun S J Moura X Hu J K Hedrick and F Sun ldquoDynamictraffic feedback data enabled energy management in plug-inhybrid electric vehiclesrdquo IEEE Transactions on Control SystemsTechnology vol 23 no 3 pp 1075ndash1086 2015

[11] B Bader ldquoAn energy management strategy for plug-in hybridelectric vehiclesrdquoMateria vol 29 no 11 2013

[12] Z Chen L Li B Yan C Yang CMarinaMartinez andD CaoldquoMultimode Energy Management for Plug-In Hybrid ElectricBuses Based on Driving Cycles Predictionrdquo IEEE Transactionson Intelligent Transportation Systems vol 17 no 10 pp 2811ndash2821 2016

[13] Z Chen and Y Liu ldquoEnergy Management Strategy for Plug-inHybrid Electric Bus with Evolutionary Reinforcement Learningmethodrdquo Journal of Mechanical Engineering vol 53 no 16 pp86ndash93 2017

[14] T Liu B Wang and C Yang ldquoOnline Markov Chain-basedenergy management for a hybrid tracked vehicle with speedyQ-learningrdquo Energy vol 160 pp 544ndash555 2018

[15] L Teng ldquoOnline Energy Management for Multimode Plug-in Hybrid Electric Vehiclesrdquo IEEE Transactions on IndustrialInformatics vol 2018 no 10 pp 1ndash9 2018

[16] JWuHHe J Peng Y Li andZ Li ldquoContinuous reinforcementlearning of energy management with deep Q network for apower split hybrid electric busrdquo Applied Energy vol 222 pp799ndash811 2018

[17] T Liu X Hu S E Li and D Cao ldquoReinforcement LearningOptimized Look-Ahead Energy Management of a ParallelHybrid Electric Vehiclerdquo IEEEASME Transactions on Mecha-tronics vol 22 no 4 pp 1497ndash1507 2017

[18] T Liu and X Hu ldquoA Bi-Level Control for Energy EfficiencyImprovement of a Hybrid Tracked Vehiclerdquo IEEE Transactionson Industrial Informatics vol 14 no 4 pp 1616ndash1625 2018

[19] Y Yin Study on Optimization Matching and Simulation forSuper-Mild HEV Chongqing university Chongqing China2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 12: An Energy Management Strategy for a Super-Mild Hybrid

12 Journal of Control Science and Engineering

[4] S Zhan and D Qin ldquoEnergy management strategy of HEVbased on driving cycle recognition using genetic optimized K-means clustering algorithmrdquo Chinese Journal of Highway andTransport vol 29 no 4 pp 130ndash137 2016

[5] X Zeng and JWang ldquoA Parallel Hybrid Electric Vehicle EnergyManagement Strategy Using Stochastic Model Predictive Con-trol With Road Grade Previewrdquo IEEE Transactions on ControlSystems Technology vol 23 no 6 pp 2416ndash2423 2015

[6] X Tang and B Wang ldquoControl strategy for hybrid electric busbased on state transition probabilityrdquo Automotive EngineeringMagazine vol 38 no 3 pp 263ndash268 2016

[7] Z Ma and S Cui ldquoApplying dynamic programming to HEVpowertrain based on EVTrdquo in Proceedings of the 2015 Inter-national Conference on Control Automation and InformationSciences vol 22 pp 37ndash41 2015

[8] Y Yang and P Ye ldquoA Research on the Fuzzy Control Strategy fora Speed-coupling ISG HEV Based on Dynamic Programmingoptimizationrdquo Automotive Engineering vol 38 no 6 pp 674ndash679 2016

[9] Z Chen C C Mi J Xu X Gong and C You ldquoEnergymanagement for a power-split plug-in hybrid electric vehiclebased on dynamic programming and neural networksrdquo IEEETransactions on Vehicular Technology vol 63 no 4 pp 1567ndash1580 2014

[10] C Sun S J Moura X Hu J K Hedrick and F Sun ldquoDynamictraffic feedback data enabled energy management in plug-inhybrid electric vehiclesrdquo IEEE Transactions on Control SystemsTechnology vol 23 no 3 pp 1075ndash1086 2015

[11] B Bader ldquoAn energy management strategy for plug-in hybridelectric vehiclesrdquoMateria vol 29 no 11 2013

[12] Z Chen L Li B Yan C Yang CMarinaMartinez andD CaoldquoMultimode Energy Management for Plug-In Hybrid ElectricBuses Based on Driving Cycles Predictionrdquo IEEE Transactionson Intelligent Transportation Systems vol 17 no 10 pp 2811ndash2821 2016

[13] Z Chen and Y Liu ldquoEnergy Management Strategy for Plug-inHybrid Electric Bus with Evolutionary Reinforcement Learningmethodrdquo Journal of Mechanical Engineering vol 53 no 16 pp86ndash93 2017

[14] T Liu B Wang and C Yang ldquoOnline Markov Chain-basedenergy management for a hybrid tracked vehicle with speedyQ-learningrdquo Energy vol 160 pp 544ndash555 2018

[15] L Teng ldquoOnline Energy Management for Multimode Plug-in Hybrid Electric Vehiclesrdquo IEEE Transactions on IndustrialInformatics vol 2018 no 10 pp 1ndash9 2018

[16] JWuHHe J Peng Y Li andZ Li ldquoContinuous reinforcementlearning of energy management with deep Q network for apower split hybrid electric busrdquo Applied Energy vol 222 pp799ndash811 2018

[17] T Liu X Hu S E Li and D Cao ldquoReinforcement LearningOptimized Look-Ahead Energy Management of a ParallelHybrid Electric Vehiclerdquo IEEEASME Transactions on Mecha-tronics vol 22 no 4 pp 1497ndash1507 2017

[18] T Liu and X Hu ldquoA Bi-Level Control for Energy EfficiencyImprovement of a Hybrid Tracked Vehiclerdquo IEEE Transactionson Industrial Informatics vol 14 no 4 pp 1616ndash1625 2018

[19] Y Yin Study on Optimization Matching and Simulation forSuper-Mild HEV Chongqing university Chongqing China2011

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom

Page 13: An Energy Management Strategy for a Super-Mild Hybrid

International Journal of

AerospaceEngineeringHindawiwwwhindawicom Volume 2018

RoboticsJournal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Active and Passive Electronic Components

VLSI Design

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Shock and Vibration

Hindawiwwwhindawicom Volume 2018

Civil EngineeringAdvances in

Acoustics and VibrationAdvances in

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Electrical and Computer Engineering

Journal of

Advances inOptoElectronics

Hindawiwwwhindawicom

Volume 2018

Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom

The Scientific World Journal

Volume 2018

Control Scienceand Engineering

Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom

Journal ofEngineeringVolume 2018

SensorsJournal of

Hindawiwwwhindawicom Volume 2018

International Journal of

RotatingMachinery

Hindawiwwwhindawicom Volume 2018

Modelling ampSimulationin EngineeringHindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Chemical EngineeringInternational Journal of Antennas and

Propagation

International Journal of

Hindawiwwwhindawicom Volume 2018

Hindawiwwwhindawicom Volume 2018

Navigation and Observation

International Journal of

Hindawi

wwwhindawicom Volume 2018

Advances in

Multimedia

Submit your manuscripts atwwwhindawicom