16
IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 63, NO. 9, SEPTEMBER 2018 2771 Time-Inconsistent Mean-Field Stochastic LQ Problem: Open-Loop Time-Consistent Control Yuan-Hua Ni , Ji-Feng Zhang , Fellow, IEEE, and Miroslav Krstic , Fellow, IEEE AbstractThis paper is concerned with the open-loop time-consistent solution of time-inconsistent mean-field stochastic linear-quadratic (LQ) optimal control. Different from standard stochastic linear-quadratic problems, both the system matrices and the weighting matrices are de- pending on the initial times, and the conditional expecta- tions of the control and state enter quadratically into the cost functional. Such features will ruin Bellman’s principle of optimality and result in the time inconsistency of op- timal control. Based on the dynamical nature of the sys- tems involved, a kind of open-loop time-consistent equi- librium control is investigated in this paper. It is shown that the existence of open-loop equilibrium control for a fixed initial pair is equivalent to the solvability of a set of forward–backward stochastic difference equations with sta- tionary condition and convexity condition. By decoupling the forward–backward stochastic difference equations, nec- essary and sufficient conditions in terms of linear difference equations and generalized difference Riccati equations are given for the existence of open-loop equilibrium control for a fixed initial pair. Moreover, the existence of open-loop time-consistent equilibrium controls for all the initial pairs is shown to be equivalent to the solvability of a set of cou- pled constrained generalized difference Riccati equations and two sets of constrained linear difference equations. Index TermsForward–backward stochastic difference equation, mean-field theory, stochastic linear-quadratic op- timal control, time inconsistency. I. INTRODUCTION A. Time Consistency versus Time Inconsistency T HOUGH not mentioned frequently, time consistency is in- deed an essential notion in optimal control theory, which Manuscript received July 25, 2017; revised July 31, 2017; accepted November 5, 2017. Date of publication November 22, 2017; date of cur- rent version August 28, 2018. The work of Y.-H. Ni was supported by the National Natural Science Foundation of China under Grant 11471242 and Grant 61773222. The work of J.-F. Zhang was supported in part by the National Natural Science Foundation of China under Grant 61227902 and in part by the National Key Basic Research Program of China (973 Program) under Grant 2014CB845301. This paper was presented in part at the 35th Chinese Control Conference, Jul. 27–29, 2016. Recom- mended by Associate Editor E. Zhou. (Corresponding author: Yuan-Hua Ni.) Y.-H. Ni is with the College of Computer and Control En- gineering, Nankai University, Tianjin 300350, P.R. China (e-mail: [email protected]). J.-F. Zhang is with the Key Laboratory of Systems and Control, Institute of Systems Science, Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100190, P.R. China, and also with the School of Mathematical Sciences, University of Chinese Academy of Sciences, Beijing 100049, P.R. China (e-mail: [email protected]). M. Krstic is with the Department of Mechanical and Aerospace Engi- neering, University of California, San Diego, La Jolla, CA 92093-0411 USA (e-mail: [email protected]). Digital Object Identifier 10.1109/TAC.2017.2776740 relates to Bellman’s principle of optimality. To see this, re- call a standard discrete-time stochastic optimal control problem, whose system dynamics and cost functional are given, respec- tively, by X k +1 = f (k,X k ,u k ,w k ) X t = x R n ,k T t ,t T (1) and J (t, x; u)= N 1 k =t E e δ ( k t ) L(k,X k ,u k ) + E e δ ( N t ) h(X N ) . (2) Here, T t = {t,...,N 1}, T = {0, 1,...,N 1}, and N is a positive integer; {X k ,k T t } and {u k ,k T t } with T t = {t,...,N } are the state process and the control process, re- spectively; {w k ,k T } is a stochastic disturbance process; E is the operator of mathematical expectation. Without loss of generality, the functions f , L, and h are assumed bounded. Let U [t, N 1] be a set of admissible controls. Then, we have the following optimal control problem. Problem (C): Letting (t, x) T × R n , find a ¯ u ∈U [t, N 1] such that J (t, xu)= inf u ∈U [ t,N 1] J (t, x; u). (3) Above Problem will be called Problem (C) for the ini- tial pair (t, x), and Problem (C) for other initial pairs can be similarly formulated. Any ¯ u ∈U [t, N 1] satisfying (3) is called an optimal control for the initial pair (t, x), and ¯ X = { ¯ X k = ¯ X(k; t, x, ¯ u),k T t } is the corresponding opti- mal trajectory. Furthermore, ( ¯ X, ¯ u) is referred to as an optimal pair for the initial pair (t, x). Let ( ¯ X, ¯ u) be an optimal pair for the initial pair (t, x); as the dynamics evolves, we indeed face a family of optimal control problems, namely, Problem (C) for the initial pairs {(k, ¯ X k ),k T t }. Bellman’s principle of optimality tells us that the optimal controls of this family of problems are interre- lated, namely, for any τ T t +1 = {t +1,...,N 1}, ¯ u| T τ = { ¯ u τ ,..., ¯ u N 1 } (the restriction of ¯ u on T τ = {τ,...,N 1}) is an optimal control of Problem (C) for the initial pair (τ, ¯ X τ ). This property is the cornerstone of Bellman’s dynamic pro- gramming and is referred to as the time consistency of optimal control, which is essential to handle optimal control problems like Problem (C) and its continuous-time counterpart. In such situation, we call that Problem (C) is time consistent. However, the time-consistency fails quite often in many situ- ations. For instance, when the exponential discounting function e δ ( k t ) in (2) is replaced by other discounting functions, the 0018-9286 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Time-Inconsistent Mean-Field Stochastic LQ Problem: Open

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Time-Inconsistent Mean-Field Stochastic LQ Problem: Open

IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 63, NO. 9, SEPTEMBER 2018 2771

Time-Inconsistent Mean-Field Stochastic LQProblem: Open-Loop Time-Consistent Control

Yuan-Hua Ni , Ji-Feng Zhang , Fellow, IEEE, and Miroslav Krstic , Fellow, IEEE

Abstract—This paper is concerned with the open-looptime-consistent solution of time-inconsistent mean-fieldstochastic linear-quadratic (LQ) optimal control. Differentfrom standard stochastic linear-quadratic problems, boththe system matrices and the weighting matrices are de-pending on the initial times, and the conditional expecta-tions of the control and state enter quadratically into thecost functional. Such features will ruin Bellman’s principleof optimality and result in the time inconsistency of op-timal control. Based on the dynamical nature of the sys-tems involved, a kind of open-loop time-consistent equi-librium control is investigated in this paper. It is shownthat the existence of open-loop equilibrium control for afixed initial pair is equivalent to the solvability of a set offorward–backward stochastic difference equations with sta-tionary condition and convexity condition. By decouplingthe forward–backward stochastic difference equations, nec-essary and sufficient conditions in terms of linear differenceequations and generalized difference Riccati equations aregiven for the existence of open-loop equilibrium control fora fixed initial pair. Moreover, the existence of open-looptime-consistent equilibrium controls for all the initial pairsis shown to be equivalent to the solvability of a set of cou-pled constrained generalized difference Riccati equationsand two sets of constrained linear difference equations.

Index Terms—Forward–backward stochastic differenceequation, mean-field theory, stochastic linear-quadratic op-timal control, time inconsistency.

I. INTRODUCTION

A. Time Consistency versus Time Inconsistency

THOUGH not mentioned frequently, time consistency is in-deed an essential notion in optimal control theory, which

Manuscript received July 25, 2017; revised July 31, 2017; acceptedNovember 5, 2017. Date of publication November 22, 2017; date of cur-rent version August 28, 2018. The work of Y.-H. Ni was supported bythe National Natural Science Foundation of China under Grant 11471242and Grant 61773222. The work of J.-F. Zhang was supported in part bythe National Natural Science Foundation of China under Grant 61227902and in part by the National Key Basic Research Program of China (973Program) under Grant 2014CB845301. This paper was presented inpart at the 35th Chinese Control Conference, Jul. 27–29, 2016. Recom-mended by Associate Editor E. Zhou. (Corresponding author: Yuan-HuaNi.)

Y.-H. Ni is with the College of Computer and Control En-gineering, Nankai University, Tianjin 300350, P.R. China (e-mail:[email protected]).

J.-F. Zhang is with the Key Laboratory of Systems and Control, Instituteof Systems Science, Academy of Mathematics and Systems Science,Chinese Academy of Sciences, Beijing 100190, P.R. China, and also withthe School of Mathematical Sciences, University of Chinese Academyof Sciences, Beijing 100049, P.R. China (e-mail: [email protected]).

M. Krstic is with the Department of Mechanical and Aerospace Engi-neering, University of California, San Diego, La Jolla, CA 92093-0411USA (e-mail: [email protected]).

Digital Object Identifier 10.1109/TAC.2017.2776740

relates to Bellman’s principle of optimality. To see this, re-call a standard discrete-time stochastic optimal control problem,whose system dynamics and cost functional are given, respec-tively, by {

Xk+1 = f(k,Xk , uk , wk )

Xt = x ∈ Rn , k ∈ Tt , t ∈ T(1)

and

J(t, x;u) =N −1∑k=t

E[e−δ(k−t)L(k,Xk , uk )

]

+ E[e−δ(N −t)h(XN )

]. (2)

Here, Tt = {t, . . . , N − 1}, T = {0, 1, . . . , N − 1}, and N isa positive integer; {Xk, k ∈ Tt} and {uk , k ∈ Tt} with Tt ={t, . . . , N} are the state process and the control process, re-spectively; {wk , k ∈ T} is a stochastic disturbance process; Eis the operator of mathematical expectation. Without loss ofgenerality, the functions f , L, and h are assumed bounded. LetU [t,N − 1] be a set of admissible controls. Then, we have thefollowing optimal control problem.

Problem (C): Letting (t, x)∈T× Rn , find a u ∈ U [t,N − 1]such that

J(t, x; u) = infu∈U [t,N −1]

J(t, x;u). (3)

Above Problem will be called Problem (C) for the ini-tial pair (t, x), and Problem (C) for other initial pairs canbe similarly formulated. Any u ∈ U [t,N − 1] satisfying (3)is called an optimal control for the initial pair (t, x), andX = {Xk = X(k; t, x, u), k ∈ Tt} is the corresponding opti-mal trajectory. Furthermore, (X, u) is referred to as an optimalpair for the initial pair (t, x).

Let (X, u) be an optimal pair for the initial pair (t, x); asthe dynamics evolves, we indeed face a family of optimalcontrol problems, namely, Problem (C) for the initial pairs{(k, Xk ), k ∈ Tt}. Bellman’s principle of optimality tells usthat the optimal controls of this family of problems are interre-lated, namely, for any τ ∈ Tt+1 = {t + 1, . . . , N − 1}, u|Tτ

={uτ , . . . , uN −1} (the restriction of u on Tτ = {τ, . . . , N − 1})is an optimal control of Problem (C) for the initial pair (τ, Xτ ).This property is the cornerstone of Bellman’s dynamic pro-gramming and is referred to as the time consistency of optimalcontrol, which is essential to handle optimal control problemslike Problem (C) and its continuous-time counterpart. In suchsituation, we call that Problem (C) is time consistent.

However, the time-consistency fails quite often in many situ-ations. For instance, when the exponential discounting functione−δ(k−t) in (2) is replaced by other discounting functions, the

0018-9286 © 2017 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

Page 2: Time-Inconsistent Mean-Field Stochastic LQ Problem: Open

2772 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 63, NO. 9, SEPTEMBER 2018

corresponding problem is not time consistent, i.e., time incon-sistent; see examples in [5] and [19] about the hyperbolic dis-counting and quasi-geometric discounting. In addition, when theconditional expectations of the state and/or control enters non-linearly into the cost functional, the considered optimal controlproblems are time inconsistent too; a notable example is themean–variance utility [2], [5], [9], [11], [22], [24]. In such case,the smoothing property of conditional expectation will not besufficient to ensure the time consistency of optimal control.

B. Literature Review

Problems with nonlinear terms of conditional expectation (inthe cost functional) are classified into the mean-field stochasticoptimal control [37]. In [22], recognizing the time inconsistency(called nonseparability there), Li and Ng derived the optimalpolicy of multiperiod mean–variance portfolio selection by us-ing an embedding scheme. Note that the optimal policy of [22]is with respect to the initial pair, i.e., it makes sense to be optimalonly when viewed at the initial time. This derivation is calledthe precommitment optimal solution now.

Precommitment optimal solution is a static notion, whichmaps the considered initial pair into an admissible control set.By applying a precommitment optimal control (for an initialpair), its restriction to the tail time horizon is not an optimalcontrol for the intertemporal initial pair. This static trait conflictswith the dynamic nature of (time-inconsistent) optimal control,as the time is involved in the problem setting. Though the staticsolution is of some practical and theoretical values, it neglectsand has not really addressed the time inconsistency. Differently,another approach handles the time inconsistency in a dynamicmanner; instead of seeking a precommitment optimal control,some kinds of equilibrium solutions are dealt with. This is mai-nly motivated by practical applications in economics and fina-nce, and has recently attracted considerable interest and efforts.

The explicit formulation of time inconsistency was initiatedby Strotz [29] in 1955, whereas its qualitative analysis can betraced back to the work of Smith [28]. Strotz studied the generaldiscounting problem, and in the discrete-time case, his idea isto tackle the time inconsistency by a lead-follower game withhierarchical structure. Specifically, controls at different timepoints were viewed as different selves (players), and every self-integrated the policies of his successor into his own decision.By a backward procedure, the equilibrium policy (if it exists)was obtained. Inspired by Strotz and intending to tackling prac-tical problems in economics and finance, hundreds of workswere concerned with time inconsistency of dynamic systemsdescribed by ordinary difference or differential equations; see,for example, [12], [13], [15], [19], [20], [26] and referencestherein. Unfortunately, as pointed out by Ekeland [12], [13],it is hard to prove the existence of Strotz’s equilibrium policy.Therefore, it is necessary and of great importance to developa general theory on time inconsistent optimal control. This, onthe one hand, can enrich the optimal control theory, and onthe other hand, can provide instructive methodology to pushthe solvability of practical problems. Recently, this topic hasattracted considerable attention from the theoretic control com-munity; see, for example, [5], [17], [18], [30], [32], [34], [37]and references therein.

For the time-inconsistent LQ problems, two kinds of time-consistent equilibrium solutions are studied, which are the open-loop equilibrium control and the closed-loop equilibrium strat-egy [17], [18], [32], [34], [37]. The separate investigations of

such two formulations are due to the fact that in the dynamicgame theory, open-loop control distinguishes significantly fromclosed-loop strategy [3], [36]. To compare, open-loop formu-lation is to find an open-loop equilibrium “control,” whereasthe “strategy” is the object of closed-loop formulation. By astrategy, we mean a decision rule that a controller uses to selecta control action based on the available information set. Mathe-matically, a strategy is a mapping or operator on the informationset. When substituting the available information into a strategy,the open-loop value or open-loop realization of this strategyis obtained. Strotz’s equilibrium solution [29] is essentially aclosed-loop equilibrium strategy, which is further elaboratelydeveloped by Yong to the LQ optimal control [32], [37] as wellas the nonlinear optimal control [33], [34]. In contrast, open-loop equilibrium control is extensively studied in [17], [18], and[37]. In particular, the closed-loop formulation can be viewedas the extension of Bellman’s dynamic programming, and thecorresponding equilibrium strategy (if it exists) is derived by abackward procedure [32]–[34], [37]. Differently, the open-loopequilibrium control is characterized via the maximum-principle-like methodology [17], [18].

Portfolio selection is to seek a best allocation of wealth amonga basket of securities. The (single-period) mean–variance for-mulation is pioneered by Markowitz [24] in 1952, which is thecornerstone of modern portfolio theory and is widely used inboth academia and industry. The multiperiod mean–varianceportfolio selection is the natural extension of [24], which hasbeen extensively studied. Until 2000 and for the first time, Liand Ng [22] and Zhou and Li [38] reported the analytical pre-commitment optimal policies for the discrete-time case and thecontinuous-time case, respectively. Noted above, multiperiodmean–variance portfolio selection is a particular example oftime-inconsistent optimal control; the recent developments intime-inconsistent optimal control and the revisits of multiperiodmean–variance portfolio selection [2], [6], [9], [10], [17], [18]are mutually stimulated.

It is noted that some nondegenerate assumptions are posed in[2], [6], [9], [10], [17], and [18]. Specifically, the volatilities ofthe stocks in [2], [6], [17], and [18] and the return rates of therisky securities in [9] and [10] are assumed to be nondegenerate.To make the formulation more practical, it is natural to consider,at least in theory, how to generalize these results to the casewhere degeneracy is allowed. In fact, mean–variance portfolioselection problems with degenerate covariance matrices maydate back to 1970s. In [7] or the “corrected” version [27], Buseret al. propose the single-period version with possibly singularcovariance matrix. Clearly, such class of problems are moregeneral than the classical ones [24], and more consistent withthe reality.

To address the case with possible degenerate return rates, itis better to put multiperiod mean–variance portfolio selectionwithin the framework of time-inconsistent mean-field stochasticLQ optimal control (with indefinite weighting matrices), whichhas not been established yet. Note that the running weightingmatrices in [17], [18], [32], [34], and [37] are assumed to benonnegative definite and positive definite. For standard time-consistent indefinite stochastic LQ optimal control, readers arereferred to, for example, [1], [8], [31] and reference therein.

C. Contents of This Paper

In this paper, we shall investigate a time-inconsistent indef-inite mean-field stochastic LQ optimal control problem. The

Page 3: Time-Inconsistent Mean-Field Stochastic LQ Problem: Open

NI et al.: TIME-INCONSISTENT MEAN-FIELD STOCHASTIC LQ PROBLEM: OPEN-LOOP TIME-CONSISTENT CONTROL 2773

matrices in system dynamics and cost functional are also de-pendent on the initial times; this is an extension of the generaldiscounting functions that are in cost functionals. The contentsof this paper are as follows.

The notion of open-loop equilibrium control is introduced inSection II, which is a discrete-time counterpart of that for thecontinuous-time problem [17], [18]. Different from the precom-mitment optimal control, the equilibrium control is only locallyoptimal in an infinitesimal sense. Furthermore, the open-loopequilibrium control is defined for a fixed initial time-state pair;its existence is shown to be equivalent to some stationary con-dition and convexity condition, which are involved with a set offorward–backward stochastic difference equations (FBSΔEs).Furthermore, necessary and sufficient conditions are obtained,respectively, for the stationary condition and the convexity con-dition; and by combining them, the existence of open-loop equi-librium control is further characterized.

The convexity condition is equivalent to the nonnegative def-initeness of some matrices relating to a set of linear differenceequations (LDEs), which is called the solvability of those con-strained LDEs. The stationary condition is characterized via aproperty about the ranges of some matrices that are involvedwith another set of LDEs and a set of generalized differenceRiccati equations (GDREs). If we further let the initial pairvary, some neater result about the existence of open-loop equi-librium control will be obtained. Specifically, for any initial pairproblem (LQ) admitting an open-loop equilibrium control isshown to be equivalent to that two sets of constrained LDEs(39), (41), and a set of constrained GDREs (40) are solvable. Itis worth pointing out that (if it is solvable) the set of GDREs(40) does not have symmetric structure, i.e., its solution is notsymmetric. Furthermore, all the open-loop equilibrium controlsare obtained.

As application of the derived theory, Section V investigatesthe multiperiod mean–variance portfolio selection. Necessaryand sufficient condition is given on the existence of open-loopequilibrium portfolio control, which is completely characterizedby the returns of the risky and riskless assets. If the return rates ofthe risky securities are nondegenerate, the equilibrium portfoliocontrol will exist.

From our derived results, we have the following remarks.1) Most existing results about time-inconsistent LQ prob-

lems are for the continuous-time case [17], [18], [32],[34], [37], and the study of discrete-time case is lag-ging behind. Noted above, the discrete-time multiperiodmean–variance portfolio selection is a notable exampleof discrete-time time-inconsistent LQ problems, and itsfull investigation motivates and needs to develop generaltheory about discrete-time time-inconsistent LQ optimalcontrol. This is the aim of this paper.

2) The novelties of this paper are as follows.First, no definiteness constraint is posed on the weight-ing matrices of cost functional, namely, the consideredproblem is an indefinite LQ optimal control. On the onehand, the indefinite setting provides a maximal capac-ity to model and deal with LQ-type problems, whosestudy will generalize existing results to some extent. Onthe other hand and most importantly, general explicit an-swers have not been reported about whether or not thedefinite weighting matrices could ensure the existence of

open-loop equilibrium control for a time-inconsistent LQproblem. Therefore, the essential and weakest conditionsare much desired for ensuring the existence of open-loopequilibrium control; and it is not necessary to pose thedefiniteness constraint on the weighting matrices.Second, necessary and sufficient conditions are obtainedon the existence of open-loop equilibrium control of prob-lem (LQ) for both the case with a fixed initial pair and thecase with all the initial pairs. The conditions are in termsof discrete-time LDEs and GDREs, which are easy to beverified by iteratively solving the LDEs and GDREs.Third, necessary and sufficient condition is derived onthe existence of open-loop equilibrium portfolio controlof multiperiod mean–variance portfolio selection [prob-lem (MV)]. The obtained condition is completely char-acterized by the returns of the risky and riskless assets.If the return rates of the risky securities are nondegener-ate (this is the common assumption in the literature), theequilibrium portfolio control will exist.

If the system dynamics and cost functional are both indepen-dent of the initial time, the corresponding LQ problem will bea dynamic version of that considered in [25], where the con-ditional expectation operators are replaced by the expectationoperators. For more details on mean-field stochastic optimalcontrol and related mean-field games, we refer to [4], [11], [14],[16], [21], [25], [35] and the references therein.

The rest of this paper is organized as follows. Section II intro-duces the notion of open-loop equilibrium control of problem(LQ). In Sections III and IV, necessary and sufficient conditionson the existence of open-loop equilibrium control are presentedfor both the case with a fixed initial pair and the case with all theinitial pairs. Section V studies the multiperiod mean–varianceportfolio selection, and some concluding remarks are given inSection VI.

II. OPEN-LOOP EQUILIBRIUM CONTROL

Consider the following controlled stochastic difference equa-tion (SΔE)⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

Xtk+1 =

(At,kXt

k + At,kEtXtk

+ Bt,kuk + Bt,kEtuk + ft,k

)+∑p

i=1

(Ci

t,kXtk + Ci

t,k EtXtk

+ Dit,kuk + Di

t,kEtuk + dit,k

)wi

k

Xtt = x, k ∈ Tt , t ∈ T

(4)

where At,k , At,k , Cit,k , Ci

t,k ∈ Rn×n , Bt,k , Bt,k ,Dit,k , Di

t,k ∈Rn×m , and ft,k , di

t,k ∈ Rn are deterministic matrices. In (4),the noise process {wk = (w1

k , . . . , wpk )T , k ∈ T} is assumed to

be a vector-valued martingale difference sequence defined on aprobability space (Ω,F , P ) with

Ek [wk ] = 0, Ek [wkwTk ] = Γk , k ≥ 0. (5)

Et in (4) is the conditional mathematical expectation E[ · |Ft ],where Ft = σ{wl, l = 0, 1, . . . , t − 1}, and F0 is understoodas {∅,Ω}. Furthermore, Γk = (γij

k )p×p is assumed to be deter-ministic.

Page 4: Time-Inconsistent Mean-Field Stochastic LQ Problem: Open

2774 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 63, NO. 9, SEPTEMBER 2018

The cost functional associated with the system (4) is

J(t, x;u) =N −1∑k=t

Et

[(Xt

k )T Qt,kXtk + (EtX

tk )T Qt,kEtX

tk

+ uTk Rt,kuk + (Etuk )T Rt,kEtuk + 2qT

t,kXtk

+ 2ρTt,kuk

]+ Et

[(Xt

N )T GtXtN

]+ (EtX

tN )T GtEtX

tN + 2Et(gT

t XtN ) (6)

where Qt,k , Qt,k , Rt,k , Rt,k , k ∈ Tt , Gt, Gt are deterministicsymmetric matrices of appropriate dimensions, and qt,k , ρt,k ,k ∈ Tt , gt are deterministic vectors.

In (4), the initial state x is in l2F (t; Rn ), which is defined as

l2F (t; Rn ) ={

ζ ∈ Rn∣∣ ζ is Ft-measurable, E|ζ|2 < ∞

}.

Similarly, we can define l2F (k; Rn ) and l2F (k; Rm ), k ∈ T . Fur-thermore, let

l2F (Tt ; Rm ) ={

ν = {νk , k ∈ Tt}∣∣∣, νk is Fk -measurable

E|νk |2 < ∞, k ∈ Tt

}.

Then, we pose the following optimal control problem.Problem (LQ): For the initial pair (t, x), find a u∗ ∈ l2F

(Tt ; Rm ) such that

J(t, x;u∗) = infu∈l2F (Tt ;Rm )

J(t, x;u) (7)

holds.Due to the time inconsistency, we in this paper intend finding

an equilibrium control of the following type.Definition II.1: Given t ∈ T and x ∈ l2F (t; Rn ), ut,x,∗ ∈

l2F (Tt ; Rm ) is called an open-loop equilibrium control of prob-lem (LQ) for the initial pair (t, x), if

J(k,Xt,x,∗k ;ut,x,∗|Tk

) ≤ J(k,Xt,x,∗k ; (uk , ut,x,∗|Tk + 1 )) (8)

holds for any k ∈ Tt and any uk ∈ L2F (k; Rm ). Here, ut,x,∗|Tk

and ut,x,∗|Tk + 1 (with Tk = {k, ..., N − 1}, Tk+1 = {k + 1,

..., N − 1}) are the restrictions of ut,x,∗ on Tk and Tk+1 , re-spectively; and Xt,x,∗ is given by⎧⎪⎪⎪⎪⎪⎪⎪⎨

⎪⎪⎪⎪⎪⎪⎪⎩

Xt,x,∗k+1 =

[(Ak,k + Ak,k )Xt,x,∗

k

+ (Bk,k + Bk,k )ut,x,∗k + fk,k

]+∑p

i=1

[(Ci

k,k + Cik,k )Xt,x,∗

k

+ (Dik,k + Di

k,k )ut,x,∗k + di

k,k

]wi

k

Xt,x,∗t = x, k ∈ Tt

(9)

which is called the equilibrium state corresponding to ut,x,∗.Noting that ut,x,∗|Tk

= (ut,x,∗k , ut,x,∗|Tk + 1 ), the control (uk ,

ut,x,∗|Tk + 1 ) on the right-hand side of (8) differs from ut,x,∗|Tk

only at time instant k. Intuitively, the cost functional will in-crease if one deviates from ut,x,∗. Hence, {ut,x,∗

t , ..., ut,x,∗N −1} can

be viewed as an equilibrium of a multiperson game with hierar-chical structure. By its definition, ut,x,∗ is time consistent in thesense that for any k ∈ Tt , ut,x,∗|Tk

is an open-loop equilibriumcontrol for the initial pair (k,Xt,x,∗

k ).

Throughout this paper, we adopt the following notations:

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

Ak, = Ak, + Ak, , Bk, = Bk, + Bk,

Cik , = Ci

k, + Cik, , Di

k , = Dik, + Di

k,

Qk, = Qk, + Qk, , Rk, = Rk, + Rk,

Gk = Gk + Gk , i = 1, ..., p, k ∈ Tt , ∈ Tk .

(10)

Then, (9) is simply rewritten as

⎧⎪⎪⎨⎪⎪⎩

Xt,x,∗k+1 =

[Ak,kXt,x,∗

k + Bk,kut,x,∗k + fk,k

]+∑p

i=1

[Ci

k ,kXt,x,∗k + Di

k ,kut,x,∗k + di

k,k

]wi

k

Xt,x,∗t = x, k ∈ Tt .

(11)

III. PROBLEM (LQ) FOR A FIXED INITIAL PAIR

A. First Characterization on the Existence of Open-LoopEquilibrium Control

Throughout Section III, we will study Problem (LQ) for thefixed initial pair (t, x), which will be simply denoted as Problem(LQ)tx . First, a difference formula of cost functionals is given.

Lemma III.1: Let ζ ∈ l2F (k; Rn ), u = {u, k ∈ Tk} ∈ l2F(Tk ; Rm ), uk ∈ l2F (k; Rm ) and λ ∈ R. Then, we have

J(k, ζ; (uk + λuk , u|Tk + 1 )) − J(k, ζ;u)

= λ2 J(k, 0; uk ) + 2λ[Rk,kuk + BT

k,kEkZkk+1

+p∑

i=1

(Dik ,k )T Ek (Zk

k+1wik ) + ρk,k

]Tuk (12)

where u|Tk + 1 = {uk+1 , ..., uN −1} and

J(k, 0; uk )

= Ek

[uT

k Rk,k uk

]+

N −1∑=k

Ek

[(Y k,uk

)T Qk,Yk,uk

+ (EkY k,uk

)T Qk,EkY k,uk

]+ Ek

[(Y k,uk

N )T GkY k,uk

N

]+ (EkY k,uk

N )T GkEkY k,uk

N (13)

with

⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

Y k,uk

+1 = Ak,Yk,uk

+ Ak,EkY k,uk

+∑p

i=1

(Ci

k,Yk,uk

+ Cik,EkY k,uk

)wi

Y k,uk

k+1 = Bk,k uk +∑p

i=1 Dik ,k ukwi

k

Y k,uk

k = 0, ∈ Tk+1 .

(14)

Page 5: Time-Inconsistent Mean-Field Stochastic LQ Problem: Open

NI et al.: TIME-INCONSISTENT MEAN-FIELD STOCHASTIC LQ PROBLEM: OPEN-LOOP TIME-CONSISTENT CONTROL 2775

Furthermore, Zkk+1 in (12) is computed via the following

FBSΔE⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

Xk+1 =

(Ak,X

k + Ak,EkXk

+ Bk,u + Bk,Eku + fk,

)+∑p

i=1

(Ci

k,Xk + Ci

k,EkXk

+ Dik,u + Di

k,Eku + dik,

)wi

Zk = Qk,X

k + Qk,EkXk

+ qk,

+ ATk,EZ

k+1 + AT

k,EkZk+1

+∑p

i=1

[(Ci

k,)T E(Zk

+1wi)

+ (Cik,)

T Ek (Zk+1w

i)]

Xkk = ζ, Zk

N = GkXkN + GkEkXk

N + gk

∈ Tk .

(15)

Proof: See Appendix A. �From Lemma III.1, we have the following result.Theorem III.1: The following statements are equivalent.i) Problem (LQ)tx admits an open-loop equilibrium control.

ii) The following assertions hold.a) The convexity condition

infu k ∈l2F (k ;Rm )

J(k, 0; uk ) ≥ 0, k ∈ Tt (16)

is satisfied, where J(k, 0; uk ) is given in (13).b) There exists a ut,x,∗ ∈ l2F (Tt ; Rm ) such that the

stationary condition

0 = Rk,kut,x,∗k + BT

k,kEkZk,t,xk+1

+p∑

i=1

(Dik ,k )T Ek (Zk,t,x

k+1 wik ) + ρk,k , k ∈ Tt

(17)

is satisfied. Here, Zk,t,xk+1 is computed via the

FBSΔE⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

Xk,t,x+1 =

(Ak,X

k,t,x + Ak,EkXk,t,x

+Bk,ut,x,∗ + Bk,Ekut,x,∗

+ fk,

)+∑p

i=1

(Ci

k,Xk,t,x + Ci

k,EkXk,t,x

+Dik,u

t,x,∗ + Di

k,Ekut,x,∗

+dik,

)wi

Zk,t,x = Qk,X

k,t,x + Qk,EkXk,t,x

+qk, + ATk,EZ

k,t,x+1 + AT

k,EkZk,t,x+1

+∑p

i=1

[(Ci

k,)T E(Z

k,t,x+1 wi

)

+(Cik,)

T Ek (Zk,t,x+1 wi

)]

Xk,t,xk = Xt,x,∗

k

Zk,t,xN = GkXk,t,x

N + GkEkXk,t,xN + gk

∈ Tk

(18)

and the initial state Xt,x,∗k of the forward SΔE of

(18) is computed via⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

Xt,x,∗k+1 =

[Ak,kXt,x,∗

k + Bk,kut,x,∗k + fk,k

]+∑p

i=1

[Ci

k ,kXt,x,∗k + Di

k ,kut,x,∗k

+ dik,k

]wi

k

Xt,x,∗t = x, k ∈ Tt .

(19)Under any of the above conditions, ut,x,∗ given in ii) is an

open-loop equilibrium control for the initial pair (t, x).Proof: See Appendix B. �Remark III.1: As the stationary condition (17) holds for k ∈

Tt , we have a set of FBSΔEs, which are coupled with (19) viathe initial states Xk,t,x

k = Xt,x,∗k , k ∈ Tt .

B. Convexity Condition

This subsection studies the convexity condition (16). First,we give a compact form of J(k, 0; uk ).

Lemma III.2: J(k, 0; uk ) can be expressed as

J(k, 0; uk ) = uTk Wk uk (20)

where

Wk = Rk,k + BTk,kPk,k+1Bk,k

+p∑

i,j=1

γijk (Di

k ,k )T Pk,k+1Djk ,k (21)

with Pk,k+1 and Pk,k+1 computed via⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

Pk, = Qk, + ATk,Pk,+1Ak,

+∑p

i,j=1 γij (Ci

k,)T Pk,+1C

jk,

Pk, = Qk, + ATk,Pk,+1Ak,

+∑p

i,j=1 γij (Ci

k ,)T Pk,+1Cj

k ,

Pk,N = Gk, Pk,N = Gk , ∈ Tk .

(22)

Proof: From (14), it follows that⎧⎪⎪⎨⎪⎪⎩

EkY k,uk

+1 = Ak,EkY k,uk

, ∈ Tk+1

EkY k,uk

k+1 = Bk,kEk uk

EkY k,uk

k = 0.

Let Pk, = Pk, − Pk, , ∈ Tk . By adding to and subtracting

N −1∑=k

Ek

[(Y k,uk

+1 )T Pk,+1Yk,uk

+1 − (Y k,uk

)T Pk,Yk,uk

+(EkY k,uk

+1 )T Pk,+1EkY k,uk

+1 − (EkY k,uk

)T Pk,EkY k,uk

]from (13), we have

J(k, 0; uk )

=N −1∑=k

Ek

[(Y k,uk

)T Qk,Yk,uk

+ (EkY k,uk

)T Qk,EkY k,uk

Page 6: Time-Inconsistent Mean-Field Stochastic LQ Problem: Open

2776 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 63, NO. 9, SEPTEMBER 2018

+ (Y k,uk

+1 )T Pk,+1Yk,uk

+1 − (Y k,uk

)T Pk,Yk,uk

+ (EkY k,uk

+1 )T Pk,+1EkY k,uk

+1

− (EkY k,uk

)T Pk,EkY k,uk

]

+ uTk Rk,k uk

=N −1∑

=k+1

Ek

[(EkY k,uk

)T(Qk, + AT

k,Pk,+1Ak,

+p∑

i,j=1

γij (Ci

k ,)T Pk,+1Cj

k , − Pk,

)EkY k,uk

+ (Y k,uk

− EkY k,uk

)T(Qk, + AT

k,Pk,+1Ak,

+p∑

i,j=1

γij (Ci

k,)T Pk,+1C

jk, − Pk,

)(Y k,uk

− EkY k,uk

)]

+ uTk

[Rk,k + BT

k,kPk,k+1Bk,k

+p∑

i,j=1

γijk (Di

k ,k )T Pk,k+1Djk ,k

]uk

= uTk

[Rk,k + BT

k,kPk,k+1Bk,k

+p∑

i,j=1

γijk (Di

k ,k )T Pk,k+1Djk ,k

]uk . (23)

�By Lemma III.2 and Theorem III.1, the following result is

straightforward.Theorem III.2: The following statements are equivalent.1) The convexity condition (16) is satisfied.2) The following inequalities

Wk � 0, k ∈ Tt (24)

hold, i.e., Wk , k ∈ Tt are nonnegative definite, where Wk

is given in (21).

C. Stationary Condition

We now switch to the stationary condition (17). The followinglemma gives an expression of the backward state Zk,t,x of theFBSΔE (18), provided that ut,x,∗ is a linear function of Xt,x,∗.

Lemma III.3: Letting k ∈ T , τ ∈ Tk , suppose that {ut,x,∗ ,

∈ Tτ } in (18) has the form ut,x,∗ = ΨX

t,x,∗ + α, ∈ Tτ

with Tτ = {τ, ..., N − 1} and Ψ , α, ∈ Tτ being determin-istic matrices. Then, the backward state {Zk,t,x

, ∈ Tτ } hasthe following expression:

Zk,t,x = Pk,X

k,t,x + Pk,EkXk,t,x

+ Tk,Xt,x,∗

+ Tk,EkXt,x,∗ + πk, , ∈ Tτ . (25)

Here, Pk, = Pk, − Pk, with Pk, ,Pk, computed via (22);and Tk, , Tk, , πk, are given by

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

Tk, = ATk,Tk,+1A,

+∑p

i,j=1 γij (Ci

k,)T Tk,+1Cj

,

+{

ATk,Pk,+1Bk, + AT

k,Tk,+1B,

+∑p

i,j=1

[(Ci

k,)T Pk,+1D

jk,

+ (Cik,)

T Tk,+1Dj,

]}Ψ

Tk, = ATk, Tk,+1A, + AT

k,Tk,+1A,

+∑p

i,j=1 γij (Ci

k,)T Tk,+1Cj

,

+{

ATk,Pk,+1Bk, + AT

k,Pk,+1Bk,

+ ATk, Tk,+1B, + AT

k,Pk,+1Bk,

+ ATk,Tk,+1B,

+∑p

i,j=1 γij

[(Ci

k,)T Pk,+1D

jk,

+ (Cik,)

T Pk,+1Djk ,

+ (Cik,)

T Tk,+1Dj,

]}Ψ

Tk,N = 0, Tk ,N = 0

∈ Tτ

(26)

and

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

πk, = ATk,Pk,+1

(Bk,α + fk,

)+ AT

k,Tk,+1(B,α + f,

)+ AT

k,πk,+1

+∑p

i,j=1 γij

[(Ci

k ,)T Pk,+1

(Dj

k ,α + djk,

)+ (Ci

k ,)T Tk,+1

(Dj

,α + dj,

)]+ qk,

πk,N = gk

∈ Tτ

(27)

with Tk, = Tk, + Tk, , ∈ Tτ .Proof: See Appendix C. �To prove above lemma, we have used a backward deduc-

tion method, namely, starting from k = N and k = N − 1, theexpression (25) can be deductively obtained.

For a given matrix M ∈ Rn×m , its Moore–Penrose inverseis denoted as M †, which is in Rm×n . The following lemma isfrom [1].

Lemma III.4: Let matrices L, M , and N be given with ap-propriate size. Then, LXM = N has a solution X if and only ifLL†NMM † = N . Moreover, the solution of LXM = N canbe expressed as X = L†NM † + Y − L†LY MM †, where Y isa matrix with appropriate size.

Based on above results, we have the following theorem.Theorem III.3: The following statements are equivalent.i) The stationary condition (17) is satisfied.

ii) The condition

HkXt,x,∗k + βk ∈ Ran(Wk ), k ∈ Tt (28)

Page 7: Time-Inconsistent Mean-Field Stochastic LQ Problem: Open

NI et al.: TIME-INCONSISTENT MEAN-FIELD STOCHASTIC LQ PROBLEM: OPEN-LOOP TIME-CONSISTENT CONTROL 2777

holds. Here, Hk ,Wk , βk , k ∈ Tt are given by

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

Wk = Rk,k + BTk,k

(Pk,k+1 + Tk,k+1

)Bk,k

+∑p

i,j=1 γijk (Di

k ,k )T

×(Pk,k+1 + Tk,k+1

)Dj

k ,k

Hk = BTk,k

(Pk,k+1 + Tk,k+1

)Ak,k

+∑p

i,j=1 γijk (Di

k ,k )T

×(Pk,k+1 + Tk,k+1

)Cj

k ,k

βk = BTk,k

[(Pk,k+1 + Tk,k+1

)fk,k + πk,k+1

]+∑p

i,j=1 γijk (Di

k ,k )T

×(Pk,k+1 + Tk,k+1

)dj

k,k + ρk,k

k ∈ Tt

(29)

with

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

Tk, = ATk,Tk,+1A,

+∑p

i,j=1 γij (Ci

k,)T Tk,+1Cj

,

−{

ATk,Pk,+1Bk, + AT

k,Tk,+1B,

+∑p

i,j=1 γij

[(Ci

k,)T Pk,+1D

jk,

+ (Cik,)

T Tk,+1Dj,

]}W†

H

Tk, = ATk,Tk,+1A,

+∑p

i,j=1 γij (Ci

k ,)T Tk,+1Cj

,

−{AT

k,Pk,+1Bk, + ATk,Tk,+1B,

+∑p

i,j=1(Cik ,)

T Pk,+1Djk ,

+ CTk,Tk,+1D,

)W†

H

Tk,N = 0, Tk,N = 0

∈ Tk

k ∈ Tt

(30)

and

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

πk, = ATk,Pk,+1

(fk, − Bk,W†

β

)+ AT

k,Tk,+1(f, − B,W†

β

)+∑p

i,j=1 γij

[(Ci

k ,)T Pk,+1

×(dj

k, −Djk ,W

† β

)+ (Ci

k ,)T Tk,+1

(dj

, −Dj,W

† β

)]+ AT

k,πk,+1 + qk,

πk,N = gk

k ∈ Tt .

(31)

Furthermore, in (28) Ran(Wk ) is the range of Wk , andXt,x,∗

k is computed via

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎩

Xt,x,∗k+1 =

[(Ak,k − Bk,kW†

kHk

)Xt,x,∗

k

− Bk,kW†kβk + fk,k

]+∑p

i=1

[(Ci

k ,k −Dik ,kW

†kHk

)Xt,x,∗

k

−Dik ,kW

†kβk + di

k,k

]wi

k

Xt,x,∗t = x, k ∈ Tt .

(32)Under any of the above conditions, ut,x,∗ in (17) is

selected as

ut,x,∗k = −W†

kHkXt,x,∗k −W†

kβk , k ∈ Tt (33)

with Xt,x,∗ given in (32). Furthermore, we have

Zk,t,x = Pk,

(Xk,t,x

− EkXk,t,x

)+ Pk,EkXk,t,x

+ Tk,

(Xt,x,∗

− EkXt,x,∗

)+ Tk,EkXt,x,∗

+ πk, , ∈ Tk . (34)

Proof: See Appendix D. �Remark III.2: Noting that Pk, and Pk, are symmetric, Tk,

and Tk, are generally nonsymmetric asA, ,B, , C, , andD,

appear in the expressions of Tk, and Tk, . Let the stationarycondition (17) hold. From Lemma III.4 and “i)⇒ii)” of the proofof Theorem III.3, we indeed have that control of the followingform

ut,x,∗k = −W†

kHkXt,x,∗k −W†

k βk + Yk , k ∈ Tt (35)

satisfies (17). Here, Yk = (I −W†kWk )Yk with Yk ∈ Rm , and

{βk , k ∈ Tt} is given by

⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

βk = BTk,k

[(Pk,k+1 + Tk,k+1

)fk,k + πk ,k+1

]+∑p

i,j=1 γijk (Di

k ,k )T(Pk,k+1 + Tk,k+1

)dj

k,k

+ ρk,k

k ∈ Tt

with

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

πk , = ATk,Pk,+1

[fk, − Bk,(W†

β − Yk )]

+ ATk,Tk,+1

[f, − B,(W†

β − Yk )]

+∑p

i,j=1 γij

[(Ci

k ,)T Pk,+1

×(dj

k, −Djk ,(W

† β − Yk )

)+ (Ci

k ,)T Tk,+1

(dj

, −Dj,(W

† β − Yk )

)]+ AT

k, πk ,+1 + qk,

πk,N = gk

k ∈ Tt .

Page 8: Time-Inconsistent Mean-Field Stochastic LQ Problem: Open

2778 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 63, NO. 9, SEPTEMBER 2018

Furthermore, Xt,x,∗ in (35) is given by⎧⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎩

Xt,x,∗k+1 =

[(Ak,k − Bk,kW†

kHk

)Xt,x,∗

k

− Bk,k (W†k βk − Yk ) + fk,k

]+∑p

i=1

[(Ci

k ,k −Dik ,kW

†kHk

)Xt,x,∗

k

−Dik ,k (W†

k βk − Yk ) + dik,k

]wi

k

Xt,x,∗t = x, k ∈ Tt .

D. Second Characterization on the Existence ofOpen-Loop Equilibrium Control

By Theorem III.1, Theorem III.2, Theorem III.3, and Re-mark III.2, we have the following result, which gives conditionson the existence of open-loop equilibrium control.

Theorem III.4: The following statements are equivalent.i) Problem (LQ)tx admits an open-loop equilibrium control.

ii) The conditions (24) and (28) hold.Under any of the above conditions, control of the following

form:

ut,x,∗k = −W†

kHkXt,x,∗k −W†

k βk + Yk , k ∈ Tt (36)

is an open-loop equilibrium control.Above-mentioned theorem is concerned with the existence of

open-loop equilibrium control. Another important issue is theuniqueness of open-loop equilibrium control, which is studiedin the following theorem.

Theorem III.5: The following statements are eqivalent.i) Problem (LQ)tx admits a unique open-loop equilibrium

control.ii) The following assertions hold.

a) The condition (24) is satisfied.b) Wk , k ∈ Tt , are invertible, where Wk is given

in (29).Under any of the above conditions, the unique open-loop

equilibrium control is given by

ut,x,∗k = −W−1

k HkXt,x,∗k −W−1

k βk , k ∈ Tt (37)

with Xt,x,∗ given by⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

Xt,x,∗k+1 =

[(Ak,k − Bk,kW−1

k Hk

)Xt,x,∗

k

− Bk,kW−1k βk + fk,k

]+∑p

i=1

[(Ci

k ,k −Dik ,kW−1

k Hk

)Xt,x,∗

k

−Dik ,kW−1

k βk + dik,k

]wi

k

Xt,x,∗t = x, k ∈ Tt .

Proof: i)⇒ii). The condition (24) naturally holds. We furtherhave b). Otherwise, controls of form (35) are also open-loopequilibrium control.

ii)⇒i). According to Theorem III.1 and Theorem III.4,Problem (LQ)tx admits an open-loop equilibrium control. Dueto the nonsingularity of Wk , k∈Tt and the proof of TheoremIII.3, the open-loop equilibrium control is unique, which is givenby (37). �

IV. CASE WITH ALL THE INITIAL PAIRS

In this section, we will let the initial time t and initial state xrange over T and l2F (t; Rn ), respectively; this is referred to as

the case with all the initial pairs. Problem (LQ) for the initialpair (t, x) will be simply denoted as Problem (LQ)tx , and similarmeanings hold for other initial pairs.

First, we give an interesting result on the unique existenceof open-loop equilibrium control, which follows from Theo-rem III.5.

Proposition IV.1: Let t ∈ T and x ∈ l2F (t; Rn ). Then, thefollowing statements are equivalent.

i) Problem (LQ)tx admits a unique open-loop equilibriumcontrol.

ii) For any k ∈ Tt and any ξ ∈ l2F (k; Rn ), Prob-lem (LQ)kξ admits a unique open-loop equilibriumcontrol.

Unfortunately, result similar to Proposition IV.1 does not holdif we just consider the existence of open-loop equilibrium con-trol. Alternatively, the following assertion holds.

Theorem IV.1: The following statements are equivalent.i) For any t ∈ T and any x ∈ l2F (t; Rn ), Problem (LQ)tx

admits an open-loop equilibrium control.ii) The set of constrained LDEs

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

Pk, = Qk, + ATk,Pk,+1Ak,

+∑p

i,j=1 γij (Ci

k,)T Pk,+1C

jk,

Pk, = Qk, + ATk,Pk,+1Ak,

+∑p

i,j=1 γij (Ci

k ,)T Pk,+1Cj

k ,

Pk,N = Gk, Pk,N = Gk , ∈ Tk

Wk � 0

k ∈ T

(38)

and the set of constrained GDREs

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

Tk, = ATk,Tk,+1A,

+∑p

i,j=1 γij (Ci

k,)T Tk,+1Cj

,

−{

ATk,Pk,+1Bk, + AT

k,Tk,+1B,

+∑p

i,j=1 γij

[(Ci

k,)T Pk,+1D

jk,

+ (Cik,)

T Tk,+1Dj,

]}W†

H

Tk, = ATk,Tk,+1A,

+∑p

i,j=1 γij (Ci

k ,)T Tk,+1Cj

,

−{AT

k,Pk,+1Bk, + ATk,Tk,+1B,

+∑p

i,j=1(Cik ,)

T Pk,+1Djk ,

+ CTk,Tk,+1D,

}W†

H

Tk,N = 0, Tk,N = 0

∈ Tk

WkW†kHk −Hk = 0

k ∈ T

(39)

Page 9: Time-Inconsistent Mean-Field Stochastic LQ Problem: Open

NI et al.: TIME-INCONSISTENT MEAN-FIELD STOCHASTIC LQ PROBLEM: OPEN-LOOP TIME-CONSISTENT CONTROL 2779

and the set of constrained LDEs⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

πk, = ATk,Pk,+1

(fk, − Bk,W†

β

)+ AT

k,Tk,+1(f, − B,W†

β

)+∑p

i,j=1 γij

[(Ci

k ,)T Pk,+1

×(dj

k, −Djk ,W

† β

)+ (Ci

k ,)T

× Tk,+1(dj

, −Dj,W

† β

)]+ AT

k,πk,+1 + qk,

πk,N = gk

WkW†kβk − βk = 0

k ∈ T

(40)

are solvable in the sense that⎧⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎩

Wk � 0

WkW†kHk −Hk = 0

WkW†kβk − βk = 0

k ∈ T

(41)

holds, i.e., the solutions of (38)–(40) satisfy (41). Here⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

Wk = Rk,k + BTk,kPk,k+1Bk,k

+∑p

i,j=1 γijk (Di

k ,k )T Pk,k+1Djk ,k

Wk = Rk,k + BTk,k

(Pk,k+1 + Tk,k+1

)Bk,k

+∑p

i,j=1 γijk (Di

k ,k )T(Pk,k+1 + Tk,k+1

)Dj

k ,k

Hk = BTk,k

(Pk,k+1 + Tk,k+1

)Ak,k

+∑p

i,j=1 γijk (Di

k ,k )T(Pk,k+1 + Tk,k+1

)Cj

k ,k

βk = BTk,k

[(Pk,k+1 + Tk,k+1

)fk,k + πk,k+1

]+∑p

i,j=1 γijk (Di

k ,k )T(Pk,k+1 + Tk,k+1

)dj

k,k

+ ρk,k

k ∈ T .

Under any of the above conditions, control of the form (36)is an open-loop equilibrium control of Problem (LQ)tx .

Proof: i)⇒ii). From Theorem III.4, the constrained LDEs(38) are solvable, and for any t ∈ T , x ∈ l2F (t; Rn ), the condi-tion (28) holds, i.e.,

HkXt,x,∗k + βk ∈ Ran(Wk ), k ∈ Tt .

Especially, we have

Htx + βt ∈ Wt , t ∈ T

equivalently

WtW†t (Htx + βt) = Htx + βt, t ∈ T . (42)

Let x = 0 in (42), we have

WtW†t βt = βt, t ∈ T

which further implies

WtW†t Htx = Htx, t ∈ T . (43)

Noting that (43) holds for any x ∈ l2F (t; Rn ), we obtain

WkW†kHk −Hk = 0, t ∈ T .

Hence, (39) and (40) are solvable.ii)⇒i). As (38)–(40) are solvable and by Theorem III.4, for

any t ∈ T and any x ∈ l2F (t; Rn ) Problem (LQ)tx admits anopen-loop equilibrium control. �

Corollary IV.1: Let

Qk,, Qk, � 0, Rk, , Rk, 0, k ∈ T , ∈ Tk . (44)

Then, the following statements are equivalent.1) For any t ∈ T and any x ∈ l2F (t; Rn ), Problem (LQ)tx

admits an open-loop equilibrium control.2) Equations (39) and (40) are solvable.Proof: In this situation, Wk , k ∈ T are positive definite, i.e.,

Wk 0, k ∈ T . Hence, the conclusion follows. �Let us make some rough observations under the condi-

tion (44). Assuming (44), consider Problem (LQ)tx for t ∈ Tand x ∈ l2F (t; Rn ). Let us begin with t = N − 1. Noting thatWN −1 = WN −1 0, Problem (LQ)N −1,x admits a uniqueopen-loop equilibrium control and uN −1,x,∗

N −1 is easily obtained

uN −1,x,∗N −1 = −W−1

N −1HN −1x −W−1N −1βN −1 .

Now move to the case t = N − 2. If we have selected uN −2,x,∗N −1 ,

from Lemmas III.1 and III.2 we have

J(N − 2, x; (uN −2 , uN −2,x,∗N −1 ))

= uTN −2WN −2uN −2

+ 2[ρN −2,N −2 + BT

N −2,N −2EN −2ZN −2,0N −1

+p∑

i=1

(DiN −2,N −2)

T EN −1(ZN −2,0N −1 wi

N −2)]T

uN −2

+ J(N − 2, x; (0, uN −2,x,∗N −1 ))

� 〈WN −2uN −2 , uN −2〉 + 2〈MN −2(ZN −2,0N −1 ), uN −2〉

+ J(N − 2, x; (0, uN −2,x,∗N −1 )). (45)

In the above, ZN −2,0N −1 is computed via

⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

ZN −2,0 = AT

N −2,EZN −2,0+1 + AT

N −2,EN −2ZN −2,0+1

+∑p

i=1

[(Ci

N −2,)T E(Z

N −2,0+1 wi

)

+ (CiN −2,)

T EN −2(ZN −2,0+1 wi

)]

+ QN −2,XN −2,0 + QN −2,EN −2X

N −2,0

+ qN −2,

ZN −2,0N = GN −2X

N −2,0N + GN −2EN −2X

N −2,0N

+ gN −2

∈ {N − 2, N − 1}

Page 10: Time-Inconsistent Mean-Field Stochastic LQ Problem: Open

2780 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 63, NO. 9, SEPTEMBER 2018

where XN −2,0 is given by⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

XN −2,0N =

(AN −2,N −1X

N −2,0N −1

+ AN −2,N −1EN −2XN −1,0N −2

+ BN −2,N −1uN −2,x,∗N −1

+ BN −2,N −1EN −2uN −2,x,∗N −1

+ fN −2,N −1)

+∑p

i=1

[Ci

N −2,N −1XN −2,0N −1

+ CiN −2,N −1EN −2X

N −1,0N −2

+ DiN −2,N −1u

N −2,x,∗N −1

+ DiN −2,N −1EN −2u

N −2,x,∗N −1

+ diN −2,N −1

]wi

N −1

XN −2,0N −1 =

(AN −2,N −2X

N −2,0N −2 + fN −2,N −2

)+∑p

i=1

[Ci

N −2,N −2XN −2,0N −2

+ diN −2,N −2

]wi

N −2

XN −2,0N −2 = x

∈ {N − 2, N − 1}

(46)

and 〈·, ·〉 is the inner product on Rm , and

MN −2(ZN −2,0N −1 ) = ρN −2,N −2 + BT

N −2,N −2EN −2ZN −2,0N −1

+p∑

i=1

(DiN −2,N −2)

T EN −1(ZN −2,0N −1 wi

N −2).

If we “select”

uN −2,x,∗N −2 = −W −1

N −2MN −2(ZN −2,0N −1 ) (47)

then the following inequality

J(N − 2, x; (uN −2,x,∗N −2 , uN −2,x,∗

N −1 ))

≤ J(N − 2, x; (uN −2 , uN −2,x,∗N −1 ))

seems to hold.However, it should be mentioned that it is questionable about

(47). If uN −2,x,∗ exists, we should have

uN −2,x,∗N −1 = −W−1

N −1HN −1XN −2,x,∗N −1 −W−1

N −1βN −1

and

XN −2,x,∗N −1 = AN −2,N −2x + BN −2,N −2u

N −2,x,∗N −2

+p∑

i=1

(Ci

N −2,N −2x + DiN −2,N −2u

N −2,x,∗N −2

)wi

N −2 .

Hence, uN −2,x,∗N −1 depends on uN −2,x,∗

N −2 , and it is so for ZN −2,0N −1 .

Therefore, the right-hand side of (47) is a functional of uN −2,x,∗N −2 ,

and it cannot be concluded that (47) makes sense under theassumption WN −2 0. Recall that {{Tk, , ∈ Tk}, k ∈ T} isalso needed to characterize the open-loop equilibrium control,and that for k ∈ T

Wk = Wk + BTk,kTk,k+1Bk,k +

p∑i,j=1

(Dik ,k )T Tk,k+1Dj

k ,k .

Note that elements in {{Tk, , ∈ Tk}, k ∈ T} are generallynonsymmetric. So far, it is not known now whether or not Wk 0 could ensure the nonsingularity of Wk . Therefore, we have tocheck case by case the solvability of (39), (40) [by validating(41)].

V. MULTIPERIOD MEAN–VARIANCE PORTFOLIO SELECTION

Consider a capital market consisting of one riskless assetand n risky assets within a time horizon N . Let sk (> 1) bea given deterministic return of the riskless asset at time pe-riod k and ek = (e1

k , . . . , enk )T the vector of random returns of

the n risky assets at period k. We assume that vectors ek , k =0, 1, . . . , N − 1, are statistically independent and the only in-formation known about the random return vector ek is its firsttwo moments: its mean E(ek ) = (Ee1

k , Ee2k , . . . , Een

k )T andits covariance Cov(ek ) = E[(ek − Eek )(ek − Eek )T ]. Clearly,Cov(ek ) is nonnegative definite, i.e., Cov(ek ) � 0.

Let Xk be the wealth of the investor at the beginning of thekth period, and let ui

k , i = 1, 2, . . . , n, be the amount investedin the ith risky asset at period k. Then, Xk −

∑ni=1 ui

k is theamount invested in the riskless asset at period k, and the wealthat the beginning of the (k + 1)th period [22] is given by

Xk+1 =n∑

i=1

eikui

k +

(Xk −

n∑i=1

uik

)sk = skXk + OT

k uk

(48)where Ok is the excess return vector of risky assets [22]defined as

Ok = (O1k , O2

k , . . . , Onk )T

= (e1k − sk , e2

k − sk , . . . , enk − sk )T . (49)

Clearly, Xk ∈ R, k ∈ T . In this section, we consider the casewhere short selling of stocks is allowed, i.e., ui

k , i = 1, ..., k,could take values in R, which leads to an unconstrained mean–variance portfolio selection formulation.

Let

Fk = σ(e, = 0, 1, . . . , k − 1)

which contains F′k = σ(X, = 0, 1, . . . , k). Then, the time-

inconsistent version of multiperiod mean–variance problem [22]can be formulated as follows.

Problem (MV): Letting t ∈ T and x ∈ l2F (t; Rn ), find u∗ ∈l2F (Tt ; Rn ) such that

Jm (t, x;u∗) = infu∈l2F (Tt ;Rn )

Jm (t, x;u).

Here

Jm (t, x;u) = λEt(XN − EtXN )2 − EtXN

which is subject to{Xk+1 = skXk + OT

k uk

Xk = x

with λ > 0 the tradeoff parameter between the mean and thevariance of the terminal wealth.

It is noted that some nondegenerate assumptions are posedin [2], [6], [9], [10], [17], and [18]. Specifically, the volatilitiesof the stocks in [2], [6], [17], and [18] and the return ratesof the risky securities in [9], [10], and [22] are assumed to benondegenerate. In this section, we do not pose the nondegenerate

Page 11: Time-Inconsistent Mean-Field Stochastic LQ Problem: Open

NI et al.: TIME-INCONSISTENT MEAN-FIELD STOCHASTIC LQ PROBLEM: OPEN-LOOP TIME-CONSISTENT CONTROL 2781

constraint on Cov(ek ), Cov(Ok ), k ∈ T , and want to see what isthe weakest condition on the existence of open-loop equilibriumportfolio control of Problem (MV)

To solve Problem (MV), we shall transform (48) into a linearcontrolled system of form (4), by which the general theory inabove sections will work. Precisely, define⎧⎪⎨

⎪⎩wi

k = eik − sk − E(ei

k − sk )

Dik = (0, . . . , 0, 1, 0, . . . , 0)

i = 1, · · · , n, k = 0, 1, · · · , N − 1

where the ith entry of Dik is 1. Then, {wk = (w1

k , ..., wnk )T , k ∈

T} is a martingale difference sequence as ek , k = 1, .., N − 1,are statistically independent. Furthermore

Ek [wkwTk ] = E[wkwT

k ] = Cov(ek ) = (γijk )n×n .

This leads to{Xk+1 = (skXk + (EOk )T uk ) +

∑ni=1 Di

kukwik

Xk = x.(50)

Due to Theorem IV.1, we have the following result.Theorem V.1: The following statements are equivalent.1) For any t ∈ T and any x ∈ l2F (t; R), Problem (MV) ad-

mits an open-loop equilibrium portfolio control.2) EOk ∈ Ran(Cov(Ok )), k ∈ T .

Under any of the above conditions

ut,x,∗k = −W†

kβk , k ∈ Tt (51)

is an open-loop equilibrium portfolio control for the initial pair(t, x), where ⎧⎪⎨

⎪⎩Wk = Pk+1Cov(Ok )

βk = πk+1EOk

k ∈ T

(52)

with ⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

Pk = s2kPk+1

πk = skπk+1

PN = λ, πN = − 12

k ∈ T .

Proof: In this case, (38)–(40) become to⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

Pk = s2kPk+1

Pk = s2kPk+1 ≡ 0

PN = λ, PN = 0

Wk =∑n

i,j=1 γijk (Di

k )T Pk+1Djk

= Pk+1Cov(Ok ) � 0

k ∈ T

(53)

⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

Tk = s2kTk+1 − sk (Pk+1 + Tk+1)(EOk )T W†

kHk

Tk = Tk+1[s2

k − sk (EOk )T W†kHk

]≡ 0

TN = 0, TN = 0

WkW†kHk −Hk = 0

k ∈ T

(54)

and ⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

πk = skπk+1

πN = − 12

WkW†kβk − βk = 0

k ∈ T

(55)

where ⎧⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎩

Wk =∑n

i,j=1 γijk (Di

k )T(Pk+1 + Tk+1

)Dj

k

=(Pk+1 + Tk+1

)Cov(Ok )

Hk = 0

βk = πk+1EOk

k ∈ T .

(56)

Noting that Pk+1 > 0,Hk = 0, k ∈ T , (53) and (54) are solv-able, and Tk = 0, k ∈ T . As πk+1 �= 0, k ∈ T , the solvabilityof (55) is then equivalent to the fact EOk ∈ Ran(Cov(Ok )), k ∈T . By Theorem IV.1, we achieve the conclusion. �

Concerned with the uniqueness of open-loop equilibrium con-trol, we have the following result.

Theorem V.2: The following statements are equivalent.1) For any t ∈ T and any x ∈ l2F (t; R), Problem (MV) ad-

mits a unique open-loop equilibrium portfolio control.2) Cov(Ok ) 0, k ∈ T .

Under any of the above conditions, the unique open-loopequilibrium portfolio control for the initial pair (t, x) is givenby

ut,x,∗k = −W−1

k βk , k ∈ Tt (57)

with Wk , βk , k ∈ T given in (52).Proof: The proof follows from Theorem III.5,

Proposition IV.1 and Theorem V.1. �Note that Cov(Ok ) 0, k ∈ T is a common assumption in

multiperiod mean–variance portfolio selection [9], [11], [22].In this situation, the open-loop equilibrium portfolio control forthe initial pair (t, x) is

ut,x,∗k = − 1

2λsk+1 · · · sN −1(Cov(Ok ))−1EOk , k ∈ Tt .

This section just studied the simplest dynamic mean–variancemodel [22]. In the future, dynamic mean–variance portfoliooptimizations are much desirable for the more general models.

VI. CONCLUSION

In this paper, the open-loop time-consistent equilibrium con-trol is investigated for a kind of mean-field stochastic LQ prob-lem, where both the system matrices and the weighting matricesare depending on the initial time, and the conditional expecta-tions of the control and state enter quadratically into the costfunctional. Necessary and sufficient conditions are presentedfor both the case with a fixed initial pair and the case withall the initial pairs. Furthermore, a set of constrained GDREsand two sets of constrained LDEs are introduced to character-ize the open-loop equilibrium control. Note that this paper isconcerned with the time consistency of open-loop control. Forfuture research, the time consistency of the strategy should bestudied.

Page 12: Time-Inconsistent Mean-Field Stochastic LQ Problem: Open

2782 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 63, NO. 9, SEPTEMBER 2018

APPENDIX

A. Proof of Lemma III.1

Let us replace uk with uk + λuk in the forward SΔE of (15),and denote its solution by Xk,λ. Then, we have⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

X k , λ + 1 −X k

+ 1λ

=(Ak,

X k , λ −X k

λ+ Ak,

Ek X k , λ −Ek X k

λ

)+∑p

i=1

(Ci

k,X k , λ

−X k

λ+ Ci

k,Ek X k , λ

−Ek X k

λ

)wi

X k , λk + 1 −X k

k + 1λ

=(Ak,k

X k , λk −X k

k

λ+ Ak,k

Ek X k , λk −Ek X k

k

λ

+Bk,k uk + Bk,k uk

)+∑p

i=1

(Ci

k,kX k , λ

k −X kk

λ

+Cik,k

Ek X k , λk −Ek X k

k

λ+ Di

k,k uk + Dik,kEk uk

)wi

k

X k , λk −X k

k

λ= 0, ∈ Tk+1 .

Denoting X k , λ −X k

λby Y k,uk

, we get (14). Note that Xk,λ =

Xk + λY k,uk

, ∀ ∈ Tk . Then, we have

J(k, ζ; (uk + λuk , u|Tk + 1 )) − J(k, ζ;u)

=N −1∑=k

Ek

[(Xk

+ λY k,uk

)T Qk,(Xk + λY k,uk

)

+ [Ek (Xk + λY k,uk

)]T Qk,Ek (Xk + λY k,uk

)

+ 2qTk,(X

k + λY k,uk

) − (Xk )T Qk,X

k

− [EkXk ]T Qk,EX

k − 2qT

k,Xk

]+ (uk + λuk )T (Rk,k + Rk,k )(uk + λuk )

+ 2ρTk,(uk + λuk ) − uT

k (Rk,k + Rk,k )uk − 2ρTk,uk

+ [Ek (XkN + λY k,uk

N )]T GkEk (XkN + λY k,uk

N )

+ Ek

[(Xk

N + λY k,uk

N )T Gk (XkN + λY k,uk

N )]

+ 2Ek [gTk (Xk

N + λY k,uk

N )]

− Ek

[(Xk

N )T GkXkN

]− (EkXk

N )T GkEkXkN

− 2Ek gTk Xk

N

= 2λ

{N −1∑=k

Ek

[(Xk

)T Qk,Yk,uk

+ qTk,Y

k

+ [EkXk ]T Qk,EkY k,uk

]+ uT

k (Rk,k + Rk,k )uk

+ ρTk, uk + Ek

[(Xk,uk

N )T GkY k,uk

N

]+ Ek [gT

k Y k,uk

N ]

+ [EkXkN ]T GkEkY k,uk

N

}

+ λ2

{N −1∑=k

Ek

[(Y k,uk

)T Qk,Yk,uk

+ (EkY k,uk

)T Qk,EkY k,uk

]

+ Ek

[uT

k (Rk,k + Rk,k )uk

]+ Ek

[(Y k,uk

N )T GkY k,uk

N

]

+ (EkY k,uk

N )T GkEkY k,uk

N

}. (58)

From (15), it holds that EkZkN = GkEkXk

N + gk and ZkN −

EkZkN = Gk (Xk

N − EkXkN ). Noting Y k,uk

k = 0, then, we have

N −1∑=k

Ek

[(Xk

)T Qk,Yk,uk

+ qTk,Y

k,uk

+[EkXk ]T Qk,EkY k,uk

]+ uT

k (Rk,k + Rk,k )uk

+ ρTk, uk + Ek

[(Xk,uk

N )T GkY k,uk

N

]+ [EkXk

N ]T GkEkY k,uk

N + Ek [gTk Y k,uk

N ]

=N −1∑=k

Ek

[(Qk,(X

k,uk

− EkXk,uk

)

+ ATk,(EZ

k+1 − EkZk

+1)

+p∑

i=1

(Cik,)

T(E(Zk

+1wi) − Ek (Zk

+1wi))

− (Zk − EkZk

))T

(Y k,uk

− EkY k,uk

)

+(Qk,EkXk,uk

+ qk, + ATk,EkZk

+1

+p∑

i=1

(Cik ,)

T Ek (Zk+1w

i) − EkZk

)T

EkY k,uk

]

+[Rk,kuk + BT

k,kEkZkk+1

+p∑

i=1

(Dik ,k )T Ek (Zk

k+1wik ) + ρk,k

]Tuk

=[Rk,kuk + BT

k,kEkZkk+1

+p∑

i=1

(Dik ,k )T Ek (Zk

k+1wik ) + ρk,k

]Tuk .

This together with (58) implies the conclusion. �

B. Proof of Theorem III.1

i)⇒ii). Let ut,x,∗ be an open-loop equilibrium control. As(18) is a decoupled FBSΔE, (18) is solvable. From (12), we

Page 13: Time-Inconsistent Mean-Field Stochastic LQ Problem: Open

NI et al.: TIME-INCONSISTENT MEAN-FIELD STOCHASTIC LQ PROBLEM: OPEN-LOOP TIME-CONSISTENT CONTROL 2783

have

J(k,Xt,x,∗k ; (ut,x,∗

k + λuk , ut,x,∗|Tk + 1)) − J(k,Xt,x,∗k ;ut,x,∗)

= 2λ[Rk,kut,x,∗

k + BTk,kEkZk,t,x

k+1 +p∑

i=1

(Dik ,k)

T Ek (Zk,t,xk+1 wi

k)

+ ρk,k

]Tuk + λ2 J(k, 0; uk )

≥ 0. (59)

Noting that (59) holds for any λ ∈ R and uk ∈ L2F (k; Rm ), we

have (16) and (17). In fact, if (16) was not satisfied, then therewould be a uk such that limλ �→∞J(k,Xt,x,∗

k ;ut,x,∗k + λuk ) −

J(k,Xt,x,∗k ;ut,x,∗

k ) = −∞. This is impossible. Furthermore, iffor some k0 ∈ Tt

γk0 = Rk0 ,k0 ut,x,∗k0

+ BTk0 ,k0

Ek0 Zk0 ,t,xk0 +1

+p∑

i=1

(Dik0 ,k0

)T Ek0 (Zk0 ,t,xk0 +1 wi

k0) + ρk0 ,k0

�= 0

we let uk0 = γk0 . Then, (59) implies that

2λ|γk0 |2 + λ2 J(k0 , 0; γk0 ) ≥ 0

holds for any λ ∈ R. However, for negative number λ withsufficient small magnitude, it holds that

2λ|γk0 |2 + λ2 J(k0 , 0; γk0 ) < 0

and contradiction arises. Therefore, γk0 must be 0, and (17)holds.

ii)⇒i). In this case, for any λ ∈ R and uk ∈ L2F (k; Rm ) we

have

J(k,Xt,x,∗k ; (ut,x,∗

k + λuk , ut,x,∗|Tk + 1)) − J(k,Xt,x,∗k ;ut,x,∗)

= λ2 J(k, 0; uk )

≥ 0.

Hence, ut,x,∗ is an open-loop equilibrium control. �

C. Proof of Lemma III.3

It is assumed that ut,x,∗ = ΨX

t,x,∗ + α, ∈ Tτ . Then, we

have

Xk,t,xN = Ak,N−1X

k,t,xN −1 + Ak,N−1EkXk,t,x

N −1

+ Bk,N−1ΨN −1Xt,x,∗N −1 + Bk,N−1ΨN −1EkXt,x,∗

N −1

+ Bk,N−1αN −1 + fk,N−1

+p∑

i=1

[Ci

k,N−1Xk,t,xN −1 + Ci

k,N−1EkXk,t,xN −1

+ Dik,N−1ΨN −1X

t,x,∗N −1 + Di

k,N−1ΨN −1EkXt,x,∗N −1

+ Dik ,N−1αN −1 + di

k,N−1

]wi

N −1 .

To calculate Zk,t,xN −1 , we need some preparations. Noting that

Zk,t,xN = GkXk,t,x

N + GkEkXk,t,xN + gk

we get

ATk,N−1EN −1Z

k,t,xN

= ATk,N−1EN −1

[GkXk,t,x

N + GkEkXk,t,xN + gk

]= AT

k,N−1GkAk,N−1Xk,t,xN −1 +

[AT

k,N−1GkAk,N−1

+ ATk,N−1GkAk,N−1

]EkXk,t,x

N −1

+ ATk,N−1GkBk,N−1ΨN −1X

t,x,∗N −1

+ ATk,N−1

[GkBk,N−1 + GkBk,N−1

]ΨN −1EkXt,x,∗

N −1

+ ATk,N−1Gk

[Bk,N−1αN −1 + fk,N−1

]+ AT

k,N−1gk

and

ATk,N−1EkZk,t,x

N

= ATk,N−1gk + AT

k,N−1GkAk,N−1EkXk,t,xN −1

+ ATk,N−1GkBk,N−1ΨN −1EkXt,x,∗

N −1

+ ATk,N−1Gk (Bk,N−1αN −1 + fk,N−1).

Furthermore, it holds that

(Cik,N−1)

T EN −1(Zk,t,xN wi

N −1)

= (Cik,N−1)

T Gk

p∑j=1

γijN −1

[Cj

k,N−1Xk,t,xN −1

+ Cjk,N−1EkXk,t,x

N −1 + Djk,N−1ΨN −1X

t,x,∗N −1

+ Djk,N−1ΨN −1EkXt,x,∗

N −1 + Djk ,N−1αN −1 + dj

k,N−1

]

and

(Cik,N−1)

T Ek (Zk,t,xN wi

N −1)

= (Cik,N−1)

T Gk

p∑j=1

γijN −1

[Cj

k ,N−1EkXk,t,xN −1

+ Djk ,N−1ΨN −1EkXt,x,∗

N −1 + Djk ,N−1αN −1 + dj

k,N−1

].

Therefore

Zk,t,xN −1

={

Qk,N−1 + ATk,N−1GkAk,N−1

+p∑

i,j=1

γijN −1(C

ik,N−1)

T GkCjk,N−1

}Xk,t,x

N −1

+{

Qk,N−1 + ATk,N−1GkAk,N−1

+ ATk,N−1GkAk,N−1 + AT

k,N−1GkAk,N−1

Page 14: Time-Inconsistent Mean-Field Stochastic LQ Problem: Open

2784 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 63, NO. 9, SEPTEMBER 2018

+p∑

i,j=1

γijN −1

[(Ci

k,N−1)T Gk Cj

k,N−1

+ (Cik,N−1)

T GkCjk ,N−1

]}EkXk,t,x

N −1

+{

ATk,N−1GkBk,N−1

+ CTk,N−1GkDk,N−1

}ΨN −1X

t,x,∗N −1

+{

ATk,N−1

[GkBk,N−1 + GkBk,N−1

]

+ ATk,N−1GkBk,N−1 +

p∑i,j=1

γijN −1

[(Ci

k,N−1)T GkDj

k,N−1

+ (Cik,N−1)

T GkDjk ,N−1

]}ΨN −1EkXt,x,∗

N −1

+ ATk,N−1Gk

[Bk,N−1αN −1 + fk,N−1

]

+p∑

i,j=1

γijN −1CT

k,N−1Gk

[Dk,N−1αN −1 + dk,N−1

]

+ ATk,N−1gk + qk,N−1

= Pk,N−1Xk,t,xN −1 + Pk,N−1EkXk,t,x

N −1 + Tk,N−1Xt,x,∗N −1

+ Tk,N−1EkXt,x,∗N −1 + πk,N−1 .

We now calculate Zk,t,xN −2 . Note that

ATk,N−2EN −2Z

k,t,xN −1

= ATk,N−2Pk,N−1Ak,N−2X

k,t,xN −2

+(AT

k,N−2Pk,N−1Ak,N−2 + ATk,N−2 Pk,N−1Ak,N−2

)× EkXk,t,x

N −2 +[AT

k,N−2Pk,N−1Bk,N−2ΨN −2

+ ATk,N−2Tk,N−1

(AN −2,N −2 + BN −2,N −2ΨN −2

)]Xt,x,∗

N −2

+[AT

k,N−2Pk,N−1Bk,N−2ΨN −2

+ ATk,N−2 Pk,N−1Bk,N−2ΨN −2

+ ATk,N−2 Tk,N−1

(AN −2,N −2 + BN −2,N −2ΨN −2

)]× EkXt,x,∗

N −2 + ATk,N−2Pk,N−1

(Bk,N−2αN −2 + fk,N−2

)+ AT

k,N−2Tk,N−1(BN −2,N −2αN −2 + fN −2,N −2

)+ AT

k,N−2πk,N−1

and similar expressions for CTk,N−2EN −2(Z

k,t,xN −1 wN −2), AT

k,N−2

EkZk,t,xN −1 , and CT

k,N−2Ek

(Zk,t,x

N −1 wN −2). Then, from (18)

we have

Zk,t,xN −2

={

Qk,N−2 + ATk,N−2Pk,N−1Ak,N−2

+p∑

i,j=1

γijN −1(C

ik,N−2)

T Pk,N−1Cjk,N−2

}Xk,t,x

N −2

+{

Qk,N−2 + ATk,N−2Pk,N−1Ak,N−2

+ ATk,N−2 Pk,N−1Ak,N−2 + AT

k,N−2Pk,N−1Ak,N−2

+p∑

i,j

γijN −1

[(Ci

k,N−2)T Pk,N−1C

jk,N−2

+ (Cik,N−2)

T Pk,N−1Cjk ,N−2

]}EkXk,t,x

N −2

+{

ATk,N−2Pk,N−1Bk,N−2ΨN −2

+ ATk,N−2Tk,N−1

(AN −2,N −2 + BN −2,N −2ΨN −2

}

+p∑

i,j=1

γijN −2

[(Ci

k,N−2)T Pk,N−1D

jk,N−2ΨN −1

+ (Cik,N−2)

T Tk,N−1(Cj

N −2,N −2 + DjN −2,N −2ΨN −2

)]}× Xt,x,∗

N −2 +{

ATk,N−2Pk,N−1Bk,N−2ΨN −2

+ ATk,N−2 Pk,N−1Bk,N−2ΨN −2

+ ATk,N−2 Tk,N−1

(AN −2,N −2 + BN −2,N −2ΨN −2

)+

p∑i,j=1

γijN −2(C

ik,N−2)

T Pk,N−1Djk,N−2ΨN −2

+ ATk,N−2Pk,N−1Bk,N−2ΨN −2 + AT

k,N−2Tk,N−1

×(AN −2,N −2 + BN −2,N −2ΨN −2

)+

p∑i,j=1

γijN −2

[(Ci

k,N−2)T Pk,N−1Dj

k ,N−2ΨN −1

+ (Cik,N−2)

T Tk,N−1(Cj

N −2,N −2 + DjN −2,N −2ΨN −2

)]}× EkXt,x,∗

N −2 + ATk,N−2Pk,N−1

(Bk,N−2αN −2 + fk,N−2

)+ AT

k,N−2Tk,N−1(BN −2,N −2αN −2 + fN −2,N −2

)+

p∑i,j=1

γijN −2

[(Ci

k ,N−2)T Pk,N−1

(Dj

k ,N−2αN −2 + djk,N−2

)

+ (Cik ,N−2)

T Tk,N−1(Dj

N −2,N −2αN −2 + djN −2,N −2

)]+ AT

k,N−2πk,N−2 + qk,N−2

= Pk,N−2Xk,t,xN −2 + Pk,N−2EkXk,t,x

N −2 + Tk,N−2Xt,x,∗N −2

+ Tk,N−2EkXt,x,∗N −2 + πk,N−2 .

By deduction, we achieve the conclusion. �

Page 15: Time-Inconsistent Mean-Field Stochastic LQ Problem: Open

NI et al.: TIME-INCONSISTENT MEAN-FIELD STOCHASTIC LQ PROBLEM: OPEN-LOOP TIME-CONSISTENT CONTROL 2785

D. Proof of Theorem III.3

i)⇒ii). Let ut,x,∗ be the one that satisfies the condition (17).Noting that Xk,t,x

k = Xt,x,∗k of (19), we have

Xk,t,xk+1 = Xt,x,∗

k+1 , k ∈ Tt (60)

as

ZN −1,t,xN = GN −1X

N −1,t,xN + GN −1EN −1X

N −1,t,xN + gN −1

we have

EN −1ZN −1,t,xN

= GN −1AN −1,N −1Xt,x,∗N −1

+ GN −1BN −1,N −1ut,x,∗N −1

+ GN −1fN −1,N −1 + gN −1 . (61)

Similarly, it holds that

EN −1(ZN −1,t,xN wi

N −1)

=p∑

j=1

γijN −1GN −1Cj

N −1,N −1Xt,x,∗N −1

+p∑

j=1

γijN −1GN −1Dj

N −1,N −1ut,x,∗N −1

+p∑

j=1

γijN −1GN −1d

jN −1,N −1 . (62)

From (17), (61), and (62), we have

0 = WN −1ut,x,∗N −1 + HN −1X

t,x,∗N −1 + βN −1 (63)

where⎧⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩

WN −1 = RN −1,N −1 + BTN −1,N −1GN −1BN −1,N −1

+∑p

i,j=1 γijN −1(Di

N −1,N −1)T GN −1Dj

N −1,N −1

HN −1 = BTN −1,N −1GN −1AN −1,N −1

+∑p

i,j=1 γijN −1(Di

N −1,N −1)T GN −1Cj

N −1,N −1

βN −1 = BTN −1,N −1

[GN −1fN −1,N −1 + gN −1

]+∑p

i,j=1 γijN −1(Di

N −1,N −1)T GN −1d

jN −1,N −1

+ ρN −1,N −1 .

Note that Xt,x,∗N −1 is not influenced by ut,x,∗

N −1 . From Lemma III.4,ut,x,∗

N −1 can be selected as

ut,x,∗N −1 = −W†

N −1HN −1Xt,x,∗N −1 −W†

N −1βN −1

� ΨN −1Xt,x,∗N −1 + αN −1 (64)

and (I −WN −1W†

N −1

)(HN −1X

t,x,∗N −1 + βN −1

)= 0

holds, which is equivalent to

HN −1Xt,x,∗N −1 + βN −1 ∈ Ran(WN −1).

Moving to the case k = N − 2 and by deduction, we thenhave (28) and (33).

ii)⇒i). Let Xt,x,∗ and ut,x,∗ be given in (32) and (33). From(28), we have

0 = Wkut,x,∗ + HkXt,x,∗k + βk , k ∈ Tt . (65)

Furthermore, from (33) and Lemma III.3, then, (25), equiva-lently (34), holds. Similarly to (61) and (62), we have

EkZk,t,xk+1 = (Pk,k+1 + Tk,k+1)Ak,kXt,x,∗

k

+ (Pk,k+1 + Tk,k+1)Bk,kut,x,∗k

+ (Pk,k+1 + Tk,k+1)fk,k + πk,k+1 (66)

and

Ek (Zk,t,xk+1 wi

k ) =p∑

j=1

γijk

(Pk,k+1 + Tk,k+1

)Cj

k ,kXt,x,∗k

+p∑

j=1

γijk

(Pk,k+1 + Tk,k+1

)Dj

k ,kut,x,∗k

+p∑

j=1

γijk

(Pk,k+1 + Tk,k+1

)dj

k,k . (67)

Combining (65)–(67), we have the stationary condition (17).�

REFERENCES

[1] M. Ait Rami, X. Chen, and X. Y. Zhou, “Discrete-time indefinite LQcontrol with state and control dependent noises,” J. Global Optim., vol. 23,pp. 245–265, 2002.

[2] S. Basak and G. Chabakauri, “Dynamic mean-variance asset allocation,”Review Financial Studies, vol. 23, pp. 2970–3016, 2010.

[3] T. Basar and G. J. Olsder, Dynamic Noncooperative Game Theory (Clas-sics in Applied Mathematics), 2nd ed. Philadelphia, PA, USA: SIAM,1999.

[4] A. Bensoussan, J. Frehse, and P. Yam, Mean Field Games and Mean FieldType Control Theory. New York, NY, USA: Springer, 2013.

[5] T. Bjork and A. Murgoci, “A general theory of Markovian time incon-sisitent stochastic control problems,” 2010, doi: 10.2139/ssrn.1694759.[Online]. Available: http://ssrn.com/abstract=1694759

[6] T. Bjork, A. Murgoci, and X. Y. Zhou, “Mean-variance portfolio optimiza-tion with state dependent risk aversion,” Math. Finance, vol. 24, pp. 1–24,2014.

[7] S. A. Buser, “Mean-variance portfolio with either a singular or nonsingu-lar variance-covariance matrix,” J. Financial Quantitative Anal., vol. 12,pp. 347–361, 1977.

[8] S. Chen, X. Li, and X. Y. Zhou, “Stochastic linear quadratic regulators withindefinite control weight costs,” SIAM J. Control Optim., 1998, vol. 36,pp. 1685–1702.

[9] X. Cui, D. Li, and X. Li, “Mean-variance policy, time consistency inefficiency and minimum-variance signed supermartingale measure fordiscrete-time cone constrained markets,” Math. Finance, vol. 27, no. 2,pp. 471–504, 2017.

[10] X. Cui, D. Li, S. Wang, and S. Zhu, “Better than dynamic meanvariance:Time inconsistency and free cash flow stream,” Math. Finance, vol. 22,pp. 346–378, 2012.

[11] X. Cui, X. Li, and D. Li, “Unified framework for optimal multi-periodmean-variance portfolio selection under mean-field formulation,” IEEETrans. Autom. Control, vol. 59, no. 7, pp. 1833–1844, Jul. 2014.

[12] I. Ekeland and A. Lazrak, “Being serious about non-commitment: Sub-game perfect equilibrium in continuous time,” 2008. [Online]. Available:http://arxiv.org/abs/math/0604264

[13] I. Ekeland and T. A. Privu, “Investment and consumption without com-mitment,” Math. Financial Econ., vol. 2, no. 1, pp. 57–86, 2008.

Page 16: Time-Inconsistent Mean-Field Stochastic LQ Problem: Open

2786 IEEE TRANSACTIONS ON AUTOMATIC CONTROL, VOL. 63, NO. 9, SEPTEMBER 2018

[14] R. J. Elliott, X. Li, and Y. H. Ni, “Discrete time mean-field stochasticinear-quadratic optimal control problems,” Automatica, vol. 49, no. 11,pp. 3222–3233, 2013.

[15] S. M. Goldman, “Consistent plan,” Rev. Econ. Studies, vol. 47, pp. 533–537, 1980.

[16] M. Huang, P. E. Caines, and R. P. Malhame, “Large-population cost-coupled LQG problems with nonuniform agents: individual-mass behaviorand decentralized ε-Nash equilibria,” IEEE Trans. Autom. Control, vol. 52,no. 9, pp. 1560–1571, Sep. 2007.

[17] Y. Hu, H. Jin, and X. Y. Zhou, “Time-inconsistent stochastic linear-quadratic control,” SIAM J. Control Optim., vol. 50, pp. 1548–1572, 2012.

[18] Y. Hu, H. Jin, and X. Y. Zhou, “Time-inconsistent stochastic linear-quadratic control: Characterization and uniqueness of equilibrium,” SIAMJ. Control Optim., vol. 50, no. 3, pp. 1548–1572, 2017.

[19] P. Krusell and A. A. Smith, “Consumption and savings decisions withquasi-georetric discounting,” Econometrica, vol. 71, no. 1, pp. 365–375,2003.

[20] D. Laibson, “Golden eggs and hyperbolic discounting,” Quart. J. Econ.,vol. 112, pp. 443–477, 1997.

[21] J. M. Lasry and P. L. Lions, “Mean-field games,” Japn. J. Math., vol. 2,pp. 229–260, 2007.

[22] D. Li and W. L. Ng, “Optimal dynamic portfolio selection: Multi-periodmean-variance formulation,” Math. Finance, vol. 10, pp. 387–406, 2000.

[23] X. Li, Y. H. Ni, and J. F. Zhang, “On time-consistent solution to time-inconsistent linear-quadratic optimal control of discrete-time stochasticsystems,” arXiv:1703.01942.

[24] H. Markowitz, “Portfolio selection,” J. Finance, vol. 7, pp. 77–91, 1952.[25] Y. H. Ni, J. F. Zhang, and X. Li, “Indefinite mean-field stochastic linear-

quadratic optimal control,” IEEE Trans. Autom. Control, vol. 60, no. 7,pp. 1786–1800, Jul. 2015.

[26] I. Palacios-Huerta, “Time-inconsistent preferences in Adam Smith andDavis Hume,” Hist. Political Econ., vol. 35, pp. 391–401, 2003.

[27] P. J. Ryan and J. Lefoll, “A comment on mean-variance portfolio witheither a singlar or nonsingular variance-covariance matrix,” J. FinancialQuant. Anal., vol. 16, pp. 389–395, 1981.

[28] A. Smith, The Theory of Moral Sentiments, 1st ed., 1759 (Reprint, London,U.K.: Oxford Univ. Press, 1976).

[29] R. H. Strotz, “Myopia and inconsistency in dynamic utility maximization,”Rev. Econ. Studies, vol. 23, pp. 165–180, 1955.

[30] H. Y. Wang and Z. Wu, “Time-inconsistent optimal control problem withrandom coefficients and stochastic equilibrium HJB equation,” Math. Con-trol Related Fields, vol. 5, no. 3, pp. 651–678, 2015.

[31] J. M. Yong and X. Y. Zhou, Stochastic Controls: Hamiltonian Systemsand HJB Equations. New York, NY, USA: Springer, 1999.

[32] J. M. Yong, “A deterministic linear quadratic time-inconsitent optimalcontrol problem,” Math. Control Related Fields, vol. 1, no. 1, pp. 83–118,2011.

[33] J. M. Yong, “Time-inconsistent optimal control problems and the equilib-rium HJB equation,” Math. Control Related Fields, vol. 2, no. 3, pp. 271–329.

[34] J. M. Yong, “Deterministic time-inconsistent optimal control problems—An essentially cooperative approach,” Acta Mathematicae ApplicataeSinica, vol. 28, pp. 1–20, 2012.

[35] J. M. Yong, “A linear-quadratic optimal control problem for mean-field stochastic differential equations,” SIAM J. Control Optim., vol. 51,pp. 2809–2838, 2013.

[36] J. M. Yong, Differential Games—A Concise Introduction. Singapore:World Scientific, 2015.

[37] J. M. Yong, “Linear-quadratic optimal control problems for mean-field stochastic differential equations—Time-consistent solutions,” Trans.Amer. Math. Soc., vol. 369, pp. 5467–5523, 2017.

[38] X. Y. Zhou and D. Li, “Continuous-time mean-variance portfolio selection:A stochastic LQ framework,” Appl. Math. Optim., vol. 42, no. 1, pp. 19–33,2000.

Yuan-Hua Ni received the Ph.D. degree in oper-ation research and cybernetics from the ChineseAcademy of Sciences, Beijing, China, in 2010.

He is currently with the College of Com-puter and Control Engineering, Nankai Univer-sity (NKU), Tianjin, China, as an Associate Pro-fessor. Before joining NKU, he was with theTianjin Polytechnic University, Tianjin, China.From April 2014 to May 2015, he was a Visit-ing Scholar with the University of California, SanDiego, CA, USA. His current research interests

include stochastic systems and stochastic control.

Ji-Feng Zhang (M’92–SM’97–F’14) receivedthe B.S. degree in mathematics from ShandongUniversity, Jinan, China, in 1985 and the Ph.D.degree from the Institute of Systems Science(ISS), Chinese Academy of Sciences (CAS),Beijing, China, in 1991.

Since 1985, he has been with the ISS, CAS,and currently is the Director of ISS. He is theEditor-in-Chief of Journal of Systems Scienceand Mathematical Sciences and all about sys-tems and control, the Deputy Editor-in-Chief of

Science China Information Sciences and Systems Engineering—Theoryand Practice. He was a Managing Editor of the Journal of SystemsScience and Complexity, the Deputy Editor-in-Chief of Acta Automat-ica Sinica and Control Theory and Applications, an Associate Editor ofseveral other journals, including the IEEE TRANSACTIONS ON AUTOMATICCONTROL, SIAM Journal on Control and Optimization, etc. His currentresearch interests include system modeling, adaptive control, stochasticsystems, and multiagent systems.

Dr. Zhang is an IFAC Fellow and an Academician of the InternationalAcademy for Systems and Cybernetic Sciences. He received twice theSecond Prize of the State Natural Science Award of China in 2010 and2015, respectively, the Distinguished Young Scholar Fund from NationalNatural Science Foundation of China in 1997, the First Prize of the YoungScientist Award of CAS in 1995, the Outstanding Advisor Award of CASin 2007, 2008, and 2009, respectively. He worked as a Vice-Chair of theIFAC Technical Board; member of the Board of Governors, IEEE ControlSystems Society; Convenor of Systems Science Discipline, AcademicDegree Committee of the State Council of China; Vice President of theSystems Engineering Society of China, and the Chinese Association ofAutomation. He was the General Co-Chair of the 33rd and the 36th Chi-nese Control Conferences; IPC Chair of the 2012 IEEE Conference onControl Applications, the 9th World Congress on Intelligent Control andAutomation, and the 17th IFAC Symposium on System Identification;and is an IPC Vice-Chair of the 20th IFAC World Congress, etc.

Miroslav Krstic (S’92–M’95–SM’99–F’02)received the Dipl. Ing. degree from Universityof Belgrade, Electrical Engineering, Yugoslaviain 1989 and the MS and Ph.D. degree fromUniversity of California, Santa Barbara, Elec-trical Engineering in 1992, 1994. Mr. Krsticholds the Alspach Endowed Chair and is theFounding Director with the Cymer Center forControl Systems and Dynamics, University ofCalifornia, San Diego (UCSD), La Jolla, CA,USA. He also serves as an Associate Vice

Chancellor for Research at UCSD. He has coauthored eleven books onadaptive, nonlinear, and stochastic control, extremum seeking, controlof PDE systems including turbulent flows, and control of delay systems.

Mr. Krstic was the recipient of the UC Santa Barbara best dissertationaward and student best paper awards at CDC and ACC, as a graduatestudent. He is a Fellow of IFAC, ASME, SIAM, AAAS, and IET (U.K.),an Associate Fellow of AIAA, and foreign member of the Academyof Engineering of Serbia. He was the recipient of the PECASE, NSFCareer, and ONR Young Investigator awards, the Axelby and Schuckpaper prizes, the Chestnut textbook prize, the ASME Nyquist LecturePrize, and the first UCSD Research Award given to an engineer. Hehas also been awarded the Springer Visiting Professorship at UCBerkeley, the Distinguished Visiting Fellowship of the Royal Academyof Engineering, the Invitation Fellowship of the Japan Society forthe Promotion of Science, and the Honorary Professorships fromthe Northeastern University (Shenyang), Chongqing University, andDonghua University, China. He serves as a Senior Editor for the IEEETRANSACTIONS ON AUTOMATIC CONTROL AND AUTOMATICA, as an editorof two Springer book series, and has served as Vice President forTechnical Activities of the IEEE Control Systems Society and as theChair of the IEEE CSS Fellow Committee.