Learning-based Attacks in Cyber-Physical Systemsacsweb.ucsd.edu/~mkhojast/Papers/Authentication-learning-based.pdf · in-the-middle attack, physical-layer authentication. I. INTRODUCTION

1

Learning-based Attacks in Cyber-Physical SystemsMohammad Javad Khojasteh, Anatoly Khina, Massimo Franceschetti, and Tara Javidi

Abstract—We introduce the problem of learning-based attacksin a simple abstraction of cyber-physical systems—the case of adiscrete-time, linear, time-invariant plant that may be subject toan attack that overrides the sensor readings and the controlleractions. The attacker attempts to learn the dynamics of the plantand subsequently override the controller’s actuation signal, todestroy the plant without being detected. The attacker can feedfictitious sensor readings to the controller using its estimate of theplant dynamics and mimic the legitimate plant operation. Thecontroller, on the other hand, is constantly on the lookout for anattack; once the controller detects an attack, it immediately shutsthe plant off. In the case of scalar plants, we derive an upperbound on the attacker’s deception probability for any measurablecontrol policy when the attacker uses an arbitrary learningalgorithm to estimate the system dynamics. We then derive lowerbounds for the attacker’s deception probability for both scalarand vector plants by assuming a specific authentication test thatinspects the empirical variance of the system disturbance. We alsoshow how the controller can improve the security of the systemby superimposing a carefully crafted privacy-enhancing signalon top of the “nominal control policy.” Finally, for nonlinearscalar dynamics that belong to the Reproducing Kernel HilbertSpace (RKHS), we investigate the performance of attacks basedon nonlinear Gaussian-processes (GP) learning algortihms.

Index Terms—Cyber-physical systems security, learning fordynamics and control, secure control, system identification, man-in-the-middle attack, physical-layer authentication.

I. INTRODUCTION

Recent technological advances in wireless communicationsand computation, and their integration into networked controland cyber-physical systems (CPS), open the door to a myriadof new and exciting applications, including cloud roboticsand automation [2]. However, the distributed nature of CPSis often a source of vulnerability. Security breaches in CPScan have catastrophic consequences ranging from hamperingthe economy by obtaining financial gain, to hijacking au-tonomous vehicles and drones, to terrorism by manipulatinglife-critical infrastructures [3]–[5]. Real-world instances ofsecurity breaches in CPS, that were discovered and madepublic, include the revenge sewage attack in Maroochy Shire,

The material in this paper was presented in part at the 8th IFAC Workshopon Distributed Estimation and Control in Networked Systems, 2019 [1]. Thisresearch was partially supported by NSF awards CNS-1446891 and ECCS-1917177. This work has received funding from the European Union’s Horizon2020 research and innovation programme under the Marie Skłodowska-Curiegrant agreement No 708932.

M. J. Khojasteh is with the Department of Electrical Engineering, Cal-ifornia Institute of Technology, Pasadena, CA 91125, USA. Some of thiswork was performed while at University of California, San Diego (e-mail:[email protected]).

A. Khina is with the School of Electrical Engineering, Tel Aviv University,Tel Aviv, Israel 6997801 (e-mail: [email protected]).

M. Franceschetti, and T. Javidi are with the Department ofElectrical and Computer Engineering, University of California, SanDiego, La Jolla, CA 92093, USA (e-mails: mfranceschetti,[email protected]).

Australia; the Ukraine power grid cyber-attack; the Germansteel mill cyber-attack; the Davis-Besse nuclear power plantattack in Ohio, USA; and the Iranian uranium-enrichmentfacility attack via the Stuxnet malware [6]. Studying and pre-venting such security breaches via control-theoretic methodshas received a great deal of attention in recent years [7]–[23].

An important and widely studied class of attacks in CPS isbased on the “man-in-the-middle” (MITM) paradigm [24]: anattacker takes over the sensor signals of the physical plantand mimics stable and safe operation. At the same time,the attacker also overrides the control signals with maliciousinputs to push the plant toward a catastrophic trajectory. Itfollows that CPS must constantly monitor the plant outputsand look for anomalies in the fake sensor signals to detect suchattacks. The attacker, on the other hand, aims to override thesensor readings in a manner that would be indistinguishablefrom the legitimate ones.

The MITM attack has been extensively studied in twospecial cases [24]–[28]. The first case is the replay attack,in which the attacker observes and records the legitimatesystem behavior for a given time window and then replays thisrecording periodically at the controller’s input [25]–[27]. Thesecond case is the statistical-duplicate attack, which assumesthat the attacker has acquired complete knowledge of thedynamics and parameters of the system, and can constructarbitrarily long fictitious sensor readings that are statisticallyidentical to the actual signals [24], [28]. The replay attackassumes no knowledge of the system parameters—and asa consequence, it is relatively easy to detect. An effectiveway to counter the replay attack consists of superimposinga random watermark signal, unknown to the attacker, ontop of the control signal [29]–[33]. The statistical-duplicateattack assumes full knowledge of the system dynamics—andas a consequence, it requires a more sophisticated detectionprocedure, as well as additional assumptions on the attackeror controller behavior to ensure it can be detected. To combatthe attacker’s full knowledge, the controller may adopt movingtarget [34]–[37] or baiting [38], [39] techniques. Alternatively,the controller may introduce private randomness in the con-trol input using watermarking [28]. In this scenario, a vitalassumption is made: although the attacker observes the truesensor readings, it is barred from observing the control actions,as otherwise it would be omniscient and undetectable.

Our contributions are as follows. First, we observe thatin many practical situations the attacker does not have fullknowledge of the system and cannot simulate a statisticallyindistinguishable copy of the system. On the other hand, theattacker can carry out more sophisticated attacks than simplyreplaying previous sensor readings, by attempting to “learn”the system dynamics from the observations. For this reason, westudy learning-based attacks, in which the attacker attempts

2

to learn a model of the plant dynamics, and show that theycan outperform replay attacks on linear systems by providinga lower bound on the attacker’s deception probability usinga specific learning algorithm. Secondly, we derive a conversebound on the attacker’s deception probability in the specialcase of scalar systems. This holds for any (measurable) controlpolicy, and for any learning algorithm that may be used bythe attacker to estimate the dynamics of the plant. Thirdly,for any learning algorithm utilized by the attacker to estimatethe dynamics of the plant, we show that adding a properprivacy-enhancing signal to the “nominal control policy” canlower the deception probability. Finally, we offer a treatmentfor nonlinear scalar dynamics that belong to a ReproducingKernel Hilbert Space (RKHS), by studying the performance ofa nonlinear attack based on machine-learning GP algorithms.

Throughout the paper, we assume that the attacker hasfull access to both sensor and control signals. The controller,on the other hand, has perfect knowledge of the systemdynamics and tries to discover the attack from the observationsthat are maliciously injected by the attacker. This assumedinformation-pattern imbalance between the controller and theattacker is justified since the controller is tuned in muchlonger than the attacker and thus has knowledge of the systemdynamics to a far greater precision than the attacker. On theother hand, the attacker can completely hijack the sensor andcontrol signals that travel through a communication networkthat has been compromised. Since in our setting the success orfailure of the attacker is dictated by its learning capabilities,our work is also related to recent studies in learning-basedcontrol [40]–[48].

The learning-based attacks are also related to the Known-Plaintext Attacks (KPA), introduced in [49], in linear systemswith linear controllers. Using pole–zero analysis from classicalsystem identification, [49] investigate necessary and sufficientconditions for which the system is identifiable for an attacker,and as a result, vulnerable against KPA. To combat KPA,[49] utilizes low-rank linear controllers that trade controlperformance for security.

The outline of the rest of the paper is as follows. Thenotations used in this work are detailed in Sec. I-A. For ease ofexposition, we start by presenting the problem for the specialcase of scalar linear plants in Sec. II, and present our mainresults for this case in Sec. III. We then extend the model andtreatment to vector linear and scalar nonlinear plants in Secs.IV and V, respectively. We conclude the paper in Sec. VI witha discussion of future directions. Due to space constraints, allthe proofs are available online in [50].

A. Notation

Throughout the paper, we use the following notation. Wedenote by N the set of natural numbers, and by R—the set ofreal numbers. All logarithms, denoted by log, are base 2. Wedenote by ‖·‖ the Euclidean norm of a vector, and by ‖·‖op—the operator norm induced by it when applied to a matrix.We denote by † the transpose operation of a matrix. For tworeal-valued functions g and h, g(x) = O (h(x)) as x → x0

means lim supx→x0|g(x)/h(x)| < ∞, and g(x) = o (h(x))

as x → x0 means limx→x0|g(x)/h(x)| = 0. We denote by

(a) Learning: During this phase, the attacker eavesdrops and learns thesystem, without altering the input signal to the controller (Yk = Xk).

(b) Hijacking: During this phase, the attacker hijacks the system andintervenes as a MITM in two places: acting as a fake plant for thecontroller (Yk = Vk) by impersonating the legitimate sensor, and asa malicious controller (Uk) for the plant aiming to destroy the plant.

Fig. 1: System model during learning-based attack phases.

xji = (xi, · · · , xj) the realization of the tuple of randomvariables Xj

i = (Xi, · · · , Xj) for i, j ∈ N, i ≤ j. Randommatrices are represented by boldface capital letters (e.g. A) andtheir realizations are represented by typewriter boldface letters(e.g. A). A B means that A−B is a positive semidefinitematrix, namely is the Loewner order of Hermitian matrices.λmax(A) denotes the largest eigenvalue of the matrix A. Wedenote by Xji = (Xi, · · · ,Xj) the realization of the tuple ofrandom vectors Xji = (Xi, · · · ,Xj) for i, j ∈ N, i ≤ j.PX denotes the distribution of the random vector X withrespect to (w.r.t) probability measure P, whereas fX denotesits probability density function (PDF) w.r.t. to the Lebesguemeasure, if it has one. An event is said to happen almost surely(a.s.) if it occurs with probability one. For real numbers a andb, a b means a is much less than b, in some numericalsense, while for probability distributions P and Q, P Qmeans P is absolutely continuous w.r.t. Q. dP/dQ denotesthe Radon–Nikodym derivative of P w.r.t. Q. The Kullback–Leibler (KL) divergence between probability distributions PXand PY is defined as

D(PX‖PY ) ,

EPX

[log

dPXdPY

], PX PY ;

∞, otherwise,

where EPXdenotes the expectation w.r.t. probability mea-

sure PX . The conditional KL divergence between probabilitydistributions PY |X and QY |X averaged over PX is definedas D

(PX|Y

∥∥QY |X ∣∣PX) , EPX

[D(PY |X=X

∥∥∥QY |X=X

)],

where (X, X) are independent and identically distributed(i.i.d.). The mutual information between random variablesX and Y is defined as I(X;Y ) , D (PXY ‖PXPY ). Theconditional mutual information between random variables X

3

and Y given random variable Z is defined as I(X;Y |Z) ,

EPZ

[I(X;Y |Z = Z)

], where (Z, Z) are i.i.d.

II. PROBLEM SETUP

We consider the networked control system depicted inFig. 1, where the plant dynamics are described by a scalar,discrete-time, linear time-invariant (LTI) system

Xk+1 = aXk + Uk +Wk, (1)

where Xk, a, Uk, Wk are real numbers representing the plantstate, open-loop gain of the plant, control input, and plantdisturbance, respectively, at time k ∈ N. The controller, attime k, observes Yk and generates a control signal Uk as afunction of Y k1 . If the attacker does not tamper sensor reading,at any time k ∈ N, we have Yk = Xk. We assume that theinitial condition X0 has a known (to all parties) distributionand is independent of the disturbance sequence Wk. Foranalytical purposes, we assume that the process Wk hasi.i.d. Gaussian samples of zero mean and variance σ2 that isknown to all parties. We assume, without loss of generality,that W0 = 0, E [X0] = 0, and take U0 = 0. Moreover, tosimplify the notation, let Zk , (Xk, Uk) denote the state-and-control input at time k and its trajectory up to time k—by

Zk1 , (Xk1 , U

k1 ).

The controller is equipped with a detector that tests foranomalies in the observed history Y k1 . When the controllerdetects an attack, it shuts the system down and prevents theattacker from causing further “damage” to the plant. Thecontroller/detector is aware of the plant dynamics (1) andknows the open-loop gain a of the plant. On the other hand,the attacker knows the plant dynamics (1) as well as the plantstate Xk, and control input Uk (or equivalently, Zk) at time k(see Fig. 1). However, it does not know the open-loop gain a.

We assume the open-loop gain is fixed in time, but unknownto the attacker (as in frequentist approach [51]). Nevertheless,it will be convenient, for algorithm aid, to assume a priorover the open-loop gain of the plant, and treat it as a randomvariable A, that is fixed across time, whose PDF fA is knownto the attacker, and whose realization a is known to thecontroller (cf. Sec. VI-D). We assume all random variables toexist on a common probability space with probability measureP, and Uk to be a measurable function of Y k1 for all timek ∈ N. We also denote the probability measure conditioned onA = a by Pa. Namely, for any measurable event C, we define

Pa(C) = P(C|A = a).

A is assumed to be independent of X0 and Wk|k ∈ N.

A. Learning-based Attacks

We now define Learning-based attacks that consist of twodisjoint, consecutive, passive and active phases (cf. Sec. VI-C).

Phase 1: Learning. During this phase, the attacker passivelyobserves the control input and the plant state to learn the open-loop gain of the plant. As illustrated in Fig. 1a, for all k ∈[0, L], the attacker observes the control input Uk and the plant

state Xk, and tries to learn the open-loop gain a, where Lis the duration of the learning phase. We denote by A theattacker’s estimate of the open-loop gain a. •

Phase 2: Hijacking. In this phase, the attacker aims todestroy the plant via Uk while remaining undetected. As illus-trated in Fig. 1b, from time L+ 1 and onward the attacker hi-jacks the system and feeds a malicious control signal Uk to theplant and a fictitious sensor reading Yk = Vk to the controller.•

We assume that the attacker can use any arbitrary learningalgorithm to estimate the open-loop gain a during the learningphase, and upon estimation is completed, we assume thatduring the hijacking phase the fictitious sensor reading is con-structed, in a model-based manner (cf. Sec. VI-B), as follows

Vk+1 = AVk + Uk + Wk , k = L, . . . , T − 1, (2)

where Wk for k = L, . . . , T − 1 are i.i.d. Gaussian N (0, σ2);Uk is the control signal generated by the controller, which isfed with the fictitious virtual signal Vk by the attacker; VL =XL; and A is the estimate of the open-loop gain of the plantat the conclusion of Phase 1.

B. Detection

The controller/detector, being aware of the dynamic (1)and the open-loop gain a, attempts to detect possible attacksby testing for statistical deviations from the typical behaviorof the system (1). More precisely, under legitimate systemoperation (corresponding to the null hypothesis), the controllerobservation Yk behaves according to

Yk+1 − aYk − Uk(Y k1 ) ∼ i.i.d. N (0, σ2). (3)

In the case of an attack, during Phase 2 (k > L), (3) can berewritten as

Vk+1 − aVk − Uk = Vk+1 − aVk + AVk − AVk − Uk (4a)

= Wk +(A− a

)Vk, (4b)

where (4b) follows from (2). Hence, the estimation error (A−a) dictates the ease with which an attack can be detected.

Since the Gaussian PDF with zero mean is fully charac-terized by its variance, we shall follow [28], and test foranomalies in the latter, i.e., test whether the empirical varianceof (3) is equal to the second moment of the plant disturbanceE[W 2]. To that end, we shall use a test that sets a confidence

interval of length 2δ > 0 around the expected variance, i.e., itchecks whether

1

T

T∑k=1

[Yk+1 − aYk − Uk(Y k1 )

]2∈ (Var [W ]− δ,Var [W ] + δ),

(5)

where T is called the test time. That is, as is implied by(4), the attacker manages to deceive the controller and remainundetected if

1

T

(L∑k=1

W 2k +

T∑k=L+1

(Wk + (A− a)Vk)2

)∈ (Var [W ]− δ,Var [W ] + δ).

4

C. Performance Measures

Definition 1. The hijack indicator at test time T is defined as

ΘT ,

0, ∀j ≤ T : Yj = Xj ;

1, otherwise.

At the test time T , the controller uses Y T1 to construct anestimate ΘT of ΘT . More precisely, ΘT = 0 if (5) occurs,otherwise ΘT = 1. •

Definition 2. The probability of deception is the probabilityof the attacker deceiving the controller and remain undetectedat the time instant T

P a,TDec , Pa(

ΘT = 0∣∣∣ΘT = 1

); (6)

the detection probability at test time T is defined as

P a,TDet , 1− P a,TDec .

Likewise, the probability of false alarm is the probability ofdetecting the attacker when it is not present, namely

P a,TFA , Pa(

ΘT = 1∣∣∣ΘT = 0

). •

Applying Chebyshev’s inequality to (5) and noting that thesystem disturbances are i.i.d. Gaussian of variance σ2, we have

PTFA ≤Var[W 2]

δ2T=

3σ4

δ2T.

Further define the deception, detection, and false-alarm prob-abilities w.r.t. the probability measure P, without conditioningon A, and denote them by PTDec, PTDet, and PTFA, respectively.For instance, PTDet is defined, w.r.t. a PDF fA of A, as

PTDet , P(

ΘT = 1∣∣∣ΘT = 1

)=

∫ ∞−∞

P a,TDet fA(a)da. (7)

III. STATEMENT OF THE RESULTS

In this section, we describe our main results for the caseof scalar plants. We provide lower and upper bounds on thedeception probability (6) of the learning-based attack (2),where the estimate A in (2) may be constructed using anarbitrary learning algorithm. Our results are valid for anymeasurable control policy Uk. We find a lower bound onthe deception probability by characterizing what the attackercan at least achieve using a least-squares (LS) algorithm, andwe derive an information theoretic converse for any learningalgorithm using Fano’s inequality [52, Chs. 2.10 & 7.9]. Whileour analysis is restricted to the asymptotic case, T →∞, it isstraightforward to extend it to the non-asymptotic case.

For analytical purposes, we assume that the power of thefictitious sensor reading is equal to β−1 <∞, namely

limT→∞

1

T

T∑k=L+1

V 2k = 1/β a.s. w.r.t. Pa. (8)

Remark 1. Assuming the control policy is memoryless, namelyUk is only dependent on Yk, the process Vk is Markov fork ≥ L + 1. By further assuming that L = o(T ) and usingthe generalization of the law of large numbers for Markovprocesses [53], we deduce

limT→∞

1

T

T∑k=L+1

V 2k ≥ Var [W ] a.s. w.r.t. Pa.

Consequently, in this case we have β ≤ 1/Var [W ]. Inaddition, when the control policy is linear and stabilizes (2),that is Uk = −ΩYk and |A− Ω| < 1, it is easy to verify that(8) holds true for β = (1− (A− Ω)2)/Var [W ]. •

A. Lower Bound on the Deception Probability

To provide a lower bound on the deception probability P a,TDec ,we consider a specific estimate A at the conclusion of thefirst phase by the attacker. To this end, we use LS estimationdue to its efficiency and amenability to recursive updateover observed incremental data [43]–[45]. The LS algorithmapproximates the overdetermined system of equations

X2

X3

...XL

= A

X1

X2

...XL−1

+

U1

U2

...UL−1

,

by minimizing the Euclidean distance A = argminA

‖Xk+1 −AXk − Uk‖ to estimate (or “identify”) the plant, thesolution to which is

A =

∑L−1k=1 (Xk+1 − Uk)Xk∑L−1

k=1 X2k

a.s. w.r.t. Pa. (9)

Remark 2. Since we assumed Wk ∼ N (0, σ2) for all k ∈ N,Pa(Xk = 0) = 0. Thus, (9) is well-defined. •

Using LS estimation (9), it follows that any linear learning-based attack (2) achieves at least the asymptotic deceptionprobability stated in the following theorem, for any measur-able control policy; its proof is available in [50].

Theorem 1. Consider any linear learning-based attack (2)with fictitious sensor reading power that satisfies (8) and anarbitrary measurable control policy Uk. Then, the asymp-totic deception probability, when using the variance test (5),is bounded from below as

limT→∞

P a,TDec = Pa(|A− a| <

√δβ)

(10a)

≥ Pa

∣∣∣∑L−1

k=1 WkXk

∣∣∣∑L−1k=1 X

2k

<√δβ

(10b)

≥ 1− 2

(1 + δβ)L/2. (10c)

An example that compares the deception probabilities of thelearning-based and replay attacks as a function of the detectionwindow size is available in Ex. 2 in App. A.Remark 3. We can study the special case of Theorem 1 for alinear controller. Using the value of β calculated in Remark 1for a linear controller Uk = −ΩYk when |A−Ω| < 1, we canrewrite (10c) as

limT→∞

P a,TDec ≥ 1− 2

(1 + δ 1−(A−Ω)2

Var[W ] )L/2. (11)

5

In this case, for a fixed L, as Var [W ] increases the lowerbound in (11) decreases. The reduction of attacker’s successrate can be explained by noticing that LS estimation (9) isbased on minimizing ‖Xk+1 −AXk − Uk‖, and the precisionof this estimate decreases as Var [W ] increases. •

B. Upper Bound on the Deception Probability

We now derive an upper bound on the deception proba-bility (6) of any learning-based attack (2) where A in (2)is constructed using any arbitrary learning algorithm, forany measurable control policy, when A is distributed over asymmetric interval [−R,R]. Similar results can be obtained forother interval choices. Since the uniform distribution has thehighest entropy among all distributions with finite support [52,Ch. 12], we assume that A has a uniform prior over the interval[−R,R]. We further assume that the attacker knows thisdistribution (including the value of R), whereas the controllerknows the true realization of A (as before). The proof of thefollowing theorem is available in [50].

Theorem 2. Let A be distributed uniformly over [−R,R]for some R > 0, and consider any measurable controlpolicy Uk and any learning-based attack (2) with fictitioussensor reading power (8) that satisfies

√δβ ≤ R. Then,

the asymptotic deception probability, when using the variancetest (5), is bounded from above as

limT→∞

PTDec = P(|A− A| <√δβ) (12a)

≤ Λ ,I(A;ZL1 ) + 1

log(R/√δβ)

. (12b)

In addition, if for all k ∈ 1, . . . , L, A→ (Xk, Zk−11 )→ Uk

is a Markov chain, then for any sequence of probabilitymeasures QXk|Zk−1

1, such that for all k ∈ 1, . . . , L

PXk|Zk−11 QXk|Zk−1

1, we have

Λ ≤

∑Lk=1D

(PXk|Zk−1

1 ,A

∥∥∥QXk|Zk−11

∣∣∣PZk−11 ,A

)+ 1

log(R/√δβ) . (13)

Remark 4. By looking at the numerator in (12b), it follows thatthe bound on the deception probability becomes looser as theamount of information revealed about the open-loop gain A bythe observation ZL1 increases. On the other hand, by lookingat the denominator, the bound becomes tighter as R increases.This is consistent with the observation of Zames [54] thatsystem identification becomes harder as the uncertainty aboutthe open-loop gain of the plant increases. In our case, a largeruncertainty interval R corresponds to a poorer estimation ofA by the attacker, which leads, in turn, to a decrease in theachievable deception probability. The denominator can also beinterpreted as the intrinsic uncertainty of A when it is observedat resolution

√δβ, as it corresponds to the entropy of the

random variable A when it is quantized at such resolution. •In conclusion, Thm. 2 provides two upper bounds on the

deception probability. The first bound (12b) clearly shows thatincreasing the privacy of the open-loop gain A—manifestedin the mutual information between A and the state-and-control trajectory ZL1 during the exploration phase—reduces

the deception probability. The second bound (13) allows free-dom in choosing the auxiliary probability measure QXk|Zk−1

1,

making it a rather useful bound. For instance, by choosingQXk|Zk−1

1∼ N (0, σ2), for all k ∈ N, we can rewrite the upper

bound (13) in term of EP[(AXk−1 + Uk−1)2

]as follows. The

proof of the following corollary can be found in [50].

Corollary 1. Under the assumptions of Thm. 2, if for all k ∈1, . . . , L, A → (Xk, Z

k−11 ) → Uk is a Markov chain, then

asymptotic deception probability is bounded from above by

limT→∞

PTDec ≤ G(ZL1 ), (14a)

G(ZL1 ) ,log e2σ2

∑Lk=1 EP

[(AXk−1 + Uk−1)2

]+ 1

log(R/√δβ) . (14b)

Ex. 3 in App. A compares the lower and upper bounds onthe deception probability of Thm. 1 and Corol. 1, respectively.

C. Privacy-enhancing Signal

For a given duration of the learning phase L, to increase thesecurity of the system, at any time k the controller can adda privacy-enhancing signal Γk to an unauthenticated controlpolicy Uk|k ∈ N:

Uk = Uk + Γk , k ∈ N. (15)

We refer to such a control policy Uk as the authenticatedcontrol policy Uk. We denote the states of the system thatwould be generated if only the unauthenticated control sig-nal Uk1 were applied by Xk

1 , and the resulting trajectory—byZk1 , (Xk

1 , Uk1 ).

A numerical example that illustrates the effect of theprivacy-enhancing signal on the deception probability is avail-able in Ex. 4 in App. A.Remark 5. A “good” privacy-enhancing signal entails little in-crease in the control cost [55] compared to its unauthenticatedversion while providing enhanced detection probability (6)and/or false alarm probability. Finding the optimal privacy-enhancing signal is left for future research. Interestingly, ina follow-up work [56], Ziemann and Sandberg studied theoptimal control problem of linear systems regularized withFisher information, where the latter serves as a proxy to theestimation quality of A via the Cramer–Rao lower bound. •

One may envisage that superimposing any noisy signal Γkon top of the control policy Uk|k ∈ N would necessarilyenhance the detectability of any learning-based attack (2)since the observations of the attacker are in this case noisier.However, it turns out that injecting a strong noise for somelearning algorithm may, in fact, speed up the learning processas it improves the power of the signal magnified by theopen-loop gains with respect to the observed noise [43]. Anysignal Γk that satisfies the condition proposed in the followingcorollary, whose proof available in [50], will provide enhancedguarantees on the detection probability when the attacker usesany arbitrary learning algorithm to estimate the uniformlydistributed A over the symmetric interval [−R,R].

Corollary 2. For any control policy Uk|k ∈ N with tra-jectory Zk1 = (Xk

1 , Uk1 ) and its corresponding authenticated

6

control policy Uk1 (15) with trajectory Zk1 = (Xk1 , U

k1 ), under

the assumptions of Corollary 1, if for all k ∈ 2, . . . , L

EP[Ψ2k−1 + 2Ψk−1(AXk−1 + Uk−1)

]< 0, (16)

where Ψk−1 ,∑k−1j=1 A

k−1−jΓj , for any L ≥ 2, the followingmajorization of G (14b) holds:

G(ZL1 ) < G(ZL1 ). (17)

Example 1. In this example, we describe a class of privacy-enhancing signal that yield better guarantees on the decep-tion probability. For all k ∈ 2, . . . , L, clearly Ψk−1 =−(AXk−1 + Uk−1)/η satisfies the condition in (16) for anyη ∈ 2, . . . , L. Thus, by choosing the privacy-enhancingsignals Γ1 = −(AX1 +U1)/η, and Γk = −(AXk +Uk)/η−∑k−2j=1 A

k−1−jΓj for all k ∈ 3, . . . , L, (17) holds. •

IV. EXTENSION TO VECTOR SYSTEMS

We now generalize our results to vector systems. Considerthe networked control system depicted in Fig. 1, with the plantdynamics replaced by a vector plant:

Xk+1 = AXk + Uk + Wk, (18)

where Xk ∈ Rn×1, Uk ∈ Rn×1, A ∈ Rn×n, Wk ∈ Rn×1

represent the plant state, control input, open-loop gain of theplant, and plant disturbance, respectively, at time k ∈ N. Thecontroller, at time k, observes Yk and generates a controlsignal Uk as a function of Yk1 , and Yk = Xk at times k ∈ Nat which the attacker does not tamper the sensor reading. Weassume that the initial condition X0 has a known (to all parties)distribution and is independent of the disturbance sequenceWk. For analytical purposes, we further assume Wk is aprocess with i.i.d. multivariate Gaussian samples of zero meanand a covariance matrix Σ that is known to all parties. Withoutloss of generality, we assume that W0 = 0, E [X0] = 0, andtake U0 = 0.

We assume the attacker uses the vector analogue of learningbased attacks described in Sec. II-A where the attacker can useany learning algorithm to estimate the open-loop gain matrixA during the learning phase. The estimation A constructed bythe attacker at the conclusion of the learning phase is utilizedto construct the fictitious sensor readings Vk according tothe vector analogue of (2), where Wk|k = L, . . . , T − 1are i.i.d. multivariate Gaussian with zero mean and covariancematrix Σ.

Similar to the scalar case, for analytical purposes, weassume that the power of the fictitious sensor reading is equalto 1/β <∞, namely

limT→∞

1

T

T∑k=L+1

‖Vk‖2 =1

βa.s. w.r.t. PA . (19)

Since the zero-mean multivariate Gaussian distribution iscompletely characterized by its covariance matrix, we shallfollow [28] and test for anomalies in the latter. To that end,define the error matrix

∆ , Σ− 1

T

T∑k=1

[Yk+1 − AYk − Uk] [Yk+1 − AYk − Uk]†.

As in (5), we use a test that sets a confidence interval, withrespect to the norm, around the expected covariance matrix,i.e., it checks whether

‖∆‖op ≤ γ, (20)

at the test time T . For the sake of analysis, we use the operatornorm in (20), which satisfies the sub-multiplicativity property.

The following lemma provides a necessary and sufficientcondition for any learning-based attack [the vector analogueof (2)] to deceive the controller and remain undetected, fora multivariate plant (18) under a covariance test (20), in thelimit of T →∞; its proof is availalbe in [50].

Lemma 1. Consider the multivariate plant (18), with somelearning-based attack [analogous to (2)], with fictitious sen-sor reading power that satisfies (19), and some measurablecontrol policy Uk. Then, the attacker manages to deceivethe controller and remain undetected, under the covaraincetest (20), a.s. in the limit T →∞, if and only if

limT→∞

1

T

∥∥∥∥∥T∑

k=L+1

(A− A)VkV†k(A− A)†

∥∥∥∥∥op

≤ γ. (21)

Lemma 1 has the following important implication.

limT→∞

1

T

∥∥∥∥∥T∑

k=L+1

(A− A)VkV†k(A− A)†

∥∥∥∥∥op

(22a)

≤ limT→∞

1

T

T∑k=L+1

∥∥∥(A− A)Vk((A− A)Vk)†∥∥∥op

(22b)

≤ limT→∞

1

T

T∑k=L+1

∥∥∥(A− A)Vk∥∥∥2

op(22c)

≤∥∥∥A− A

∥∥∥2

op/β , (22d)

where (22b) follows from the triangle inequality, (22c) and(22d) follow from the sub-multiplicativity of the operator, theidentity ‖Vk‖ = ‖Vk‖op and by putting the power constraint(19) into force.

If ‖A − A‖2op ≤ γβ, (21) holds in the limit of T → ∞,then the attacker is able to deceive the controller and remainundetected a.s., by Lemma 1. Equation (22) then implies thatthe norm of the estimation error, ‖A−A‖op, dictates the easewith which an attack can go undetected. This is used next todevelop a lower bound on the deception probability.

A. Lower Bound on the Deception Probability

We start by observing that in the case of multivariatesystems, and in contrast to their scalar counterparts, somecontrol actions might not reveal the entire plant dynamics,and in this case the attacker might not be able to learn theplant completely. This phenomenon is captured by the per-sistent excitation property of control inputs, which describescontrol-action signals that are sufficiently rich to excite all thesystem modes that will allow to learn them. While avoidingpersistently exciting control inputs can be used as a way tosecure the system against learning-based attacks, here, we use

7

a probabilistic variant of this property [54], [57], and assumethat the control policy satisfies it in our lower bound on thedeception probability.

Definition 3 (Persistent excitation). Given a plant (18), ζ >0, and ρ ∈ [0, 1], the control policy Uk is (ζ, ρ)-persistentlyexciting if there exists a time L0 ∈ N such that, for all τ ≥ L0,

PA(

1

τGτ ζIn×n

)≥ ρ, (23)

where Gτ is the sum of the state Gramians up to time τ :

Gτ ,τ∑k=1

XkX†k. (24)

As in Sec. III-A, to find a lower bound on the deceptionprobability P A,T

Dec , we consider a specific estimate of A, ob-tained via the LS estimation algorithm, analogous to (9), atthe conclusion of the first phase by the attacker:

A =

0n×n, det(GL−1) = 0;L−1∑k=1

(Xk+1 − Uk)X†kG−1L−1, otherwise,

(25)

where 0k×` denotes an all zero matrix of dimensions k × `.We prove next an upper bound on the estimation-error norm,

‖A−A‖op, of the above LS algorithm (25), and use it to extendthe bound in (10) to the vector case. A formal proof is availablein [50].

Lemma 2. Consider the vector plant (18). If the attackerconstructs A using LS estimation (25), and the controller usesa policy Uk for which the event in (23) occurs for τ = L−1,that is GL−1/(L− 1) ζIn×n. Then,

‖A− A‖op ≤1

ζL

L−1∑k=1

‖WkX†k‖op a.s. w.r.t. PA . (26)

The following theorem provides a lower bound on thedeception probability and it proof can be found in [50].

Theorem 3. Consider the plant (18) with a (ζ, ρ)-persistentlyexciting control policy Uk from time L0, and any learning-based attack learning-based attack [the vector analogueof (2)] with fictitious sensor reading power that satisfies (19)and with a learning phase of duration L ≥ L0 + 1. Then, theasymptotic deception probability, when using the covaraincetest (20), is bounded from below as

limT→∞

P A,TDec ≥ PA

(‖A− A‖op <

√γβ)

(27a)

≥ ρPA

(1

ζL

L−1∑k=1

‖WkX†k‖op <√γβ

). (27b)

Remark 6. The bound (10c) for scalar systems, which isindependent of the control policy and state value, has beendeveloped using the concentration bounds of [43] for the scalarLS algorithm (9). To the best of our knowledge there are nosimilar concentration bounds for the vector variant of the LSalgorithm (9) which work for any A, and a large class ofcontrol policies. Looking for such bounds seems an interestingresearch venue. •

Two numerical examples for vector systems are presentedin App. B. Ex. 5 compares the empirical success rate of thelearning-based attack with that of the replay attack. Ex. 6investigates the effect of privacy-enhancing signal on theempirical deception probability of the learning-based attackson the vector systems.

V. EXTENSION TO NONLINEAR SYSTEMS

In the previous sections, we focused on linear systems,where, for finding a lower bound on the deception probabilityof the learning-based attack, the LS algorithm has beenutilized. We now extend this treatment to nonlinear systems.

Finding the optimal solution of nonlinear LS problemsis difficult in general. Consequently, different approaches tothis problem have been introduced across the years. Promi-nent now-classical solutions include the Gauss–Newton andLevenberg–Marquardt algorithms [58]. For richer nonlineardynamics families, more sophisticated learning algorithmshave been recently proposed, such as Gaussian processes (GP)regression [41], [46], [59]–[61] and deep neural networks [42],[62]; in this section, we concentrate on GP inference.

To derive analytical results, we need to restrict the classof possible dynamics. Similarly to our treatment for linearsystems, we follow the frequentist approach.1 We work withthe natural counterpart of GP—the Reproducing Kernel HilbertSpace (RKHS) [46], [63]. To that end, we consider thefollowing scalar nonlinear system

Xk+1 = f(Zk) +Wk, (28)

where Zk = (Xk, Uk) ∈ R × U ⊆ R2; the plant disturbanceprocess Wk has i.i.d. Gaussian samples of zero mean andvariance σ2; f : X × U → R belongs to the class of RKHSfunctions spanned by the symmetric bilinear positive-definitekernel K(Z,Z ′) that specifies the RKHS.

Unlike the linear case, where the attacker needs to learnfinitely many parameters, in the case of RKHS functions,in general, there are infinitely many parameters to learn.2 Inthis case, to have a positive asymptotic deception probability,the duration of learning phase should be sufficiently large.Namely, in our analysis, we assume T = L + c, wherec ≥ 1 is a positive constant, and we investigate the deceptionprobability in the limit of L→∞.

Theorem 4. For T = L + c, the asymptotic deception prob-ability of the nonlinear analogue of (2), under the variancetest, is bounded from below by

limL→∞

P f,L+cDec ≥ lim

L→∞p

L+c∏k=L+1

(1− ξk) (29)

for all f in the RKHS, where 0 ≤ ξk ≤ 1 is defined as

ξk , e1−(

νk − χ4σσL(Vk, Uk)

)2

+ 12

∑L−1k=1 ln(1+σ−2σ2

k−1(Zk))

, (30)

1We assume the attacker uses GP regression to construct the learningalgorithm despite the function f governing the system dynamics is assumedto be non-random (cf. [63]).

2In fact, having a rich nonlinear dynamics can be used as a way to securethe system against learning-based attacks (cf. Sec. VI-E)

8

and νk is a sequence of non-negative reals such that

limL→∞

1

L+ c

L+c∑k=L+1

ν2k +

2

L+ c

L+c∑k=L+1

|Wk|νk ≤ δ (31)

holds with probability p.

For more details and a proof, see [50].

VI. DISCUSSION AND FUTURE WORK

A. Upper Bound on the Deception Probability

The upper bound in (22), which relates the deceptioncriterion (21) to the estimation error ‖A−A‖op, is used to findthe lower bound (27). Finding a corresponding lower bound interm of ‖A− A‖op for (22a) is the first step in extending ourresults in Thm. 2 to vector systems. Finding an upper bound onthe deception probability for vector linear and scalar nonlinearsystems, where the attacker can use any learning algorithm, isleft open for future work.

B. Model-based vs. Model-free

In this work, we mainly concentrated on linear systems; weassumed the attacker constructs the fictitious sensor reading,in a model-based manner, according to the linear model (2)and its vector variants. In general, as discussed in Sec. V, thesystem might be nonlinear, and the attacker might not be awareof the linearity or non-linearity of the dynamics. Comparingthe deception probability for the vast range of model-free andmodel-based learning methods [40], [44], [64] is an interestingresearch venue.

C. Continuous Learning and Hijacking

In this work, we assumed two disjoint consecutive phases(recall Sec. II-A): learning and hijacking, which are akin to theexploration and exploitation phases of reinforcement learning(RL) [65]. Indeed, in this two-phase process, the attackerexplores the system until it reaches a desired deceptionprobability and then moves to the exploitation phase duringwhich it drives the system to instability as quickly as it can.The two phases are completely separated due to the inherenttension between them: exploiting the system without properlyexploring it during the learning (silent) phase increases thechances of being detected.

Despite the naturalness of two-phase attacks, just like inRL [65], one may consider more general strategies whereexploration and exploitation are intertwined and gradual: astime elapses, the attacker can gain better estimates of theobserved system and gradually increase its detection-limitedattack. In these terms, our two-phase attack can be regarded asa two-stage approximation of the gradual attack and providesachievability bounds for such attacks. Studying more generalattacks is a research venue that is left for future study.

D. Oblivious Controller

A more realistic scenario is the one in which neither theattacker nor the controller are aware of the open-loop gain of

the plant. In this scenario, both parties strive to learn the open-loop gain—posing two conflicting objectives to the controller,who, on the one hand, wishes to speed up its own learningprocess, while, on the other hand, wants to slow down thelearning process of the attacker.

In such a situation, standard adaptive control methods areclearly insufficient, as no asymmetry between the attacker andthe controller can be achieved under the setting of Sec. II. Tocreate a security leverage over the attacker, the controller needsto utilize a judicious privacy-enhancing signal: A properly de-signed privacy-enhancing signal should enjoy a positive doubleeffect by facilitating the learning process of the controllerwhile hindering that of the attacker at the same time. Notethat, in such a scenario, while the controller knows both Ukand Γk, the attacker is cognizant of only their sum (15)—Uk.This is reminiscent of strategic information transfer [66].

Finally, we note that, unless the controller is able to detectan MITM attack (the attacker’s hijacking phase), its learningprocess will be hampered by the fictitious signal that isgenerated according to the virtual system of the attacker (2).

E. Moving Target Defense

In our setup the attacker has full access to the controlsignal (see Fig. 1), and at time k + 1 the attacker uses thecontrol input Uk to construct the fictitious sensor readingVk+1 according to (2). Thus, the watermarking signal [25],the private random signature, might not be an effective wayto counter the learning based attacks. Here, we introduced theprivacy-enhancing signal (15) to impede the learning processof the attacker and decrease the deception probability. Anothertechnique that has been developed in the literature to counterattacks, where the attacker has full system knowledge, ishaving the controller covertly introduce virtual state variablesunknown to the attacker but that are correlated with theordinary state variables, so that modification of the originalstates will impact the extraneous states. These extraneousstates can act as a moving target [34]–[37] for the attacker. Asimilar technique is the so-called baiting, which adds an offsetto the system dynamic [38], [39]. In practice, this techniquebreaks the information symmetry between the attacker—whichhas the full system knowledge, and the controller. Using suchdefense techniques to hamper the learning process of ourproposed attacker, is an interesting research venue. In thiscontext, the controller, by potentially sacrificing the optimallyof the control task, can act in an adversarial learning setting.Assuming that the control can covertly introduce a virtualpart to the dynamics, for any given duration of the learningphase L (see Fig. 1), sufficiently fast changes in the cipher partof the dynamic can drastically hamper the learning processof the attacker. Also, as discussed in Sec. V, adding a richnonlinearity to the dynamics can be used as a way to securethe system against learning-based attacks.

F. Further Future Directions

Other future directions can explore the extension of theestablished results to partially-observable linear vector systemswhere the control (actuation) gain is unknown, characterising

9

securable and unsecurable subspaces [67] for learning-basedattacks, revising the attacker full access to both sensor andcontrol signals, designing optimal privacy-enhancing signals(recall Remark 5) for linear and nonlinear systems, andstudying the relation between our proposed privacy-enhancingsignal with the noise signal utilized to achieve differentialprivacy [68]. Further, since the controller does not know theexact time instant at which an attack might occur, a morerealistic scenario would be that of continual testing, i.e., thatin which the integrity of the system is tested at every timestep and where the false alarm and deception probabilities aredefined with a union across time. We leave this treatment forfuture research.

REFERENCES

[1] M. J. Khojasteh, A. Khina, M. Franceschetti, and T. Javidi, “Authenti-cation of cyber-physical systems under learning-based attacks,” IFAC-PapersOnLine, vol. 52, no. 20, pp. 369–374, 2019.

[2] B. Kehoe, S. Patil, P. Abbeel, and K. Goldberg, “A survey of researchon cloud robotics and automation,” IEEE Tran. on auto. sci. and eng.,vol. 12, no. 2, pp. 398–409, 2015.

[3] D. I. Urbina, J. A. Giraldo, A. A. Cardenas, N. O. Tippenhauer,J. Valente, M. Faisal, J. Ruths, R. Candell, and H. Sandberg, “Lim-iting the impact of stealthy attacks on industrial control systems,” inProceedings of the 2016 ACM SIGSAC Conference on Computer andCommunications Security, 2016, pp. 1092–1105.

[4] S. M. Dibaji, M. Pirani, D. B. Flamholz, A. M. Annaswamy, K. H.Johansson, and A. Chakrabortty, “A systems and control perspective ofCPS security,” Ann. Rev. in Cont., 2019.

[5] M. Jamei, E. Stewart, S. Peisert, A. Scaglione, C. McParland, C. Roberts,and A. McEachern, “Micro synchrophasor-based intrusion detection inautomated distribution systems: Toward critical infrastructure security,”IEEE Int. Comp., vol. 20, no. 5, pp. 18–27, 2016.

[6] H. Sandberg, S. Amin, and K. H. Johansson, “Cyberphysical security innetworked control systems: An introduction to the issue,” IEEE Cont.Magazine, vol. 35, no. 1, pp. 20–23, 2015.

[7] C.-Z. Bai, F. Pasqualetti, and V. Gupta, “Data-injection attacks instochastic control systems: Detectability and performance tradeoffs,”Automatica, vol. 82, pp. 251–260, 2017.

[8] V. Dolk, P. Tesi, C. De Persis, and W. Heemels, “Event-triggered controlsystems under denial-of-service attacks,” IEEE Trans. Cont. NetworkSys., vol. 4, no. 1, pp. 93–105, 2017.

[9] Y. Shoukry, M. Chong, M. Wakaiki, P. Nuzzo, A. Sangiovanni-Vincentelli, S. A. Seshia, J. P. Hespanha, and P. Tabuada, “SMT-basedobserver design for cyber-physical systems under sensor attacks,” ACMTrans. on CPS, vol. 2, no. 1, p. 5, 2018.

[10] Y. Chen, S. Kar, and J. M. Moura, “Cyber-physical attacks with controlobjectives,” IEEE Trans. Auto. Control, vol. 63, no. 5, pp. 1418–1425,2018.

[11] D. Shi, Z. Guo, K. H. Johansson, and L. Shi, “Causality countermeasuresfor anomaly detection in cyber-physical systems,” IEEE Trans. Auto.Control, vol. 63, no. 2, pp. 386–401, 2018.

[12] S. Dibaji, M. Pirani, A. Annaswamy, K. H. Johansson, andA. Chakrabortty, “Secure control of wide-area power systems: Confi-dentiality and integrity threats,” in Proc. IEEE Conf. Decision and Cont.(CDC), Miami Beach, FL, USA, 2018, pp. 7269–7274.

[13] R. Tunga, C. Murguia, and J. Ruths, “Tuning windowed chi-squareddetectors for sensor attacks,” in American Control Conf. (ACC), Mil-waukee, WI, USA, 2018, pp. 1752–1757.

[14] L. Niu, J. Fu, and A. Clark, “Minimum violation control synthesis oncyber-physical systems under attacks,” in Proc. IEEE Conf. Decisionand Cont. (CDC), Miami Beach, FL, USA, pp. 262–269.

[15] M. S. Chong, H. Sandberg, and A. M. Teixeira, “A tutorial introductionto security and privacy for cyber-physical systems,” in 2019 Europ. Cont.Conf. (ECC), pp. 968–978.

[16] I. Tomic, M. J. Breza, G. Jackson, L. Bhatia, and J. A. McCann, “Designand evaluation of jamming resilient cyber-physical systems,” in IEEEInter. Conf. on Internet of Things (iThings) and IEEE Gr. Comput. andCommu. (GreenCom) and IEEE Cyber. Phy. and So. Comp. (CPSCom)and IEEE Sm. Data (SmartData), 2018, pp. 687–694.

[17] K. Ding, X. Ren, D. E. Quevedo, S. Dey, and L. Shi, “DOS attackson remote state estimation with asymmetric information,” IEEE Tran.Cont. Net. Sys., vol. 6, no. 2, pp. 653–666, 2018.

[18] A. Teixeira, I. Shames, H. Sandberg, and K. H. Johansson, “A se-cure control framework for resource-limited adversaries,” Automatica,vol. 51, pp. 135–148, 2015.

[19] M. Xue, S. Roy, Y. Wan, and S. K. Das, “Security and vulnerability ofcyber-physical,” Handbook on securing cyber-physical critical infras-tructure, p. 5, 2012.

[20] A. Cetinkaya, H. Ishii, and T. Hayakawa, “Networked control underrandom and malicious packet losses,” IEEE Trans. Auto. Control, vol. 62,no. 5, pp. 2434–2449, 2017.

[21] P. N. Brown, H. P. Borowski, and J. R. Marden, “Security against im-personation attacks in distributed systems,” IEEE Trans. Cont. NetworkSys., vol. 6, no. 1, pp. 440–450, 2018.

[22] Y. W. Law, T. Alpcan, and M. Palaniswami, “Security games for riskminimization in automatic generation control,” IEEE Trans. Power Sys.,vol. 30, no. 1, pp. 223–232, 2014.

[23] I. Shames, F. Farokhi, and T. H. Summers, “Security analysis of cyber-physical systems using H2 norm,” IET Cont. The. App., vol. 11, no. 11,pp. 1749–1755, 2017.

[24] R. S. Smith, “A decoupled feedback structure for covertly appropriatingnetworked control systems,” IFAC Proceedings Volumes, vol. 44, no. 1,pp. 90–95, 2011.

[25] Y. Mo, S. Weerakkody, and B. Sinopoli, “Physical authentication of con-trol systems: designing watermarked control inputs to detect counterfeitsensor outputs,” IEEE Cont. Magazine, vol. 35, no. 1, pp. 93–109, 2015.

[26] M. Zhu and S. Martınez, “On the performance analysis of resilientnetworked control systems under replay attacks,” IEEE Trans. Auto.Control, vol. 59, no. 3, pp. 804–808, 2014.

[27] F. Miao, M. Pajic, and G. J. Pappas, “Stochastic game approach forreplay attack detection,” in Proc. IEEE Conf. Decision and Cont. (CDC),Florence, Italy, 2013, pp. 1854–1859.

[28] B. Satchidanandan and P. R. Kumar, “Dynamic watermarking: Activedefense of networked cyber–physical systems,” Proc. IEEE, vol. 105,no. 2, pp. 219–240, 2017.

[29] C. Fang, Y. Qi, P. Cheng, and W. X. Zheng, “Cost-effective watermarkbased detector for replay attacks on cyber-physical systems,” in 2017Asian Cont. Conf. (ASCC), 2017, pp. 940–945.

[30] P. Hespanhol, M. Porter, R. Vasudevan, and A. Aswani, “Statisticalwatermarking for networked control systems,” in American ControlConf. (ACC), Milwaukee, WI, USA, 2018, pp. 5467–5472.

[31] M. Hosseini, T. Tanaka, and V. Gupta, “Designing optimal watermarksignal for a stealthy attacker,” in 2016 Europ. Cont. Conf. (ECC), pp.2258–2262.

[32] A. Ferdowsi and W. Saad, “Deep learning for signal authentication andsecurity in massive internet-of-things systems,” IEEE Tran. on Comm.,vol. 67, no. 2, pp. 1371–1387, 2018.

[33] H. Liu, J. Yan, Y. Mo, and K. H. Johansson, “An on-line design ofphysical watermarks,” in Proc. IEEE Conf. Decision and Cont. (CDC),Miami Beach, FL, USA, pp. 440–445.

[34] S. Weerakkody and B. Sinopoli, “Detecting integrity attacks on controlsystems using a moving target approach,” in Proc. IEEE Conf. Decisionand Cont. (CDC), Osaka, Japan, 2015, pp. 5820–5826.

[35] A. Kanellopoulos and K. G. Vamvoudakis, “A moving target defensecontrol framework for cyber-physical systems,” IEEE Trans. Auto.Control, 2019.

[36] Z. Zhang, R. Deng, D. K. Yau, P. Cheng, and J. Chen, “Analysis ofmoving target defense against false data injection attacks on power grid,”IEEE Trans. Inf. Forensics and Security, 2019.

[37] P. Griffioen, S. Weerakkody, and B. Sinopoli, “An optimal design ofa moving target defense for attack detection in control systems,” inAmerican Control Conf. (ACC), Philadelphia, PA, 2019, pp. 4527–4534.

[38] D. B. Flamholz, A. M. Annaswamy, and E. Lavretsky, “Baiting fordefense against stealthy attacks on cyber-physical systems,” in AIAAScitech 2019 Forum, 2019, p. 2338.

[39] A. Hoehn and P. Zhang, “Detection of covert attacks and zero dynamicsattacks in cyber-physical systems,” in American Control Conf. (ACC),Boston, MA, USA, 2016, pp. 302–307.

[40] S. Dean, H. Mania, N. Matni, B. Recht, and S. Tu, “On the sample com-plexity of the linear quadratic regulator,” Foundations of ComputationalMathematics, Aug 2019.

[41] M. Deisenroth and C. E. Rasmussen, “PILCO: A model-based and data-efficient approach to policy search,” in Int. Conf. on Machine Learning(ICML), 2011, pp. 465–472.

10

[42] Y. Gal, R. McAllister, and C. E. Rasmussen, “Improving PILCO withBayesian neural network dynamics models,” in Data-Efficient MachineLearning workshop, ICML, vol. 4, 2016.

[43] A. Rantzer, “Concentration bounds for single parameter adaptive con-trol,” in American Control Conf. (ACC), Milwaukee, WI, USA, 2018,pp. 1862–1866.

[44] S. Tu and B. Recht, “Least-squares temporal difference learning for thelinear quadratic regulator,” in Int. Conf. on Machine Learning (ICML),2018, pp. 5005–5014.

[45] T. Sarkar and A. Rakhlin, “Near optimal finite time identification ofarbitrary linear dynamical systems,” in Int. Conf. on Machine Learning(ICML), 2019, pp. 5610–5618.

[46] F. Berkenkamp, M. Turchetta, A. Schoellig, and A. Krause, “Safe model-based reinforcement learning with stability guarantees,” in Adv. neu. inf.proc. sys., 2017, pp. 908–918.

[47] J. F. Fisac, A. K. Akametalu, M. N. Zeilinger, S. Kaynama, J. Gillula,and C. J. Tomlin, “A general safety framework for learning-based controlin uncertain robotic systems,” IEEE Trans. Auto. Control, 2018.

[48] M. J. Khojasteh, V. Dhiman, M. Franceschetti, and N. Atanasov,“Probabilistic safety constraints for learned high relative degree systemdynamics,” PMLR, 2020, to appear.

[49] Y. Yuan and Y. Mo, “Security in cyber-physical systems: Controllerdesign against known-plaintext attack,” in Proc. IEEE Conf. Decisionand Cont. (CDC), Osaka, Japan, pp. 5814–5819.

[50] M. J. Khojasteh, A. Khina, M. Franceschetti, and T. Javidi,“Learning-based attacks in cyber-physical systems,” arXiv preprintarXiv:1809.06023, 2018.

[51] B. Efron and T. Hastie, Computer Age Statistical Inference: Algorithms,Evidence, and Data Science, ser. Institute of Mathematical StatisticsMonographs. Cambridge University Press, 2016.

[52] T. M. Cover and J. A. Thomas, Elements of information theory. JohnWiley & Sons, 2012.

[53] R. Durrett, Probability: theory and examples. Cambridge universitypress, 2010.

[54] M. Raginsky, “Divergence-based characterization of fundamental limita-tions of adaptive dynamical systems,” in Proc. Allerton Conf. on Comm.,Control, and Comput., Monticello, IL, USA, 2010, pp. 107–114.

[55] D. P. Bertsekas, “Reinforcement learning and optimal control,” AthenaScientific, 2019.

[56] I. Ziemann and H. Sandberg, “Parameter privacy versus control per-formance: Fisher information regularized control,” in American ControlConf. (ACC), Denver, CO, USA.

[57] M. Duflo, Random iterative models. Springer Science & BusinessMedia, 2013, vol. 34.

[58] D. W. Marquardt, “An algorithm for least-squares estimation of nonlinearparameters,” Jou. of the soc. for Indus. and App. Math., vol. 11, no. 2,pp. 431–441, 1963.

[59] J. Umlauft, T. Beckers, M. Kimmel, and S. Hirche, “Feedback lineariza-tion using gaussian processes,” in Proc. IEEE Conf. Decision and Cont.(CDC), Melbourne, Australia, 2017, pp. 5249–5255.

[60] N. Srinivas, A. Krause, S. M. Kakade, and M. W. Seeger, “Information-theoretic regret bounds for gaussian process optimization in the banditsetting,” IEEE Trans. Inf. Theory, vol. 58, no. 5, pp. 3250–3265, 2012.

[61] S. Shekhar, T. Javidi, et al., “Gaussian process bandits with adaptivediscretization,” Electronic Journal of Statistics, vol. 12, no. 2, pp. 3829–3874, 2018.

[62] S. Chen, K. Saulnier, N. Atanasov, D. D. Lee, V. Kumar, G. J. Pap-pas, and M. Morari, “Approximating explicit model predictive controlusing constrained neural networks,” in American Control Conf. (ACC),Milwaukee, WI, USA, 2018, pp. 1520–1527.

[63] S. R. Chowdhury and A. Gopalan, “On kernelized multi-armed ban-dits,” in Proceedings of the 34th International Conference on MachineLearning-Volume 70. JMLR. org, 2017, pp. 844–853.

[64] V. G. Satorras, Z. Akata, and M. Welling, “Combining generative anddiscriminative models for hybrid inference,” in Adv. in Neu. Info. Proc.Sys., 2019, pp. 13 802–13 812.

[65] R. S. Sutton and A. G. Barto, Reinforcement learning: An introduction.MIT press Cambridge, 1998, vol. 1, no. 1.

[66] E. Akyol, C. Langbort, and T. Basar, “Privacy constrained informationprocessing,” in Proc. IEEE Conf. Decision and Cont. (CDC), Osaka,Japan, pp. 4511–4516.

[67] B. Satchidanandan and P. Kumar, “Control systems under attack: Thesecurable and unsecurable subspaces of a linear stochastic system,” inEmerging Apps. Cont. and Sys. Theory. Springer, 2018, pp. 217–228.

[68] J. Cortes, G. E. Dullerud, S. Han, J. Le Ny, S. Mitra, and G. J. Pappas,“Differential privacy in control and network systems,” in Proc. IEEEConf. Decision and Cont. (CDC), Las Vegas, NV, 2016, pp. 4252–4272.

[69] T. L. Lai and C. Z. Wei, “Least squares estimates in stochastic regressionmodels with applications to identification and control of dynamicsystems,” The Annals of Statistics, pp. 154–166, 1982.

[70] J. C. Duchi and M. J. Wainwright, “Distance-based and continuum fanoinequalities with applications to statistical estimation,” arXiv preprintarXiv:1311.2669, 2013.

[71] F. Zhang, Matrix theory: basic results and techniques. Springer Sci.& Bus. Med., 2011.

[72] C. K. Williams and C. E. Rasmussen, Gaussian processes for machinelearning. MIT press Cambridge, MA, 2006, vol. 2, no. 3.

Mohammad Javad Khojasteh (S’14) did his under-graduate studies at Sharif University of Technologyfrom which he received double-major B.Sc. degreesin Electrical Engineering and in Pure Mathematics,in 2015. He received the M.Sc. and Ph.D. degreesin Electrical and Computer Engineering from Uni-versity of California San Diego (UCSD), La Jolla,CA, in 2017, and 2019, respectively. Currently, heis a Postdoctoral Scholar in at California Institute ofTechnology, Pasadena, CA.

Anatoly Khina (S’08–M’17) is a Senior Lecturerin the School of Electrical Engineering, Tel AvivUniversity, from which he holds B.Sc. (2006), M.Sc.(2010), and Ph.D. (2016) degrees, all in ElectricalEngineering. He was a Postdoctoral Scholar in theDepartment of Electrical Engineering, California In-stitute of Technology, from 2015 to 2018, and a Re-search Fellow at the Simons Institute for the Theoryof Computing, University of California, Berkeley,during the Spring of 2018.

Massimo Franceschetti (M’98–SM’11–F’18) re-ceived the Laurea degree (with highest honors) incomputer engineering from the University of Naples,Naples, Italy, in 1997, the M.S. and Ph.D. degrees inelectrical engineering from the California Institute ofTechnology, in 1999, and 2003, respectively. He isProfessor of Electrical and Computer Engineering atthe University of California at San Diego (UCSD).Before joining UCSD, he was a postdoctoral scholarat the University of California at Berkeley for twoyears. He has held visiting positions at the Vrije

Universiteit Amsterdam, the Ecole Polytechnique Federale de Lausanne,and the University of Trento. His research interests are in physical andinformation-based foundations of communication and control systems. He wasawarded the C. H. Wilts Prize in 2003 for best doctoral thesis in electricalengineering at Caltech; the S.A. Schelkunoff Award in 2005 for best paperin the IEEE Transactions on Antennas and Propagation, a National ScienceFoundation (NSF) CAREER award in 2006, an Office of Naval Research(ONR) Young Investigator Award in 2007, the IEEE Communications SocietyBest Tutorial Paper Award in 2010, and the IEEE Control theory societyRuberti young researcher award in 2012. He has been elected fellow of theIEEE in 2018 and became a Guggenheim fellow for the natural sciences:engineering in 2019.

Tara Javidi (S’-96–M’02) studied electrical engi-neering at Sharif University of Technology, from1992 to 1996. She received her MS degrees inelectrical engineering (systems), and in appliedmathematics (stochastics) from the University ofMichigan, Ann Arbor. She received her Ph.D. inelectrical engineering and computer science fromthe University of Michigan, Ann Arbor, in May2002. From 2002 to 2004, Tara was an assistantprofessor of electrical engineering at the Universityof Washington, Seattle. She joined University of

California, San Diego, in 2005, where she is currently a professor of electricaland computer engineering. She is a member of the Information Theory SocietyBoard of Governors and a Distinguished Lecturer of the IEEE InformationTheory Society (2017/18). Tara Javidi is a founding co-director of the Centerfor Machine-Integrated Computing and Security, a founding faculty memberof Halicioglu Data Science Institute (HDSI) at UCSD.

11

Fig. 2: The attacker’s success rate P a,TDec versus the size of thedetection window T .

APPENDIX ASIMULATIONS FOR SCALAR SYSTEMS

Example 2. In this example, we compare the empiricalperformance of the variance-test with our developed boundin Thm. 1. At every time T , the controller tests the empiricalvariance for anomalies over a detection window [1, T ], usinga confidence interval 2δ > 0 around the expected variance (5).Here, a = 1, δ = 0.1, Uk = −0.88aYk for all 1 ≤ k ≤ T =800, and Wk are i.i.d. Gaussian N (0, 1), and 500 Monte-Carlo simulations were performed.

The learning-based attacker (2) uses the LS algorithm (9) toestimate a and, as illustrated in Fig. 2, the attacker’s successrate increases as the duration of the learning phase L increases.This is in agreement with (10c) since the attacker can improveits estimate of a and the estimation error |A − a| reduces asL increases. As discussed in Sec. II-C, the false alarm ratedecays to zero as the size of the detection window T tendsto infinity. Hence, for a sufficiently large detection windowsize, the attacker’s success rate could potentially tend to one.Indeed, such behavior is observed in Fig. 2 for a learning-basedattacker (2) with L = 400.

Fig. 2 also illustrates that our learning-based attack outper-forms the replay attack. A replay attack with a recording lengthof L = 20 and a learning-based attack with a learning phaseof length L = 20 are compared, and the success rate of thereplay attack saturates at a lower value. Moreover, a learning-based attack with a learning phase of length L = 8 has a highersuccess rate than a replay attack with a larger recording lengthof L = 20. •

Example 3. Thm. 1 provides a lower bound on the deceptionprobability given A = a. Hence, by applying the law oftotal probability w.r.t. the PDF fA as in (7), we can applythe result of Thm. 1 to provide a lower bound also on theaverage deception probability for a random open-loop gain A.In this context, Fig. 3 compares the lower and upper boundson the deception probability provided by Thm. 1 and Corol. 1,augmented with the trivial cases of zero and one probability,namely max0, 1 − (2/(1 + δβ)L/2), and min1, G(ZL1 ),where A is distributed uniformly over [−0.9, 0.9]. Equation

Fig. 3: Comparison of the lower and upper bounds on thedeception probability, of Thm. 1 and Corol. 1, respectively.

Fig. 4: The attacker’s success rate P a,TDec versus the duration ofthe learning phase L.

(14a) is valid when the control input is not a function ofrandom variable A; hence, we assumed Uk = −0.045Yk for alltime k ∈ N. Here δ = 0.1, Wk are i.i.d. Gaussian with zeromean and variance of 0.16, and for simplicity, we let β = 1.1.Although in general the attacker’s estimation of the randomopen-loop gain A and consequently the power of fictitioussensor reading (8) vary based on the learning algorithm andthe realization of A, the comparison of the lower and upperbounds in Fig. 3 is restricted to a fixed β. 2000 Monte Carlosimulations were performed.

Example 4. Here, the attacker uses the LS algorithm (9),the detector uses the variance test (5), a = 1, T = 600,δ = 0.1, and Wk are i.i.d. Gaussian N (0, 1). We comparethe attacker’s success rate, the empirical P a,TDec , as a functionof the duration L of the learning phase for three differentcontrol policies: I) unauthenticated control signal Uk1 = −aYkfor all k, II) authenticated control signal (15), where Γk arei.i.d. Gaussian N (0, 9), III) authenticated control signal (15),where Γk are i.i.d. Gaussian N (0, 16). As illustrated in Fig. 4,for the authenticated and unauthenticated control signals, theattacker’s success rate increases as the duration of the learningphase increases. This is in agreement with (10c) since theattacker can improve its estimate of a as L increases. Also, fora fixed L the attacker performance deteriorates as the power of

12

Fig. 5: The attacker’s success rate P A,TDec versus the size of the

detection window T .

privacy-enhancing signal Γk increases. Namely, Γk hampersthe learning process of the attacker and the estimation error|A − a| increases as the power of privacy-enhancing signalincreases. 500 Monte Carlo simulations were performed. •

APPENDIX BSIMULATIONS FOR VECTOR SYSTEMS

Example 5. In this example, we compare the empiricalperformance of the covariance test against the learning-basedattack which utilizes LS estimation (25), and the replayattack. At every time k, the controller tests the empiricalcovariance for anomalies over a detection window [1, T ], usinga confidence interval 2γ > 0 around the operator norm oferror matrix ∆ (20). Since we are considering the Euclideannorm for vectors, the induced operator norm amounts to‖∆‖op =

√λmax(∆†∆). Here, γ = 0.1, Uk = −0.9AYk

for all 1 ≤ k ≤ T = 600,

A =

(1 23 4

), Σ =

(1 00 2

). (32)

Fig. 5 presents the performance averaged over 180 runs of aMonte-Carlo simulation. It illustrates that the vector variant ofour learning-based attack also outperforms the replay attack. Alearning-based attack with a learning phase of length L = 40has a higher success rate than a replay attack with a largerrecording length of L = 50. Similarly to the discussion forscalar systems in Sec. II-C, the false-alarm rate decays to zeroas the size of the detection window T tends to infinity. Thus,the success rate of learning-based attacks increases as the sizeof detection window increases. Finally, as illustrated in Fig. 5,the attacker’s success rate increases as the duration of thelearning phase L increases, since the attacker improves itsestimate of A as L increases. •

Example 6. Consider the vector-plant setting with a privacy-enhancing signal (cf. Sec. III-C) and its yielded enhanceddetection probability. As in Example 5, we assume that thecontroller uses the empirical covariance test (20) and that theattacker utilizes LS estimation (25). Again, the false alarm ratedecays to zero as the detection window size T goes to infinity.Here, γ = 0.1, A is as in (32), and Σ = I2.

Fig. 6: The attacker’s success rate P A,TDec versus the size of the

detection window T .

Fig. 6 compares the attacker’s success rate, namely, theempirical P A,T

Dec , as a function of size of the detection win-dow T for two different control policies, averaged over 200runs of a Monte-Carlo simulation: I) Unauthenticated controlUk1 = −AYk for all 1 ≤ k ≤ T = 600, II) The vectoranalogue of the authenticated control signal of (15), whereΓk are i.i.d. zero-mean Gaussian with a diagonal covariancematrix with diagonal

(12, 10

). As is evident from Fig. 6, the

privacy-enhancing signal Γk hampers the learning process ofthe attacker consequently reduces its deception probability. •

13

APPENDIX CPROOF OF THEOREM 1

We break the proof of Thm. 1 into several lemmas that arestated and proved next.

In the case of any learning-based attack (2), in the limit ofT →∞, the empirical variance, which is used in the variancetest (5), can be expressed in terms of the estimation error ofthe open-loop gain as follows.

Lemma 3. Consider any learning-based attack (2) withfictitious sensor reading power that satisfies (8) and somemeasurable control policy Uk. Then, the variance test (5)reduces to

limT→∞

1

T

T∑k=1

[Yk+1 − aYk − Uk(Y k1 )]2 =

Var [W ] +(A− a)2

βa.s. w.r.t. Pa.

Proof of Lemma 3: Since the hijacking phase of a learning-based attack (2) starts at time k = L + 1, using (1) and (4)we have

1

T

T∑k=1

(Yk+1 − aYk − Uk(Y k1 ))2

=1

T

(L∑k=1

W 2k +

T∑k=L+1

(Wk + (A− a)Vk)2

)(33a)

=1

T

(L∑k=1

W 2k +

T∑k=L+1

W 2k

)

+(A− a)2

T

T∑k=L+1

V 2k +

2(A− a)

T

T∑k=L+1

WkVk . (33b)

Let Fk be the σ−filed generated by (Vk, A, Wk, Uk)|k =L, . . . , T − 1. Then clearly, Vk+1 is Fk measurable, also(Wk+1,Fk) is a Martingale difference sequence. Thus, using[69, Lemma 2, part iii] the last term in (33b) reduces to

T∑k=L+1

WkVk = o

(T∑

k=L+1

V 2k

)+O(1) a.s. (34)

in the limit T →∞.Note further that

limT→∞

1

T

(L∑k=1

W 2k +

T∑k=L+1

W 2k

)= Var [W ] a.s. (35)

by the strong law of large numbers [53].Substituting (8) in (33), (35) in (33), using (34), and taking

T to infinity concludes the proof of the lemma. •In the following lemma, we prove (10a) for any learning-

based attack (2), and any measurable control policy.

Lemma 4. Consider any learning-based attack (2) withfictitious sensor reading power that satisfies (8) and a mea-surable control policy Uk. Then, the asymptotic deceptionprobability, under the variance test (5), is equal to

limT→∞

P a,TDec = Pa(|A− a| <

√δβ).

Proof of Lemma 4: Under the variance test,

limT→∞

P a,TDec = limT→∞

EPa[1T ] ,

where 1T is one if (5) occurs and zero otherwise. Using thedominated convergence theorem [53] and Lem. 3, we deduce

limT→∞

P a,TDec = EPa [1′T ] ,

where 1′T is one if (A− a)2/β ∈ (−δ, δ) and zero otherwise,which concludes the proof. •

Clearly, the estimation error of the LS algorithm (9) is [43]

A− a =

∑L−1k=1 WkXk∑L−1k=1 X

2k

a.s. w.r.t. Pa

Consequently, by (10a), learning-based attack (2) can at leastachieve the asymptotic deception probability (10b). Finally,since, for k ∈ 1, . . . , L, Uk is a measurable function ofY k1 = Xk

1 , (10c) follows using Theorem 4 of [43].

APPENDIX DPROOF OF THEOREM 2

We start by proving (12a). Using Lem. 4 and (7) we deduce

limT→∞

PTDec =1

2R

∫ R

−RPa(|A− a| <

√δβ)da

=1

2R

∫ R

−REPa

[1c] da,

where 1c is one if |A − a| <√δβ and zero otherwise.

Consequently, using Tonelli’s theorem [53] it follows that

limT→∞

PTDec = P(|A− A| <√δβ). (36)

We now continue by proving (12b). Since the attacker ob-served the plant state and control input during the learningphase which lasts L time steps, and since A→ (XL

1 , UL1 )→

A constitutes a Markov chain, using the continuous domainversion of Fano’s inequality [70, Prop. 2], we have

infA

P(∣∣∣A− A∣∣∣ ≥√δβ) ≥ 1− I(A;ZL1 ) + 1

log(R/√δβ)

, (37)

whenever√δβ ≤ R. Finally, using (36), (37), (12b) follows.

To prove (13), we further bound I(A;ZL1 ) from abovevia KL divergence manipulations. The proof of the followinglemma follows the arguments of [54], and is detailed inappendix E for completeness.

Lemma 5. Assume that A→ (Xk, Zk−11 )→ Uk is a Markov

chain for all k ∈ 1, . . . , L. LetQXk|Zk−1

1

be a sequence

of probability measures satisfying PXk|Zk−11 QXk|Zk−1

1for

all k. Then, for all k, we have

I(A;Zk1 ) =

L∑k=1

I(A;Xk|Zk−11 )

≤L∑k=1

D(PXk|Zk−1

1 ,A

∥∥∥QXk|Zk−11

∣∣∣PZk−11 ,A

).

Applying the bound of Lem. 5 to the first bound of thetheorem (12b) proves the second bound of the theorem (13).

14

APPENDIX EPROOF OF LEMMA 5

We start by applying the chain rule for mutual informationto I

(A;ZL1

)as follows.

I(A;ZL1 ) =

L∑k=1

I(A;Zk

∣∣Zk−11

). (38)

We next bound I(A;Zk

∣∣Zk−11

)from above.

I(A;Zk

∣∣Zk−11

)= I

(A;Xk, Uk

∣∣Zk−11

)(39a)

= I(A;Xk

∣∣Zk−11

)(39b)

= D(PXk|Zk−1

1 ,A

∥∥∥PXk|Zk−11

∣∣∣PZk−11 ,A

)(39c)

= EP

[log

dPXk|Zk−11 ,A

dPXk|Zk−11

](39d)

= EP

[log

dPXk|Zk−11 ,A

dQXk|Zk−11

]− EP

[log

dPXk|Zk−11

dQXk|Zk−11

](39e)

≤ D(PXk|Zk−11 ,A‖QXk|Zk−1

1|PZk−1

1 ,A), (39f)

where we substitute the definition of Zk , (Xk, Uk) to arriveat (39a), (39b) follows from the chain rule for mutual informa-tion and the Markovity assumption A → (Xk, Z

k−11 ) → Uk,

we use the definition of the conditional mutual informationin terms of the conditional KL divergence (recall the notationsection) to attain (39c) and (39d), the manipulation in (39e)is valid due to the condition PXk|Zk−1

1 QXk|Zk−1

1in the

setup of the lemma, and (39f) follows from the non-negativityproperty of the KL divergence.

Substituting (39) in (38) concludes the proof.

APPENDIX FPROOF OF COROLLARY 1

Set QXk|Zk−11

∼ N (0, σ2). Then, PXk|Zk−11 ,A =

N (AXk−1 + Uk−1, σ2), and consequently the measure-

domination condition PXk|Zk−11 QXk|Zk−1

1holds.

D(PXk|Zk−11 ,A‖QXk|Zk−1

1|PZk−1

1 ,A) (40)

= EP[D(N (AXk−1 + Uk−1, σ

2)∥∥N (0, σ2)

)]=

log e

2σ2EP[(AXk−1 + Uk−1)2

].

The result follows by combining (13) and (40).

APPENDIX GPROOF OF COROLLARY 2

Using (1) and (15), we can rewrite Xk and Xk explicitlyas follows

Xk = AkX0 +

k−1∑j=1

Ak−1−j(Uj +Wj),

Xk = AkX0 +

k−1∑j=1

Ak−1−j(Uj +Wj)

= AkX0 +

k−1∑j=1

Ak−1−j(Uj + Γj +Wj)

= Xk + Ψk−1 .

Thus, by (1), the following relation holds

AXk−1 + Uk−1 = AXk−1 + Uk−1 + Ψk−1 . (41)

By comparing

G(ZL1 ) ,log e2σ2

∑Lk=1 EP

[(AXk−1 + Uk−1)2

]+ 1

log(R/√δβ) ,

with

G(ZL1 ) =log e2σ2

∑Lk=1 EP

[(AXk−1 + Uk−1 + Ψk−1)2

]+ 1

log(R/√δβ) ,

in which we have utilized (41), and provided (16), we arriveat G(ZL1 ) > G(ZL1 ).

APPENDIX HPROOF OF LEMMA 1

Since the hijacking phase of the vector analogue of thelearning-based attack of (2) starts at time k = L+1, using (18)and (4), we haveT∑k=1

[Yk+1 − AYk − Uk(Y k1 )

][Yk+1 − AYk − Uk]

†

=

L∑k=1

WkW†k

+

T∑k=L+1

(Wk + (A− A)Vk

)(Wk + (A− A)Vk

)†(42a)

=

L∑k=1

WkW†k +

T∑k=L+1

WkW†k +

T∑k=L+1

(WkV†k(A− A)†

)†+

T∑k=L+1

(A− A)VkV†k(A− A)† +

T∑k=L+1

WkV†k(A− A)†

(42b)

Let Fk be the σ-field generated by (Vk, A,Wk,Uk)|k =L, . . . , T−1. Then, V†k+1 is Fk measurable, and (Wk+1,Fk)is a Martingale difference sequence, i.e., E [Wk+1|Fk] = 0 a.s.Consequently, using [69, Lemma 2, Part iii], we have

T∑k=L+1

WkV†k = O(1)

+

o(∑T

k=L+1(V†k)21

). . . o

(∑Tk=L+1(V†k)2

n

)...

......

o(∑T

k=L+1(V†k)21

). . . o

(∑Tk=L+1(V†k)2

n

) a.s.

(43)

in the limit T →∞, where (V†k)2i denotes the square of the i-

th element of V†k. Further, by the strong law of large numbers:

limT→∞

1

T

(L∑k=1

W†kWk +

T∑k=L+1

W†kWk

)= Σ a.s. (44)

15

Substituting (43) and (44) in (42b) completes the proof.

APPENDIX IPROOF OF THEOREM 3

(27a) follows from Lemma 1 and (22). We now prove (27b).By the Law of total probability,

PA(‖A− A‖op <

√γβ)≥ PA

(1

L

L∑k=1

XkX†k ζIn×n

)

· PA

(‖A− A‖op <

√γβ

∣∣∣∣∣ 1LL∑k=1

XkX†k ζIn×n

).

Since the control policy is (ζ, ρ)-persistently exciting and L−1 ≥ L0, the result follows from Lemma 2 and (23).

APPENDIX JPROOF OF LEMMA 2

Since GL−1 is a Hermitian matrix we start by noticing thatsince the event in (23) occurs for L−1 we have det(GL−1) 6=0, using [71, Theorem 7.8, part 2]. Thus, when the attackeruses the LS estimation (25) we deduce

A− A =

(L−1∑k=1

((Xk+1 − Uk)X†k

)− AGL−1

)G−1L−1

=

L−1∑k=1

((Xk+1 − AXk − Uk)X†k

)G−1L−1

=

L−1∑k=1

(WkX†k

)G−1L−1,

where the last two equalities follow from (24) and (18), re-spectively. Thus, using sub-multiplicativity of operator nromsand the triangle inequality we have

‖A− A‖op ≤L−1∑k=1

‖WkX†k‖op‖G−1L−1‖op. (45)

We now continue by upper bounding ‖G−1L−1‖op as follows.

Since the event in (23) occurs for L− 1, using [71, Theorem7.8, part 3] we deduce

1

ζLυ†In×nυ ≥ υ†G−1

L υ (46)

for all υ ∈ Rn×1. Since GL−1 is a Hermitian positive semidef-inite matrix, then so is G−1

L−1 (see [71, Problem 1, Section7.1]). Thus, using [71, Theorem 7.4] the Hermitian matrix√

G−1L−1 exists. We continue by noticing υ†In×nυ = ‖υ‖2,

and

υ†√

G−1L−1

†√G−1L−1υ =

∥∥∥∥√G−1L−1υ

∥∥∥∥2

.

Thus, using (46) we deduce∥∥∥∥√G−1L−1

∥∥∥∥op

≤ 1√ζL

. (47)

Using (47), sub-multiplicativity of operator nroms,and (45), (26) follows.

APPENDIX KEXTENSION TO NONLINEAR SYSTEMS: PROOFS AND

DISCUSSION

In this section, we first discuss additional details about ournonlinear results and then prove the Thm. 4.

The inner product of the RKHS satisfies the reproducingproperty: f(Z) = 〈f,K(Z, .)〉K. The choice of kernel isproblem and application dependent; see, e.g., [72] for a reviewof common kernel choices. The RKHS norm, induced by thekernel K, is defined as ‖f‖K ,

√〈f, f〉K, that can be viewed

as a complexity measure of the function f w.r.t. the kernelK. We assume ‖f‖K ≤ χ for a known constant χ > 0.The function f is assumed to be completely known to thecontroller, while the attacker is oblivious of f but is assumedto know the RKHS to which f belongs.

As in (3), under legitimate system operation, the controllerobservation Yk behaves according to

Yk+1 − f(Yk, Uk) ∼ i.i.d. N (0, σ2). (48)

Thus, following the exposition of Sec. II-B, we consider thevariance test, i.e., the controller tests whether the empiricalvariance of (48) falls within a confidence interval of length2δ > 0 around its expected variance σ2 [cf. (5)]. That is, attest time T , it checks whether

1

T

T∑k=1

[Yk+1 − f(Yk, Uk)]2

∈ (Var [W ]− δ,Var [W ] + δ).

(49)

Denote by F the estimation of the attacker of the functionf at the conclusion of Phase 1. We assume that during Phase2, the fictitious sensor reading is constructed in a model-basedfashion according to the nonlinear analogue of (2):

Vk+1 = F (Vk, Uk) + Wk , k = L, . . . , T − 1. (50)

To provide a lower bound on the deception probabilityP f,TDec , we assume that the attacker uses GP inference, whichis known to be closely related to RKHS functions (cf. [46],[63]), to construct an estimate F of f at the conclusion ofthe learning phase. To that end, the function f is assumed tohave a Gaussian prior and treat it as a GP, F , with a kernel(covariance function) K, which is the kernel associated withthe RKHS to which f (28) belongs. Without loss of generality,we assume F to be of zero mean.

Remark 7. Following [63] the assumption that f is random anddistributed according to a GP is used only for the attacker’slearning algorithm construction and is not part of the modelsetup (28).

During the learning phase, the attacker observes the state-and-control trajectories ZL−1

1 up to time L − 1. That is, theattacker observes ZL−1

1 —the (L − 1) inputs to the functionF , and XL

2 —their corresponding outputs corrupted by the i.i.d.Gaussian system disturbances WL−1

1 . Then, for all 1 ≤ k ≤L− 1, the posterior distribution of F is also a GP with mean

16

Mk, covariance Kk, and variance σ2k, given by (see, e.g., [72]):

Mk(Z) = Ck(Z)(Ck + Ik×kσ2)−1(Xk+1

2 )†,

Kk(Z,Z ′) = K(Z,Z ′)− Ck(Z)(Ck + Ik×kσ2)−1C†k(Z ′),

σ2k(Z) = Kk(Z,Z),

(51)

where the vector Ck = [K(Z,Z1) . . .K(Z,Zk)] comprises thecovariances between the input Z and the observed state-and-control trajectories vector (Zk1 )†, and Ck ∈ Rk×k is the co-variance matrix of the vector (Zk1 )†, i.e., (Ck)i,j = K(Zi, Zj).

By choosing F = ML−1—the minimum mean squareerror estimator of F , in Thm. 4, we showed the nonlinearlearning-based attack (50) achieves at least the asymptoticdeception probability (29) under the variance test (49) for anymeasurable control policy.

A. Proof of Theorem 4

We start by proving the following lemma, which is anextension of the Lem. 3 to our nonlinear setup, and it showsthat the deception probability is related to the performance ofthe learning algorithm.

Lemma 6. Given T = L + c, consider any learning-basedattack (50) and some measurable control policy Uk. If

1

L+ c

L+c∑k=L+1

[f(Vk, Uk)− F (Vk, Uk)]2+

2

L+ c

L+c∑k=L+1

|Wk||F (Vk, Uk)− f(Vk, Uk)| ≤ δ

holds in the limit of L → ∞, then the attacker is able todeceive the controller and remain undetected a.s.

Proof of Lemma 6: Using (50) we have

Vk+1 − f(Vk, Uk) = Wk + F (Vk, Uk)− f(Vk, Uk).

Thus, given the test (49), the attacker manages to deceive thecontroller and remain undetected if

1

T

(L∑k=1

W 2k +

T∑k=L+1

(Wk + F (Vk, Uk)− f(Vk, Uk))2

)∈ (Var [W ]− δ,Var [W ] + δ).

We have

1

T

(L∑k=1

W 2k +

T∑k=L+1

(Wk + F (Vk, Uk)− f(Vk, Uk))2

)

=1

T

(L∑k=1

W 2k +

T∑k=L+1

W 2k

)

+1

T

T∑k=L+1

(F (Vk, Uk)− f(Vk, Uk))2

+2

T

T∑k=L+1

Wk(F (Vk, Uk)− f(Vk, Uk)) . (52)

Since T = L+ c, by taking L to infinity and substituting (35)

in (52) we deduce if

1

L+ c

L+c∑k=L+1

(f(Vk, Uk)− F (Vk, Uk))2+

2

L+ c

L+c∑k=L+1

Wk(F (Vk, Uk)− f(Vk, Uk)) ∈ (−δ, δ).

holds in the limit of L → ∞, then the attacker remainsundetected a.s. The result follows using triangle inequality. •

We continue by re-stating the following lemma from [46],[63] for our setup.

Lemma 7. Consider the nonlinear plant (28), where ‖f‖K ≤χ. If the attacker chooses F = ML−1 according to the GP-based inference (51), then

Pf(|f(Vk, Uk)− F (Vk, Uk)| ≤ %kσL−1(Vk, Uk)

)≥ 1− ξk,

for k ≥ L+ 1, where %k , χ+ 4σ√ψL−1 + 1 + ln( 1

ξk), and

ψL−1 is defined as follows

ψL−1 ,1

2

L−1∑k=1

ln(1 + σ−2σ2k−1(Zk)).

Define νk , %kσL−1(Vk, Uk), then ξk can be calculatedas (30). Also, using Lem. 7, we deduce

1

L+ c

L+c∑k=L+1

[F (Vk, Uk)− f(Vk, Uk)]2+

2

L+ c

L+c∑k=L+1

|Wk||F (Vk, Uk)− f(Vk, Uk)| ≤

1

L+ c

L+c∑k=L+1

ν2k +

2

L+ c

L+c∑k=L+1

|Wk|νk

holds at least with probability∏L+ck=L+1(1 − ξk). The result

now follows using Lem. 6, and (31).By looking at (29), it follows that the bound on the de-

ception probability depends on the variances of the posteriorsat the observation points ZL−1

1 , as well as σL(Vk, Uk) forL+1 ≤ k ≤ L+c. In particular, as the variances of posteriorsare smaller, the attacker has less uncertainty about the systemdynamics, and the its success rate increases.Remark 8. Designing a proper privacy-enhancing signal whichleads to large variances for posteriors, and consequently,enhanced detection probability is an interesting future researchvenue.

Documents

Learning-based Attacks in Cyber-Physical Systemsacsweb.ucsd.edu/~mkhojast/Papers/Authentication-learning-based.pdf · in-the-middle attack, physical-layer authentication. I. INTRODUCTION