Projection-Based Bayesian Recursive Estimation of ARX ...as.utia.cz/publications/2005/KarPav_05.pdf · the support of the pdf in its argument. The approximation needed for on line

Projection-Based Bayesian Recursive

Estimation of ARX Model with Uniform

Innovations

Miroslav Karny and Lenka Pavelkova

Department of Adaptive Systems

Institute of Information Theory and Automation

Academy of Sciences of the Czech Republic

P.O. Box 18, 182 08 Prague 8, Czech Republic

Abstract

Autoregressive model with exogenous inputs (ARX) is a widely-used black-box type

model underlying adaptive predictors and controllers. Its innovations, stochastic un-

observed stimulus of the model, are white, zero mean with time-invariant variance.

Mostly, the innovations are assumed to be normal. It induces least squares as the

adequate estimation procedure. Light tails of the normal distribution imply that its

unbounded support can often be accepted as a reasonable approximate description of

physical quantities, which are mostly bounded. In some case, however, this approx-

imation is too crude or does not fit subsequent processing, for instance, robust con-

trol design. Then, techniques similar to those dealing with unknown-but-bounded

equation errors are used. They intentionally give up stochastic interpretation of

innovations and develop various algorithms of a min-max type.

The paper assumes bounded innovations but stays within the standard, here

Bayesian, estimation setup by considering their uniform distribution. The posterior

Preprint submitted to Elsevier Science 22 August 2005

probability density function (pdf) is first described and then approximated by a pdf

with a fixed-dimensional statistic. Consequently, the estimation can run on line. Its

limited memory allows for tracking varying parameters. In this way, an alternative

to popular forgetting techniques is also obtained.

The paper provides a complete algorithmic solution and illustrates its behavior.

Key words: ARX model, Bayesian recursive estimation, uniform distribution

1 INTRODUCTION

Recursive estimation [1,2] is in the heart of a range of adaptive decision-making

units that include predictors, advising and fault-detecting systems as well as

adaptive controllers, e.g. [3–6]. Sufficiency of simple, often black-box, locally

valid models for successful decision making is the major advantage of adaptive

systems [7,8]. It makes them relatively universal and keeps costs of their im-

plementation low. Autoregressive model with exogenous inputs (ARX) is an

important representative of this model class. Its innovations, stochastic unob-

served stimulus of the model, are white, zero mean and have time-invariant

variance. Mostly, the innovations are assumed to be normal. It singles out

least squares as the adequate estimation procedure [1]. Light tails of the nor-

mal distribution imply that unbounded support of normal distribution can

often be accepted as a reasonable approximation of reality, which is mostly

bounded. In some case, however, this assumption is unrealistic, e.g. in mod-

elling of transportation systems (for encountered problems see e.g. [9]), or do

not fit subsequent processing, for instance, robust control design [10]. Then,

techniques similar to those dealing with unknown-but-bounded equation er-

rors are used, see references in [11]. They often intentionally give up stochastic

2

interpretation of innovations and develop and analyze various algorithms of a

min-max type, cf. [12]. The resulting algorithms are definitely useful. However,

giving up the probabilistic interpretation makes solutions of related decision-

making tasks unnecessarily difficult as a rich set of statistical tools is given up,

too. Thus, it makes sense to address the discussed estimation within the “clas-

sical” probabilistic setting. The paper accepts the assumption that innovations

are bounded but stays within the standard, here Bayesian, estimation setup

[1] by assuming their uniform distribution. The posterior probability density

function (pdf) is described and then approximated by a pdf of the same type

but having a fixed-dimensional statistic. Consequently, the estimation can run

on line and its limited memory allows tracking of varying parameters. Thus,

an alternative to forgetting techniques [13] is obtained.

The exact Bayesian estimation of the ARX model is recalled, Section 2. Its

recursively applicable approximate estimation is presented in Section 3. Sec-

tion 4 discusses practical implementation aspects of the obtained algorithm,

which is summarized in Section 5. Properties of the proposed algorithm are

illustrated on simulation examples and on processing of simple transportation

data, Section 6. Remarks, Section 7, close the paper.

Throughout: ′ denotes transposition; x means the number of entries in a vector

x; I denotes unit matrix; the subscript in round brackets defines dimensions

of the matrix; ≡ is equality by definition; X∗ denotes a set of X-values; χx(•)is value of the indicator of a set x∗ at the point x: the set x∗ defined by

the conditions •; f(·|·) denotes probability density function (pdf); t labels

discrete-time moments, t ∈ t∗ ≡ {1, 2, . . .}; dt = (yt, ut) is the data record at

time t consisting of an observed system output yt and of an optional system

input ut; X(t) denotes the sequence (X1, . . . , Xt), X ∈ {d, y, u}. Names of

3

arguments distinguish respective pdfs. No formal distinction is made between

a random variable, its realization and an argument of a pdf. Integrals used are

always definite and multivariate ones. The integration domain coincides with

the support of the pdf in its argument.

The approximation needed for on line estimation exploits Kullback-Leibler

(KL) divergence D(f ||f

)[14] that measures the proximity of a pair of pdfs

f , f acting on a set X∗. It is defined as follows

D(f ||f

)≡

∫f(X) ln

(f(X)

f(X)

)dX. (1)

The KL divergence has the following properties of interest

D(f ||f) ≥ 0, D(f ||f) = 0 iff f = f almost everywhere on X∗ (2)

D(f ||f) = ∞ if the set on which f(x) = 0 and f(x) > 0 has non-zero volume.

2 ESTIMATING ARX MODEL WITH UNIFORM INNOVATIONS

The parameterized model of the system with a single output yt

f(yt|ut, d(t− 1), Θ) = f(yt|ψt, Θ) ≡ Uyt(ψ′θ, r) (3)

≡ χyt(−r ≤ yt − ψ′tθ ≤ r)

2r=

χyt(−r ≤ Ψt[1,−θ′]′ ≤ r)

2r

is the pdf that describes the ARX model with uniform innovations. In it,

ψt is a column regression vector made of past observed data d(t− 1) and the

current system input ut; often, the state in the phase form is considered,

i.e., ψ′t ≡ [u′t, d′t−1, . . . , d

′t−∂, 1] with the model order ∂ ≥ 0,

Ψ′t ≡ [yt, ψ

′t] combines the modelled output and regression vector in the single

data vector; for the state in the phase form Ψ′t = [d′t, d

′t−1, . . . , d

′t−∂, 1],

4

θ is a column vector of regression coefficients,

r > 0 is a positive scalar half-width of the range of innovations et ≡ Ψ′t[1,−θ′]′,

Θ ≡ (θ, r) labels unknown parameters of the model,

Uy(µ, r) is a uniform pdf of y given by expectation µ and half-width r > 0.

Note that single output model is sufficient as the chain rule for pdfs [1] implies

that the multivariate case can be treated using a collection of such models [15].

Under natural conditions of control [1], which assume inputs conditionally

independent of unknown parameters f(ut|d(t − 1), Θ) = f(ut|d(t − 1)), the

likelihood function assigned to the uniform ARX model has the form

L(d(t), Θ) =1

rtχr(r ≥ 0)χΘ(−1tr ≤ Wt[−1, θ′]′ ≤ 1tr), (4)

where 1t is the column vector consisting of t units and W ′t ≡ [Ψ1, . . . , Ψt]. The

matrix Wt is assumed to be known. Thus, the initial regression vector ψ1 is to

be known. For the state in the phase form, it is equivalent to the knowledge

of u1, d0, . . . , d−∂+1. The likelihood form (4) hints the reproducing prior pdf

f(Θ) ∝ 1

rν0χr(r ≥ r ≥ 0)χΘ(−1ν0r ≤ W0[−1, θ′]′ ≤ 1ν0r), (5)

where ∞ > r > 0 is a sure upper bound on r. W0 can be interpreted as a

storage of data vectors (fictitiously) measured before the time t = 1 and ν0 is

their number. The proper prior pdf is obtained for full-rank W0 and ν0 ≥ Ψ+1.

For the prior pdf (5), the posterior pdf has the same form

f(Θ|d(t))∝ 1

rνtχr(r ≥ r ≥ 0)χΘ(−1νtr ≤ Wt[−1, θ′]′ ≤ 1νtr), with (6)

νt = νt−1 + 1, ν0 ≥ Ψ + 1 is chosen a priori

W ′t =

[W ′

t−1, Ψt

], full-rank W0 is chosen a priori.

5

Applicability of the formula (6) is limited as the dimension (νt, Ψ) of the matrix

Wt increases with the number t of processed data d(t), which also increases

complexity of the convex support defined by them. This makes us search for

an approximate solution, especially when considering the recursive estimation.

3 APPROXIMATE RECURSIVELY FEASIBLE ESTIMATION

To obtain approximate, recursively feasible, estimation of the uniform ARX

model we have to approximate the exact posterior pdf by a pdf determined

by a statistic whose finite dimension does not increase with the increasing

number of data. Approximation of the support of the pdf by an ellipsoid is

used in connection with the estimation dealing with unknown-but-bounded

innovations [16]. Here, we propose an alternative approximation by a convex

set. Similar methodology was used in early papers addressing this problem

[17]. Intuitively, the gained support shape is more similar to the correct one

than ellipsoidal. Moreover, if need be, the presented solution can be comple-

mented by covering the approximate support by a union of non-overlapping

boxes, using algorithms derived in [18]. This provides a simple description of

uncertainties suitable for robust prediction and control design.

The exact posterior pdf (6) motivates the form of the approximate pdf

f(Θ|d(t− 1)) =1

rνt−1 χr(r ≥ r ≥ 0)χΘ(−1V r ≤ Vt−1[−1, θ′]′ ≤ 1V r)

I(Vt−1, νt−1)(7)

I(V, ν)≡∫ 1

rνχr(r ≥ r ≥ 0)χΘ(−1V r ≤ V [−1, θ′]′ ≤ 1V r) dΘ. (8)

These pdfs are determined by full-rank matrices Vt−1 with a finite first dimen-

sion V ≥ Ψ that is common for t ∈ t∗.

6

The projection-based approximation of the Bayes rule [19] is applied for design

of the recursively implementable data updating of these approximate pdfs. The

proper updating of f(Θ|d(t− 1)) by the data vector Ψt leads to the pdf

f(Θ|d(t)) =

χr(r≥r≥0)

rνt−1+1 χΘ

(−1V +1r ≤ [V ′

t−1, Ψt]′[−1, θ′]′ ≤ 1V +1r

)

I ([V ′t−1, Ψt]′, νt−1 + 1)

. (9)

We search for the approximate posterior pdf f(Θ|d(t)) in the form (7), with

t replacing t − 1. In [20], this approximation is cast into the Bayesian con-

text. Under rather general conditions, it is found that the “best” approximate

f(Θ|d(t)) is to be selected as minimizer of the KL divergence D(f ||f) over

pdfs of the desired form (7), thus over Vt, νt.

The KL divergence will be finite iff the support of f will be included in the

support of the constructed approximate pdf. This can be guaranteed if some

of the restrictions on Θ, defined by rows of the matrix [V ′t−1, Ψt] V ′

t , is relaxed.

We use this possibility and optimize over

Vt ∈ V ∗t ≡ {Vt−1,

iV, i = 1, . . . , V } (10)

with iV obtained from Vt−1 by replacing its ith row with Ψ′t. As the scalar νt

expresses the number of used data vectors, it is reasonable to take

νt ≡ ν(Vt, νt−1) =

νt−1 if Vt = Vt−1

νt−1 + 1 otherwise

. (11)

Under restrictions (10), (11) and for a fixed r, the constructed pdf is constant

function of θ on the integration domain in the definition of the KL divergence

(1) and arg minV ∈V ∗t D(f ||f) = arg minV ∈V ∗t ln[I(V, ν)]. For the optimal choice

7

of Vt, we have to inspect dependence of I(V, ν) on V . The substitution x′ =

[−1/r︸︷︷︸x1

, θ′/r] in (8) and a straightforward bounding from above give

I(V, ν) =∫

(−x1)ν−Ψ−1χx1(x1 ≤ −1/r)χx(−1V ≤ V x ≤ 1V ) dx ≤ V(V )

rν−Ψ−1(12)

r≡ arg min{Θ: r≥r≥0,−r1V ≤V [−1,θ′]′≤r1V }

r (13)

V(V )≡∫

χx(−1V ≤ V x ≤ 1V ) dx.

The exact evaluation of I(V, ν) is hard. Thus, instead of minimizing the KL

divergence, which is its increasing function of I(V, ν), we minimize the upper

bound in (13) over V ∗t given by (10). The complete argument Θ ≡ (θ, r) ≡

(θ(V ), r(V )) ≡ Θ(V ) of the minima (13) is found by linear programming (LP).

It serves also as a good point estimate of Θ.

In summary, the set V ∗t of candidates for Vt contains Vt−1 and iV , i = 1, . . . , V ,

where iV is obtained by replacing ith row of Vt−1 with Ψ′t. The best option Vt

minimizes the upper bound (13), i.e., V(Vt)/rνt−Ψ−1(Vt) ≤ V(V )/rνt−Ψ−1(V ),

∀V ∈ V ∗t with νt given by the rule (11). Thus, it remains to inspect the

dependence V(V ).

For V = Ψ, the substitution z ≡ V x in V(V ) implies that the volume V(V ) is

increasing function of |V |−1, where | · | denotes absolute value of the determi-

nant. Defining the vector a ≡(V ′

t−1

)−1Ψt, the replacement of the ith row of

Vt−1 by Ψ′t offers the matrix

iVt ≡ Vt−1 + ie(Ψ′t − ie′Vt−1) =

(I + ie(a− ie)′

)Vt−1 ≡ (I + ie ib′)Vt−1,

where I is the unit matrix and ie its ith column. The absolute value of the

determinant | iVt| = |I + ie ib′||Vt−1| = |ai||Vt−1| as the absolute value |ai| of

ith entry of a is the only non-unit eigenvalue of I + ie ib′ corresponding to the

8

eigenvector ie. The remaining eigenvectors are vectors orthogonal to ib.

Thus, the replacement ith row in Vt−1 decreases the inspected volume V(Vt−1)

iff maxi=1,...,Ψ |ai| > 1. The argument of the maxima points to the row of Vt−1

to be replaced by Ψ′t.

Evaluation of the optimized volume in the general case V > Ψ is harder, cf.

[18]. We need, however, to find only its dependence on possible variants iVt.

For it, let us drop temporarily the subscript t and consider singular value

decompositions iV ≡ iS iΛ iT [21], of the full-rank (V , Ψ) matrices iV . In

them, iS, iT are orthogonal square matrices of appropriate dimensions and

iΛ is (V , Ψ) matrix with only non-zero entries (singular values) iΛkk, k =

1, . . . , Ψ. For arbitrary ω > 0, let us consider the extensions of these matrices

to the full-rank square (V , V ) matrices

iVω = iS

iΛ

0(Ψ,V−Ψ)

ωI(V−Ψ,V−Ψ)

iT 0

0 iU

︸︷︷︸iT

(14)

where iU are arbitrary orthogonal matrices and 0(m,n) is the (m,n) matrix

with zero entries. Consequently, the matrices iT in (14) are orthogonal. Let

us inspect volumes of the sets

i(x, κ)∗(Vω) ≡ {x, κ : −1V ≤ iVω[x′, κ′]′ ≤ 1V }, (15)

with the “dummy” variables κ in (V − Ψ)-dimensional vector. The volume

V( iVω) of the set i(x, κ)∗(Vω) (15) equal to c| iVω|−1 with a common positive

c. For arbitrary i, j, ω, the volume ratio ijRω ≡ V( jVω)/V( iVω) is well defined

9

and independent of ω

ijRω ≡ | iVω|| jVω| =

1︷︸︸︷| iS|∏Ψ

k=1iΛk|ωI||

1︷︸︸︷iT |

| jS|︸︷︷︸1

∏Ψk=1

jΛk|ωI| | jT |︸︷︷︸1

=Ψ∏

k=1

iΛk

jΛk

. (16)

The sets (x, κ)∗(Vω) extend the sets of interest x∗(V ) ≡ {x : −1V ≤ V x ≤ 1V }by the dummy variables κ. Volumes of these extensions vary with ω between

zero (for ω → ∞) and infinity (for ω → 0). The ratio of the overall volumes

is, however, constant and well defined for all ω > 0. Thus, it makes sense to

interpret the ratio (16) of the extended sets as the ratio of the volumes of the

corresponding original sets x∗(V ), V ∈ V ∗. It holds

| iV ′ iV | = | iT ′ iΛ′ iS ′ iS︸︷︷︸I

iΛ iT | =Ψ∏

k=1

iΛ2k (17)

and thus | iV ′ iV | equals the square of the expression entering the inspected

ratio of volumes. Similarly as in the simpler case V = Ψ, it can be found that

| iV ′t

iVt|= |V ′t−1Vt−1| [(1 + F ′F )(1− iG′ iG) + ( iG′F )2]︸︷︷︸

ai

(18)

F ≡ (V ′t−1Vt−1)

−0.5Ψt,iG = (V ′

t−1Vt−1)−0.5 × ith column of V ′

t−1.

Thus, the replacement decreases the inspected volume iff maxi=1,...,Ψ |ai| > 1.

The maximizing argument points to the row of Vt−1 to be replaced by Ψ′t to get

the best Vt. A straightforward algebra confirms that the derived rule reduces

to the former one (up to the unimportant squaring) if V = Ψ.

10

4 PRACTICAL ASPECTS

4.1 Choice of V0

The initial V0 = W0 is supposed to be chosen so that the set {Θ : r ≥ r ≥0, −1V r ≤ V0[−1, θ′]′ ≤ 1V r} is bounded. Typically, we know that r ≥ r ≥ 0

and θ ∈ [θ, θ] with some 1V∞ > θ > θ > −1V∞. Throughout, the inequalities

for the vector quantities are understood entry-wise. Let us take V = Ψ and

V0 ≡

r 0(1,ψ)

−rθ1+θ1

θ1−θ1

2rθ1−θ1

.... . .

−rθψ+θψ

θψ−θψ

2rθψ−θψ

. (19)

Then, the first row guarantees that r ≥ r. The upper bound r ≥ r is specified

independently in (7). Entries of θ are bounded individually. For ith entry

−rθi + θi

θi − θi

+ θi2r

θi − θi

≤ r ⇔ θi ≤ θ.

The correctness of the definition of the lower bound can be verified similarly.

Notice that the proposed construction of the matrices Vt, which are updated

only when the product of their singular values increases, guarantees that the

estimation task remains well-conditioned for all t.

11

4.2 Choice of V

The developed recursive estimation uses the projected (approximated) results

obtained at time t as statistics of the prior pdf for time t + 1. It means that

the precious information contained in observed data is corrupted by this pro-

cessing, cf. discussion in [22,23]. Essentially, the correct commutative scheme

of the recursive estimation with projection

sufficient statistict−1 → Bayesian update by Ψt → sufficient statistict

↓ projection ↓ projection

projected statistict−1 → correct approximate update by Ψt → projected statistict

is replaced by the scheme

statistict−1 → Bayesian update by Ψt → updated statistict

↓ projection

statistict → Bayesian update by Ψt+1 . . . ,

where statistict describes the approximate posterior pdf. Thus, it is generally

desirable to choose V as large as possible. Its practical choice is thus deter-

mined by the induced computational demands, see Section 4.3.

4.3 Computational demands and robustness

The minimization of the upper bound (12) over V ∈ V ∗t requires solution

of V linear-programming (LP) tasks (13) of the size (V , Ψ). This represents

the main computational burden connected with the updating Vt−1 → Vt. It

can be substantially decreased by exploiting closeness of respective LP’s with

matrices V ∈ V ∗t (10) differing at most in two rows. Moreover, it is possible

and reasonable to restrict the search for the optimum of (12) to a subset

12

of V ∗t . It has an added advantage as it allows preserving a selected part of

Vt and to make the updating robust. Specifically, if we optimize only over

{Vt−1,iV, i = Ψ + 1, . . . , V } the prior “safe” bounds stored in V0 ≡ W0 are

unchanged. We use this option.

The prior V0 expresses usually box-type restrictions, see Section (4.1). If the

stabilized estimation results indicate this box too conservative, we can replace

it in one shot by the box covering the convex set determined by Vt without the

original V0. The construction of this box requires solving 2×Ψ LP tasks of the

type (V , Ψ), [18]. The detection when this replacement is suitable represents

an independent non-trivial tasks. This makes us to avoid this option.

Moreover, we conjecture that changes of the optimized upper bound (13) follow

predominantly changes of the volume V(V ). This brings additional simplifi-

cation we use: the replacement is made according the volume changes. The

single LP task is then solved only for the selected Vt only and provides the

point estimate Θ of Θ.

The form of supports of considered posterior pdfs implies that the solved LP

may fail only when the upper bound r on r is too small. When facing such a

failure, the restriction r should be relaxed. The point estimate r found without

this restriction defines a new larger upper bound r.

4.4 Relationships to least squares

The derived algorithm is tightly related with least squares (LS). The ma-

trix W ′tWt coincides with so called extended information matrix, which forms,

together with νt, sufficient statistics of normal ARX model. Its updating is

13

algorithmically equivalent to recursive LS. The matrix V ′t Vt, used in construc-

tion Vt+1, can be taken as an approximation of the information matrix that

preserves only V data vectors among those observed up to time t. Thus, the

proposed algorithm provides as byproduct LS that deals with a restricted and

varying choice of informative data vectors. This complements, if not compete,

with classical forgetting based on age-weighting. The preservation of V0 pro-

posed in Section 4.3 even avoids known instability of exponential forgetting

under information lack. Thus, it is close to the advanced versions of forgetting,

which counteracts this problem, see [13,24].

The observed relationship also helps in implementation: the approximate in-

formation matrices should be updated in the factorized way mastered in con-

nection with LS, see [25].

5 ALGORITHMIC SUMMARY

Initial mode

• Select the structure of the ARX model, i.e., structure of data vectors Ψt.

• Select the dimension V ≥ Ψ of the statistic V .

• Select lower and upper bounds on the estimated parameters and construct

the matrix V0 according to (19).

• Collect the data up to the moment when data vector Ψt complement Vt to

its full chosen dimension V .

• Fill initial data into the first regression vector ψ1, choose ν0 and set t = 0.

14

Recursive mode

• Set t = t + 1, acquire data dt and create the data vector Ψt = [yt, ψ′t]′.

• Update by Ψt the matrix Vt−1 to the matrix Vt as follows.

If Vt−1 < V set V ′t = [V ′

t−1, Ψt] otherwise

· Evaluate H =(V ′

t−1Vt−1

)−0.5

· Compute the vector F = HΨt, the matrix G = HV ′t−1 – with ith column

iG (18) – and evaluate the scalar γ = 1 + F ′F .

· For i = 1, . . . , V , evaluate ai ≡ γ(1− iG′ iG) + ( iG′F )2 and then find the

entry ak with the largest absolute value, i.e., |ak| ≥ |ai|.· Set Vt = Vt−1 if |ak| ≤ 1 else replace kth row of Vt−1 by Ψ′

t to get Vt.

• Set νt = νt−1 and preserve the point estimate Θt ≡ Θt−1 of parameters Θ if

Vt = Vt−1 otherwise set νt = νt−1 + 1 and update point estimates

Θt = arg min{Θ:−1V r≤Vt[−1,θ′]′≤1V r, r∈[r,r]}

r (20)

Increase appropriately r if the above LP fails.

• Go to the beginning of Recursive mode while data are available.

6 ILLUSTRATIVE EXAMPLES

This section illustrates basic features of the proposed algorithm on a simulated

example and on prediction of real transportation data.

15

0 10 20 30 40 50 60 70 80 90 100−350

−300

−250

−200

−150

−100

−50

0

50SIMULATED DATA

TIME

SYST

EM OU

TPUT

0 10 20 30 40 50 60 70 80 90 100−1.5

−1

−0.5

0

0.5

1PREDICTION ERROR FOR MEMORY 5

TIME

PRED

ICTION

ERRO

R

Fig. 1. Simulated data Prediction errors, V = 1× Ψ = 5

0 10 20 30 40 50 60 70 80 90 1000

0.5

1

1.5

2

2.5

3ESTIMATE OF REGRESSION COEFFICIENT 1

TIME

REGR

ESSIO

N COE

FFICI

ENT

0 10 20 30 40 50 60 70 80 90 100−3

−2.5

−2

−1.5

−1

−0.5

0

0.5ESTIMATE OF REGRESSION COEFFICIENT 2

TIME

REGR

ESSIO

N COE

FFICI

ENT

0 10 20 30 40 50 60 70 80 90 100−1

−0.5

0

0.5

1

1.5


TIME

REGR

ESSIO

N COE

FFICI

ENT

0 10 20 30 40 50 60 70 80 90 100−2.5

−2

−1.5

−1

−0.5

0


TIME

REGR

ESSIO

N COE

FFICI

ENT

Fig. 2. Estimates θ of θ for V = 1× Ψ = 5; ∗ marks the simulated values oθ.

6.1 Simulated case

The simple single-output system was simulated by the uniform AR model

f(yt| yt−1, yt−2, yt−3, 1︸︷︷︸ψ′t

) = 0.5χyt(− 1︸︷︷︸or

≤ yt−[2.85,−2.7075, 0.8574, 0]︸︷︷︸oθ′

ψt ≤ 1).

Parameters of the uniform ARX model with the structure enriched by offset

were estimated using 100 data samples for various storage lengths V . The

prior pdf was selected as recommended in Section 4.2 with r = 10, r = 10−6,

θ = −θ = 102×15, Influence of the storage length V , taken as integer multiples

of Ψ = 5, was inspected on this simple example.

The processed data are on Figure 1 together with prediction errors obtained

for V = Ψ = 5. These errors are rather close to the noise realization. Typical

trajectories of parameter estimates θ for V = 5 are in Figure 2. Character

16

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1POINT ESTIMATE OF THE NOISE−RANGE HALF−WIDTH

TIME

HALF

−WIDT

H

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9


TIME

HALF

−WIDT

H

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9


TIME

HALF

−WIDT

H

0 10 20 30 40 50 60 70 80 90 1000

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9


TIME

HALF

−WIDT

H

Fig. 3. Estimates r of r for V = 1, 2, 3, 4× Ψ = 5; ∗ marks the simulated value or.

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

14

16

18ERROR OF PARAMETER ESTIMATES FOR MEMORY 5

TIME

SQUA

RED E

STIM

ATION

ERRO

R

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

14

16


TIME

SQUA

RED E

STIM

ATION

ERRO

R

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

14

16


TIME

SQUA

RED E

STIM

ATION

ERRO

R

0 10 20 30 40 50 60 70 80 90 1000

2

4

6

8

10

12

14

16


TIME

SQUA

RED E

STIM

ATION

ERRO

R

Fig. 4. Trajectories of ρt ≡ ||θt − oθ||2 for V = 1, 2, 3, 4× V

of these trajectories is similar for tested dimensions V = 5, 10, 15, 20. Main

differences are in estimates r of the half-width r, see Figure 3. Also, the trajec-

tories of squared norms ρt ≡ ||θt− oθ||2 ≡ (θt− oθ)′(θt− oθ) exhibit predictable

dependence on V , see Figure 4. Table 1 containing elementary statistical eval-

uation complements the overall picture about properties of the algorithms.

It is worth of noticing that only about 50-60% of data is taken as informa-

tive, which implies the corresponding reduction in computational time and

relatively infrequent changes of point estimates.

17

Table 1

Elementary sample statistics of estimation; e ≡ prediction error, ρ ≡ ||θ − oθ||2 ≡squared norm of the estimation error

storage length V 5 10 15 20

mean of e -0.237 -0.051 -0.031 -0.102

minimum of e -1.020 -1.032 -1.091 -1.099

maximum of e 1.000 1.000 1.000 1.000

standard deviation of e 0.474 0.549 0.556 0.552

ratio of standard deviations of e and data 0.005 0.006 0.006 0.006

mean of ρ 4.712 2.750 2.630 2.721

minimum of ρ 0.039 0.029 0.005 0.014

maximum of ρ 16.189 17.371 17.368 17.368

standard deviation of ρ 4.064 4.852 4.875 4.843

portion of informative data [ % ] 55.000 56.000 61.000 63.000

elapsed time [s] 5.187 5.734 6.875 7.478

6.2 Transportation data

The research reported here has been motivated by practical problems in esti-

mation of urban-traffic models. Thus, the data in this domain were taken for

inspecting properties of the proposed algorithm. In a particular sub-problem,

the counts of cars crossing a cross-road per hour were recorded. To get the

required predictor, logarithms of these counts are modelled by fourth order

18

0 100 200 300 400 500 600 7000

100

200

300

400

500

600

700

800

900

1000REAL DATA

TIME

MODE

LLED

INTE

NSITY

0 100 200 300 400 500 600 700−600

−400

−200

0

200

400

600

800PREDICTION ERROR FOR MEMORY 42

TIME

PRED

ICTION

ERRO

R

Fig. 5. Transportation data Prediction errors, V = 7× Ψ = 42

0 100 200 300 400 500 600 7000

0.2

0.4

0.6

0.8

1

1.2


TIME

REGR

ESSIO

N COE

FFICI

ENT

0 100 200 300 400 500 600 700−1.5

−1

−0.5

0

0.5

1


TIME

REGR

ESSIO

N COE

FFICI

ENT

0 100 200 300 400 500 600 700−1.5

−1

−0.5

0

0.5

1

1.5


TIME

REGR

ESSIO

N COE

FFICI

ENT

0 100 200 300 400 500 600 700−1.5

−1

−0.5

0

0.5


TIME

REGR

ESSIO

N COE

FFICI

ENT

0 100 200 300 400 500 600 700−1

−0.5

0

0.5

1

1.5

2

2.5

3

3.5


TIME

REGR

ESSIO

N COE

FFICI

ENT

0 100 200 300 400 500 600 7001

2

3

4

5

6


TIME

HALF

−WIDT

H

Fig. 6. Estimates θ of θ for V = 7× Ψ = 42.

uniform AR model. The results are presented in the similar form as the sim-

ulated example. Predictions and prediction errors are expressed in original

counts.

The processed data are on Figure 5 together with prediction errors obtained

for V = 7× Ψ = 42. These errors are typical.

Typical trajectories of parameter estimates θ and r for V = 42 are in Fig-

ure 6. Character of these trajectories is similar for tested dimensions V =

12, 18, 24, 30, 36, 42, 60, 180.

Table 1 containing elementary statistical evaluation complements the overall

19

picture about properties of the algorithms. It is worth of noticing that the

range 11-59% of informative data. The extension of the storage length V

essentially ceases to influence the result quality for V > 24 = 4× Ψ.

Table 2

Elementary sample statistics of estimation; e ≡ prediction error

storage length V 12 24 42 180

mean of e 114 100 99 96

minimum of e -320 -450 -467 -462

maximum of e 828 773 772 774

standard deviation of e 192 190 189 188

ratio of standard deviations of e and data 0.74 0.73 0.73 0.73

portion of informative data [ % ] 43 11 18 59

elapsed time [s] 4.2 11.1 22.6 702.4

7 CONCLUDING REMARKS

The paper provides a relatively straightforward approximation of the recursive

Bayesian estimation of the ARX model with uniform innovations.

With respect to practice:

• It respects hard bounds on all model parameters.

• It provides description of possible models by a convex support of the (ap-

proximate) posterior pdf. Such a description is often supposed in subsequent

robust-control design.

20

• It makes behavior of point parameter estimates much more quiet than is

usual with standard least squares. It may decrease demands on repetition

of the subsequent control design and suits well to successful approximate

design strategy of iterations-spread-in-time [26,27]. Altogether, it makes the

resulting adaptive system applicable to high-dimensional systems in spite of

the increased computational demands (comparing to least squares) on the

recursive estimation.

With respect to theory :

• It offers an interesting complement to parameter tracking based on forget-

ting (age-weighting). It is expected to serve well in cases when newer data

do not automatically mean more informative ones. The exactly same idea

can be applied to other model types, even to normal regression models.

• It contributes to the unknown-but-bounded methodology by avoiding arti-

ficial approximations of the involved convex supports by ellipsoids.

With respect to further research it offers interesting problems, like:

• It is necessary to address model structure estimation.

• Filtering counterpart is practically needed and its derivation represents a

real challenge.

• Use of the covering of the found support by a union non-overlapping boxes

could and should be developed [18].

21

Acknowledgements

This research has been partially supported by GA CR grant 102/03/0049, by

AV CR project BADDYR 1ET100750401 and by MD CR 1F43A/003/120.

References

[1] V. Peterka, “Bayesian system identification”, in Trends and Progress in System

Identification, P. Eykhoff, Ed., pp. 239–304. Pergamon Press, Oxford, 1981.

[2] L. Ljung, System Identification: Theory for the User, Prentice-Hall, London,

1987.

[3] P.E. Wellstead and M.B. Zarrop, Self-tuning Systems, John Wiley, Chichester,

1991.

[4] IFAC, On-line fault detection and supervision in the chemical process industries,

University of Delaware, Delaware, 1992, 366 p.

[5] E. Mosca, Optimal, Predictive, and Adaptive Control, Prentice Hall, 1994.

[6] F. Doymaz, J. Chen, J.A. Romagnoli, and A. Palazoglu, “A robust strategy for

real-time process monitoring”, J. Process Control, vol. 11, no. 4, pp. 343–359,

2001.

[7] T. Bohlin, Interactive System Identification: Prospects and Pitfalls, Springer-

Verlag, New York, 1991.

[8] M. Karny, “Adaptive systems: Local approximators?”, in Preprints of the IFAC

Workshop on Adaptive Systems in Control and Signal Processing, Glasgow,

August 1998, pp. 129–134, IFAC.

22

[9] K.K. Srinivasan and Z.Y. Guo, “Day-to-day evolution of network flows under

departure time dynamics in commuter decisions”, Travel Demand and Land

Use 2003 Transportation Research Record, vol. 1831, pp. 47–56, 2003.

[10] M. Green and D.J.N. Limebeer, Linear robust control, Prentice Hall, New

Jersey, 1995, ISBN 0-13-102278-4.

[11] J. Roll, A. Nazin, and L. Ljung, “Nonlinear system identification via direct

weight optimization”, Automatica, vol. 41, no. 3, pp. 475–490, 2004.

[12] B.M. Ninness and G.C. Goodwin, “Rapprochment between bounded-error and

stochastic estimation theory”, International Journal of Adaptive Control and

Signal Processing, vol. 9, no. 1, pp. 107–132, 1995.

[13] R. Kulhavy and M. B. Zarrop, “On a general concept of forgetting”,

International Journal of Control, vol. 58, no. 4, pp. 905–924, 1993.

[14] S. Kullback and R. Leibler, “On information and sufficiency”, Annals of

Mathematical Statistics, vol. 22, pp. 79–87, 1951.

[15] M. Karny, “Parametrization of multi-output multi-input autoregressive-

regressive models for self-tuning control”, Kybernetika, vol. 28, no. 5, pp. 402–

412, 1992.

[16] B.T. Polyak, S.A. Nazin, C. Durieu, and E. Walter, “Ellipsoidal parameter

or state estimation under model uncertainty”, Automatica, vol. 40, no. 7, pp.

1171–1179, 2004.

[17] M. Milanese and G. Belforte, “Estimation theory and uncertainty intervals

evaluation in presence of unknown but bounded errors linear families of models

and estimators”, IEEE Transactions on Automatic Control, vol. 27, no. 2, pp.

408–414, 1982.

[18] A. Bemporad, C. Filippi, and F. Torrisi, “Inner and outer approximations of

polytopes using boxes”, Computational Geometry, vol. 27, pp. 151–178, 2004.

23

[19] J. Andrysek, “Approximate recursive Bayesian estimation of dynamic

probabilistic mixtures”, in Multiple Participant Decision Making, J. Andrysek,

M. Karny, and J. Kracık, Eds., pp. 39–54. Advanced Knowledge International,

Adelaide, 2004.

[20] J. M. Bernardo, “Expected infromation as expected utility”, The Annals of

Statistics, vol. 7, no. 3, pp. 686–690, 1979.

[21] G.H. Golub and C.F. VanLoan, Matrix Computations, The John Hopkins

University Press, Baltimore – London, 1989.

[22] M. Karny, “Recursive parameter estimation of regression model when the

interval of possible values is given”, Kybernetika, vol. 18, no. 2, pp. 164–178,

1982.

[23] R. Kulhavy, “Can we preserve the structure of recursive Bayesian estimation

in a limited-dimensional implementation?”, in Systems and Networks:

Mathematical Theory and Applications, U. Helmke, R. Mennicken, and

J. Saurer, Eds., vol. I, pp. 251–272. Akademie Verlag, Berlin, 1994.

[24] R. Kulhavy and M. Karny, “Tracking of slowly varying parameters by

directional forgetting”, in Preprints of the 9th IFAC World Congress, vol. X,

pp. 178–183. IFAC, Budapest, 1984.

[25] G.J. Bierman, Factorization Methods for Discrete Sequential Estimation,

Academic Press, New York, 1977.

[26] M. Karny, A. Halouskova, J. Bohm, R. Kulhavy, and P. Nedoma, “Design

of linear quadratic adaptive control: Theory and algorithms for practice”,

Kybernetika, vol. 21, 1985, Supplement to Nos. 3, 4 ,5, 6.

[27] M. Karny, J. Bohm, T.V. Guy, L. Jirsa, I. Nagy, P. Nedoma, and L. Tesar,

Optimized Bayesian Dynamic Advising: Theory and Algorithms, Springer,

London, 2005, ISBN 1-85233-928-4, pp. 552.

24

Documents

Projection-Based Bayesian Recursive Estimation of ARX ...as.utia.cz/publications/2005/KarPav_05.pdf · the support of the pdf in its argument. The approximation needed for on line