Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Projection-Based Bayesian Recursive
Estimation of ARX Model with Uniform
Innovations
Miroslav Karny and Lenka Pavelkova
Department of Adaptive Systems
Institute of Information Theory and Automation
Academy of Sciences of the Czech Republic
P.O. Box 18, 182 08 Prague 8, Czech Republic
Abstract
Autoregressive model with exogenous inputs (ARX) is a widely-used black-box type
model underlying adaptive predictors and controllers. Its innovations, stochastic un-
observed stimulus of the model, are white, zero mean with time-invariant variance.
Mostly, the innovations are assumed to be normal. It induces least squares as the
adequate estimation procedure. Light tails of the normal distribution imply that its
unbounded support can often be accepted as a reasonable approximate description of
physical quantities, which are mostly bounded. In some case, however, this approx-
imation is too crude or does not fit subsequent processing, for instance, robust con-
trol design. Then, techniques similar to those dealing with unknown-but-bounded
equation errors are used. They intentionally give up stochastic interpretation of
innovations and develop various algorithms of a min-max type.
The paper assumes bounded innovations but stays within the standard, here
Bayesian, estimation setup by considering their uniform distribution. The posterior
Preprint submitted to Elsevier Science 22 August 2005
probability density function (pdf) is first described and then approximated by a pdf
with a fixed-dimensional statistic. Consequently, the estimation can run on line. Its
limited memory allows for tracking varying parameters. In this way, an alternative
to popular forgetting techniques is also obtained.
The paper provides a complete algorithmic solution and illustrates its behavior.
Key words: ARX model, Bayesian recursive estimation, uniform distribution
1 INTRODUCTION
Recursive estimation [1,2] is in the heart of a range of adaptive decision-making
units that include predictors, advising and fault-detecting systems as well as
adaptive controllers, e.g. [3–6]. Sufficiency of simple, often black-box, locally
valid models for successful decision making is the major advantage of adaptive
systems [7,8]. It makes them relatively universal and keeps costs of their im-
plementation low. Autoregressive model with exogenous inputs (ARX) is an
important representative of this model class. Its innovations, stochastic unob-
served stimulus of the model, are white, zero mean and have time-invariant
variance. Mostly, the innovations are assumed to be normal. It singles out
least squares as the adequate estimation procedure [1]. Light tails of the nor-
mal distribution imply that unbounded support of normal distribution can
often be accepted as a reasonable approximation of reality, which is mostly
bounded. In some case, however, this assumption is unrealistic, e.g. in mod-
elling of transportation systems (for encountered problems see e.g. [9]), or do
not fit subsequent processing, for instance, robust control design [10]. Then,
techniques similar to those dealing with unknown-but-bounded equation er-
rors are used, see references in [11]. They often intentionally give up stochastic
2
interpretation of innovations and develop and analyze various algorithms of a
min-max type, cf. [12]. The resulting algorithms are definitely useful. However,
giving up the probabilistic interpretation makes solutions of related decision-
making tasks unnecessarily difficult as a rich set of statistical tools is given up,
too. Thus, it makes sense to address the discussed estimation within the “clas-
sical” probabilistic setting. The paper accepts the assumption that innovations
are bounded but stays within the standard, here Bayesian, estimation setup
[1] by assuming their uniform distribution. The posterior probability density
function (pdf) is described and then approximated by a pdf of the same type
but having a fixed-dimensional statistic. Consequently, the estimation can run
on line and its limited memory allows tracking of varying parameters. Thus,
an alternative to forgetting techniques [13] is obtained.
The exact Bayesian estimation of the ARX model is recalled, Section 2. Its
recursively applicable approximate estimation is presented in Section 3. Sec-
tion 4 discusses practical implementation aspects of the obtained algorithm,
which is summarized in Section 5. Properties of the proposed algorithm are
illustrated on simulation examples and on processing of simple transportation
data, Section 6. Remarks, Section 7, close the paper.
Throughout: ′ denotes transposition; x means the number of entries in a vector
x; I denotes unit matrix; the subscript in round brackets defines dimensions
of the matrix; ≡ is equality by definition; X∗ denotes a set of X-values; χx(•)is value of the indicator of a set x∗ at the point x: the set x∗ defined by
the conditions •; f(·|·) denotes probability density function (pdf); t labels
discrete-time moments, t ∈ t∗ ≡ {1, 2, . . .}; dt = (yt, ut) is the data record at
time t consisting of an observed system output yt and of an optional system
input ut; X(t) denotes the sequence (X1, . . . , Xt), X ∈ {d, y, u}. Names of
3
arguments distinguish respective pdfs. No formal distinction is made between
a random variable, its realization and an argument of a pdf. Integrals used are
always definite and multivariate ones. The integration domain coincides with
the support of the pdf in its argument.
The approximation needed for on line estimation exploits Kullback-Leibler
(KL) divergence D(f ||f
)[14] that measures the proximity of a pair of pdfs
f , f acting on a set X∗. It is defined as follows
D(f ||f
)≡
∫f(X) ln
(f(X)
f(X)
)dX. (1)
The KL divergence has the following properties of interest
D(f ||f) ≥ 0, D(f ||f) = 0 iff f = f almost everywhere on X∗ (2)
D(f ||f) = ∞ if the set on which f(x) = 0 and f(x) > 0 has non-zero volume.
2 ESTIMATING ARX MODEL WITH UNIFORM INNOVATIONS
The parameterized model of the system with a single output yt
f(yt|ut, d(t− 1), Θ) = f(yt|ψt, Θ) ≡ Uyt(ψ′θ, r) (3)
≡ χyt(−r ≤ yt − ψ′tθ ≤ r)
2r=
χyt(−r ≤ Ψt[1,−θ′]′ ≤ r)
2r
is the pdf that describes the ARX model with uniform innovations. In it,
ψt is a column regression vector made of past observed data d(t− 1) and the
current system input ut; often, the state in the phase form is considered,
i.e., ψ′t ≡ [u′t, d′t−1, . . . , d
′t−∂, 1] with the model order ∂ ≥ 0,
Ψ′t ≡ [yt, ψ
′t] combines the modelled output and regression vector in the single
data vector; for the state in the phase form Ψ′t = [d′t, d
′t−1, . . . , d
′t−∂, 1],
4
θ is a column vector of regression coefficients,
r > 0 is a positive scalar half-width of the range of innovations et ≡ Ψ′t[1,−θ′]′,
Θ ≡ (θ, r) labels unknown parameters of the model,
Uy(µ, r) is a uniform pdf of y given by expectation µ and half-width r > 0.
Note that single output model is sufficient as the chain rule for pdfs [1] implies
that the multivariate case can be treated using a collection of such models [15].
Under natural conditions of control [1], which assume inputs conditionally
independent of unknown parameters f(ut|d(t − 1), Θ) = f(ut|d(t − 1)), the
likelihood function assigned to the uniform ARX model has the form
L(d(t), Θ) =1
rtχr(r ≥ 0)χΘ(−1tr ≤ Wt[−1, θ′]′ ≤ 1tr), (4)
where 1t is the column vector consisting of t units and W ′t ≡ [Ψ1, . . . , Ψt]. The
matrix Wt is assumed to be known. Thus, the initial regression vector ψ1 is to
be known. For the state in the phase form, it is equivalent to the knowledge
of u1, d0, . . . , d−∂+1. The likelihood form (4) hints the reproducing prior pdf
f(Θ) ∝ 1
rν0χr(r ≥ r ≥ 0)χΘ(−1ν0r ≤ W0[−1, θ′]′ ≤ 1ν0r), (5)
where ∞ > r > 0 is a sure upper bound on r. W0 can be interpreted as a
storage of data vectors (fictitiously) measured before the time t = 1 and ν0 is
their number. The proper prior pdf is obtained for full-rank W0 and ν0 ≥ Ψ+1.
For the prior pdf (5), the posterior pdf has the same form
f(Θ|d(t))∝ 1
rνtχr(r ≥ r ≥ 0)χΘ(−1νtr ≤ Wt[−1, θ′]′ ≤ 1νtr), with (6)
νt = νt−1 + 1, ν0 ≥ Ψ + 1 is chosen a priori
W ′t =
[W ′
t−1, Ψt
], full-rank W0 is chosen a priori.
5
Applicability of the formula (6) is limited as the dimension (νt, Ψ) of the matrix
Wt increases with the number t of processed data d(t), which also increases
complexity of the convex support defined by them. This makes us search for
an approximate solution, especially when considering the recursive estimation.
3 APPROXIMATE RECURSIVELY FEASIBLE ESTIMATION
To obtain approximate, recursively feasible, estimation of the uniform ARX
model we have to approximate the exact posterior pdf by a pdf determined
by a statistic whose finite dimension does not increase with the increasing
number of data. Approximation of the support of the pdf by an ellipsoid is
used in connection with the estimation dealing with unknown-but-bounded
innovations [16]. Here, we propose an alternative approximation by a convex
set. Similar methodology was used in early papers addressing this problem
[17]. Intuitively, the gained support shape is more similar to the correct one
than ellipsoidal. Moreover, if need be, the presented solution can be comple-
mented by covering the approximate support by a union of non-overlapping
boxes, using algorithms derived in [18]. This provides a simple description of
uncertainties suitable for robust prediction and control design.
The exact posterior pdf (6) motivates the form of the approximate pdf
f(Θ|d(t− 1)) =1
rνt−1 χr(r ≥ r ≥ 0)χΘ(−1V r ≤ Vt−1[−1, θ′]′ ≤ 1V r)
I(Vt−1, νt−1)(7)
I(V, ν)≡∫ 1
rνχr(r ≥ r ≥ 0)χΘ(−1V r ≤ V [−1, θ′]′ ≤ 1V r) dΘ. (8)
These pdfs are determined by full-rank matrices Vt−1 with a finite first dimen-
sion V ≥ Ψ that is common for t ∈ t∗.
6
The projection-based approximation of the Bayes rule [19] is applied for design
of the recursively implementable data updating of these approximate pdfs. The
proper updating of f(Θ|d(t− 1)) by the data vector Ψt leads to the pdf
f(Θ|d(t)) =
χr(r≥r≥0)
rνt−1+1 χΘ
(−1V +1r ≤ [V ′
t−1, Ψt]′[−1, θ′]′ ≤ 1V +1r
)
I ([V ′t−1, Ψt]′, νt−1 + 1)
. (9)
We search for the approximate posterior pdf f(Θ|d(t)) in the form (7), with
t replacing t − 1. In [20], this approximation is cast into the Bayesian con-
text. Under rather general conditions, it is found that the “best” approximate
f(Θ|d(t)) is to be selected as minimizer of the KL divergence D(f ||f) over
pdfs of the desired form (7), thus over Vt, νt.
The KL divergence will be finite iff the support of f will be included in the
support of the constructed approximate pdf. This can be guaranteed if some
of the restrictions on Θ, defined by rows of the matrix [V ′t−1, Ψt] V ′
t , is relaxed.
We use this possibility and optimize over
Vt ∈ V ∗t ≡ {Vt−1,
iV, i = 1, . . . , V } (10)
with iV obtained from Vt−1 by replacing its ith row with Ψ′t. As the scalar νt
expresses the number of used data vectors, it is reasonable to take
νt ≡ ν(Vt, νt−1) =
νt−1 if Vt = Vt−1
νt−1 + 1 otherwise
. (11)
Under restrictions (10), (11) and for a fixed r, the constructed pdf is constant
function of θ on the integration domain in the definition of the KL divergence
(1) and arg minV ∈V ∗t D(f ||f) = arg minV ∈V ∗t ln[I(V, ν)]. For the optimal choice
7
of Vt, we have to inspect dependence of I(V, ν) on V . The substitution x′ =
[−1/r︸ ︷︷ ︸x1
, θ′/r] in (8) and a straightforward bounding from above give
I(V, ν) =∫
(−x1)ν−Ψ−1χx1(x1 ≤ −1/r)χx(−1V ≤ V x ≤ 1V ) dx ≤ V(V )
rν−Ψ−1(12)
r≡ arg min{Θ: r≥r≥0,−r1V ≤V [−1,θ′]′≤r1V }
r (13)
V(V )≡∫
χx(−1V ≤ V x ≤ 1V ) dx.
The exact evaluation of I(V, ν) is hard. Thus, instead of minimizing the KL
divergence, which is its increasing function of I(V, ν), we minimize the upper
bound in (13) over V ∗t given by (10). The complete argument Θ ≡ (θ, r) ≡
(θ(V ), r(V )) ≡ Θ(V ) of the minima (13) is found by linear programming (LP).
It serves also as a good point estimate of Θ.
In summary, the set V ∗t of candidates for Vt contains Vt−1 and iV , i = 1, . . . , V ,
where iV is obtained by replacing ith row of Vt−1 with Ψ′t. The best option Vt
minimizes the upper bound (13), i.e., V(Vt)/rνt−Ψ−1(Vt) ≤ V(V )/rνt−Ψ−1(V ),
∀V ∈ V ∗t with νt given by the rule (11). Thus, it remains to inspect the
dependence V(V ).
For V = Ψ, the substitution z ≡ V x in V(V ) implies that the volume V(V ) is
increasing function of |V |−1, where | · | denotes absolute value of the determi-
nant. Defining the vector a ≡(V ′
t−1
)−1Ψt, the replacement of the ith row of
Vt−1 by Ψ′t offers the matrix
iVt ≡ Vt−1 + ie(Ψ′t − ie′Vt−1) =
(I + ie(a− ie)′
)Vt−1 ≡ (I + ie ib′)Vt−1,
where I is the unit matrix and ie its ith column. The absolute value of the
determinant | iVt| = |I + ie ib′||Vt−1| = |ai||Vt−1| as the absolute value |ai| of
ith entry of a is the only non-unit eigenvalue of I + ie ib′ corresponding to the
8
eigenvector ie. The remaining eigenvectors are vectors orthogonal to ib.
Thus, the replacement ith row in Vt−1 decreases the inspected volume V(Vt−1)
iff maxi=1,...,Ψ |ai| > 1. The argument of the maxima points to the row of Vt−1
to be replaced by Ψ′t.
Evaluation of the optimized volume in the general case V > Ψ is harder, cf.
[18]. We need, however, to find only its dependence on possible variants iVt.
For it, let us drop temporarily the subscript t and consider singular value
decompositions iV ≡ iS iΛ iT [21], of the full-rank (V , Ψ) matrices iV . In
them, iS, iT are orthogonal square matrices of appropriate dimensions and
iΛ is (V , Ψ) matrix with only non-zero entries (singular values) iΛkk, k =
1, . . . , Ψ. For arbitrary ω > 0, let us consider the extensions of these matrices
to the full-rank square (V , V ) matrices
iVω = iS
iΛ
0(Ψ,V−Ψ)
ωI(V−Ψ,V−Ψ)
iT 0
0 iU
︸ ︷︷ ︸iT
(14)
where iU are arbitrary orthogonal matrices and 0(m,n) is the (m,n) matrix
with zero entries. Consequently, the matrices iT in (14) are orthogonal. Let
us inspect volumes of the sets
i(x, κ)∗(Vω) ≡ {x, κ : −1V ≤ iVω[x′, κ′]′ ≤ 1V }, (15)
with the “dummy” variables κ in (V − Ψ)-dimensional vector. The volume
V( iVω) of the set i(x, κ)∗(Vω) (15) equal to c| iVω|−1 with a common positive
c. For arbitrary i, j, ω, the volume ratio ijRω ≡ V( jVω)/V( iVω) is well defined
9
and independent of ω
ijRω ≡ | iVω|| jVω| =
1︷︸︸︷| iS|∏Ψ
k=1iΛk|ωI||
1︷︸︸︷iT |
| jS|︸︷︷︸1
∏Ψk=1
jΛk|ωI| | jT |︸ ︷︷ ︸1
=Ψ∏
k=1
iΛk
jΛk
. (16)
The sets (x, κ)∗(Vω) extend the sets of interest x∗(V ) ≡ {x : −1V ≤ V x ≤ 1V }by the dummy variables κ. Volumes of these extensions vary with ω between
zero (for ω → ∞) and infinity (for ω → 0). The ratio of the overall volumes
is, however, constant and well defined for all ω > 0. Thus, it makes sense to
interpret the ratio (16) of the extended sets as the ratio of the volumes of the
corresponding original sets x∗(V ), V ∈ V ∗. It holds
| iV ′ iV | = | iT ′ iΛ′ iS ′ iS︸ ︷︷ ︸I
iΛ iT | =Ψ∏
k=1
iΛ2k (17)
and thus | iV ′ iV | equals the square of the expression entering the inspected
ratio of volumes. Similarly as in the simpler case V = Ψ, it can be found that
| iV ′t
iVt|= |V ′t−1Vt−1| [(1 + F ′F )(1− iG′ iG) + ( iG′F )2]︸ ︷︷ ︸
ai
(18)
F ≡ (V ′t−1Vt−1)
−0.5Ψt,iG = (V ′
t−1Vt−1)−0.5 × ith column of V ′
t−1.
Thus, the replacement decreases the inspected volume iff maxi=1,...,Ψ |ai| > 1.
The maximizing argument points to the row of Vt−1 to be replaced by Ψ′t to get
the best Vt. A straightforward algebra confirms that the derived rule reduces
to the former one (up to the unimportant squaring) if V = Ψ.
10
4 PRACTICAL ASPECTS
4.1 Choice of V0
The initial V0 = W0 is supposed to be chosen so that the set {Θ : r ≥ r ≥0, −1V r ≤ V0[−1, θ′]′ ≤ 1V r} is bounded. Typically, we know that r ≥ r ≥ 0
and θ ∈ [θ, θ] with some 1V∞ > θ > θ > −1V∞. Throughout, the inequalities
for the vector quantities are understood entry-wise. Let us take V = Ψ and
V0 ≡
r 0(1,ψ)
−rθ1+θ1
θ1−θ1
2rθ1−θ1
.... . .
−rθψ+θψ
θψ−θψ
2rθψ−θψ
. (19)
Then, the first row guarantees that r ≥ r. The upper bound r ≥ r is specified
independently in (7). Entries of θ are bounded individually. For ith entry
−rθi + θi
θi − θi
+ θi2r
θi − θi
≤ r ⇔ θi ≤ θ.
The correctness of the definition of the lower bound can be verified similarly.
Notice that the proposed construction of the matrices Vt, which are updated
only when the product of their singular values increases, guarantees that the
estimation task remains well-conditioned for all t.
11
4.2 Choice of V
The developed recursive estimation uses the projected (approximated) results
obtained at time t as statistics of the prior pdf for time t + 1. It means that
the precious information contained in observed data is corrupted by this pro-
cessing, cf. discussion in [22,23]. Essentially, the correct commutative scheme
of the recursive estimation with projection
sufficient statistict−1 → Bayesian update by Ψt → sufficient statistict
↓ projection ↓ projection
projected statistict−1 → correct approximate update by Ψt → projected statistict
is replaced by the scheme
statistict−1 → Bayesian update by Ψt → updated statistict
↓ projection
statistict → Bayesian update by Ψt+1 . . . ,
where statistict describes the approximate posterior pdf. Thus, it is generally
desirable to choose V as large as possible. Its practical choice is thus deter-
mined by the induced computational demands, see Section 4.3.
4.3 Computational demands and robustness
The minimization of the upper bound (12) over V ∈ V ∗t requires solution
of V linear-programming (LP) tasks (13) of the size (V , Ψ). This represents
the main computational burden connected with the updating Vt−1 → Vt. It
can be substantially decreased by exploiting closeness of respective LP’s with
matrices V ∈ V ∗t (10) differing at most in two rows. Moreover, it is possible
and reasonable to restrict the search for the optimum of (12) to a subset
12
of V ∗t . It has an added advantage as it allows preserving a selected part of
Vt and to make the updating robust. Specifically, if we optimize only over
{Vt−1,iV, i = Ψ + 1, . . . , V } the prior “safe” bounds stored in V0 ≡ W0 are
unchanged. We use this option.
The prior V0 expresses usually box-type restrictions, see Section (4.1). If the
stabilized estimation results indicate this box too conservative, we can replace
it in one shot by the box covering the convex set determined by Vt without the
original V0. The construction of this box requires solving 2×Ψ LP tasks of the
type (V , Ψ), [18]. The detection when this replacement is suitable represents
an independent non-trivial tasks. This makes us to avoid this option.
Moreover, we conjecture that changes of the optimized upper bound (13) follow
predominantly changes of the volume V(V ). This brings additional simplifi-
cation we use: the replacement is made according the volume changes. The
single LP task is then solved only for the selected Vt only and provides the
point estimate Θ of Θ.
The form of supports of considered posterior pdfs implies that the solved LP
may fail only when the upper bound r on r is too small. When facing such a
failure, the restriction r should be relaxed. The point estimate r found without
this restriction defines a new larger upper bound r.
4.4 Relationships to least squares
The derived algorithm is tightly related with least squares (LS). The ma-
trix W ′tWt coincides with so called extended information matrix, which forms,
together with νt, sufficient statistics of normal ARX model. Its updating is
13
algorithmically equivalent to recursive LS. The matrix V ′t Vt, used in construc-
tion Vt+1, can be taken as an approximation of the information matrix that
preserves only V data vectors among those observed up to time t. Thus, the
proposed algorithm provides as byproduct LS that deals with a restricted and
varying choice of informative data vectors. This complements, if not compete,
with classical forgetting based on age-weighting. The preservation of V0 pro-
posed in Section 4.3 even avoids known instability of exponential forgetting
under information lack. Thus, it is close to the advanced versions of forgetting,
which counteracts this problem, see [13,24].
The observed relationship also helps in implementation: the approximate in-
formation matrices should be updated in the factorized way mastered in con-
nection with LS, see [25].
5 ALGORITHMIC SUMMARY
Initial mode
• Select the structure of the ARX model, i.e., structure of data vectors Ψt.
• Select the dimension V ≥ Ψ of the statistic V .
• Select lower and upper bounds on the estimated parameters and construct
the matrix V0 according to (19).
• Collect the data up to the moment when data vector Ψt complement Vt to
its full chosen dimension V .
• Fill initial data into the first regression vector ψ1, choose ν0 and set t = 0.
14
Recursive mode
• Set t = t + 1, acquire data dt and create the data vector Ψt = [yt, ψ′t]′.
• Update by Ψt the matrix Vt−1 to the matrix Vt as follows.
If Vt−1 < V set V ′t = [V ′
t−1, Ψt] otherwise
· Evaluate H =(V ′
t−1Vt−1
)−0.5
· Compute the vector F = HΨt, the matrix G = HV ′t−1 – with ith column
iG (18) – and evaluate the scalar γ = 1 + F ′F .
· For i = 1, . . . , V , evaluate ai ≡ γ(1− iG′ iG) + ( iG′F )2 and then find the
entry ak with the largest absolute value, i.e., |ak| ≥ |ai|.· Set Vt = Vt−1 if |ak| ≤ 1 else replace kth row of Vt−1 by Ψ′
t to get Vt.
• Set νt = νt−1 and preserve the point estimate Θt ≡ Θt−1 of parameters Θ if
Vt = Vt−1 otherwise set νt = νt−1 + 1 and update point estimates
Θt = arg min{Θ:−1V r≤Vt[−1,θ′]′≤1V r, r∈[r,r]}
r (20)
Increase appropriately r if the above LP fails.
• Go to the beginning of Recursive mode while data are available.
6 ILLUSTRATIVE EXAMPLES
This section illustrates basic features of the proposed algorithm on a simulated
example and on prediction of real transportation data.
15
0 10 20 30 40 50 60 70 80 90 100−350
−300
−250
−200
−150
−100
−50
0
50SIMULATED DATA
TIME
SYST
EM OU
TPUT
0 10 20 30 40 50 60 70 80 90 100−1.5
−1
−0.5
0
0.5
1PREDICTION ERROR FOR MEMORY 5
TIME
PRED
ICTION
ERRO
R
Fig. 1. Simulated data Prediction errors, V = 1× Ψ = 5
0 10 20 30 40 50 60 70 80 90 1000
0.5
1
1.5
2
2.5
3ESTIMATE OF REGRESSION COEFFICIENT 1
TIME
REGR
ESSIO
N COE
FFICI
ENT
0 10 20 30 40 50 60 70 80 90 100−3
−2.5
−2
−1.5
−1
−0.5
0
0.5ESTIMATE OF REGRESSION COEFFICIENT 2
TIME
REGR
ESSIO
N COE
FFICI
ENT
0 10 20 30 40 50 60 70 80 90 100−1
−0.5
0
0.5
1
1.5
2ESTIMATE OF REGRESSION COEFFICIENT 3
TIME
REGR
ESSIO
N COE
FFICI
ENT
0 10 20 30 40 50 60 70 80 90 100−2.5
−2
−1.5
−1
−0.5
0
0.5ESTIMATE OF REGRESSION COEFFICIENT 4
TIME
REGR
ESSIO
N COE
FFICI
ENT
Fig. 2. Estimates θ of θ for V = 1× Ψ = 5; ∗ marks the simulated values oθ.
6.1 Simulated case
The simple single-output system was simulated by the uniform AR model
f(yt| yt−1, yt−2, yt−3, 1︸ ︷︷ ︸ψ′t
) = 0.5χyt(− 1︸︷︷︸or
≤ yt−[2.85,−2.7075, 0.8574, 0]︸ ︷︷ ︸oθ′
ψt ≤ 1).
Parameters of the uniform ARX model with the structure enriched by offset
were estimated using 100 data samples for various storage lengths V . The
prior pdf was selected as recommended in Section 4.2 with r = 10, r = 10−6,
θ = −θ = 102×15, Influence of the storage length V , taken as integer multiples
of Ψ = 5, was inspected on this simple example.
The processed data are on Figure 1 together with prediction errors obtained
for V = Ψ = 5. These errors are rather close to the noise realization. Typical
trajectories of parameter estimates θ for V = 5 are in Figure 2. Character
16
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1POINT ESTIMATE OF THE NOISE−RANGE HALF−WIDTH
TIME
HALF
−WIDT
H
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1POINT ESTIMATE OF THE NOISE−RANGE HALF−WIDTH
TIME
HALF
−WIDT
H
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1POINT ESTIMATE OF THE NOISE−RANGE HALF−WIDTH
TIME
HALF
−WIDT
H
0 10 20 30 40 50 60 70 80 90 1000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1POINT ESTIMATE OF THE NOISE−RANGE HALF−WIDTH
TIME
HALF
−WIDT
H
Fig. 3. Estimates r of r for V = 1, 2, 3, 4× Ψ = 5; ∗ marks the simulated value or.
0 10 20 30 40 50 60 70 80 90 1000
2
4
6
8
10
12
14
16
18ERROR OF PARAMETER ESTIMATES FOR MEMORY 5
TIME
SQUA
RED E
STIM
ATION
ERRO
R
0 10 20 30 40 50 60 70 80 90 1000
2
4
6
8
10
12
14
16
18ERROR OF PARAMETER ESTIMATES FOR MEMORY 10
TIME
SQUA
RED E
STIM
ATION
ERRO
R
0 10 20 30 40 50 60 70 80 90 1000
2
4
6
8
10
12
14
16
18ERROR OF PARAMETER ESTIMATES FOR MEMORY 15
TIME
SQUA
RED E
STIM
ATION
ERRO
R
0 10 20 30 40 50 60 70 80 90 1000
2
4
6
8
10
12
14
16
18ERROR OF PARAMETER ESTIMATES FOR MEMORY 20
TIME
SQUA
RED E
STIM
ATION
ERRO
R
Fig. 4. Trajectories of ρt ≡ ||θt − oθ||2 for V = 1, 2, 3, 4× V
of these trajectories is similar for tested dimensions V = 5, 10, 15, 20. Main
differences are in estimates r of the half-width r, see Figure 3. Also, the trajec-
tories of squared norms ρt ≡ ||θt− oθ||2 ≡ (θt− oθ)′(θt− oθ) exhibit predictable
dependence on V , see Figure 4. Table 1 containing elementary statistical eval-
uation complements the overall picture about properties of the algorithms.
It is worth of noticing that only about 50-60% of data is taken as informa-
tive, which implies the corresponding reduction in computational time and
relatively infrequent changes of point estimates.
17
Table 1
Elementary sample statistics of estimation; e ≡ prediction error, ρ ≡ ||θ − oθ||2 ≡squared norm of the estimation error
storage length V 5 10 15 20
mean of e -0.237 -0.051 -0.031 -0.102
minimum of e -1.020 -1.032 -1.091 -1.099
maximum of e 1.000 1.000 1.000 1.000
standard deviation of e 0.474 0.549 0.556 0.552
ratio of standard deviations of e and data 0.005 0.006 0.006 0.006
mean of ρ 4.712 2.750 2.630 2.721
minimum of ρ 0.039 0.029 0.005 0.014
maximum of ρ 16.189 17.371 17.368 17.368
standard deviation of ρ 4.064 4.852 4.875 4.843
portion of informative data [ % ] 55.000 56.000 61.000 63.000
elapsed time [s] 5.187 5.734 6.875 7.478
6.2 Transportation data
The research reported here has been motivated by practical problems in esti-
mation of urban-traffic models. Thus, the data in this domain were taken for
inspecting properties of the proposed algorithm. In a particular sub-problem,
the counts of cars crossing a cross-road per hour were recorded. To get the
required predictor, logarithms of these counts are modelled by fourth order
18
0 100 200 300 400 500 600 7000
100
200
300
400
500
600
700
800
900
1000REAL DATA
TIME
MODE
LLED
INTE
NSITY
0 100 200 300 400 500 600 700−600
−400
−200
0
200
400
600
800PREDICTION ERROR FOR MEMORY 42
TIME
PRED
ICTION
ERRO
R
Fig. 5. Transportation data Prediction errors, V = 7× Ψ = 42
0 100 200 300 400 500 600 7000
0.2
0.4
0.6
0.8
1
1.2
1.4ESTIMATE OF REGRESSION COEFFICIENT 1
TIME
REGR
ESSIO
N COE
FFICI
ENT
0 100 200 300 400 500 600 700−1.5
−1
−0.5
0
0.5
1
1.5ESTIMATE OF REGRESSION COEFFICIENT 2
TIME
REGR
ESSIO
N COE
FFICI
ENT
0 100 200 300 400 500 600 700−1.5
−1
−0.5
0
0.5
1
1.5
2ESTIMATE OF REGRESSION COEFFICIENT 3
TIME
REGR
ESSIO
N COE
FFICI
ENT
0 100 200 300 400 500 600 700−1.5
−1
−0.5
0
0.5
1ESTIMATE OF REGRESSION COEFFICIENT 4
TIME
REGR
ESSIO
N COE
FFICI
ENT
0 100 200 300 400 500 600 700−1
−0.5
0
0.5
1
1.5
2
2.5
3
3.5
4ESTIMATE OF REGRESSION COEFFICIENT 5
TIME
REGR
ESSIO
N COE
FFICI
ENT
0 100 200 300 400 500 600 7001
2
3
4
5
6
7POINT ESTIMATE OF THE NOISE−RANGE HALF−WIDTH
TIME
HALF
−WIDT
H
Fig. 6. Estimates θ of θ for V = 7× Ψ = 42.
uniform AR model. The results are presented in the similar form as the sim-
ulated example. Predictions and prediction errors are expressed in original
counts.
The processed data are on Figure 5 together with prediction errors obtained
for V = 7× Ψ = 42. These errors are typical.
Typical trajectories of parameter estimates θ and r for V = 42 are in Fig-
ure 6. Character of these trajectories is similar for tested dimensions V =
12, 18, 24, 30, 36, 42, 60, 180.
Table 1 containing elementary statistical evaluation complements the overall
19
picture about properties of the algorithms. It is worth of noticing that the
range 11-59% of informative data. The extension of the storage length V
essentially ceases to influence the result quality for V > 24 = 4× Ψ.
Table 2
Elementary sample statistics of estimation; e ≡ prediction error
storage length V 12 24 42 180
mean of e 114 100 99 96
minimum of e -320 -450 -467 -462
maximum of e 828 773 772 774
standard deviation of e 192 190 189 188
ratio of standard deviations of e and data 0.74 0.73 0.73 0.73
portion of informative data [ % ] 43 11 18 59
elapsed time [s] 4.2 11.1 22.6 702.4
7 CONCLUDING REMARKS
The paper provides a relatively straightforward approximation of the recursive
Bayesian estimation of the ARX model with uniform innovations.
With respect to practice:
• It respects hard bounds on all model parameters.
• It provides description of possible models by a convex support of the (ap-
proximate) posterior pdf. Such a description is often supposed in subsequent
robust-control design.
20
• It makes behavior of point parameter estimates much more quiet than is
usual with standard least squares. It may decrease demands on repetition
of the subsequent control design and suits well to successful approximate
design strategy of iterations-spread-in-time [26,27]. Altogether, it makes the
resulting adaptive system applicable to high-dimensional systems in spite of
the increased computational demands (comparing to least squares) on the
recursive estimation.
With respect to theory :
• It offers an interesting complement to parameter tracking based on forget-
ting (age-weighting). It is expected to serve well in cases when newer data
do not automatically mean more informative ones. The exactly same idea
can be applied to other model types, even to normal regression models.
• It contributes to the unknown-but-bounded methodology by avoiding arti-
ficial approximations of the involved convex supports by ellipsoids.
With respect to further research it offers interesting problems, like:
• It is necessary to address model structure estimation.
• Filtering counterpart is practically needed and its derivation represents a
real challenge.
• Use of the covering of the found support by a union non-overlapping boxes
could and should be developed [18].
21
Acknowledgements
This research has been partially supported by GA CR grant 102/03/0049, by
AV CR project BADDYR 1ET100750401 and by MD CR 1F43A/003/120.
References
[1] V. Peterka, “Bayesian system identification”, in Trends and Progress in System
Identification, P. Eykhoff, Ed., pp. 239–304. Pergamon Press, Oxford, 1981.
[2] L. Ljung, System Identification: Theory for the User, Prentice-Hall, London,
1987.
[3] P.E. Wellstead and M.B. Zarrop, Self-tuning Systems, John Wiley, Chichester,
1991.
[4] IFAC, On-line fault detection and supervision in the chemical process industries,
University of Delaware, Delaware, 1992, 366 p.
[5] E. Mosca, Optimal, Predictive, and Adaptive Control, Prentice Hall, 1994.
[6] F. Doymaz, J. Chen, J.A. Romagnoli, and A. Palazoglu, “A robust strategy for
real-time process monitoring”, J. Process Control, vol. 11, no. 4, pp. 343–359,
2001.
[7] T. Bohlin, Interactive System Identification: Prospects and Pitfalls, Springer-
Verlag, New York, 1991.
[8] M. Karny, “Adaptive systems: Local approximators?”, in Preprints of the IFAC
Workshop on Adaptive Systems in Control and Signal Processing, Glasgow,
August 1998, pp. 129–134, IFAC.
22
[9] K.K. Srinivasan and Z.Y. Guo, “Day-to-day evolution of network flows under
departure time dynamics in commuter decisions”, Travel Demand and Land
Use 2003 Transportation Research Record, vol. 1831, pp. 47–56, 2003.
[10] M. Green and D.J.N. Limebeer, Linear robust control, Prentice Hall, New
Jersey, 1995, ISBN 0-13-102278-4.
[11] J. Roll, A. Nazin, and L. Ljung, “Nonlinear system identification via direct
weight optimization”, Automatica, vol. 41, no. 3, pp. 475–490, 2004.
[12] B.M. Ninness and G.C. Goodwin, “Rapprochment between bounded-error and
stochastic estimation theory”, International Journal of Adaptive Control and
Signal Processing, vol. 9, no. 1, pp. 107–132, 1995.
[13] R. Kulhavy and M. B. Zarrop, “On a general concept of forgetting”,
International Journal of Control, vol. 58, no. 4, pp. 905–924, 1993.
[14] S. Kullback and R. Leibler, “On information and sufficiency”, Annals of
Mathematical Statistics, vol. 22, pp. 79–87, 1951.
[15] M. Karny, “Parametrization of multi-output multi-input autoregressive-
regressive models for self-tuning control”, Kybernetika, vol. 28, no. 5, pp. 402–
412, 1992.
[16] B.T. Polyak, S.A. Nazin, C. Durieu, and E. Walter, “Ellipsoidal parameter
or state estimation under model uncertainty”, Automatica, vol. 40, no. 7, pp.
1171–1179, 2004.
[17] M. Milanese and G. Belforte, “Estimation theory and uncertainty intervals
evaluation in presence of unknown but bounded errors linear families of models
and estimators”, IEEE Transactions on Automatic Control, vol. 27, no. 2, pp.
408–414, 1982.
[18] A. Bemporad, C. Filippi, and F. Torrisi, “Inner and outer approximations of
polytopes using boxes”, Computational Geometry, vol. 27, pp. 151–178, 2004.
23
[19] J. Andrysek, “Approximate recursive Bayesian estimation of dynamic
probabilistic mixtures”, in Multiple Participant Decision Making, J. Andrysek,
M. Karny, and J. Kracık, Eds., pp. 39–54. Advanced Knowledge International,
Adelaide, 2004.
[20] J. M. Bernardo, “Expected infromation as expected utility”, The Annals of
Statistics, vol. 7, no. 3, pp. 686–690, 1979.
[21] G.H. Golub and C.F. VanLoan, Matrix Computations, The John Hopkins
University Press, Baltimore – London, 1989.
[22] M. Karny, “Recursive parameter estimation of regression model when the
interval of possible values is given”, Kybernetika, vol. 18, no. 2, pp. 164–178,
1982.
[23] R. Kulhavy, “Can we preserve the structure of recursive Bayesian estimation
in a limited-dimensional implementation?”, in Systems and Networks:
Mathematical Theory and Applications, U. Helmke, R. Mennicken, and
J. Saurer, Eds., vol. I, pp. 251–272. Akademie Verlag, Berlin, 1994.
[24] R. Kulhavy and M. Karny, “Tracking of slowly varying parameters by
directional forgetting”, in Preprints of the 9th IFAC World Congress, vol. X,
pp. 178–183. IFAC, Budapest, 1984.
[25] G.J. Bierman, Factorization Methods for Discrete Sequential Estimation,
Academic Press, New York, 1977.
[26] M. Karny, A. Halouskova, J. Bohm, R. Kulhavy, and P. Nedoma, “Design
of linear quadratic adaptive control: Theory and algorithms for practice”,
Kybernetika, vol. 21, 1985, Supplement to Nos. 3, 4 ,5, 6.
[27] M. Karny, J. Bohm, T.V. Guy, L. Jirsa, I. Nagy, P. Nedoma, and L. Tesar,
Optimized Bayesian Dynamic Advising: Theory and Algorithms, Springer,
London, 2005, ISBN 1-85233-928-4, pp. 552.
24