Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
Artificial Neural Networks (Spring 2007)
Neural Networks for Solving Systems of Linear Equations
Seyed Jalal KazemitabarReza Sadraei
Instructor: Dr. Saeed BagheriArtificial Neural Networks Course (Spring 2007)
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Outline
Historical IntroductionProblem FormulationStandard Least Squares SolutionGeneral ANN SolutionMinimax SolutionLeast Absolute Value SolutionConclusion
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Outline
Historical IntroductionProblem FormulationStandard Least Squares SolutionGeneral ANN SolutionMinimax SolutionLeast Absolute Value SolutionConclusion
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
History
70’s:Kohonen solved optimization problems using Neural Networks.
80’s:Hopfield used Lyapunov function (Energy function) for proving the convergence of iterative methods in optimization problems.
Differential Eq. Neural Networksmapping
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
History
Many problems in science and engineering involve solving a large system of linear equations:
Machine LearningPhysicsImage ProcessingStatistics,…
In many applications an on-line solution of a set of linear equations is desired.
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
History
40’s:Kaczmarz introduced a method to solve linear equations
50’s – 80’s:Different methods based on Kaczmarz’s has been proposed in different fields.Conjugate Gradient method.
No good method for on-line solution of large systems.
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
1990:Andrzej Cichocki:
a Mathematician who received his PhD in Electrical Engineering Proposed a Neural Network for solving systems of linear equations in real time
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Outline
Historical IntroductionProblem FormulationStandard Least Squares SolutionGeneral ANN SolutionMinimax SolutionLeast Absolute Value SolutionConclusion
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Problem Formulation
Linear Parameter Estimation model :
: Linear Equation
: Model matrix: Unknown vector of the system parameters to be estimated
: Vector of observations: Unknown measurement errors: Vector of true values (usually unknown)
nmij R]a[A ×∈=
truebrbAx =+=
mRb∈mRr∈m
true Rb ∈
nTn21 R]x,...,x,x[x ∈=
⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
+
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
=
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
m
2
1
true
true
true
m
2
1
m
2
1
n
2
1
mn2m1m
n22221
n11211
b
bb
r
rr
b
bb
x
xx
aaa
aaaaaa
MMMM
L
MOM
L
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Types of Equations
A set of linear equations is said to be overdetermined if m > n.
Usually inconsistent due to noise and errors.e.g. Linear parameter estimation problems arising in signal processing, biology, medicine and automatic control.
A set of linear equations is said to be underdetermined if m < n (due to the lack of information).
Inverse and extrapolation problems.Involves much less problems than overdetermined case
nmij R]a[A ×∈=truebrbAx =+=
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Mathematical Solutions
Why not use ?It is not applicable since m≠n most of the time which results in irreversibility of A.
What if we use least square error method?
Inversing is considered to be time consuming for large A in real-time systems.
bA x -1=
;bA)AA(x,bAAxA
,0)bAx(A'y),bAx()bAx(y
T1T
TT
T
T
−=
=
=−=
−−=
AAT
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Outline
Historical IntroductionProblem FormulationStandard Least Squares SolutionGeneral ANN SolutionMinimax SolutionLeast Absolute Value SolutionConclusion
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Least Squares Error Function
Find the vector that minimizes the least squares function
Where
represents the residual components of the residual vector
nRx ∈*
∑=
−=−=n
1jijijiii bxabxA)x(r
bAxxrxrxrxr Tm −== )](),...,(),([)( 21
∑=
=−−=m
1i
2i
T )x(r21)bAx()bAx(
21)x(E
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Gradient Descent ApproachBasic idea: compute a trajectory starting at the initial point
that has the solution x* as a limit point ( for )
General gradient approach for minimization of a function:
is chosen in a way that ensures the stability of the differential equations and an appropriate convergence speed
∞→t)t(x
)0(x
)x(EdtdX
∇−= μ
μ
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
∂∂
∂∂
∂∂
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−=
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
n
2
1
mn2m1m
n22221
n11211
n
2
1
xE
xE
xE
dtdx
dtdx
dtdx
M
L
MOM
L
Mμμμ
μμμμμμ
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Solving LE Using Least Squares Criterion
Gradient of the energy function:
So
Scalar representation:
)bAx(AxE
xE
xEE T
T
n21
−=⎥⎦
⎤⎢⎣
⎡∂∂
∂∂
∂∂
=∇ L
n,...,2,1j,x)0(x
bxaadtdx
)0(jj
n
1p
n
1kikik
m
1iipjp
j
==
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎠
⎞⎜⎝
⎛−−= ∑ ∑∑
= ==
μ
)bAx(AdtdX T −−= μ
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
∑ ∑∑= ==
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎠
⎞⎜⎝
⎛−−=
n
1p
n
1kikik
m
1iipjp
j bxaadtdx
μ
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
ANN With Identity Activation Function
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Outline
Historical IntroductionProblem FormulationStandard Least Squares SolutionGeneral ANN SolutionMinimax SolutionLeast Absolute Value SolutionConclusion
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
General ANN Solution
The key step in designing an algorithm for neural networks:
Construct an appropriate computational energy function (Lyapunov function)
Lowest energy state will correspond to the desired solution x*
Using derivation, the energy function minimization problem is transformed into a set of ordinary differential equations
)x(E
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
General ANN Solution
In general, the optimization problem can be formulated as:
Find the vector that minimizes the energy function
is called weighting function.Weighting function derivation is called activation function
nRx ∈*
))x(r()bxA()x(Em
1ii
m
1iii ∑∑
==
=−= σσ
))x(r( iσ
ii
iii r
Er
)r()r(g∂∂
=∂
∂=
σ
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
General ANN Solution
Gradient descent approach:
The minimization of the energy function leads to the set of differential equation
)x(EdtdX
∇−= μ
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
∂∂
∂∂
∂∂
⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢
⎣
⎡
−=
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
n
2
1
mn2m1m
n22221
n11211
n
2
1
xE
xE
xE
dtdx
dtdx
dtdx
M
L
MOM
L
Mμμμ
μμμμμμ
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎠
⎞⎜⎝
⎛−×−=
⎟⎟⎠
⎞⎜⎜⎝
⎛
∂∂
∂∂
−=∂∂
−=
∑ ∑∑
∑∑∑
= ==
===
m
1i
n
1kikikiip
n
1pjp
j
m
1i ip
in
1pjp
p
n
1pjp
j
bxagadt
dx
rE
xr
xE
dtdx
μ
μμ
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
General ANN Architecture⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎠
⎞⎜⎝
⎛−×−= ∑ ∑∑
= ==
m
1i
n
1kikikiip
n
1pjp
j bxagadt
dxμ
Remember that this is the activation function
g1
g2
gm
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Drawbacks of Least Square Error Criterion
Why not always use least square energy function?
Not so good in case of existence of large outliers.Only optimal for Gaussian distribution of error.
The proper choice of the criterion depends onSpecific applications.Distribution of the errors in the measurement vector b
Gaussian dist*. Least squares criterionUniform dist. Chebyshev norm criterion
*However the assumption that the set of measurements or observations has a Gaussian error distribution is frequently unrealistic due to different sources of errors such as instrument errors, modeling errors, sampling errors, and human errors.
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Huber’s Function:
Weighting Function Activation Function
Special Energy Functions
⎪⎪⎭
⎪⎪⎬
⎫
⎪⎪⎩
⎪⎪⎨
⎧
>−
≤=
βββ
βρ
e:2
e
e:2e
)e( 2
2
H
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Special Energy Functions
Talvar’s Function:
This Function has direct implementationWeighting Function Activation Function
⎪⎪⎭
⎪⎪⎬
⎫
⎪⎪⎩
⎪⎪⎨
⎧
>
≤=
ββ
βρ
e:2
e:2
e
)e( 2
2
T
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Special Energy Functions
Logistic Function:
Iterative Reweigheted method uses this activation function.
Weighting Function Activation Function
⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛=
ββρ eCoshln)e( 2
L
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Special Energy Functions
Lp-normed function:
Activation Function
∑=
=m
1i
pip r
p1)x(E
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Lp-Norm Energy Functions
A well-known criterion is energy functionNormL1 −
Weighting Function Activation Function
∑=
=m
1ii1 )x(r)x(E
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Special Energy Functions
Another well-known criterion is (chebyshev) criterion which can be formulated as the minimax problem:
This criterion is optimal for uniform distribution of error.
NormL −∞
{ })x(rmaxmin imi1Rx n ≤≤∈
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Outline
Historical IntroductionProblem FormulationStandard Least Squares SolutionGeneral ANN SolutionMinimax SolutionLeast Absolute Value SolutionConclusion
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Minimax (L∞-Norm) Criterion
For the case p=∞ of the Lp-Norm problem the activation function g[ri(x)] can not be explicitly mathematically expressed by
Error function can be define as
resulting in following activation function:
mi1i })x(rmax{)x(E
≤≤∞ =
1)( −pi xr
⎪⎭
⎪⎬⎫
⎪⎩
⎪⎨⎧ =
= ≤≤
otherwise0})x(r{max)x(rif)]x(r[sign
)]x(r[gkmk1ii
i
Jalal KazemitabarReza Sadraei Artificial Neural Networks (Spring 2007)
Minimax (L∞-Norm) Criterion
Although straightforward, some problems arise in practical implementations of the system of differential equations:
Exact realization of the signum functions is rather difficult (electrically).E∞ has a derivative discontinuity at x if for some i ≠ k
*This is often responsible for various anomalous results (e.g. hysteresis phenomena)
)()()( xExrxr ki ∞==
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Transforming the problem to an equivalent one
Rather than directly implementing the proposed system, we transform the minimax problem
into an equivalent one:
Minimize subject to the constraints
Thus the problem can be viewed as finding the smallest non-negative value of
where x* is a vector of the optimal values of the parameters
{ })(maxmin1
xr imiRx n ≤≤∈
ε
ε≤)(xri 0≥ε
0)( ** ≥= ∞ xEε
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
New Energy Function
Applying the standard quadratic function we can consider the cost function as:
where are coefficients and
{ }∑=
−− −+++=m
iii xrxrxE
1
22 ))](([))](([2
),( εεκυεε
0,0 >> κν
},0min{][ yy =−
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
New Energy Function
Applying now the gradient strategy we obtain the associated system of differential equations
⎟⎠
⎞⎜⎝
⎛−ε++ε+
κν
μ−=ε ∑
=
]S))x(r(S))x(r[(dtd
2ii
m
1i1ii0
{ }∑=
−−+−=m
iiiiiijj
j SxrSxradtdx
121 ]))(())([( εεμ ),...,2,1( nj =
⎭⎬⎫
⎩⎨⎧ ≥+
=otherwise;1
0)x(r;0S i
1i
ε
⎭⎬⎫
⎩⎨⎧ ≥−
=otherwise;1
0)x(r;0S i
2i
ε
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Simplifying architecture
It is interesting to note that the system of differential equations can be simplified by:
This nonlinear function represent a typical dead zone function.
⎪⎭
⎪⎬
⎫
⎪⎩
⎪⎨
⎧
>+−≤≤−
−<+=
εεεε
εεεϕ
ii
i
ii
ii
rifrrif
rifrxr
__0_
)),((
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Simplifying architectureIt is easy to check:
Thus the system of differential equations can be simplified to the form:
)),(())(())(( 21 εϕεε xrSxrSxr iiiiii −=−++
)),(())(())(( 21 εϕεε xrSxrSxr iiiiii =−−+
)0(
10 )0(,)),(( εεεϕκυμε
=⎟⎠
⎞⎜⎝
⎛−−= ∑
=
m
iii xr
dtd
,)),x(r(adt
dx m
1iiiijj
j ∑=
εϕμ−= )n,...,2,1j(x)0(x )0(jj ==
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
∑=
−=m
iiiijj
j xradt
dx
1)),(( εϕμ
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
⎟⎠
⎞⎜⎝
⎛−−= ∑
=
m
1iii0 )),x(r(
dtd εϕ
κυμε
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Outline
Historical IntroductionProblem FormulationStandard Least Squares SolutionGeneral ANN SolutionMinimax SolutionLeast Absolute Value SolutionConclusion
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Least Absolute Values ( L1-Norm) Energy Function
Find the design vector that minimizes the error function
where
Why should one choose this function knowing that it has differentiation problems?
∑=
=m
ii xrxE
11 )()(
∑=
−=n
jijiji bxaxr
1)(
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Important L1-Norm Properties1. Least absolute value problems are equivalent to linear
programming problems and vice versa.
2. Although the energy function E1(x) is not differentiable, the terms can be approximated very closely by smoothly differentiable functions
3. For a full rank* matrix A, there always exists a minimum L1-Norm solution which passes through at least n of the m data points. L2-Norm does not in general interpolate any of the points.
These properties are not shared by L2-Norm.
* Matrix A is said to be of full rank if all its rows or columns are linearly independent.
)(xri
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Important L1-Norm Properties
Theorem: There is a minimizer of the energy function for which the residuals forat least n values of i, say i1, i2, …, in, where n denotes the rank of the matrix A.
We can say that L1-Norm solution is the median solution while the L2-Norm solution is the mean solution.
n* Rx ∈∑=
=m
1ii1 )x(r)x(E 0)x(r *
i =
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Least Absolute Error Implementation
The algorithm is as follows:1. First phase:
Solving the problem using ordinary least-square technique and computing all m residualsSelecting from them the n residuals which are smallest in absolute value
2. Second phase:Discarding the rest of equations, n equations related to selected residuals are solved by minimizing the residuals to zero
ANN implementation is done in three layers using inhibition control circuit.
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Phase #1
ANN Architecture for Solving L1-Norm Estimation Problem
Phase #2
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Phase #1
ANN Architecture for Solving L1-Norm Estimation Problem
Phase #2
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Phase #1
ANN Architecture for Solving L1-Norm Estimation Problem
Phase #2
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Example
Consider matrix A and observation b as below. Find the solution to Ax=b using the least absolute error energy function.
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
=
16 4 19 3 14 2 11 1 10 0 1
A
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
=
101-1 2 1
b, 0bAx, =−
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
In the first phase all the switches ( S1-S5 ) were closed and the network was able to find the following standard least-squares solution:
In this case it is impossible to select two largest, in absolutevalue, residuals because Phase one was rerun while switch S4 was opened and the network found then
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−=
5.15.36.0
x *I
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
−
−
=
6.04.1
6.06.04.0
)x(r *I
6.0rrr 532 ===
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−=
3409.16404.29182.0
x II*
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
−−
−
=
0273.02273.3
016362182.00818.0
)x(r II*
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Cichocki’s Circuit Simulation Results
In the second phase ( and third run of the network ) the inhibitive control network has opened the switch S2. So in the third run only switches S1,S3,S5 were closed, and the network found the equilibrium point:
⎥⎥⎥
⎦
⎤
⎢⎢⎢
⎣
⎡
−=
375.1750.21
x*
⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢
⎣
⎡
−=
0125.2
0375.00
)x(r *,
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Cichocki’s Circuit Simulation Results
Residuals for n=3 of the m=5 equations converges to zero in 50 nano-seconds.
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Using MATLAB, we observed that zeroing r1,r3 and r5 results in the minimum value of ∑
=
=m
1ii1 )x(r)x(E
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Outline
Historical IntroductionProblem FormulationStandard Least Squares SolutionGeneral ANN SolutionMinimax SolutionLeast Absolute Value SolutionConclusion
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Conclusion
Great need for real-time solution of linear equations.
Cichocki’s proposal ANN is different from classical ANNs.
Consider a proper energy function, reducing which results in the optimal solution to Ax=b.
‘Proper function’ may have different meaning in different applications.
Standard least square error function gives the optimal answer for Gaussian distribution of error.
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)
Conclusion (Cont.)Least square function doesn’t have a good behavior when having large outliers in observations.
Various energy functions have been proposed to solve the outlierproblem (e.g. logistic function).
Minimax results in the optimal answer for the uniform distribution of error. It also has some implementation and mathematical problemsthat results in an indirect approach to solving the problem.
Least absolute error function has some properties that makes it distinguishable from other error functions.
Reza SadraeiJalal Kazemitabar Artificial Neural Networks (Spring 2007)