Neural Network Tutorial

1

Review of Adaptive Neural Networks & Some Applications

Steve [email protected]

304.368.9300

2adaptive NN background & applications – Steve Rogers

Overview

• Adaptive Neural Networks have become more popular due to their ability to approximate a large array of dynamics. The ability to adapt is accomplished by means of a set of tuning rules.

• Adaptive Neural Networks are used for control & system identification/prediction, table look-ups, fault detection, and optimization due to their generalization ability. Tuning rules for adaptive neural networks have featured Lyapunov-based approaches in recent years. Although these have some desirable qualities they have led to complex tuning procedures. Tuning rules should be simple and provide for rapid, reliable convergence.

• Adaptive Neural Networks possess learning, adaptation, and classification capabilities.


Neural Networks Decision Points

• Advantages Capable of learning complex nonlinear systems Code/algorithms available Can use either for fixed or adaptive applications including control &

system identification/prediction, table look-ups, and fault detection May handle arbitrary inputs, unlike linear systems Can treat system to be identified as a ‘black box’, i.e., doesn’t

require knowledge of 1st principles Can be used in conjunction with other conventional methods

• Disadvantages Requires specialized knowledge related to the algorithm Difficult to validate, especially adaptive systems, because the

weights are not deterministic Solution may be available from other conventional methods Convergence may be to local minimum & not global solution


Neural Network Components

• Neurons Also known as nodes, neurons are the basic computing element of a network

• Connections Defines relationship of neuron within the network

• Weights Used to determine whether a neuron activates Can also be used as the activation of a neuron

• Activation Function Function used to determine the output of a neuron

• Adaptive Algorithm Controls the learning process of the network

A key point is that arbitrary measurements & derived measurements may be inputs.


Neural Network Structure

Common Activation Functions

Bipolar Sigmoid )(11)( xe

xf

11

2)( )(

xexf

2

)( xexf

Binary Sigmoid

Gaussian Radial

A neuron in the NN will take the weighted sum of its inputs and use this as an input signal for the neuron's activation function which will then produce an output signal.


Adaptive Radial Basis Function Neural Networks (RBFN)

• RBFN’s are two-layer networks whose outputs are a linear combination of the hidden layer functions

• Typical RBFN equations are:

• where is the input vector of the network, h indicates the total number of hidden neurons, k and k refers to the center and width of the kth hidden neuron. ||…|| is the Euclidean norm. The function f(.) is the output of an RBFN, which represents the network approximation to the actual output. The coefficient wk is the connection weight vector of the kth-hidden neuron to the output neurons and w0 is the bias term.

21

10

2exp kk

h

kk

Tk

T

k

wwf


Typical RBFN Architecture


Adaptive Update Schemes

• Any good identification scheme that utilizes the RBFN scheme should satisfy two criteria:

1) the parameters of the RBFN are tuned properly to satisfy stability and performance needs

2) the parameter adaptive law should be efficient to allow real-time operation Feedback• The RAN (resource allocating network) was developed to tune all the RBF

parameters and incorporated a growth feature. MRAN also includes a pruning feature. Other tuning rules only adjusted the connection weight vector and left the center and width vectors fixed.

• A Lyapunov derived tuning rule is:

• where is the vector of parameters to be tuned including the connection weights, centers, and widths, is the user selected learning rate (positive scalar), (n) is the gradient of the function with respect to the parameter vector evaluated at (n).

• P is the solution of the Lyapunov derived equation . Q is a user selected positive definite matrix; A is a user selected Hurwitz (all components of polynomial positive) stable matrix. e(n) is the error vector.

nPennn 1

)( PAPAQ T


22

222

2

ekew

ekew

ewkeww www

• Another common approach for tuning rules is

• The 3rd term moves the discrete pole away from the unit circle, i.e., from being a pure integrator. Although this may slow down convergence, it improves stability, and should remove oscillations.

• Note that all parameters are tuned in the above gradient approach

Update Schemes

21

2exp

iiii yye ˆ


Radial Basis Function Block Diagrams


•The bottom part of the figure shows how a control structure may be inserted into the linear combiner (LC). The simplest control structure is the standard learning rate . A proportional integral (PI) structure is the next simplest controller. It has the form:

•which gives another integrator plus a zero. Note also that Kp may be combined with , ie, Kp = .•Any control structure may be used including lead-lag, PID, servo type PID, etc.

Control Circuit for LC update

s

asKp

RBF With Controller Update Mechanism

y + ex

dot

-yhat

1---s

x

sigmoid

s

asKp

bsasKl

PI


Optimization Gain Results

bsasK

1

112

bsasK

sKisKp


Data Plots

sKisKp

bsasKp


Use of Neural Networks for Control Enhancements of Existing Systems

ExistingController Gas Turbine

set points

measurements

NN add-on

A neural network may be added to an existing system & make use of the current data stream to enhance an existing system. This would make it non-intrusive to the existing system to take advantage of the existing control system capabilities. The NN add-on could focus on any deficiencies of the existing system. Most current research NN prototypes are handled in this fashion.


NN Control of Systems with Jumps:friction, deadzones, backlash, & hysteresis

• Add-on to existing continuous controller• Modify usual activation function by adding a jump function

Common Activation Functions

Bipolar Sigmoid )(11)( xe

xf

11

2)( )(

xexf

2

)( xexf

Binary Sigmoid

Gaussian Radial

continuous

0_1

10_0

)()( xfor

e

xforxg

x

0_11

20_0

)()( xfor

e

xforxg

x

0_0_0

)( 2

xforexfor

xg x

Jump functions


System Identification

UnknownSystem

AdaptiveComponent

+Input x[n]

d[n]

y[n]

e[n]

+

-

The adaptive component successfully models the system when e[n] converges to a small value. If model coefficients change drastically an anomaly may be declared.


System ID with Adaptive Neural Networks• Adaptive components are usually used in

conjunction with conventional components because of the instability concerns. They are used to ‘pick up the slop’ remaining from the conventional component.

• Multi-Layer Perceptrons (MLP) may be used for system identification or prediction.

• Numerous structures & update law options.• Sigma Pi structure, Ci are input vectors, β is a

kronecker product, W is a set of weights, Ue is an error function (in this case a PI control output), G/B are defined by the application.

• Single Hidden Layer structure, W/V are weights to be updated adaptively, & the other parameters are defined by the application system.

• The MLP used here is explained in the following sheets.

eKKKe

KU

WULBUGW

CCCWoutput

pi

i

ie

ee

T

21

21

,

,,, 321

0,0,0

,ˆˆˆˆ

,ˆˆˆˆˆ

3,,1,

1

2

1

1

vw

vzT

wT

z

n

iiijvjvjj

n

jjjwkwkw

VWxV

WxVW

xvbz

nkzwboutput


System ID Example with MLP

32321311313

22221211212

12121111111

13312211121 ,

pvpvfnfcpvpvfnfcpvpvfnfc

cwcwcwfnfa

22

,,,

)(ˆ

,ˆ

1

ˆ1

ˆ1

ˆ1

kaktkerrorkF

kkFkk

kvkFkvkv

kkFkk

kwkFkwkw

iii

ninini

JJJ

iii

111111

11

,

JxSxRxSxRSxJxSJxJx

Jii

R

nnin

S

iiJJ

pvfwfa

wpvffa

+

+

+

f

f

f

+ f

p1

p2

v11

v12

v21

v22

v31

v32

θ1

θ2

θ3

n11

n12

n13

c11

c12

c13

n21 a

MLP (Multi-Layer Perceptron) diagram MLP equations

MLP general equations

w1

w2

w3λ

The equations completely define the MLP. Note that α is a scalar learning rate. The matlab implementation is shown in the following sheet.

MLP general update laws


Matlab Code% mlp_example.m%clear *N = 500;cycles = 4;x = sin(cycles*2*pi*[0:N-1]/N);lb = -0.7;ub = 0.6;gain = 2;init = 1;for i = 1:N if x(i)>ub,y(i) = gain*x(i)^5; elseif x(i)<lb,y(i) = gain*x(i)^5; else y(i) = sign(x(i))*x(i)^2; end yhat(i) =

MLP_recurArray([init,x(i),y(i)]); init = 0;end figure(1)subplot(211)err = y(:) - yhat(:);errnorm = norm(err);plot([x(:),y(:),yhat(:)]),grid ontitle(['MLP estimation of sinusoid, error

= ',... num2str(errnorm)])subplot(212)plot(err),grid onylabel('error')

function yout = MLP_recurArray(in);%% MLP backpropagation learning for single hidden layer% W is output layer weights% Vi is for ith hidden layer% Assume N number of interior nodes, then the MLP NN equations are:% O = W*atanh(V*I);% With the above there are 2 update equations:% W = W - mu*err*atan(V*I);% V = V - mu*err*W*I*[1/(1+(V*I)];% N is the number of interior nodes% m is the number of inputs including the bias signalpersistent XN = 10;m = 5;my = 5;init = in(1);u = in(2);y = in(3);% Initialize W & Vif init == 1 | isempty(X) X.W = zeros(1,N); X.dW = X.W; X.V = rand(N,m+my+N)/10000; X.dV = zeros(size(X.V)); X.in = [1;u*ones(m-1,1);y*ones(my,1);zeros(N,1)]; X.predslow = y;end

mu = .09;bet = .1;G = tanh(X.V*X.in);out = X.W*G;err = y - out;nextW = X.W + mu*err*G' + bet*X.dW;sec2h = sech(X.V*X.in);sec2h = sec2h.*sec2h;nextV = X.V + mu*err*sec2h.*X.W'*X.in'... + bet*X.dV;X.in = [1;u;X.in(2:m-1);y;X.in(2+m+1:2+m+my-1);G];X.dW = nextW - X.W;X.dV = nextV - X.V;X.W = nextW;X.V = nextV;yout = out;

MLP function code


0 50 100 150 200 250 300 350 400 450 500-2

-1

0

1

2MLP estimation of sinusoid, error = 10.3689

0 50 100 150 200 250 300 350 400 450 5000

1

2

3

4

5

errornormwtnorm

Results

x

y

yhat

x is the input sinusoid, y is the output signal which is a nonlinear combination of sinusoids, & yhat is the MLP tracking signal. The bottom plot shows the stability & error performance.

Fluctuation of weights indicates that better model structure needed.


Predictive Filters

The block entitled adaptive filter may be replaced by an arbitrary structured filter. The adaptive filter copy is updated each time step. This same concept can be applied to an adaptive neural network. Note that many adaptive components (unless otherwise guaranteed stable) are used in conjunction with conventional components to ensure the stability of the adaptive component.

Z-n Adaptive filtersignal Signal estimate

Adaptive filter copy

error-

+

Signal prediction

Z – discrete delay operatorN – number of delays


Fault Detection Concepts• Actuator nonlinearities – deadband, backlash, &

hysteresis. Conventional and adaptive neural networks to estimate jump discontinuities.

• Instrument faults – excessive noise, dead sensor, drift, and bias. Simple statistics for the 1st two & system ID for the last two.

• Parameter estimation for process fault detection. Changes in coefficients may be used for fault detection.

• Hopfield neural networks may be used for principal component analyses (PCA), which is used in data driven fault detection.


Continuous Instrumentation Diagnostics for Accuracy/Precision & Life-Cycle Maintenance

• Sensor faults. Monitoring is data validation or cross checking sensor data. There are 4 types of anomalies from typical analog sensors: dead, excessive noise, drift, & offset.

• Dead or excessive noise can be detected & isolated using standard deviations of the individual sensor data stream. The standard deviation is compared to the statistics of common sensors throughout the plant.

• Drift or offset may also be caused by something in the process being measured, therefore, detection/isolation must be model based.

• Drift or offset fault detection model equations can be based on performance criteria, heat/mass balance equations, or other model structures. Fault detection parameters are derived from the equations. Any change indicates an anomaly which can then be investigated. Kalman filters are frequently used to estimate the fault parameters in stochastic systems, although other nonlinear systems including neural networks may be used as well. Typical equations and fault indicators derived from an electric pump system & heat exchanger follow.


Instrument Fault Types: Excessive Noise & Dead Sensor

0 10 20 30 40 50 60 70 80 90 100-3

-2

-1

0

1

2

3

4

time seconds

sens

or v

alue

bottom - excessive noise fault, top - dead sensor fault


Raw data Filter Bank

fault indicatorsThese will be determined by operational experience.

Raw signal Low passfilter

smoothed signal +Abs( )

residual Low passfilter

s

x y

Typical distribution of ‘s’ for a group of sensorsThis algorithm will process the raw engineering converted data that comes from each sensor. ‘s’ is the output signal that is sent to the decision logic. Sensors will be grouped by type, service, criticality, etc., as appropriate.

x y

Filter Bank

Fault decision

Note that Sds, & Snf will be refined by operational experience.

Technical Approach for dead/noisy sensors: Sensor Fault Detection Filter Banks

snum

ber

Possible noise failurePossible dead sensor

Sds Snf

Smoo

thed

sign

al

Note that the low pass filter blocks may be of arbitrary structure and may fixed or adaptive neural networks or linear networks.


Instrument Fault Types: Drift & Offset

0 1 2 3 4 5 6 7 8 9 10-1

0

1

2

3

4

5

6

7

8

time (seconds)

sens

or v

alue

bottom - offset fault, top - drift fault


Proposed Observer Solution for Drift/Offset Sensor Fault

u process Q sensors

Observer 1

Observer 2

Observer q

y1

y2

yq

Logic 1

Logic q

. . .

y11y21

y12y22

yq1

yq2

y2qyqq

y1q

Decision 1

Decision q

•The observers will process the raw engineering data ‘yi’ (output measurements) and ‘u’ (input measurements) that comes from each sensor. •An estimated value of all the output measurements is sent to a set of rules for decision making. •If a residual is greater than a threshold a fault is indicated. •This is the basis for an approach using neural networks. Note that the observer blocks may have arbitrary structures. Each observer is made unique by varying the input vectors. Therefore, the differences between them become fault indicators.

Failed health

Nominal health

Suspect health

Possible States of Sensor Health


Pump 1 Fluid Schematic with sensors & formulas

filter Gas trap

dpf

dpg

Inlet

accumulator

QuantitySensor

Pump

dpp

T

TemperatureSensor

Check Valve

fmFlowmeter

Outlet

LATI02SR0201P

LATI02SR0501Q

LATI02SR0401P

LATI02SR0001T

psia

Abs Press SensorLATI02SR0101P LATI02FM0001R

LATI02FM0002R

LATI02SR0301P

Pump Indicators:1) Zf = dpf/pph^2 (filter resistance)2) Zg = dpg/pph^2 (gas trap resistance)3) Impeller specific speed = rpm*pph^0.5/(dpp^0.75)4) Suction specific speed = rpm*pph^0.5/(psia^0.75)electricwatts1 = amps*voltselectric watts2 = amps*4.3825*krpmhydraulic watts = pph*psid/(60*8.34*2.298)5) pump efficiency = hydraulic watts/electric watts6) a1 = dpp - function(Impeller specific speed)*ppha1 should be close to zero except in a fault condition.7) vc = Amps/krpm (pump ratio)8) load = pph*dpp/(krpm*krpm) (pump load ratio)where the left hand side of the above 8 equations are indicator parameters.

Amps LATI21FC0001C/10volts LATI21FC0001Vkrpm LATI21FC0003U/(255*20000)

Pump Dynamic Equations are used for estimation:1) Ampsdot = -(R2/L2)*Amps - (psi/L2)*krpm2) krpmdot = (psi/J)*Amps - (hth /J)*krpm3) dppdot = hnn*pph^2 + hww*krpm^24) pphdot = -(hrr/ab)*pph^2 + dpp/abwhere R2, L2, psi, J, hth, hnn, hww, hrr, and ab are indicator parameters which can be determined.


ISS MTL & LTL PPA Equations TablePump Indicators Equation Parameters Algorithm PPA Area of fault detection Sensors

1) Zf Adaptive or low pass filter filter performance dpf, pph2) Zg Adaptive or low pass filter gas trap performance dpg, pph3) impeller spec. speed (iss) Adaptive or low pass filter pump performance krpm, pph, dpp4) suction spec. speed (sss) Adaptive or low pass filter pump performance krpm, pph, psia5) pump efficiency (pe) Adaptive or low pass filter pump performance Amps, volts, pph, dpp6) a1 Adaptive or low pass filter pump performance dpp, krpm, pph dpp7) vc Adaptive or low pass filter pump motor performance amps, krpm8) load Adaptive or low pass filter pump motor performance pph, dpp, krpm

Pump Dynamic Indicators Equation

1) R2, L2, psi Adaptive filter motor performance amps, krpm2) psi, J, hth Adaptive filter motor performance amps, krpm3) hnn, hww Adaptive filter pump performance pph, krpm4) hrr, ab Adaptive filter pump performance pph, dpp

Sensor Fault Matrix - PPA equationsIndicator Parameters

Sensors

dpfpphdpg

krpmdpppsia

Ampsvolts

T

Zf Zg iss sss pe a1 vc load R2 L2 psi J hth hnn hww hrr ab

** * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

Note that the algorithms may be adaptive neural networks as well as linear adaptive filters.


On-Line Estimation of deadband, backlash, & hysteresis In Control Element

v bl

br

mr

ml

u Control Element& Plant

y

Deadband Schematic

u(t) =mr(v(t) - br) if v(t) >= br0 if bl < v(t) < brml(v(t) - bl) if v(t) <= bl

v cl

cr

m

m

u Control Element& Plant

y

Backlash Schematic

u(t) =m(v(t) - cl) if v(t) <= clm(v(t) - cr) if v(t) >= cru(t - 1) if cl < v(t) < cr

Backlash Equations

Deadband Equations

The hysteresis schematic is more complicated than deadband or backlash & is not shown here. The general approach for parameter estimation is shown below. The types of nonlinearities are usually known by inspection.

nonlinearity u Control Element& Plant

yv

Parameter estimation(Kalman Filter)

Mr, ml, m, br, bl, cl, cr, etc.

The estimated deadband parameters may be used for 2 purposes:• on-line control loop audits• plant control


Deadband Model Parameter Estimation


Backlash Model Parameter Estimation


Matlab code

% matlab deadband code if udb(i)>0; [p1,Pdb,err(i)] = KalmanF(p1,Pdb,... udb(i),[v(i) -1 0 0]');end;if udb(i)<0; [p1,Pdb,err(i)] = KalmanF(p1,Pdb,... udb(i),[0 0 v(i) -1]');end;function [param,P,err] = KalmanF(param,P,y,x);%%niter = 10;Q = 0.05*eye(size(P));for i = 1:niter err = y - x'*param; k = P*x/(1 + x'*P*x); P = (eye(size(P)) - k*x')*P + Q; param = param + k*err;end;


Examples of applications for active control of noise and vibration

• Control of aircraft interior noise by use of lightweight vibration sources on the fuselage and acoustic sources inside the fuselage.

• Reduction of helicopter cabin noise by active vibration isolation of the rotor and gearbox from the cabin.

• Reduction of noise radiated by ships and submarines by active vibration isolation of interior mounted machinery (using active elements in parallel with passive elements) and active reduction of vibratory power transmission along the hull, using vibration actuators on the hull.

• Reduction of internal combustion engine exhaust noise by use of acoustic control sources at the exhaust outlet or by use of high intensity acoustic sources mounted on the exhaust pipe and radiating into the pipe at some distance from the exhaust outlet.

• Reduction of low frequency noise radiated by industrial noise sources such as vacuum pumps, forced air blowers, cooling towers and gas turbine exhausts, by use of acoustic control sources.

• Lightweight machinery enclosures with active control for low frequency noise reduction. • Control of tonal noise radiated by turbo-machinery (including aircraft engines). • Reduction of low frequency noise propagating in air conditioning systems by use of

acoustic sources radiating into the duct airway. • Reduction of electrical transformer noise either by using a secondary, perforated

lightweight skin surrounding the transformer and driven by vibration sources or by attaching vibration sources directly to the transformer tank. Use of acoustic control sources for this purpose is also being investigated, but a large number of sources are required to obtain global control.

• Reduction of noise inside automobiles using acoustic sources inside the cabin and lightweight vibration actuators on the body panels.

• Active headsets and earmuffs.


NoiseSource

Primary Noise

ReferenceMicrophone

ANCx(n)

y(n)e(n)

ErrorMicrophone

CancelingLoudspeaker

Acoustic Concept 1

ANC is active noise control, which includes an adaptive component . Main components are:• error microphone for each direction• reference microphone• canceling loudspeaker for each direction

y(n) is the loudspeaker signal that minimizes e(n) signal.


NoiseSource

Primary Noise

ANC

y(n)e(n)

ErrorMicrophone

CancelingLoudspeaker

Acoustic Concept 2

ANC is active noise control. The ANC includes an adaptive algorithm that learns the system in order to create an ‘anti-noise’ in the canceling loudspeaker. Components are:• error microphone for each direction•canceling loudspeaker for each direction

y(n) is the loudspeaker signal that minimizes e(n) signal.

Documents

Neural Network Tutorial