Adaptive Resonance Theory (ART)

1

Adaptive Resonance Theory (ART)

Shahid Rajaee Teacher Training UniversityFaculty of Computer Engineering

PRESENTED BY:

Amir Masoud Sefidian

OUTLINE

• Introduction (Stability-Plasticity Dilemma)

• ART Network

• ART Types

• Basic ART network Architecture

• ART Algorithm and Learning

• ART Computational Example

• ART Application

• Conclusion

• Main References

Stability-Plasticity Dilemma (SPD)

• System behaviour doesn’t change after irrelevant eventsStability

• System adapts its behaviour according to significant eventsPlasticity

• How to achieve stability without rigidity and plasticity without chaos?

• Ongoing learning capability

• Preservation of learned knowledge

Dilemma

Real world is faced with a situations where data is continuously changing. Every learning system faces plasticity-stability dilemma.How can we create a machine that can act and navigate in a world that is constantly changing?

SPD (Contd.)Every learning system faces the plasticity-stability dilemma.

• The plasticity-stability dilemma poses few questions :

How can we continue to quickly learn new things about the

environment and yet not forgetting what we have already learned?

How can a learning system remain plastic (adaptive) in response to

significant input yet stable in response to irrelevant input?

How can a neural network can remain plastic enough to learn newpatterns and yet be able to maintain the stability of the alreadylearned patterns?

How does the system know to switch between its plastic and stablemodes.

5

Back-propagation DrawBack

• The back-propagation algorithm suffer from stability problem. • Once a back propagation is trained, the number of hidden neurons and the

weights are fixed. • The network cannot learn from new patterns unless the network is

retrained from scratch. • Thus we consider the back propagation networks don’t have plasticity. • Assuming that the number of hidden neurons can be kept constant, the

plasticity problem can be solved by retraining the network on the new patterns using on-line learning rule.

• However it will cause the network to forget about old knowledge rapidly. We say that such algorithm is not stable.

• Gail Carpenter and Stephen Grossberg (Boston University) developed the “Adaptive Resonance Theory” in 1976 to learning model to answer this dilemma.

• ART networks tackle the stability-plasticity dilemma:▫ Maintain the plasticity required to learn new patterns, while preventing the modification

of pattern that have been learned previously.▫ No stored pattern is modified if input is not matched with any pattern.

▫ Plasticity: They can always adapt to unknown inputs (by creating a new cluster with a new weight vector) if the given input cannot be classified by existing clusters (e.g., this is a computational corollary of the biological model of neural plasticity).

▫ Stability: Existing clusters are not deleted by the introduction of new inputs (new clusters will just be created in addition to the old ones).

▫ The basic ART System is an unsupervised learning model.

The key innovation :

“expectations”

As each input is presented to the network, it is compared with the prototype vectorthat is most closely matches (the expectation).

If the match between the prototype and the input vector is NOT adequate, a new prototype is selected.

In this way, previous learned memories (prototypes) are not eroded by new learning.

8

9

ART NetworksART Networks

Grossberg, 1976

Unsupervised ART

Learningsupervised ART Learning

ART1, ART2

Carpenter &

Grossberg,

1987

Fuzzy ART

Carpenter &

Grossberg,

etal,1991

ARTMAP

Carpenter &

Grossberg,

etal,1991

Fuzzy ARTMAP

Carpenter &

Grossberg,

etal,1991

Gaussian

ARTMAP

Williamson,1

992

Simplified ART

Baraldi & Alpaydin, 1998

Simplified ARTMAP

Kasuba, 1993

Mahalanobis Distance Based ARTMAP

Vuskovic & Du, 2001, Vuskovic, Xu & Du, 2002

• ART 1 :▫ simplest variety of ART networks

▫ accepting only binary inputs.

• ART2 :▫ support continuous inputs.

• ART3 is refinement of both models.

• Fuzzy ART implements fuzzy logic into ART’s pattern recognition.

• ARTMAP also known as Predictive ART, combines two slightly modified ART-1 or ART-2 units into a supervised learning structure .

• Fuzzy ARTMAP is merely ARTMAP using fuzzy ART units, resulting in a corresponding increase in efficacy.

Basic ART network ArchitectureThe basic ART system is unsupervised learning

model. It typically consists of

1. a comparison field

2. a recognition field composed of neurons,

3. a vigilance parameter, and

4. a reset module

F2 node is in one of three states:•Active

•Inactive, but available to participate in

competition

•Inhibited, and prevented from participating

in competition

12

ART SubsystemsLayer 1

Comparison of input pattern and expectation.

L1-L2 Connections

Perform clustering operation.

Each row of weights is a prototype pattern.

Layer 2

Competition (Contrast enhancement) winner-take-all learning strategy.

L2-L1 Connections

Perform pattern recall (Expectation).

Orienting Subsystem

Causes a reset when expectation does not match input pattern

The degree of similarity required for patterns to be assigned to the same cluster unit is

controlled by a user-defined gain control, known as the vigilance parameter

ART Algorithm

Adapt winner

node

Initialise uncommitted

node

new pattern

categorisation

known unknown

recognition

comparison

Incoming pattern matched with

stored cluster templates

If close enough to stored template

joins best matching cluster,

weights adapted

If not, a new cluster is initialised

with pattern as template

Recognition Phase(in Layer 2)

•Forward transmission via bottom-up weights(Inner product)

•Best matching node fires (winner-take-all layer)

Comparison Phase (in Layer 1)• Backward transmission via top-down weights

• Vigilance test: class template matched with input pattern

• If pattern close enough to template, categorisation was successful and “resonance” achieved

• If not close enough reset winner neuron and try next best matching• (The reset inhibit the current winning neuron, and the current expectation is

removed)• A new competition is then performed in Layer 2, while the previous winning neuron

is disable.• The new winning neuron in Layer 2 projects a new expectation to Layer 1, through

the L2-L1 connections.• This process continues until the L2-L1 expectation provides a close enough match to

the input pattern.• The process of matching, and subsequent adaptation, is referred to as resonance.

Step 1: Send input from the F1 layer to F2 layer for processing. The first node within the F2 layer is chosen as the closest match to the input and a hypothesis is formed. This hypothesis represents whatthe node will look like after learning has occurred, assuming it is the correct node to be updated.

Step2: Assume j is winner.If T j (I*) > ρ then the hypothesis is accepted and assigned to that node. Otherwise, the process moves on to Step 3.

Step3:If the hypothesis is rejected, a “reset” command is sent back to the F2 layer. In this situation, the jth node within F2 is no longer a candidate so the process repeats for node j+1.

20

Learning in ART1Updates for both the bottom-up and top-down weights are controlled by

differential equations.Assuming Jth node is winner and their connected weightsshould be updated, Two separate learning laws: L2-L1 connections:

Bottom-up weights should be smaller than or equal to value:

Solve

Solve

Initial weights for ART1:

top-down weights are initialized to 1.

1)0( jit

L1-L2 connections

22

Vigilance Threshold• A vigilance parameter ρ determines the tolerance of

matching process.

• Vigilance threshold sets granularity of clustering.• It defines amount of attraction of each prototype.• Low threshold

▫ Large mismatch accepted▫ Few large clusters▫ Misclassifications more likely▫ the algorithm will be more willing to accept input

vectors into clusters (i.e., definition of similarity is LESS strict).

• High threshold▫ Small mismatch accepted▫ Many small clusters▫ Higher precision▫ the algorithm will be more "picky" about assigning

input vectors to clusters (i.e., the definition of similarity is MORE strict)

23

Example:

For this example, let us assume that we have an ART-1 network with 7 input neurons (n = 7) and initially one output neuron (n = 1).

Our input vectors are

{(1, 1, 0, 0, 0, 0, 1),(0, 0, 1, 1, 1, 1, 0),(1, 0, 1, 1, 1, 1, 0),(0, 0, 0, 1, 1, 1, 0),(1, 1, 0, 1, 1, 1, 0)}

and the vigilance parameter = 0.7.

Initially, all top-down weights are set to 𝑡𝑙, 1(0) = 1, and all bottom-up weights are set to 𝑏1, 𝑙

(0) = 1/8.

ART Example Computation 24

For the first input vector, (1, 1, 0, 0, 0, 0, 1), we get:

Clearly, y1 is the winner (there are no competitors).

Since we have:

the vigilance condition is satisfied and we get the following new weights:


8

31

8

10

8

10

8

10

8

10

8

11

8

11

8

11 y

7.013

37

1

7

1 1,

l l

l ll

x

xt

5.3

1

35.0

1)1()1()1( 7,12,11,1

bbb

0)1()1()1()1( 6,15,14,13,1 bbbb

Also, we have:

We can express the updated weights as matrices:

Now we have finished the first learning step and proceed by presenting the next input vector.


lll xtt )0()1( 0,1,

T

5.3

1 0 0 0 0

5.3

1

5.3

1)1(

B

T1 0 0 0 0 1 1)1( T

For the second input vector, (0, 0, 1, 1, 1, 1, 0), we get:

Of course, y1 is still the winner.

However, this time we do not reach the vigilance threshold:

This means that we have to generate a second node in the output layer that represents the current input.

Therefore, the top-down weights of the new node will be identical to the current input vector.


005.3

1101010100

5.3

10

5.3

11 y

.7.004

07

1

7

1 1,

l l

l ll

x

xt

The new unit’s bottom-up weights are set to zero in the positions where the input has zeroes as well.

The remaining weights are set to:1/(0.5 + 0 + 0 + 1 + 1 + 1 + 1 + 0)

This gives us the following updated weight matrices:


T

0 4.5

1 4.5

1 4.5

1 4.5

1 0 0

3.51 0 0 0 0

3.51

3.51

)2(

B

T

0 1 1 1 1 0 0

1 0 0 0 0 1 1)2(

T

For the third input vector, (1, 0, 1, 1, 1, 1, 0), we have:

Here, y2 is the clear winner.

This time we exceed the vigilance threshold again:

Therefore, we adapt the second node’s weights.

Each top-down weight is multiplied by the corresponding element of the current input.


5.4

4 ;

5.3

121 yy

.7.08.05

47

1

7

1 2,

l l

l ll

x

xt

The new unit’s bottom-up weights are set to the top-down weights divided by (0.5 + 0 + 0 + 1 + 1 + 1 + 1 + 0).

It turns out that, in the current case, these updates do not result in any weight changes at all:


T

0 4.5

1 4.5

1 4.5

1 4.5

1 0 0

3.51 0 0 0 0

3.51

3.51

)3(

B

T

0 1 1 1 1 0 0

1 0 0 0 0 1 1)3(

T

For the fourth input vector, (0, 0, 0, 1, 1, 1, 0), it is:

Again, y2 is the winner.

The vigilance test succeeds once again:

Therefore, we adapt the second node’s weights.

As usual, each top-down weight is multiplied by the corresponding element of the current input.


5.4

3 ;0 21 yy

.7.013

37

1

7

1 2,

l l

l ll

x

xt

The new unit’s bottom-up weights are set to the top-down weights divided by (0.5 + 0 + 0 + 0 + 1 + 1 + 1 + 0).

This gives us the following new weight matrices:


T

0 3.5

1 3.5

1 3.5

1 0 0 0

3.51 0 0 0 0

3.51

3.51

)4(

B

T

0 1 1 1 0 0 0

1 0 0 0 0 1 1)4(

T

Finally, the fifth input vector, (1, 1, 0, 1, 1, 1, 0), gives us:

Once again, y2 is the winner.

The vigilance test fails this time:

This means that the active set A is reduced to contain only the first node, which becomes the uncontested winner.


5.3

3 ;

5.3

221 yy

.7.06.05

37

1

7

1 2,

l l

l ll

x

xt

The vigilance test fails for the first unit as well:

We thus have to create a third output neuron, which gives us the following new weight matrices:


.7.04.05

27

1

7

1 1,

l l

l ll

x

xt

T

B

0 5.5

1 5.5

1 5.5

1 0 5.5

1 5.5

1

0 3.5

1 3.5

1 3.5

1 0 0 0

3.51 0 0 0 0

3.51

3.51

)5(

T

0 1 1 1 0 1 1

0 1 1 1 0 0 0

1 0 0 0 0 1 1

)5(

T

In the second epoch, the first input vector, (1, 1, 0, 0, 0, 0, 1), gives us:

Here, y1 is the winner, and the vigilance test succeeds:

Since the current input is identical to the winner’s top-down weights, no weight update happens.


5.5

2 ;0 ;

5.3

3321 yyy

.7.013

37

1

7

1 1,

l l

l ll

x

xt

The second input vector, (0, 0, 1, 1, 1, 1, 0), results in:

Now y2 is the winner, and the vigilance test succeeds:

Again, because the current input is identical to the winner’s top-down weights, no weight update occurs.


5.5

3 ;

5.3

3 ;0 321 yyy

.7.013

37

1

7

1 2,

l l

l ll

x

xt

The third input vector, (1, 0, 1, 1, 1, 1, 0), give us:

Once again, y2 is the winner, but this time the vigilance test fails:

This means that the active set is reduced to A = {1, 3}.

Since y3 > y1, the third node is the new winner.


5.5

4 ;

5.3

3 ;

5.3

1321 yyy

.7.06.05

37

1

7

1 2,

l l

l ll

x

xt

The third node does satisfy the vigilance threshold:

This gives us the following updated weight matrices:


.7.08.05

47

1

7

1 3,

l l

l ll

x

xt

T

B

0 4.5

1 4.5

1 4.5

1 0 0 4.5

1

0 3.5

1 3.5

1 3.5

1 0 0 0

3.51 0 0 0 0

3.51

3.51

)8(

T

0 1 1 1 0 0 1

0 1 1 1 0 0 0

1 0 0 0 0 1 1

)8(

T

For the fourth vector, (0, 0, 0, 1, 1, 1, 0), the second node wins, passes the vigilance test, but no weight changes occur.

The fifth vector, (1, 1, 0, 1, 1, 1, 0), makes the second unit win, which fails the vigilance test.

The new winner is the third output neuron, which passes the vigilance test but does not lead to any weight modifications.

Further presentation of the five sample vectors do not lead to any weight changes; the network has thus stabilized.


Adaptive Resonance Theory40

Illustration of the categories (or clusters) in input space formed by ART networks.

increasing leads to narrower cones and not to wider ones as suggested by the figure.

• A problem with ART-1 is the need to determine the vigilance parameter for a given problem, which can be tricky.

• Furthermore, ART-1 always builds clusters of the same size, regardless of the distribution of samples in the input space.

• Nevertheless, ART is one of the most important and successful attempts at simulating incremental learning in biological systems.

Adaptive Resonance Theory 41

Applications of ART

• Face recognition

• Image compression

• Mobile robot control

• Target recognition

• Medical diagnosis

• Signature verification

Conclusion

ART is Artificial Neural Network system that must be able to adapt to

changing environment but constant change can make the system unstable

system because system may learn new information only by forgetting

everything it has so far learned.

44

Main References:

• S. Rajasekaran, G.A.V. Pai, “Neural Networks, Fuzzy Logic and Genetic Algorithms”, Prentice Hall of India,

Adaptive Resonance Theory, Chapter 5.

• Jacek M. Zurada, “Introduction to Artificial Neural Systems”, West Publishing Company, Matching & Self

organizing maps, Chapter 7.

• Carpenter, G. A., & Grossberg, S. (1987). A massively parallel architecture for a self-organizing neural pattern

recognition machine. Computer vision, graphics, and image processing, 37(1), 54-115.

• Adaptive Resonance Theory, Soft computing lecture notes, http://www.myreaders.info/html/soft_computing.html

• Fausett, L. V. (1994). Fundamentals of neural networks: Architectures, algorithms, and applications. Englewood

Cliffs, NJ: Prentice-Hall.

http://www.myreaders.info/html/soft_computing.html

QUESTION??...

46

Where to find us:

[email protected]

mailto:[email protected]

Technology

Adaptive Resonance Theory (ART)