51
MI MI L L MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College of Engineering Department of Electrical and Computer Engineering

MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Embed Size (px)

Citation preview

Page 1: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

MIMILL

MAPLD2005/P249

FPGA Co-Processor Enhanced Ant Colony Systems Data

MiningJason Isaacs and Simon Y. Foo

Machine Intelligence Laboratory

FAMU-FSU College of Engineering

Department of Electrical and Computer Engineering

Page 2: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 2

MIMILL

MAPLD2005/P249

Presentation Outline Introduction Significance of Research Concise Background on ACS Summary of Data Mining focused on

Clustering Discussion of ACS-based Data Mining FPGA Co-processor Enhancement Conclusions Future Work

Page 3: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 3

MIMILL

MAPLD2005/P249

Project Goal: to design and implement an Ant Colony Systems toolbox for non-combinatorial problem solving. This toolbox will comprise both hardware and software based solutions.

Page 4: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 4

MIMILL

MAPLD2005/P249

Ant Colony Systems Project Overview

This work aims at advancing fundamental research in Ant Colony Systems.

The major objectives of this project are: Develop a set of behavior models Design ACS algorithms for solutions to non-

combinatorial problems Analyze algorithms for hardware implementations Implement FPGA Modules – CURRENT Incorporate all modules into a cohesive toolbox

Page 5: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 5

MIMILL

MAPLD2005/P249

Introduction to Ant Colony Systems

Ants are model organisms for bio-simulations due to both their relative individual simplicity and their complex group behaviors.

Colonies have evolved means for collectively performing tasks that are far beyond the capacities of individual ants. They do so without direct communication or centralized control – Stigmergy.

Previous Research: our use of simulated ants to generate random numbers proved a novel application for ACS. Prior to 1992, ACS was used exclusively to study real ant behavior. However, in the last decade, beginning with Marco Dorigo’s 1992 PhD

Dissertation “Optimization, Learning and Natural Algorithms,” modeling the way real ants solve problems using pheromones, ant colony simulations have provided solutions to a variety of NP-hard combinatorial optimization problems

Page 6: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 6

MIMILL

MAPLD2005/P249

ACS Application Area: Data Mining

Ant Colony real-world behaviors applicable to Data Mining: Ant Foraging Cemetery Organization and Brood Sorting Division of Labor and Task Allocation Self-organization and Templates Co-operative Transport Nest Building

Page 9: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 9

MIMILL

MAPLD2005/P249

Feature/Object

Classification

Recognize

Clustering

Connection Topology

Store New Object

NEST (Data Warehouse)

ACS Data Mining

NO

YES

Update Cognitive Map

Data

Flowchart for the ACS Data Mining System

Page 10: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 10

MIMILL

MAPLD2005/P249

Knowledge Discovery andData Mining

What is Data Mining? “Discovery of useful summaries of data” Also, Data Mining refers to a collection of

techniques for extracting interesting relationships and knowledge hidden in data.

It is best described as “the nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable patterns in data.” (Fayyad, et al 1996)

Page 11: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 11

MIMILL

MAPLD2005/P249

Knowledge Discovery in Databases

Data Warehouse

Prepareddata

Data

CleaningIntegration

SelectionTransformation

DataMining

Patterns

EvaluationVisualization

Knowledge

KnowledgeBase

Page 13: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 13

MIMILL

MAPLD2005/P249

Clustering

What is Clustering?

Given points in some space, often a high-dimensional space, group the points into a small number of clusters, each cluster consisting of points that are “near” in some sense.

Page 14: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 14

MIMILL

MAPLD2005/P249

The k-Means Algorithm

k-means picks k cluster centroids and assigns points to the clusters by picking the closest centroid to the point in question. As points are assigned to clusters, the centroid of the cluster may migrate.

For a very simple example of five points in two dimensions. Suppose we assign the points 1, 2, 3, 4, and 5 in that order, with k = 2. Then the points 1 and 2 are assigned to the two clusters, and become their centroids for the moment.

When we consider point 3, suppose it is closer to 1, so 3 joins the cluster of 1, whose centroid moves to the point indicated as a. Suppose that when we assign 4, we find that 4 is closer to 2 than to a, so 4 joins 2 in its cluster, whose center thus moves to b. Finally, 5 is closer to a than to b, so it joins the cluster {1,3}, whose centroid moves to c.

Page 15: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 15

MIMILL

MAPLD2005/P249

The k-Means Algorithm

Having located the centroids of the k clusters, we can reassign all points, since some points that were assigned early may actually wind up closer to another centroid, as the centroids move about. If we are not sure of k, we can try different values of k until we find the smallest k such that increasing k does not much decrease the average distance of points to their centroids.

Page 16: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 16

MIMILL

MAPLD2005/P249

E = {Oi,…, On} Set of n data or objects collected.

Oi = {vi,…, vk} Each object is a vector of k numerical attributes.

Vector similarity is measured by Euclidean distance (can use other: Minkowski, Hamming, or Mahalanobis).

Dmax = max D{Oi, Oj}, where Oi,Oj E

ACS Notation and Heuristics

Page 17: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 17

MIMILL

MAPLD2005/P249

•2-D search area, in general, must be at least m2 n, but experiments have shown that m2 4n provides good results.

•A heap/pile H is considered to be a collection of two or more objects. This collection is located on a given single cell rather than just spatially connected. This limitation prevents overlaps.

O1 O2

O4

O5

O3

O5

O4

O2

O3

O1

Spatial pattern cluster Single-cell ranked cluster

ACS Notation and Heuristics

Page 18: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 18

MIMILL

MAPLD2005/P249

• Dmax is the maximum distance between two objects of H:

• Ocenter is the center of mass of all objects in H: (not necessarily a real object)

• Odissim is the most dissimilar object in H, i.e. which maximizes

• Dmean is the mean distance between the objects of H and the center of mass Ocenter :

HOcenteri

Hmean

i

HOODn

HD ))(,(1

)(

HO

iH

centeri

On

HO1

)(

),(max)(,

max jiHOO

OODHDji

))((., HOD center

ACS Distance Measures

Page 19: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 19

MIMILL

MAPLD2005/P249

ACS Unsupervised Learning and Clustering Algorithm

Initialize randomly the ant positions Repeat For each anti Do

Move anti If anti does not carry any object Then look at 8-cell

neighborhood and pick up object according to pick-up algorithm

Else (anti is already carrying an object O) look at 8-cell neighborhood and drop O according to drop-off algorithm

Until stopping criterion

Page 20: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 20

MIMILL

MAPLD2005/P249

ACS Data Mining AlgorithmTop Level

1. Load Database2. Data Compression3. Object Clustering4. Clustering of Similar Groups5. Reevaluate Objects in Groups

Page 21: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 21

MIMILL

MAPLD2005/P249

ACS Data Mining AlgorithmTop Level

Load Database Select Compression Method

Wavelets Principle Component Analysis None

Repeat for Max_Iterations1 – Object Clustering Begin Ants Redistribute Objects K-means

Repeat for Max_Iterations2 – Clustering of Similar Groups Ants Redistribute Piles (Clusters) of Objects K-means

Repeat for Max_Iterations3 – Reevaluate Objects in Groups Ants Redistribute Objects in Clusters with a Probability based on Least Similar

Objects Distance from the Mean of the Cluster K-means

Page 22: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 22

MIMILL

MAPLD2005/P249

ACS Object Pick-up Algorithm

1. Label 8-cell neighborhood as “unexplored”

2. Repeat1. Consider the next unexplored cell c around anti with the following order: cell 1is

NW, cell 2 is N, cell 3 is NE, … N is the direction the ant is facing.

2. If c is not empty Then do one of the following:1. If c contains a single object O, Then load O with probability Pload, Else

2. If c contains a heap of two objects, Then remove one of the two with a probability Pdestroy, Else

3. If c contains a heap H of more than 2 objects, Then remove the most dissimilar object Odissim(H) from H provided that

3. Label c as “explored”

3. Until all 8 cells have been explored or one object has been loaded

removemean

centerdissim THD

HOHOD

)(

))(),((

Page 23: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 23

MIMILL

MAPLD2005/P249

ACS Object Drop-off Algorithm

1. Label 8-cell neighborhood as “unexplored”

2. Repeat1. Consider the next unexplored cell c around anti with the following order: cell 1is NW,

cell 2 is N, cell 3 is NE, … N is the direction the ant is facing.1. If c is empty Then drop O in cell with a probability Pdrop, Else

2. If c contains a single object O’, Then drop O to create a heap H provided that:

Else

3. If c contains a heap H, Then drop O on H provided that:

2. Label c as “explored”

3. Until all 8 cells have been explored or carried object has been dropped

createTD

OOD

max

' ),(

))(),(())(,( HOHODHOOD centerdissimcenter

Page 24: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 24

MIMILL

MAPLD2005/P249

Parameter Table

Parameter Role Value (or Range)

Speed Distance ant can travel in one time step [1,10]

Pdirection Probability to move in same direction [0.5,1]

Maxcarry Maximum object carry time [20,200]

Pload Probability to pick-up an object [0.4,0.8]

Pdestroy Probability to destroy an heap of 2 objects

[0,0.6]

Pdrop Probability to drop an object [0.4,0.8]

Tremove Minimum dissimilarity for removing an object from a heap

[0.1,0.2]

Tcreate Maximum dissimilarity permitted for creating a heap of two objects

[0.05,0.2]

Page 25: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 25

MIMILL

MAPLD2005/P249

K-means Algorithm

1. Take as input the partition P of the data set found by the ants in the form of k heaps: Hi,…,Hk

2. Repeat1. Compute Ocenter(Hi),…, Ocenter(Hk)2. Remove all objects from heaps,3. For each object Oi E:

1. Let Hi, j [1, k] be the heap whose center is the closest to Oi,2. Assign Oi to Hj,

4. Compute the resulting new partition P = H1,…,Hk’ by removing all empty clusters,

3. Until stopping criterion

Page 26: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 26

MIMILL

MAPLD2005/P249

Benchmark Databases

The following public domain data sets were obtained from the UCI (University of California at Irvine) - Machine Learning Repository. These have been used extensively for classification tasks using different paradigms. The main characteristics of each of these domains are described in the three slides.

Page 27: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 27

MIMILL

MAPLD2005/P249

Tested Databases Golf

Very simple database, 4 attributes, 2 classes

Balloons The influence of prior knowledge on concept acquisition, 4 data sets, 4

attributes, 2 classes

Wine Well behaved class structure, 178 instances, 13 attributes, 3 classes

Hepatitis Poorly distributed database, 155 instances, 19 attributes, 2 classes

Iris (plant) Very popular database, 150 instances, 4 attributes, 3 classes.

Wisconsin Breast Cancer High dimensional database, 198 instances, 32 attributes, 2 classes

Page 28: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 28

MIMILL

MAPLD2005/P249

Golf Data Results

sunny 85 85 FALSE Don’t' Play 1 85 85 0 0 0.3333 1 0.8854 0 0sunny 80 90 TRUE Don’t' Play 1 80 90 1 0 0.3333 0.9412 0.9375 1 0

overcast 83 78 FALSE Play 2 83 78 0 1 0.6667 0.9765 0.8125 0 1rain 70 96 FALSE Play 3 70 96 0 1 1 0.8235 1 0 1rain 68 80 FALSE Play 3 68 80 0 1 1 0.8 0.8333 0 1rain 65 70 TRUE Don’t' Play 3 65 70 1 0 1 0.7647 0.7292 1 0

overcast 64 65 TRUE Play 2 64 65 1 1 0.6667 0.7529 0.6771 1 1sunny 72 95 FALSE Don’t' Play 1 72 95 0 0 0.3333 0.8471 0.9896 0 0sunny 69 70 FALSE Play 1 69 70 0 1 0.3333 0.8118 0.7292 0 1

rain 75 80 FALSE Play 3 75 80 0 1 1 0.8824 0.8333 0 1sunny 75 70 TRUE Play 1 75 70 1 1 0.3333 0.8824 0.7292 1 1

overcast 72 90 TRUE Play 2 72 90 1 1 0.6667 0.8471 0.9375 1 1overcast 81 75 FALSE Play 2 81 75 0 1 0.6667 0.9529 0.7813 0 1

rain 71 80 TRUE Don’t' Play 3 71 80 1 0 1 0.8353 0.8333 1 0

Outlook Temperature Humidity Windy Decisionsunny continuous continuous true/false play/don't play

overcastrain

Given Data Numerical Equivalent Normalized

Page 29: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 29

MIMILL

MAPLD2005/P249

Golf Data Results

4 3 1 0 1 0 0 0 1 0 0 0 0 1 0 0 1

8 4 3 0 0 1 1 1 0 1 0 1 1 0 1 1 0

2 2 4 1 0 0 0 0 0 0 1 0 0 0 0 0 0

Number in Cluster

Position of Cluster Objects (1-14)

ERROR

Play

Don’t Play

Don’t Play

Page 30: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 30

MIMILL

MAPLD2005/P249

Golf Data Results

9 3 3 0 0 1 1 1 0 1 0 1 1 1 1 1 0

3 4 3 0 1 0 0 0 1 0 0 0 0 0 0 0 1

2 5 5 1 0 0 0 0 0 0 1 0 0 0 0 0 0

Number in Cluster

Position of Cluster Objects (1-14)

Play

Don’t Play

Don’t Play

No Errors

Page 31: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 31

MIMILL

MAPLD2005/P249

Wine Database

Error: 0.050562

5 class 1 mislabeled as class 2

3 class 2 mislabeled as class 3

1 class 3 mislabeled as class 2

The attributes are 1) Alcohol 2) Malic acid 3) Ash

4) Alcalinity of ash 5) Magnesium

6) Total phenols 7) Flavanoids 8) Nonflavanoid phenols 9) Proanthocyanins

10)Color intensity 11)Hue 12)OD280/OD315 of diluted wines 13)Proline  

Data is the results of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars.

 

Number of Instancesclass 1 59class 2 71class 3 48

Page 32: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 32

MIMILL

MAPLD2005/P249

Iris (Plant) Database

Attribute Information:

1. sepal length in cm

2. sepal width in cm

3. petal length in cm

4. petal width in cm

This is perhaps the best known database to be found in the pattern recognition literature.

Number of Instances: 150 (50 in each of three classes)

-- Iris Setosa -- Iris Versicolour -- Iris Virginica

Errors:4 mislabeled as #2 Errors:2 mislabeled as #3Errors:3 mislabeled as #3 Errors:4 mislabeled as #2

0.046667 0.04

Errors: 0.0474 mislabeled as type 23 mislabeled as type 3

Errors: 0.042 mislabeled as type 34 mislabeled as type 2

Page 33: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 33

MIMILL

MAPLD2005/P249

ACS DM: Optimization of Parameters Number of Total Iterations Compression Method (PCA, Wavelet, None) Cluster Method

Objects Only Objects and Groups of Objects Objects, Groups, then Objects again

Number of Ants K-Means Iterations Distance Measure (Euclidean, Minkowski, Hamming, or

Mahalanobis) Others (RNG, Ants Movement Distance, Ant Carrying

Capacity)

Page 34: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 34

MIMILL

MAPLD2005/P249

ACS DM: Object Grouping Only

Database CompressionVector Length

Hybrid Iterations Ant Count Max_ACS Max_K-Means

Number of Data Points

Number of Resultant Groups Error Num ADM

WPBC None 33 50 80 30 15 198 34 1WPBC Wavelet DB3 12 50 80 30 15 198 29 1WPBC PCA 9 50 80 30 15 198 47 1WPBC None 33 100 80 30 15 198 32 1WPBC Wavelet DB3 12 100 80 30 15 198 18 1WPBC PCA 9 100 80 30 15 198 34 1Wine None 13 50 80 30 15 178 36 1.6854 1Wine Wavelet DB3 7 50 80 30 15 178 25 7.3034 1Wine PCA 7 50 80 30 15 178 33 1.6854 1Wine None 13 100 80 30 15 178 20 2.809 1Wine Wavelet DB3 7 100 80 30 15 178 22 7.3034 1Wine PCA 7 100 80 30 15 178 23 1.6854 1

Hepatitis None 19 50 80 30 15 80 22 8.75 1Hepatitis Wavelet DB3 8 50 80 30 15 80 18 13.75 1Hepatitis PCA 13 50 80 30 15 80 24 11.25 1Hepatitis None 19 100 80 30 15 80 18 8.75 1Hepatitis Wavelet DB3 8 100 80 30 15 80 16 16.25 1Hepatitis PCA 13 100 80 30 15 80 14 13.75 1

Iris None 4 50 80 30 15 150 5 4 1Iris Wavelet DB3 4 50 80 30 15 150 14 3.3333 1Iris PCA 1 50 80 30 15 150 6 6 1Iris None 4 100 80 30 15 150 5 4 1Iris Wavelet DB3 4 100 80 30 15 150 14 3.3333 1Iris PCA 1 100 80 30 15 150 6 6 1

Page 35: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 35

MIMILL

MAPLD2005/P249

ACS DM: Object and Cluster Grouping Only

Database CompressionVector Length

Hybrid Iterations Ant Count Max_ACS Max_K-Means

Number of Data Points

Number of Resultant Groups Error Num ADM

WPBC None 33 50 80 30 15 198 21 2WPBC Wavelet DB3 12 50 80 30 15 198 18 2WPBC PCA 9 50 80 30 15 198 43 2WPBC None 33 100 80 30 15 198 13 2WPBC Wavelet DB3 12 100 80 30 15 198 16 2WPBC PCA 9 100 80 30 15 198 29 2Wine None 13 50 80 30 15 178 33 1.6854 2Wine Wavelet DB3 7 50 80 30 15 178 21 6.7416 2Wine PCA 7 50 80 30 15 178 30 1.6854 2Wine None 13 100 80 30 15 178 19 2.809 2Wine Wavelet DB3 7 100 80 30 15 178 14 8.9888 2Wine PCA 7 100 80 30 15 178 21 1.6854 2

Hepatitis None 19 50 80 30 15 80 22 8.75 2Hepatitis Wavelet DB3 8 50 80 30 15 80 17 13.75 2Hepatitis PCA 13 50 80 30 15 80 24 11.25 2Hepatitis None 19 100 80 30 15 80 18 8.75 2Hepatitis Wavelet DB3 8 100 80 30 15 80 11 16.25 2Hepatitis PCA 13 100 80 30 15 80 14 13.75 2

Iris None 4 50 80 30 15 150 4 13.333 2Iris Wavelet DB3 4 50 80 30 15 150 4 13.333 2Iris PCA 1 50 80 30 15 150 3 3.333 2Iris None 4 100 80 30 15 150 3 4 2Iris Wavelet DB3 4 100 80 30 15 150 6 8 2Iris PCA 1 100 80 30 15 150 5 6 2

Page 36: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 36

MIMILL

MAPLD2005/P249

ACS DM: Object, Cluster, and ObjectDatabase Compression

Vector Length

Hybrid Iterations Ant Count Max_ACS Max_K-Means

Number of Data Points

Number of Resultant Groups Error Num ADM

WPBC None 33 50 80 30 15 198 19 3WPBC Wavelet DB3 12 50 80 30 15 198 18 3WPBC PCA 9 50 80 30 15 198 38 3WPBC None 33 100 80 30 15 198 12 3WPBC Wavelet DB3 12 100 80 30 15 198 14 3WPBC PCA 9 100 80 30 15 198 27 3Wine None 13 50 80 30 15 178 31 1.6854 3Wine Wavelet DB3 7 50 80 30 15 178 21 6.7416 3Wine PCA 7 50 80 30 15 178 29 1.6854 3Wine None 13 100 80 30 15 178 7 2.2472 3Wine Wavelet DB3 7 100 80 30 15 178 11 6.7416 3Wine PCA 7 100 80 30 15 178 21 2.2472 3

Hepatitis None 19 50 80 30 15 80 22 8.75 3Hepatitis Wavelet DB3 8 50 80 30 15 80 16 12.5 3Hepatitis PCA 13 50 80 30 15 80 22 11.25 3Hepatitis None 19 100 80 30 15 80 16 10 3Hepatitis Wavelet DB3 8 100 80 30 15 80 8 16.25 3Hepatitis PCA 13 100 80 30 15 80 11 16.25 3

Iris None 4 50 80 30 15 150 3 4 3Iris Wavelet DB3 4 50 80 30 15 150 4 13.333 3Iris PCA 1 50 80 30 15 150 3 3.3333 3Iris None 4 100 80 30 15 150 3 4 3Iris Wavelet DB3 4 100 80 30 15 150 6 8 3Iris PCA 1 100 80 30 15 150 5 6 3

Page 37: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 37

MIMILL

MAPLD2005/P249

Why Move to Hardware? For such large datasets the ACS classifier perform

remarkably well. However, Speed of classification is very limited in software. The computational bottlenecks lay in the number of multiply

and adds that must be performed for each object. In addition, the requirement of a square root for each distance measurement adds complexity.

Page 38: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 38

MIMILL

MAPLD2005/P249

Target Hardware:

Avnet’s Virtex II Pro Board

Uses Virtex II Pro XC2VP20 Many Options for I/O. 32 Bit PCI Bus has Data Throughput of Over 100 MB per Second.

Page 39: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 39

MIMILL

MAPLD2005/P249

ACS-DM System Top-Level HW

K-Means

Data Comparison

Module

Data Actions and

Information

Ant Colony Actions and Data

Module

Database Actions and Information Module

ACS

Cellular Automata

Random Number Generator

Page 42: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 42

MIMILL

MAPLD2005/P249

Device Utilization Summary

Selected Device : 2vp20ff896-6

Number of Slices: 6600 out of 9280 71% Number of Slice Flip Flops: 8312 out of 18560 44% Number of 4 input LUTs: 7661 out of 18560 41% Number of bonded IOBs: 266 out of 556 48% Number of BRAMs: 3 out of 88 3% Number of MULT18X18s: 8 out of 88 9% Number of GCLKs: 1 out of 16 6% ========================================================================= TIMING REPORT Clock Information: -----------------------------------+------------------------+-------+ Clock Signal | Clock buffer(FF name) | Load | -----------------------------------+------------------------+-------+ clk | BUFGP | 1419 | -----------------------------------+------------------------+-------+ Timing Summary: Minimum period: 16.499ns (Maximum Frequency: 60.611MHz) Minimum input arrival time before clock: 4.491ns Maximum output required time after clock: 6.087ns Maximum combinational path delay: 5.102ns

CORDIC Sqrt data path is greatest bottleneck causing high period

Page 43: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 43

MIMILL

MAPLD2005/P249

Hardware Euclidean Distance Result

Result from Matlab = 1.5058 Result from Hardware = 1.5172 Vectors are Fix 8_7 on input

Then after add: Fix 9_7 Then after multi: Fix 18_14 Then after accum: Fix 20_14 Then after CORDIC Sqrt: Fix 42_36

Error is present in round-off and Cordic Sqrt

V1 V2 0.838120.019640.681280.379480.83180.502810.709470.428890.304620.189650.193430.682220.302760.541670.150870.69790.378370.860010.853660.59356

0.496550.899770.821630.644910.817970.660230.341970.289730.341190.534080.727110.309290.83850.568070.370410.702740.546570.444880.694570.62131

N

nnn VVDist

1

221

Page 44: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 44

MIMILL

MAPLD2005/P249

Ant Colony Actions: Movement

Ant Colony Data

RNG Ant(1)RNG Ant(2)

RNG Ant(N)

Ant M

ove-Direction F

ilter

Ant Change Location

Current Location Data

New Location Data

Current LocationLast Location

Have Data Status

CARNG is a simple 32-bit rule 30 that is user initialized for reproducibility

Page 46: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 46

MIMILL

MAPLD2005/P249

Ant Colony Actions: Object Load/Drop

Ant Colony Data

Ant Change Have Data

Status

Current Have Data Status

New Have Data Status

Current LocationLast Location

Have Data Status

Object Information

Current LocationCarried Status

Current Location

Carried Status

Enable Drop/Load Y/NWere Probabilities and Thresholds Met?

Page 47: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 47

MIMILL

MAPLD2005/P249

ACS DM Hardware: Storage Requirements

Preprocessed Data (Number of Objects * Vector Length, 8- to 32-bit fixed-point) Object Vectors Object Locations Object Status

Parameter Values (16 32-bit fixed-point) Probabilities Thresholds Limits

Max Distance (1 32-bit fixed-point) Groups (Number of Objects * Number of Groups, 1-bit and 3*Number of Groups

8-bit) Members Means (Object Vector Length * 32-bit fixed-point) Locations

Ant Locations and Have-Object Status (Number of Ants * 8-bit, plus 1-bit status)

Page 50: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 50

MIMILL

MAPLD2005/P249

Block Diagram

• Virtex-II Pro is focal point.

• Spartan acts as bridge to PCI

• On Board Memory• 32 MB SDRAM• 2 MB SRAM• 16 MB FLASH• 128 MB DDR SDRAM• 64 MB Compact Flash

• Ethernet• RS232• 4 AvBus Connectors• 2 PMC Connectors

Page 51: MIL MAPLD2005/P249 FPGA Co-Processor Enhanced Ant Colony Systems Data Mining Jason Isaacs and Simon Y. Foo Machine Intelligence Laboratory FAMU-FSU College

Isaacs 51

MIMILL

MAPLD2005/P249

Conclusions/Future Work

Continue to design the ACS Data Mining System Implement an improved Memory Manager Correct Errors associated with Round-off and the CORDIC Sqrt. Implement the Group Clustering Algorithm

Optimize the PC/FPGA interfacing to create our own low-cost integrated system. Our problems currently reside on the PCI interface design shipped with the Avnet

Development Board. We are working hard to resolve this issue, but in the end we may have to consider another board. Also shown in presentation P248.

We also need to improve the speed. 60Mhz is too slow. Optimize data through put and calculating efficiency of the distance metric

algorithm, i.e., consider a multi-stage pipeline or employ the use of more look-up tables.

The ultimate goal is to demonstrate the ability of ACS algorithms to perform as well as other well-know techniques allowing for computational speed-up utilizing FPGAs as co-processors.