32
Wells 1 E169/MAPLD 2004 Applying a Genetic Applying a Genetic Algorithm to Reconfigurable Algorithm to Reconfigurable Hardware – a Case Study Hardware – a Case Study B. Earl Wells * , Clint Patrick, Luis Trevino, John Weir and Jim Steincamp NASA Marshall Space Flight Center Huntsville, Alabama * University of Alabama in Huntsville, Huntsville, Alabama

Applying a Genetic Algorithm to Reconfigurable Hardware – a Case Study

Embed Size (px)

DESCRIPTION

Applying a Genetic Algorithm to Reconfigurable Hardware – a Case Study. B. Earl Wells * , Clint Patrick, Luis Trevino, John Weir and Jim Steincamp NASA Marshall Space Flight Center Huntsville, Alabama. * University of Alabama in Huntsville, Huntsville, Alabama. - PowerPoint PPT Presentation

Citation preview

Wells 1 E169/MAPLD 2004

Applying a Genetic Algorithm to Applying a Genetic Algorithm to Reconfigurable Hardware – a Case Reconfigurable Hardware – a Case

StudyStudyB. Earl Wells*, Clint Patrick, Luis Trevino, John Weir

and Jim Steincamp

NASA Marshall Space Flight Center

Huntsville, Alabama

*University of Alabama in Huntsville, Huntsville, Alabama

Wells 2 E169/MAPLD 2004

• To evaluate the technology of reconfigurable computing -- determine its level of maturity and suitability for use in future NASA applications

• To implement a nontrivial test bed type application on a Star Bridge Hypercomputer Model 36

• Chosen Application: a simple Genetic Algorithm

Project Motivation & Objectives

Wells 3 E169/MAPLD 2004

Targeted Hardware Platform•Starbridge HC-36 Hypercomputer System

•Employs Xilinx Virtex II 6000 Series FPGAs

Wells 4 E169/MAPLD 2004

• Development Environment: VIVA ™ Graphical User Interface Structural Design Philosophy with Behavioral Attributes: Polymorphism Object Overload

RecursionData flow and data driven type synchronization between objects (Go, Done, Busy, Wait protocol)

Large library of high end objects

Environment falls somewhere between hardware description languages and schematic capture packages

Development Environment

Wells 5 E169/MAPLD 2004

Polymorphism, Overloading, Recursion, and SynchronizationExample: Object to Determine Number of 1’s in a Binary Number

Terminal Case

Recursive Case

Wells 6 E169/MAPLD 2004

Genetic Algorithms

• Biologically Inspired Search Techniques

• Employs Selection, Replication (crossover), Mutation, and Replacement

• Iterative method -- very time intensive

• Regularly Structured

• Large Amounts of Concurrency Present that can be Exploited

Wells 7 E169/MAPLD 2004

Genetic Algorithm Implementation

Top Level View

Run Time Environment

Wells 8 E169/MAPLD 2004

GA Characteristics• 2 Way Tournament Selection

• No Elitism

• Single Point Cross Over with bit-wise mutation

• Weight Encoded Chromosome (weight translated into rank ordering of cities)

• Adjustable Parameters Population Size 2 to 512 (powers of 2), Number of Generations,

Probability of Mutation, Probability of Crossover

Wells 9 E169/MAPLD 2004

Block Diagram Level View of

Genetic Algorithm Implementation

Wells 10 E169/MAPLD 2004

Replacement & Chromosome Storage

Wells 11 E169/MAPLD 2004

Selection

Wells 12 E169/MAPLD 2004

Chromosome 1

Chromosome 2

Offspring Chromosome

{25,17,10,20,33,14,7,29}

{44,12,17,38,20,5,70,13}

{25,17,10,20, 20,5,70,13}

Crossover Point = 4

Standard Single Point Crossover Operation (Weighted Chromosomes)

Wells 13 E169/MAPLD 2004

Standard Single Point Crossover Operation (Weighted Chromosomes)

Wells 14 E169/MAPLD 2004

Single Point Mutation (Weighted Chromosomes)

Original Chromosome

{25,17,10,20,20,5,70,13}

Mutated Chromosome {25,17,10,20,55,5,70,13}

Mutated Element = 5

Wells 15 E169/MAPLD 2004

Traveling Salesman Problem (TSP)

• Given a specified number of “cities” along with the cost of travel between each pair of them, find the cheapest way of visiting all the cities and returning to the first city visited

• Asymmetric Case – direction traveled between any two cities matters (i.e. cost is different)

• Possible solutions (n-1)! – where n is the number of cities

Wells 16 E169/MAPLD 2004

• Well understood NP Complete optimization problem

• Academic literature contains many test problems

• Chose for test purposes an Asymmetric TSP with 65 cities (TSP 65)*

• Used a modified weight encoded chromosome representation

Traveling Salesman Problem (TSP)

*University of Heidelberg, http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95

Wells 17 E169/MAPLD 2004

Equivalent TSP Chromosome Representations

{25,17,10,20,55, 5,70,13} City No. 0 1 2 3 4 5 6 7

Weighted Chromosome

{ 5, 2, 7, 1, 3, 0, 4, 6} City Visit Order 1st 2nd 3rd 4th 5th 6th 7th 8th

Visit Order Permutation Chromosome

Rank Ordering [ 5, 3, 1, 4, 6, 0, 7, 2 ]

city numbers

weights

Wells 18 E169/MAPLD 2004

TSP Objective Function

• Systolic sort of chromosome weights• Summation of segments• Replacement of weights with rank orderings

Wells 19 E169/MAPLD 2004

Single Point Permutation Preserving Crossover Operation

Chromosome 1

Chromosome 2

Offspring Chromosome

{1,7,3,2,5,6,0,4}

{0,2,4,1,6,5,7,3}

{1,7,3,2,0,4,6,5}

Crossover Point = 4

Wells 20 E169/MAPLD 2004

Modified Crossover Operator

Wells 21 E169/MAPLD 2004

Permutation Altering Mutation

Original Chromosome

{1,7,3,2,0,4,6,5} Mutation Removal Point = 6

Insertion Point = 3

Mutated Chromosome

{1,7,4,3,2,0,6,5}

Note: No change in Mutation Operator Needed

Wells 22 E169/MAPLD 2004

Wells 23 E169/MAPLD 2004

Comparison with Instruction Set Processor, ISP, Implementations

• Implemented TSP using a high-end 3.2 GHz Intel Xeon Processor with 3-level Cache

• Encoded Problem in C using pointers for maximum efficiency

• OS: Redhat Enterprise Linnx v 3 (Kernal 2.4.21 SMP) -- single user

• Basic Methodology Required ~1.6 mS/per Generation (population size 512)

• Optimized Version Required ~ 0.8ms/per Generation (population size 512)

Wells 24 E169/MAPLD 2004

Parallelization Strategies• Initial Basic Reconfigurable

Implementation on the Starbridge System required ~1.1 mS/per Generation!

[slower than the optimized ISP implementation]

(population size = 512, Clock speed 66 MHz)

• MORE PARALLELIZATION WAS NEEDED!

Wells 25 E169/MAPLD 2004

Parallelization Strategies

• Exploiting Concurrency in a Common Population– Temporal Parallelism via pipelining– Spatial Parallelism via replicating functional

units

• Processing Isolated Subpopulations – With chromosome migration (very promising

for Starbridge system but not yet completed)

Wells 26 E169/MAPLD 2004

Applying Temporal Parallelism

Wells 27 E169/MAPLD 2004

Applying Spatial Parallelism

Wells 28 E169/MAPLD 2004

Wells 29 E169/MAPLD 2004

Resource Requirements• Non-pipelined 1 TSP Implementation

Number of SLICES 10910 out of 33792 32%

Number of Block RAMs 40 out of 144 27%

Total equivalent gate count: 2,767,231

• Pipelined 1 TSP Implementation Number of SLICES 10957 out of 33792 32%

Number of Block RAMs 40 out of 144 27%

Total equivalent gate count: 2,770,741

Wells 30 E169/MAPLD 2004

Resource Requirements• Pipelined 2 TSP Implementation Number of SLICES 13738 out of 33792 40% Number of Block RAMs 45 out of 144 31% Total equivalent gate count: 3,149,966

• Pipelined 4 TSP Implementation Number of SLICES 19685 out of 33792 58% Number of Block RAMs 55 out of 144 38% Total equivalent gate count: 3,908,362

• Pipelined 6 TSP Implementation Number of SLICES 25728 out of 33792 76% Number of Block RAMs 65 out of 144 45% Total equivalent gate count: 4,664,262

Wells 31 E169/MAPLD 2004

Problems Encountered

• Synthesis Time Issues

(within Viva and within Xilinx)

• Maturity/Robustness of CAD Tools

• Learning Curve

• Timing Issues

• I/O Pin Limitations

Wells 32 E169/MAPLD 2004

Summary & Conclusion• A simple genetic algorithm was implemented on

reconfigurable hardware using the Viva paradigm • Significant but not spectacular speedups have

been obtained for the TSP using a combination of temporal and spatial parallel processing methods

• Many other opportunities exist to improve processing through put

• The concept of isolated subpopulations is very promising method to further improve performance