Upload
deacon-rosario
View
41
Download
4
Embed Size (px)
DESCRIPTION
Applying a Genetic Algorithm to Reconfigurable Hardware – a Case Study. B. Earl Wells * , Clint Patrick, Luis Trevino, John Weir and Jim Steincamp NASA Marshall Space Flight Center Huntsville, Alabama. * University of Alabama in Huntsville, Huntsville, Alabama. - PowerPoint PPT Presentation
Citation preview
Wells 1 E169/MAPLD 2004
Applying a Genetic Algorithm to Applying a Genetic Algorithm to Reconfigurable Hardware – a Case Reconfigurable Hardware – a Case
StudyStudyB. Earl Wells*, Clint Patrick, Luis Trevino, John Weir
and Jim Steincamp
NASA Marshall Space Flight Center
Huntsville, Alabama
*University of Alabama in Huntsville, Huntsville, Alabama
Wells 2 E169/MAPLD 2004
• To evaluate the technology of reconfigurable computing -- determine its level of maturity and suitability for use in future NASA applications
• To implement a nontrivial test bed type application on a Star Bridge Hypercomputer Model 36
• Chosen Application: a simple Genetic Algorithm
Project Motivation & Objectives
Wells 3 E169/MAPLD 2004
Targeted Hardware Platform•Starbridge HC-36 Hypercomputer System
•Employs Xilinx Virtex II 6000 Series FPGAs
Wells 4 E169/MAPLD 2004
• Development Environment: VIVA ™ Graphical User Interface Structural Design Philosophy with Behavioral Attributes: Polymorphism Object Overload
RecursionData flow and data driven type synchronization between objects (Go, Done, Busy, Wait protocol)
Large library of high end objects
Environment falls somewhere between hardware description languages and schematic capture packages
Development Environment
Wells 5 E169/MAPLD 2004
Polymorphism, Overloading, Recursion, and SynchronizationExample: Object to Determine Number of 1’s in a Binary Number
Terminal Case
Recursive Case
Wells 6 E169/MAPLD 2004
Genetic Algorithms
• Biologically Inspired Search Techniques
• Employs Selection, Replication (crossover), Mutation, and Replacement
• Iterative method -- very time intensive
• Regularly Structured
• Large Amounts of Concurrency Present that can be Exploited
Wells 8 E169/MAPLD 2004
GA Characteristics• 2 Way Tournament Selection
• No Elitism
• Single Point Cross Over with bit-wise mutation
• Weight Encoded Chromosome (weight translated into rank ordering of cities)
• Adjustable Parameters Population Size 2 to 512 (powers of 2), Number of Generations,
Probability of Mutation, Probability of Crossover
Wells 12 E169/MAPLD 2004
Chromosome 1
Chromosome 2
Offspring Chromosome
{25,17,10,20,33,14,7,29}
{44,12,17,38,20,5,70,13}
{25,17,10,20, 20,5,70,13}
Crossover Point = 4
Standard Single Point Crossover Operation (Weighted Chromosomes)
Wells 14 E169/MAPLD 2004
Single Point Mutation (Weighted Chromosomes)
Original Chromosome
{25,17,10,20,20,5,70,13}
Mutated Chromosome {25,17,10,20,55,5,70,13}
Mutated Element = 5
Wells 15 E169/MAPLD 2004
Traveling Salesman Problem (TSP)
• Given a specified number of “cities” along with the cost of travel between each pair of them, find the cheapest way of visiting all the cities and returning to the first city visited
• Asymmetric Case – direction traveled between any two cities matters (i.e. cost is different)
• Possible solutions (n-1)! – where n is the number of cities
Wells 16 E169/MAPLD 2004
• Well understood NP Complete optimization problem
• Academic literature contains many test problems
• Chose for test purposes an Asymmetric TSP with 65 cities (TSP 65)*
• Used a modified weight encoded chromosome representation
Traveling Salesman Problem (TSP)
*University of Heidelberg, http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95
Wells 17 E169/MAPLD 2004
Equivalent TSP Chromosome Representations
{25,17,10,20,55, 5,70,13} City No. 0 1 2 3 4 5 6 7
Weighted Chromosome
{ 5, 2, 7, 1, 3, 0, 4, 6} City Visit Order 1st 2nd 3rd 4th 5th 6th 7th 8th
Visit Order Permutation Chromosome
Rank Ordering [ 5, 3, 1, 4, 6, 0, 7, 2 ]
city numbers
weights
Wells 18 E169/MAPLD 2004
TSP Objective Function
• Systolic sort of chromosome weights• Summation of segments• Replacement of weights with rank orderings
Wells 19 E169/MAPLD 2004
Single Point Permutation Preserving Crossover Operation
Chromosome 1
Chromosome 2
Offspring Chromosome
{1,7,3,2,5,6,0,4}
{0,2,4,1,6,5,7,3}
{1,7,3,2,0,4,6,5}
Crossover Point = 4
Wells 21 E169/MAPLD 2004
Permutation Altering Mutation
Original Chromosome
{1,7,3,2,0,4,6,5} Mutation Removal Point = 6
Insertion Point = 3
Mutated Chromosome
{1,7,4,3,2,0,6,5}
Note: No change in Mutation Operator Needed
Wells 23 E169/MAPLD 2004
Comparison with Instruction Set Processor, ISP, Implementations
• Implemented TSP using a high-end 3.2 GHz Intel Xeon Processor with 3-level Cache
• Encoded Problem in C using pointers for maximum efficiency
• OS: Redhat Enterprise Linnx v 3 (Kernal 2.4.21 SMP) -- single user
• Basic Methodology Required ~1.6 mS/per Generation (population size 512)
• Optimized Version Required ~ 0.8ms/per Generation (population size 512)
Wells 24 E169/MAPLD 2004
Parallelization Strategies• Initial Basic Reconfigurable
Implementation on the Starbridge System required ~1.1 mS/per Generation!
[slower than the optimized ISP implementation]
(population size = 512, Clock speed 66 MHz)
• MORE PARALLELIZATION WAS NEEDED!
Wells 25 E169/MAPLD 2004
Parallelization Strategies
• Exploiting Concurrency in a Common Population– Temporal Parallelism via pipelining– Spatial Parallelism via replicating functional
units
• Processing Isolated Subpopulations – With chromosome migration (very promising
for Starbridge system but not yet completed)
Wells 29 E169/MAPLD 2004
Resource Requirements• Non-pipelined 1 TSP Implementation
Number of SLICES 10910 out of 33792 32%
Number of Block RAMs 40 out of 144 27%
Total equivalent gate count: 2,767,231
• Pipelined 1 TSP Implementation Number of SLICES 10957 out of 33792 32%
Number of Block RAMs 40 out of 144 27%
Total equivalent gate count: 2,770,741
Wells 30 E169/MAPLD 2004
Resource Requirements• Pipelined 2 TSP Implementation Number of SLICES 13738 out of 33792 40% Number of Block RAMs 45 out of 144 31% Total equivalent gate count: 3,149,966
• Pipelined 4 TSP Implementation Number of SLICES 19685 out of 33792 58% Number of Block RAMs 55 out of 144 38% Total equivalent gate count: 3,908,362
• Pipelined 6 TSP Implementation Number of SLICES 25728 out of 33792 76% Number of Block RAMs 65 out of 144 45% Total equivalent gate count: 4,664,262
Wells 31 E169/MAPLD 2004
Problems Encountered
• Synthesis Time Issues
(within Viva and within Xilinx)
• Maturity/Robustness of CAD Tools
• Learning Curve
• Timing Issues
• I/O Pin Limitations
Wells 32 E169/MAPLD 2004
Summary & Conclusion• A simple genetic algorithm was implemented on
reconfigurable hardware using the Viva paradigm • Significant but not spectacular speedups have
been obtained for the TSP using a combination of temporal and spatial parallel processing methods
• Many other opportunities exist to improve processing through put
• The concept of isolated subpopulations is very promising method to further improve performance