50
A Parallel Implementation of a Multi-Objective Evolutionary Algorithm Christos Kannas Constantinos Pattichis University of Cyprus

2009 MSc Presentation for Parallel-MEGA

Embed Size (px)

DESCRIPTION

MSc presentation for "A Parallel Implementation of a Multi-objective Evolutionary Algorithm"

Citation preview

Page 1: 2009 MSc Presentation for Parallel-MEGA

A Parallel Implementation of a Multi-Objective Evolutionary

AlgorithmChristos Kannas

Constantinos Pattichis

University of Cyprus

Page 2: 2009 MSc Presentation for Parallel-MEGA

2

Outline Introduction Goals Evolutionary Algorithms

Background Single & Multi Objective Optimization The family

Parallel Evolutionary Algorithms Algorithms

Multi-Objective Evolutionary Graph Algorithm Parallel Multi-Objective Evolutionary Graph Algorithm

Implementation Results Discussion Conclusion and Future Work Bibliography

Page 3: 2009 MSc Presentation for Parallel-MEGA

3

Introduction The majority of software programs are written for serial computation, which does not

take advantage of the multi-/many-core technology available in current CPUs. Parallel computing can use multi-/many-cores CPUs and/or General Programming

enabled GPUs. In general, parallel programs are very dependent on the specific hardware architecture

for which they were developed. General parallel programming turns out to be extremely difficult to implement due to the existence of different hardware architectures and to the limitations of current programming paradigms and languages.

Evolutionary Algorithms are stochastic search and optimization techniques which were inspired by natural evolution and population genetics. EAs are an interdisciplinary research field with relationships to biology, artificial intelligence, numerical optimization and applications for decision support in almost any engineering discipline.

Parallel Evolutionary Algorithms (PEAs) emerged for two main reasons: one is the need to achieve time-savings by distributing the computational effort and the second is to benefit from a parallel environment from an algorithmic point of view.

Page 4: 2009 MSc Presentation for Parallel-MEGA

4

Goals

The goals of this thesis are to: Explore the possibility of using a Parallel Evolutionary Algorithm on a

multi-/many-core CPU environment. Implement a Parallel Evolutionary Algorithm that exploits multi-/many-core

architectures, so as to produce solutions of the same quality as an ordinary Evolutionary Algorithm in much less time.

Learn more about parallel programming in multi-/many-core environment with the use of high level programming languages, such as Python.

Gain experiences in Parallel Evolutionary Algorithms, their strengths, weakness and potential applications.

Page 5: 2009 MSc Presentation for Parallel-MEGA

5

Evolutionary AlgorithmsBackground

Evolutionary Algorithms are based on a model of natural, biological evolution, which was formulated for the first time by Charles Darwin [4]. Darwin’s “Theory of Evolution” explains the adaptive change of species by the principle of natural selection, which favours those species for survival and further evolution that are best adapted to their environmental conditions. In addition to selection, Darwin recognized the occurrence of small, apparently random and undirected variations between the phenotypes (any observable characteristic or trait of an organism).

Modern biochemistry and genetics has extended the Darwinian Theory by microscopic findings concerning the mechanisms of heredity, the resulting theory is called the “Synthetic Theory of Evolution” [6]. This theory is based on genes as transfer units of heredity. Genes are occasionally changed by mutations. Selection acts on the individual, which expresses in its phenotype the complex interactions within its genotype, its total genetic information, as well as the interaction of the genotype with the environment in determining the phenotype. The evolving unit is the population which consists of a common gene pool included in the genotypes of the individuals.

Page 6: 2009 MSc Presentation for Parallel-MEGA

6

Evolutionary AlgorithmsSingle & Multi Objective Optimization

EAs are search algorithms aiming to find an optimal solution to the problem.Optimization can be classified in two major families.

Single Objective OPtimization (SOOP).For problems that have to satisfy one objective function.A single optimal solution is found.

Multi Objective OPtimization (MOOP).For problems that have to satisfy many objective functions.A list of equally optimal solutions are found, called Pareto front.

Page 7: 2009 MSc Presentation for Parallel-MEGA

7

Evolutionary AlgorithmsThe family

EAs are a big family of algorithms.Genetic Algorithms (GAs).

Chromosome string.Genetic Programming (GP).

Tree structure.Evolutionary Strategies (ES).

Real value vectors.Evolutionary Programming (EP).

Finite State Machines.

Population

Fitness Evaluation

Parent Selection

Reproduction

Page 8: 2009 MSc Presentation for Parallel-MEGA

8

Parallel Evolutionary Algorithms

Coarse grained.Divide the population into several subpopulations.Distribution of the subpopulations.

Fine grained.Inherent parallelism.

Individual level parallelism.Fitness level parallelism.

Subpop 1

Subpop 2 Subpop 3

Subpop 4Subpop 5

MasterFitness Assignment

Slave 1Reproduction

Fitness Evaluation

Slave 2Reproduction

Fitness Evaluation

Slave NReproduction

Fitness Evaluation

Page 9: 2009 MSc Presentation for Parallel-MEGA

9

Algorithms

Multi-Objective Evolutionary Graph Algorithm (MEGA) description. Chromosomes are graphs.

Working Population

Calculate Fitness

Hard Filter

Pareto Ranking

Calculate Efficiency

Select Parents

Evolve Parents

Page 10: 2009 MSc Presentation for Parallel-MEGA

10

Algorithms

Multi-Objective Evolutionary Graph Algorithm (MEGA) description. Chromosomes are graphs.

Working Population

Calculate Fitness

Hard Filter

Pareto Ranking

Calculate Efficiency

Select Parents

Evolve Parents

Page 11: 2009 MSc Presentation for Parallel-MEGA

11

Algorithms

Multi-Objective Evolutionary Graph Algorithm (MEGA) description. Chromosomes are graphs.

Working Population

Calculate Fitness

Hard Filter

Pareto Ranking

Calculate Efficiency

Select Parents

Evolve Parents

Page 12: 2009 MSc Presentation for Parallel-MEGA

12

Algorithms

Multi-Objective Evolutionary Graph Algorithm (MEGA) description. Chromosomes are graphs.

Working Population

Calculate Fitness

Hard Filter

Pareto Ranking

Calculate Efficiency

Select Parents

Evolve Parents

Page 13: 2009 MSc Presentation for Parallel-MEGA

13

Algorithms

Multi-Objective Evolutionary Graph Algorithm (MEGA) description. Chromosomes are graphs.

Working Population

Calculate Fitness

Hard Filter

Pareto Ranking

Calculate Efficiency

Select Parents

Evolve Parents

Page 14: 2009 MSc Presentation for Parallel-MEGA

14

Algorithms

Multi-Objective Evolutionary Graph Algorithm (MEGA) description. Chromosomes are graphs.

Working Population

Calculate Fitness

Hard Filter

Pareto Ranking

Calculate Efficiency

Select Parents

Evolve Parents

Page 15: 2009 MSc Presentation for Parallel-MEGA

15

Algorithms

Multi-Objective Evolutionary Graph Algorithm (MEGA) description. Chromosomes are graphs.

Working Population

Calculate Fitness

Hard Filter

Pareto Ranking

Calculate Efficiency

Select Parents

Evolve Parents

Page 16: 2009 MSc Presentation for Parallel-MEGA

16

Algorithms

Multi-Objective Evolutionary Graph Algorithm (MEGA) description. Chromosomes are graphs.

Working Population

Calculate Fitness

Hard Filter

Pareto Ranking

Calculate Efficiency

Select Parents

Evolve Parents

Page 17: 2009 MSc Presentation for Parallel-MEGA

17

Algorithms

Parallel Multi-Objective Evolutionary Graph Algorithm (PMEGA) description. Divide population into several subpopulations.

Number of subpopulations sets the number of tasks. Evolve subpopulations independently and isolated.

Isolation time. Merge results.

Page 18: 2009 MSc Presentation for Parallel-MEGA

18

Implementation

Programming language Python. Use of processes. No threads due to Global Interpreter Lock(GIL). Task scheduling with the use of a Pool of Processes.

Page 19: 2009 MSc Presentation for Parallel-MEGA

19

ImplementationWorking Population

Subpopulation 1

Subpopulation 2

Subpopulation X

Pool of processes, N processes for an N core CPU.

Each process is assigned a task to evolve a subpopulation, using MEGA algorithm.

Process 1

Process 2

Process N

Processes return their results.

Merge the results.

Page 20: 2009 MSc Presentation for Parallel-MEGA

20

ImplementationWorking Population

Subpopulation 1

Subpopulation 2

Subpopulation X

Pool of processes, N processes for an N core CPU.

Each process is assigned a task to evolve a subpopulation, using MEGA algorithm.

Process 1

Process 2

Process N

Processes return their results.

Merge the results.

Page 21: 2009 MSc Presentation for Parallel-MEGA

21

ImplementationWorking Population

Subpopulation 1

Subpopulation 2

Subpopulation X

Pool of processes, N processes for an N core CPU.

Each process is assigned a task to evolve a subpopulation, using MEGA algorithm.

Process 1

Process 2

Process N

Processes return their results.

Merge the results.

Page 22: 2009 MSc Presentation for Parallel-MEGA

22

ImplementationWorking Population

Subpopulation 1

Subpopulation 2

Subpopulation X

Pool of processes, N processes for an N core CPU.

Each process is assigned a task to evolve a subpopulation, using MEGA algorithm.

Process 1

Process 2

Process N

Processes return their results.

Merge the results.

Page 23: 2009 MSc Presentation for Parallel-MEGA

23

ImplementationWorking Population

Subpopulation 1

Subpopulation 2

Subpopulation X

Pool of processes, N processes for an N core CPU.

Each process is assigned a task to evolve a subpopulation, using MEGA algorithm.

Process 1

Process 2

Process N

Processes return their results.

Merge the results.

Page 24: 2009 MSc Presentation for Parallel-MEGA

24

ImplementationWorking Population

Subpopulation 1

Subpopulation 2

Subpopulation X

Pool of processes, N processes for an N core CPU.

Each process is assigned a task to evolve a subpopulation, using MEGA algorithm.

Process 1

Process 2

Process N

Processes return their results.

Merge the results.

Page 25: 2009 MSc Presentation for Parallel-MEGA

25

Results

MEGA vs. PMEGA on 2 cores CPU CPU: Intel Core 2 Duo E8400 @ 3.0 GHz (2 Physical Cores) Memory: 4 Gbytes

MEGAPMEGA 2

SubpopulationsPMEGA 4

Subpopulations

1st Experiment (population 100, iterations 200)

Average Execution Time

0:39:15 0:22:44 0:25:50

Speedup 1.727206062 1.519243174

2nd Experiment (population 150, iterations 200)

Average Execution Time

1:04:44 0:34:48 0:40:33

Speedup 1.859696728 1.596464785

3rd Experiment (population 200, iterations 200)

Average Execution Time

1:22:20 0:49:22 0:54:22

Speedup 1.667604366 1.514253602

Summary Average Speedup 1.8

Page 26: 2009 MSc Presentation for Parallel-MEGA

26

Results

MEGA Blue dots: 1st Pareto front Red dots: Pareto fronts at 25%, 50%, 75% and 100%.

Page 27: 2009 MSc Presentation for Parallel-MEGA

27

Results

PMEGA 2 Subpopulations Blue dots: 1st Pareto front Red dots: Pareto front at 100%.

Page 28: 2009 MSc Presentation for Parallel-MEGA

28

Results

PMEGA 4 Subpopulations Blue dots: 1st Pareto front Red dots: Pareto front at 100%.

Page 29: 2009 MSc Presentation for Parallel-MEGA

29

Results Genotype

diversity: Clustering at parameter space.

Phenotype diversity: Clustering at objective space.

Page 30: 2009 MSc Presentation for Parallel-MEGA

30

Results

ER-alpha active ligands (left). ER-beta active ligands (right).

Page 31: 2009 MSc Presentation for Parallel-MEGA

31

Results MEGA vs. PMEGA 2 Subpopulations vs. PMEGA 4 Subpopulations

Page 32: 2009 MSc Presentation for Parallel-MEGA

32

Results

MEGA vs. PMEGA on 4 cores CPU CPU: Inter Core 2 Duo Quad Q6600 @ 2.4 GHz (4 Physical Cores) Memory: 4 Gbytes

MEGAPMEGA 2

SubpopulationsPMEGA 4

Subpopulations

1st Experiment (population 100, iterations 200)

Average Execution Time

0:55:57 0:20:53 0:21:27

Speedup 2.678191489 2.60880829

2nd Experiment (population 150, iterations 200)

Average Execution Time

1:26:18 0:32:02 0:32:12

Speedup 2.693895248 2.679489391

3rd Experiment (population 200, iterations 200)

Average Execution Time

1:56:45 0:42:39 0:41:45

Speedup 2.73752768 2.796540253

Summary Average Speedup 2.7

Page 33: 2009 MSc Presentation for Parallel-MEGA

33

Results

MEGA Blue dots: 1st Pareto front Red dots: Pareto fronts at 25%, 50%, 75% and 100%.

Page 34: 2009 MSc Presentation for Parallel-MEGA

34

Results

PMEGA 4 Subpopulations Blue dots: 1st Pareto front Red dots: Pareto front at 100%.

Page 35: 2009 MSc Presentation for Parallel-MEGA

35

Results

PMEGA 8 Subpopulations Blue dots: 1st Pareto front Red dots: Pareto front at 100%.

Page 36: 2009 MSc Presentation for Parallel-MEGA

36

Results Genotype

diversity: Clustering at parameter space.

Phenotype diversity: Clustering at objective space.

Page 37: 2009 MSc Presentation for Parallel-MEGA

37

Results

ER-alpha active ligands (left). ER-beta active ligands (right).

Page 38: 2009 MSc Presentation for Parallel-MEGA

38

Results MEGA vs. PMEGA 4 Subpopulations vs. PMEGA 8 Subpopulations

Page 39: 2009 MSc Presentation for Parallel-MEGA

39

Discussion

Amdahl’s Law.

Speedup

Maximum Speedup

Page 40: 2009 MSc Presentation for Parallel-MEGA

40

Discussion

N Sest E

2 1.8 0.9

4 2.8 0.7

8 4.0 0.5

16 5.1 0.3

32 5.9 0.2

64 6.4 0.1Pest

0.86Estimated parallel percentage of the

algorithm.Sest

Estimated SpeedupN

Number of coresE

EfficiencySmax

7.0

Calculate estimated Parallel % and estimated maximum Speedup from results based on 2 cores CPU.

Page 41: 2009 MSc Presentation for Parallel-MEGA

41

Discussion

N Sest E

4 2.7 0.7

8 3.8 0.5

16 4.7 0.3

32 5.4 0.2

64 6.4 0.1Pest

0.84Estimated parallel percentage of the

algorithm.Sest

Estimated SpeedupN

Number of coresE

EfficiencySmax

6.3

Calculate estimated Parallel % and estimated maximum Speedup from results based on 4 cores CPU.

Page 42: 2009 MSc Presentation for Parallel-MEGA

42

Discussion

The estimated parallelized part of the algorithm is 85% so by applying Amdahl’s Law and setting the number of CPUs to infinite, then the maximum estimated speedup is 6.6.

When comparing PMEGA with itself, using different number of subpopulations at the same problem, a minor reduction to the speedup gained is observed.

PMEGA can provide similar solutions with MEGA. PMEGA’s subpopulations can provide solutions that are different

among subpopulations. Despite the fact that subpopulations were selected at random without the use of any knowledge.

Page 43: 2009 MSc Presentation for Parallel-MEGA

43

Conclusions

MEGA and PMEGA behave comparably. The differences observed between the final Pareto-front approximations produced, are partly due to the way PMEGA splits the working population into subpopulations.

PMEGA can achieve a maximum speedup of 6.6. On the other hand is stated clearly that when using a two cores CPU, PMEGA completes in the half time that MEGA needs and when using a four cores CPU, PMEGA completes almost in one third of the time that MEGA needs.

Number of subpopulations has an effect on the quality of the results obtained. With more subpopulations the more uniform the resulting solutions are, at least in objective space.

Page 44: 2009 MSc Presentation for Parallel-MEGA

44

Future Work

Algorithmic improvements in the way subpopulations are selected with the aid of knowledge-driven approaches in order to improve the quality of the optimization search and reduce the number of iterations needed for convergence.

Also an investigation if further parallelization is possible in order to achieve a better maximum speedup.

Page 45: 2009 MSc Presentation for Parallel-MEGA

45

Bibliography

1. M. Tomassini, Parallel and Distributed Evolutionary Algorithms: A Review, Institute of Computer Science, University of Lausanne.

2. Bäck T., Evolutionary Algorithms in Theory and Practice. Oxford University Press 1996

3. http://en.wikipedia.org/wiki/Optimization_%28mathematics%29 (accessed December 18, 2009)

4. http://en.wikipedia.org/wiki/Charles_Darwin (accessed December 1, 2009)

5. Darwin, C. R. 1876, The origin of species by means of natural selection, or the preservation of favoured races in the struggle for life. London: John Murray. 6th ed, with additions and corrections.

6. http://anthro.palomar.edu/synthetic/default.htm (accessed December 1, 2009)

7. C. A. Nicolaou, J. Apostolakis,, and C. S. Pattichis, “De Novo Drug Design Using Multiobjective Evolutionary Graphs”, J. Chem. Inf. Model., vol. 49(2), pp. 295-307, 2009.

Page 46: 2009 MSc Presentation for Parallel-MEGA

46

Bibliography

8. Baringhaus, K.-H.; Matter, H. Efficient strategies for lead optimization by simultaneously addressing affinity, selectivity and pharmacokinetic parameters. In Chemoinformatics in Drug DiscoVery; Oprea, T., Ed.; Wiley-VCH: Weinheim, Germany, 2004; pp 333-379.

9. Schneider, G.; Fechner, U. Computer-based de novo design of druglike molecules. Nat. ReV. Drug DiscoVery 2005, 4, 649–663.

10. Xu, J.; Hagler, A. Chemoinformatics and drug discovery. Molecules 2002, 7, 566–600.

11. Bohacek, R. S.; Martin, C.; Guida, W. C. The Art and Practice of Structure-Based Drug Design: A Molecular Modelling Approach. Med. Res. ReV. 1996, 16, 3–50.

12. Evolutionary Algorithms 8 Population models – Parallel Implementations, http://www.geatbx.com/docu/algindex-07.html (accessed August 20, 2009).

13. P. Adamidis, Parallel Evolutionary algorithms: A Review, Dept. of Applied Informatics, University of Macedonia.

Page 47: 2009 MSc Presentation for Parallel-MEGA

47

Bibliography

14. http://en.wikipedia.org/wiki/Speedup (accessed December 1, 2009).

15. http://en.wikipedia.org/wiki/Gene_Amdahl (accessed December 1, 2009).

16. http://en.wikipedia.org/wiki/ Amdahl's_law (accessed December 1, 2009).

17. Fonseca, C. M.; Fleming, P. J. Genetic Algorithms for Multiobjective Optimization: Formulation, Discussion and Generalization. In Proceedings of the Fifth International Conference on Genetic Algorithms; Forrest, S., Ed.; Morgan Kaufmann: San Mateo, CA, 1993; pp 416-423.

18. Y. Colette, P. Siarry, Multiobjective Optimization: Principles and Case Studies; Springer-Verlag: Berlin, Germany, 2004.

19. http://www.python.org (accessed August 20, 2009).

20. http://www.parallelpython.com (accessed September 23, 2009).

21. http://code.google.com/p/mpi4py (accessed September 23, 2009).

22. http://en.wikipedia.org/wiki/CUDA (accessed September 23, 2009).

Page 48: 2009 MSc Presentation for Parallel-MEGA

48

Bibliography

23. http://en.wikipedia.org/wiki/OpenCL (accessed September 23, 2009).

24. http://pypi.python.org/pypi/pycuda (accessed September 23, 2009).

25. http://pypi.python.org/pypi/pyopencl (accessed September 23, 2009).

26. http://en.wikipedia.org/wiki/Global_Interpreter_Lock (accessed September 23, 2009).

27. http://docs.python.org/library/multiprocessing.html (accessed September 23, 2009).

28. http://docs.python.org/library/threading.html #module-threading (accessed September 23, 2009).

29. J. Noller, R. Oudkerk, PEP 371 -- Addition of the multiprocessing package to the standard library, (accessed Mar 19, 2009).

30. http://en.wikipedia.org/wiki/Cluster_analysis (accessed December 16 2009)

31. http://www.cs.ucy.ac.cy/itab2009/

32. http://en.wikipedia.org/wiki/Jaccard_index (accessed September 28, 2009).

Page 49: 2009 MSc Presentation for Parallel-MEGA

49

Bibliography

34. Wheeler, D. L.; Barrett, T.; Benson, D. A.; Bryant, S. H.; Canese, K.; Chetvernin, V.; Church, D. M.; DiCuccio, M.; Edgar, R.; Federhen, S.; Geer, L. Y.; Helmberg, W.; Kapustin, Y.; Kenton, D. L.; Khovayko, O.; Lipman, D. J.; Madden, T. L.; Maglott, D. R.; Ostell, J.; Pruitt, K. D.; Schuler, G. D.; Schriml, L. M.; Sequeira, E.; Sherry, S. T.; Sirotkin, K.; Souvorov, A.; Starchenko, G.; Suzek, T. O.; Tatusov, R.; Tatusova, T. A.; Wagner, L.; Yaschenko, E. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2006, 34, D173–D180.

35. http://www.noesisinformatics.com (accessed September 28, 2009).

36. http://molinspiration.com (accessed December 16 2009)

37. Handl, J.; Kell, D. B.; Knowles, J. Multiobjective Optimization in Bioinformatics and Computational Biology. IEEE/ACM Trans. Comput. Biol. Bioinf. 2007, 4, 279–292.

38. Alberto, I.; Azcarate, C.; Mallor, F. & Mateo, P.M., Multiobjective Evolutionary Algorithms. Pareto Rankings.

Page 50: 2009 MSc Presentation for Parallel-MEGA

50

Bibliography

39. Yong Liang, Kwong-Sak Laung, Tony Shu am Mok, A Novel Evolutionary Drug Scheduling Model in Cancer Cemotherapy, 2004

40. H. S. Lopes, Evolutionary Algorithms for the Protein Folding Problem: A Review and Current Trends, 2008

41. C. M. V. Benitez, H. S. Lopes, A Parallel Genetic Algorithm for Protein Folding Prediction Using the 3D-HP Side Chain Model, 2009

42. E. Alba, Parallel Evolutionary Algorithms can achieve super-linear Performance, University of Malaga.

43. N. Matloff, F. Hsu Tutorial on Threads Programming with Python, 2007.

44. C. Nicolaou, C. Kannas, C. Pattichis, Optimal graph Design Using A Knowledge-driven Multi-objective Evolutionary Graph Algorithm In Proceedings of the Ninth International Conference on Information Techonology and Applications in Biomedicine, Larnaca Cyprus, November 5 - 7, 2009