Applications Of Multi-Objective Evolutionary Algorithms (Advances in Natural Computation)

Applications of Multi-ObjectiveEvolutionary Algorithms

ADVANCES IN NATURAL COMPUTATION

Series Editor: Xin Yao (The University of Birmingham, UK)

Published

Vol. 2: Recent Advances in Simulated Evolution and LearningEds: Kay Chen Tan, Meng Hiot Lint, Xin Yao & Lipo Wang

Applications of Multi-ObjectiveEvolutionary Algorithms

A d v a n c e s I n N a t u r a l C o m p u t a t i o n - V o l . 1

editors

Carlos A Coello Coello(CINVESTAV-IPN, Mexico)

Gary B Lamont(Air Force Institute of Technology, Wright-Patterson AFB, USA)

World ScientificN E W J E R S E Y • L O N D O N • S I N G A P O R E • B E I J I N G • S H A N G H A I • H O N G K O N G • T A I P E I • C H E N N A I

Published by

World Scientific Publishing Co. Pte. Ltd.

5 Toh Tuck Link, Singapore 596224

USA office: 27 Warren Street, Suite 401-402, Hackensack, NJ 07601

UK office: 57 Shelton Street, Covent Garden, London WC2H 9HE

British Library Cataloguing-in-Publication DataA catalogue record for this book is available from the British Library.

APPLICATIONS OF MULTI-OBJECTIVE EVOLUTIONARY ALGORITHMSAdvances in Natural Computation — Vol. 1

Copyright © 2004 by World Scientific Publishing Co. Pte. Ltd.

All rights reserved. This book, or parts thereof, may not be reproduced in any form or by any means,electronic or mechanical, including photocopying, recording or any information storage and retrievalsystem now known or to be invented, without written permission from the Publisher.

For photocopying of material in this volume, please pay a copying fee through the CopyrightClearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, USA. In this case permission tophotocopy is not required from the publisher.

ISBN 981-256-106-4

Printed in Singapore by World Scientific Printers (S) Pte Ltd

FOREWORD

Computer science is playing an increasingly important role in many otherfields, such as biology, physics, chemistry, economics and sociology. Manydevelopments in these classical fields rely crucially on advanced comput-ing technologies. At the same time, these fields have inspired new ideasand novel paradigms of computing, such as evolutionary, neural, molecu-lar and quantum computation. Natural computation refers to the studyof computational systems that use ideas and draw inspiration from naturalsystems, whether they be biological, ecological, molecular, neural, quantumor social.

World Scientific Publishing Co. publishes an exciting book series on"Advances in Natural Computation." The series aims at serving as a cen-tral source of reference for the theory and applications of natural compu-tation, establishing the state of the art, disseminating the latest researchdiscoveries, and providing potential textbooks to senior undergraduate andpostgraduate students. The series publishes monographs, edited volumesand lecture notes on a wide range of topics in natural computation. Ex-amples of possible topics include, but are not limited to, evolutionary al-gorithms, evolutionary games, evolutionary economics, evolvable hardware,neural networks, swarm intelligence, quantum computation, molecular com-putation, ecological informatics, etc.

This volume edited by Drs. Carlos A. Coello Coello and Gary B. Lam-ont represents an excellent start to this leading book series. It addresses afast growing field of multi-objective optimization, especially its applicationsin many disciplines, from engineering design to spectroscopic data analysis,from groundwater monitor to regional planning, from autonomous vehiclenavigation to polymer extrusion, and from bioinformatics to computational

V

vi Foreword

finance. It complements the journal papersa beautifully and should be onthe bookshelf of anyone who is interested in multi-objective optimizationand its diverse applications.

The two editors of this volume are both leading experts on evolutionarymulti-objective optimization. They have done an outstanding job in puttingtogether the most comprehensive book on the applications of evolutionarymulti-objective optimization. I hope all readers will enjoy this volume asmuch as I do!

The upcoming volume in this book series on "Recent Advances in Sim-ulated Evolution and Learning" will appear soon. If you are interested inwriting or editing a volume for this book series, please get in touch withthe Series Editor.

Xin YaoSeries EditorAdvances in Natural ComputationAugust 2004

aCarlos A. Coello Coello, Special Issue on Evolutionary Multi-objective Optimization,IEEE Transactions on Evolutionary Computation, 7(2), April 2003.

PREFACE

The intent of this book is to present a variety of multi-objective problems(MOPs) which have been solved using multi-objective evolutionary algo-rithms (MOEAs). Due to obvious space constraints, the set of applicationsincluded in the book is relatively small. However, the editors believe thatsuch a set is representative of the current trends among both researchersand practitioners across many disciplines.

This book aims not only to present a representative sampling of real-world MOPs currently being tackled by practitioners, but also to providesome insights into the different aspects related to the use of MOEAs inreal-world applications. The reader should find the material particularlyuseful in analyzing the pragmatic (and sometimes peculiar) point of viewof each contributor regarding how to choose a certain MOEA and how tovalidate the results using metrics and statistics.

Another aspect that is worth addressing is the limited variety of MOEAsadopted throughout this book which is not as diverse as those presentedin the literature. This indicates a certain degree of maturity within thisresearch community, and at the same time defines some important cur-rent trends among practitioners. By reading the chapters, it is evident thatcertain MOEAs that some researchers in the field might consider "old-fashioned" (e.g., the Niched Pareto Genetic Algorithm) continue to be usedby various practitioners. At the same time, it is evident that other "mod-ern" MOEAs (e.g., the Non-dominated Sorting Genetic Algorithm II) withavailable software are becoming increasingly popular. As MOEA softwareevolves and each incorporates an increasingly larger variety of operators,generic MOEA software should be available. For example, such software iscurrently being integrated into various optimization packages incorporat-ing a variety of search techniques. Of course, the MOEA discipline contin-

vii

viii Preface

ues to evolve more sophisticated variants, hybridization techniques, uniquemethodologies depending on the problem domain, and use of efficient par-allel computation, with application to an increasing broader class of high-dimensional complex problems.

The spectrum of real-world optimization MOPs dealt with in this bookinclude, among others, aircraft design, robot planning, identification of in-teresting qualitative features in biological sequences, circuit design, pro-duction system control, city planning, ecological system management, andbioremediation of chemical pollution. Thus the organization of the bookis structured around engineering, biology, chemistry, physics, and manage-ment disciplines. Throughout this book, the reader should find not onlyproblems with different degrees of complexity, but also with different prac-tical requirements, user constraints, and a variety of MOEA solution ap-proaches.

We would like to thank all the contributors for providing their insightsregarding the use of MOEAs in solving real-world multi-objective problems.Without their serious consideration, contemplation, and devoted efforts ingeneral, the discipline of MOEAs would not have evolved as well as thisbook. Such activity makes MOEAs a viable approach in finding effectiveand efficient solutions to complex MOPs. Observe that the contributors arefrom many countries reflecting the international interest in MOEA appli-cations and the interdisciplinary nature of optimization research.

As indicated, this book presents a collection of MOEA applicationswhich provide the professional and the practitioner with direction to achiev-ing "good" results in their selected problem domain. For the beginner, theIntroductory chapter and the variety of MOEA application chapters shouldprovide an understanding of generic MOPs and MOEA parameter and op-erator selection leading to acceptable results. For the expert, the varietyof MOP applications generates a wider understanding of MOEA operatorselection and insight to the path leading to problem solutions.

Additional applications and theoretical MOEA papers can be foundat the Evolutionary Multi-Objective Optimization (EMOO) Repository in-ternet web site at http://delta.cs.cinvestav.mx/~ccoello/EMOO/withmirrors at http://www.lania.mx/~ccoello/EMOO/ and athttp://neo.lcc.uma.es/emoo/. As of mid 2004, the EMOO Repository

Preface ix

contained over 1700 bibliographic references, including more than 100 PhDtheses, and over 1000 conference papers and 400 journal papers. However,the EMOO Repository is continually being updated. There is not onlya large collection of bibliographic references (many of them electronicallyavailable), but also contains public-domain MOP and MOEA software andsample test problems as well as some other useful information which allowsone to start working in this exciting research field.

The general organization of the book is based on the types of applica-tions considered. Chapter 1 provides some preliminary material intendedfor those not familiar with the basic concepts and terminology adopted inevolutionary multi-objective optimization. This first chapter also providesa brief description of each of the other 29 chapters that integrate this book.These 29 chapters are divided in four parts. The first part is the largest andit consists of engineering applications (e.g., civil, mechanical, aeronautical,and chemical engineering, among others). This first part includes chapters2 to 13. The second part consists of scientific applications (e.g., computerscience, bioinformatics and physics, among others) and includes chapters 14to 19. The third part consists of industrial applications (e.g., design, man-ufacture, packing and scheduling, among others) and includes chapters 20to 24. The fourth and last part consists of miscellaneous applications suchas data mining, finance and management. This last part includes chapters25 to 30.

The first editor gratefully acknowledges the support obtained fromCINVESTAV-IPN and from the NSF-CONACyT project 42435-Y. He alsothanks Dr. Luis Gerardo de la Fraga for his continuous support and ErikaHernandez Luna for her valuable help during the preparation of this book.The second author acknowledges the support of his graduate students in-cluding Rick Day and Mark Kleeman. We also acknowledge the use of anacademic license of the Word2Tex™ converter (developed by Chikrii Soft-lab) to convert some of the chapters submitted in MS Word™ to I TfrjX 2E,which is the tool that we adopted to process the entire manuscript.

The editors thank Steven Patt, from World Scientific, who was veryprofessional, incredibly helpful and always replied promptly to all of theeditors' queries. The editors also thank Prof. Xin Yao for deciding to in-clude this book within his Advances in Natural Computation series.

x Preface

Finally, the editors would like to thank their spouses for letting themspend the many hours on the preparation of the manuscript. Also, we thankthe many students, researchers and practitioners working in this field. Theyhave contributed directly and indirectly a large variety of ideas that con-tinue to expand this interdisciplinary field of multi-objective evolutionarycomputation. Their constant efforts and contributions have made possiblethis book and innovative publications that we expect to see in the years tocome.

Carlos A. Coello CoelloGary B. LamontAugust 2004

CONTENTS

Foreword v

Preface vii

1 An Introduction to Multi-Objective EvolutionaryAlgorithms and Their Applications 11.1 Introduction 11.2 Basic Concepts 31.3 Basic Operation of a MOEA 41.4 Classifying MOEAs 6

1.4.1 Aggregating Functions 61.4.2 Population-Based Approaches 71.4.3 Pareto-Based Approaches 8

1.5 MOEA Performance Measures 111.6 Design of MOEA Experiments 14

1.6.1 Reporting MOEA Computational Results 151.7 Layout of the Book 16

1.7.1 Part I: Engineering Applications 161.7.2 Part II: Scientific Applications 191.7.3 Part III: Industrial Applications 201.7.4 Part IV: Miscellaneous Applications 21

1.8 General Comments 22References 23

2 Applications of Multi-Objective EvolutionaryAlgorithms in Engineering Design 292.1 Introduction 292.2 Multi-Objective Evolutionary Algorithm 31

2.2.1 Algorithms 33

xi

xii Contents

2.3 Examples 362.3.1 Design of a Welded Beam 362.3.2 Preliminary Design of Bulk Carrier 402.3.3 Design of Robust Airfoil 46

2.4 Summary and Conclusions 50References 52

3 Optimal Design of Industrial ElectromagneticDevices: A Multiobjective Evolutionary Approach 533.1 Introduction 533.2 The Algorithms 54

3.2.1 Non-Dominated Sorting Evolution StrategyAlgorithm (NSESA) 553.2.1.1 Pareto Gradient Based Algorithms (PGBA)

and Hybrid Strategies 583.2.1.2 Pareto Evolution Strategy Algorithm

(PESTRA) 593.2.1.3 Multi Directional Evolution Strategy

Algorithm (MDESTRA) 613.3 Case Studies 61

3.3.1 Shape Design of a Shielded Reactor 623.3.1.1 Direct Problem 623.3.1.2 Inverse Problem 633.3.1.3 Sample-and-Rank Approach 653.3.1.4 Optimization Results 67

3.3.2 Shape Design of an Inductor for Transverse-Flux-Heating of a Non-Ferromagnetic Strip 693.3.2.1 Direct Problem 703.3.2.2 Inverse Problem 723.3.2.3 Sample-and-Rank Approach 733.3.2.4 Optimization Results 73

3.4 Conclusions 75References 75

4 Groundwater Monitoring Design: A Case StudyCombining Epsilon Dominance Archiving andAutomatic Parameterization for the NSGA-II 794.1 Introduction 794.2 Prior Work 81

Contents xiii

4.3 Monitoring Test Case Problem 834.3.1 Test Case Overview 834.3.2 Problem Formulation 83

4.4 Overview of the e-NSGA-II Approach 844.4.1 Searching with the NSGA-II 864.4.2 Archive Update 874.4.3 Injection and Termination 89

4.5 Results 914.6 Discussion 974.7 Conclusions 97References 98

5 Using a Particle Swarm Optimizer with aMulti-Objective Selection Scheme to DesignCombinational Logic Circuits 1015.1 Introduction 1015.2 Problem Statement 1025.3 Our Proposed Approach 1045.4 Use of a Multi-Objective Approach 1075.5 Comparison of Results 109

5.5.1 Example 1 1095.5.2 Example 2 1105.5.3 Example 3 1125.5.4 Example 4 1145.5.5 Example 5 1175.5.6 Example 6 118

5.6 Conclusions and Future Work 120Acknowledgements 122References 122

6 Application of Multi-Objective EvolutionaryAlgorithms in Autonomous Vehicles Navigation 1256.1 Introduction 1266.2 Autonomous Vehicles 127

6.2.1 Experimental Setup 1276.2.2 Vehicle Model 1286.2.3 Relative Sensor Models 129

6.2.3.1 Steering Encoder 1296.2.3.2 Wheel Encoder 129

xiv Contents

6.2.4 Absolute Sensor Models 1306.2.4.1 Global Positioning Systems 1306.2.4.2 Inertial Measurement Unit 131

6.2.5 Simulation and Measurement of the Vehicle State . . 1316.2.6 Prediction of the Vehicle State 131

6.3 Parameter Identification of Autonomous Vehicles 1336.3.1 Problem Formulation 1336.3.2 A General Framework for Searching Pareto-Optimal

Solutions 1346.3.3 Selection of a Single Solution by CoGM 136

6.4 Multi-Objective Optimization 1386.4.1 Evaluation of Functions 138

6.4.1.1 Rank Function 1386.4.1.2 Fitness Function 139

6.4.2 Search Methods 1396.4.2.1 MCEA 1396.4.2.2 MOGM 140

6.5 Application of Parameter Identification of an AutonomousVehicle 141

6.6 Conclusions 1486.7 Acknowledgement 151References 151

7 Automatic Control System Design via aMultiobjective Evolutionary Algorithm 1557.1 Introduction 1557.2 Performance Based Design Unification and Automation . . 158

7.2.1 The Overall Design Architecture 1597.2.2 Control System Formulation 1607.2.3 Performance Specifications 161

7.2.3.1 Stability 1617.2.3.2 Step Response Specifications 1627.2.3.3 Disturbance Rejection 1627.2.3.4 Robust Stability 1627.2.3.5 Actuator Saturation 1637.2.3.6 Minimal Controller Order 164

7.3 An Evolutionary ULTIC Design Application 1657.4 Conclusions 172References 174

Contents xv

8 The Use of Evolutionary Algorithms to SolvePractical Problems in Polymer Extrusion 1778.1 Introduction 1778.2 Polymer Extrusion 178

8.2.1 Single Screw Extrusion 178

8.2.2 Co-Rotating Twin-Screw Extrusion 1798.2.3 Optimization Characteristics 183

8.3 Optimization Algorithm 184

8.3.1 Multi-Objective Optimization 1848.3.2 Reduced Pareto Set Genetic Algorithm with Elitism

(RPSGAe) 186

8.3.3 Travelling Salesman Problem 187

8.4 Results and Discussion 1898.4.1 Single Screw Extrusion 1898.4.2 Twin-Screw Extrusion 194

8.5 Conclusions 196Acknowledgments 197References 197

9 Evolutionary Multi-Objective Optimization of Trusses 2019.1 Introduction 2019.2 Related Work 2029.3 ISPAES Algorithm 204

9.3.1 Inverted "ownership" 2079.3.2 Shrinking the Objective Space 207

9.4 Optimization Examples 212

9.4.1 Optimization of a 49-bar Plane Truss 2129.4.1.1 The 49-bar Plane Truss as a Single-Objective

Optimization Problem with Constraints . . 2129.4.1.2 The 49-bar Plane Truss as a Multi-Objective

Optimization Problem with Constraints . . 2159.4.2 Optimization of a 10-bar Plane Truss 215

9.4.2.1 The 10-bar Plane Truss as a Single-ObjectiveOptimization Problem with Constraints . . 216

9.4.2.2 The 10-bar Plane Truss as a Multi-ObjectiveOptimization Problem with Constraints . . 217

9.4.3 Optimization of a 72-bar 3D Structure 217

xvi Contents

9.4.3.1 The 72-bar 3D Structure in ContinuousSearch Space as a Single-Objective Opti-mization Problem with Constraints 219

9.4.3.2 The 72-bar 3D Structure in Discrete SearchSpace as a Single-ObjectiveOptimization Problem with Constraints . . 222

9.5 Final Remarks and Future Work 222Acknowledgments 223References 223

10 City and Regional Planning via a MOEA:Lessons Learned 22710.1 The Traditional Approach 22710.2 The MOEA Approach 22910.3 City Planning: Provo and Orem 23110.4 Regional Planning: The WFMR 23510.5 Coordinating Regional and City Planning 23810.6 Conclusions 239Acknowledgments 240References 240

11 A Multi-Objective Evolutionary Algorithm for theCovering Tour Problem 24711.1 Introduction 24811.2 The Covering Tour Problem 251

11.2.1 The Mono-Objective Covering Tour Problem . . . . 25111.2.2 The Bi-Objective Covering Tour Problem 25211.2.3 Optimization Methods 253

11.2.3.1 A Heuristic Method 25311.2.3.2 An Exact Method 254

11.3 A Multi-Objective Evolutionary Algorithm for theBi-Objective Covering Tour Problem 25511.3.1 General Framework 25511.3.2 Solution Coding 25611.3.3 Genetic Operators 257

11.3.3.1 The Crossover Operator 25711.3.3.2 The Mutation Operator 258

11.4 Computational Results 25811.5 Conclusions and Outlooks 260

Contents xvii

Acknowledgement 261References 261

12 A Computer Engineering BenchmarkApplication for Multiobjective Optimizers 26912.1 Introduction 26912.2 Packet Processor Design 271

12.2.1 Design Space Exploration 27212.2.2 Basic Models and Methods 274

12.3 Software Architecture 28112.3.1 General Considerations 28112.3.2 Interface Description 283

12.4 Test Cases 28412.4.1 Problem Instances 28412.4.2 Simulation Results 286

12.5 Summary 289Acknowledgments 292References 292

13 Multiobjective Aerodynamic Design andVisualization of Supersonic Wings by UsingAdaptive Range Multiobjective Genetic Algorithms 29513.1 Introduction 29513.2 Adaptive Range Multiobjective Genetic Algorithms . . . . 29713.3 Multiobjective Aerodynamic Optimization 300

13.3.1 Furmulation of Optimization 30013.3.2 CFD Evaluation 30213.3.3 Overview of Non-Dominated Solutions 303

13.4 Data Mining by Self-Organizing Map 30513.4.1 Neural Network and SOM 30513.4.2 Cluster Analysis 30713.4.3 Visualization of Design Tradeoffs: SOM of Tradeoffs 30813.4.4 Data Mining of Design Space: SOM of Design

Variables 31013.5 Conclusions 311Acknowledgements 312References 313

xviii Contents

14 Applications of a Multi-Objective GeneticAlgorithm in Chemical and Environmental Engineering 31714.1 Introduction 31714.2 Physical Problem 31914.3 Genetic Algorithm 32014.4 Problem Formulation 32214.5 Conclusions 337References 338

15 Multi-Objective Spectroscopic Data Analysis ofInertial Confinement Fusion Implosion Cores:Plasma Gradient Determination 34115.1 Introduction 34215.2 Self-Consistent Analysis of Data from X-ray Images and

Line Spectra 34415.3 A Niched Pareto Genetic Algorithm for Multi-Objective

Spectroscopic Data Analysis 34715.4 Test Cases 34915.5 Application to Direct-Drive Implosions at GEKKO XII . . 35415.6 Application to Indirect-Drive Implosions at OMEGA . . . 35715.7 Conclusions 359Acknowledgments 361References 361

16 Application of Multiobjective EvolutionaryOptimization Algorithms in Medicine 36516.1 Introduction 36516.2 Medical Image Processing 366

16.2.1 Medical Image Reconstruction 36716.3 Computer Aided Diagnosis 369

16.3.1 Optimization of Diagnostic Classifiers 37016.3.2 Rules-Based Atrial Disease Diagnosis 370

16.4 Treatment Planning 37216.4.1 Brachytherapy 373

16.4.1.1 Dose Optimization for High Dose RateBrachytherapy 373

16.4.1.2 Inverse Planning for HDR Brachytherapy . 37416.4.2 External Beam Radiotherapy 376

Contents xix

16.4.2.1 Geometrical Optimization of BeamOrientations 377

16.4.2.2 Intensity Modulated Beam RadiotherapyDose Optimization 379

16.4.3 Cancer Chemotherapy 38116.5 Data Mining 382

16.5.1 Partial Classification 38316.5.2 Identification of Multiple Gene Subsets 385


17 On Machine Learning with MultiobjectiveGenetic Optimization 39317.1 Introduction 39317.2 An Overview 396

17.2.1 Machine Learning 39617.2.2 Generalization 39817.2.3 Multiobjective Evolutionary Algorithms (MOEA) &

Real-World Applications (RWA) 40117.2.3.1 Achieving Diversity 40317.2.3.2 Monitoring Convergence 40317.2.3.3 Avoiding Local Convergence 405

17.3 Problem Formulation 40617.4 MOEA for Partitioning 410

17.4.1 The Algorithm 41117.4.2 Chromosome Representation 41217.4.3 Genetic Operators 41217.4.4 Constraints & Heuristics 41217.4.5 Convergence 413

17.5 Results and Discussion 41517.6 Summary & Future Work 419Acknowledgements 421References 421

18 Generalized Analysis of Promoters: A Method forD N A Sequence Description 42718.1 Introduction 42818.2 Generalized Clustering 42918.3 Problem: Discovering Promoters in DNA Sequences . . . . 432

xx Contents

18.4 Biological Sequence Description Methods 43418.5 Experimental Algorithm Evaluation 43818.6 Concluding Remarks 442Appendix 443References 443

19 Multi-Objective Evolutionary Algorithms forComputer Science Applications 45119.1 Introduction 45119.2 Combinatorial MOP Functions 45219.3 MOP NPC Examples 453

19.3.1 Multi-Objective Quadratic Assignment Problem . . 45319.3.1.1 Literary QAP Definition 45519.3.1.2 Mathematical QAP Definition 45519.3.1.3 General mQAP 45519.3.1.4 Mathematical mQAP 45519.3.1.5 Mapping QAP to MOEA 457

19.3.2 MOEA mQAP Results and Analysis 45919.3.2.1 Design of mQAP Experiments and Testing . 45919.3.2.2 QAP Analysis 460

19.3.3 Modified Multi-Objective Knapsack Problem(MMOKP) 465

19.3.4 MOEA MMOKP Testing and Analysis 47119.4 MOEA BB Conjectures for NPC Problems 47619.5 Future Directions 478References 478

20 Design of Fluid Power System Using a MultiObjective Genetic Algorithm 48320.1 Introduction 48320.2 The Multi-Objective Optimization Problem 48520.3 Multi-Objective Genetic Algorithms 486

20.3.1 The Multi-Objective Struggle GA 48720.3.2 Genome Representation 48920.3.3 Similarity Measures 489

20.3.3.1 Attribute Based Distance Function 49020.3.3.2 Phenotype Based Distance Function . . . . 49020.3.3.3 Real Number Distance 49020.3.3.4 Catalog Distance 491

Contents xxi

20.3.3.5 Overall Distance 49220.3.4 Crossover Operators 493

20.4 Fluid Power System Design 49420.4.1 Optimization Results 496

20.5 Mixed Variable Design Problem 49820.5.1 Component Catalogs 49920.5.2 Optimization Results 499

20.6 Discussion and Conclusions 500References 502

21 Elimination of Exceptional Elements in CellularManufacturing Systems Using Multi-ObjectiveGenetic Algorithms 50521.1 Introduction 50621.2 Multiple Objective Optimization 51021.3 Development of the Multi-Objective Model for Elimination

ofEEs 51121.3.1 Assumptions 51121.3.2 The Set of Decision Criteria 51121.3.3 Problem Formulation 511

21.3.3.1 Notation 51121.3.3.2 The Objective Functions 51221.3.3.3 The Constraints 51421.3.3.4 The Multi-Objective Optimization Problem

(MOP) 51421.3.4 A Numerical Example 515

21.4 The Proposed MOGA 51721.4.1 Pseudocode for the Proposed MOGA 51821.4.2 Fitness Calculation 51921.4.3 Selection 52021.4.4 Recombination 52021.4.5 Updating the Elite Set 52021.4.6 Stopping Criteria 520

21.5 Parameter Setting 52121.6 Experimentation 52221.7 Conclusion 525References 526

xxii Contents

22 Single-Objective and Multi-Objective EvolutionaryFlowshop Scheduling 52922.1 Introduction 52922.2 Permutation Flowshop Scheduling Problems 53122.3 Single-Objective Genetic Algorithms 532

22.3.1 Implementation of Genetic Algorithms 53222.3.2 Comparison of Various Genetic Operations 53522.3.3 Performance Evaluation of Genetic Algorithms . . . 539

22.4 Multi-Objective Genetic Algorithms 54122.4.1 NSGA-II Algorithm 54222.4.2 Performance Evaluation of the NSGA-II Algorithm . 54422.4.3 Extensions to Multi-Objective Genetic Algorithms . 548


23 Evolutionary Operators Based on Elite Solutions forBi-Objective Combinatorial Optimization 55523.1 Introduction 55623.2 MOCO Problems and Solution Sets 55723.3 An Evolutionary Heuristic for Solving biCO Problems . . . 559

23.3.1 Overview of the Heuristic 55923.3.2 The Initial Population 56123.3.3 Bound Sets and Admissible Areas 56223.3.4 The Genetic Map 56323.3.5 The Crossover Operator 56423.3.6 The Path-Relinking Operator 56523.3.7 The Local Search Operator 566

23.4 Application to Assignment and Knapsack Problems withTwo Objectives 56723.4.1 Problem Formulation 56723.4.2 Experimental Protocol 568

23.5 Numerical Experiments with the Bi-Objective AssignmentProblem 56923.5.1 Minimal Complete Solution Sets and Initial Elite

Solution Set 56923.5.2 Our Results Compared with Those Existing in the

Literature 57123.6 Numerical Experiments with the Bi-Objective

Knapsack Problem 573

Contents xxiii

23.6.1 Minimal Complete Solution Sets and the Initial EliteSolution set 574

23.6.2 Our Results compared with Those Existing in theLiterature 575

23.7 Conclusion and Perspectives 575References 577

24 Multi-Objective Rectangular Packing Problem 58124.1 Introduction 58224.2 Formulation of Layout Problems 583

24.2.1 Definition of RP 58324.2.2 Multi-Objective RP 583

24.3 Genetic Layout Optimization 58424.3.1 Representations 585

24.3.1.1 Sequence-Pair 58624.3.1.2 Encoding System 586

24.3.2 GA Operators 58724.3.2.1 Placement-Based Partially Exchanging

Crossover 58924.3.2.2 Mutation Operator 589

24.4 Multi-Objective Optimization Problems by GeneticAlgorithms and Neighborhood Cultivation GA 58924.4.1 Multi-Objective Optimization Problems and Genetic

Algorithm 58924.4.2 Neighborhood Cultivation Genetic Algorithm . . . . 591

24.5 Numerical Examples 59324.5.1 Parameters of GAs 59424.5.2 Evaluation Methods 594

24.5.2.1 Sampling of the Pareto Frontier Lines ofIntersection(//,/) 594

24.5.2.2 Maximum, Minimum and Average Values ofEach Object of Derived Solutions (IMM A ) • 595

24.5.3 Results 59524.5.3.1 Layout of the Solution 59624.5.3.2 AMI33 59724.5.3.3 rdm500 598

24.6 Conclusion 600References 600

xxiv Contents

25 Multi-Objective Algorithms for Attribute Selectionin Data Mining 60325.1 Introduction 60325.2 Attribute Selection 60525.3 Multi-Objective Optimization 60625.4 The Proposed Multi-Objective Methods for Attribute

Selection 60825.4.1 The Multi-Objective Genetic Algorithm (MOGA) . 609

25.4.1.1 Individual Encoding 61025.4.1.2 Fitness Function 61025.4.1.3 Selection Methods and Genetic Operators . 610

25.4.2 The Multi-Objective Forward Sequential SelectionMethod (MOFSS) 611

25.5 Computational Results 61225.5.1 Results for the "Return All Non-Dominated

Solutions" Approach 61525.5.2 Results for the "Return the 'Best' Non-Dominated

Solution" Approach 61625.5.3 On the Effectiveness of the Criterion to Choose the

"Best" Solution 62025.6 Conclusions and Future Work 623References 624

26 Financial Applications of Multi-ObjectiveEvolutionary Algorithms: Recent Developmentsand Future Research Directions 62726.1 Introduction 62726.2 A Justification for MOEAs in Financial Applications . . . 62826.3 Selected Financial Applications of MOEAs 631

26.3.1 Portfolio Selection Problems 63126.3.2 Vederajan et al 63326.3.3 Lin et al 63626.3.4 Fieldsend k Singh 63926.3.5 Schlottmann & Seese 642

26.4 Conclusion and Future Research Directions 64626.5 Acknowledgement 649References 649

Contents xxv

27 Evolutionary Multi-Objective OptimizationApproach to Constructing Neural NetworkEnsembles for Regression 65327.1 Introduction 65327.2 Multi-Objective Optimization of Neural Networks 655

27.2.1 Parameter and Structure Representation of theNetwork 655

27.2.2 Objectives in Network Optimization 65627.2.3 Mutation and Learning 65827.2.4 Elitist Non-Dominated Sorting and Crowded

Tournament Selection 65927.3 Selecting Ensemble Members 65927.4 Case Studies 661

27.4.1 Experimental Settings 66127.4.2 Results on the Ackley Function 66127.4.3 Results on the Macky-Glass Function 666

27.5 Discussions and Conclusions 669Acknowledgements 672References 672

28 Optimizing Forecast Model Complexity UsingMulti-Objective Evolutionary Algorithms 67528.1 Introduction 67528.2 Artificial Neural Networks 67728.3 Optimal Model Complexity 681

28.3.1 Early Stopping 68128.3.2 Weight Decay Regularization and Summed Penalty

Terms 68128.3.3 Node and Weight Addition/Deletion 68228.3.4 Problems with These Methods 682

28.4 Using Evolutionary Algorithms to Discover the Complexity/Accuracy Trade-Off 68428.4.1 Pareto Optimally 68428.4.2 Extent, Resolution and Density of Estimated Pareto

Set 68528.4.3 The Use of EMOO 68728.4.4 A General Model 689

28.4.4.1 mutateQ 68928.4.4.2 weightadjustQ 691

xxvi Contents

28.4.4.3 unitadjustQ 69128.4.4.4 The Elite Archive 69128.4.4.5 replaceQ 692

28.4.5 Implementation and Generalization 69228.5 Empirical Validation 693

28.5.1 Data 69428.5.2 Model Parameters 694

28.6 Results 69528.7 Discussion 697Acknowledgments 697References 698

29 Even Flow Scheduling Problems in Forest Management 70129.1 Benchmark Problem 701

29.1.1 Introduction 70129.1.2 Methodology 70329.1.3 Results and Discussion 703

29.1.3.1 Visual Interpretation 70329.1.3.2 Performance Indices 70429.1.3.3 Statistical Approaches 70629.1.3.4 Implications for Forest Management

Problems 70729.2 Applying Single Objective Genetic Algorithms to a

Real-World Problem 70829.2.1 Introduction 70829.2.2 Methodology 709

29.2.2.1 Input Data 70929.2.2.2 Implementation 709

29.2.3 Results and Discussion 71029.2.4 Conclusion 712

29.3 Applying NSGA-II: A Truly Bi-Objective Approach . . . . 71529.3.1 Introduction 71529.3.2 Methodology 71529.3.3 Results 715

29.3.3.1 Effect of Encoding on the Spread and Pareto-Optimality 715

29.3.3.2 Comparing the Single and MultipleObjective Genetic Algorithm 716

29.3.3.3 Effect of Population Size on Solution Quality 718

Contents xxvii

29.3.3.4 Validity of the Plans 71929.3.4 Conclusion 722

29.4 Speeding Up the Optimization Process 72329.4.1 Introduction 72329.4.2 Methodology 72329.4.3 Results and Discussion 72329.4.4 Conclusions 724

Acknowledgements 724References 724

30 Using Diversity to Guide the Search inMulti-Objective Optimization 72730.1 Introduction 72730.2 Diversity in Multi-Objective Optimization 72930.3 Maintaining Diversity in Multi-Objective Optimization . . 730

30.3.1 Weighted Vectors 73130.3.2 Fitness Sharing 73230.3.3 Crowding/Clustering Methods 73230.3.4 Restricted Mating 73330.3.5 Relaxed Forms of Dominance 73330.3.6 Helper Objectives 73530.3.7 Objective Oriented Heuristic Selection 73530.3.8 Using Diversity to Guide the Search 736

30.4 The Two-Objective Space Allocation Problem 73630.4.1 Problem Description 73730.4.2 Measuring Diversity of Non-Dominated Sets 739

30.5 Using Diversity to Guide the Search 74030.5.1 Diversity as a Helper Objective 74030.5.2 Diversity to Control Exploration and Exploitation . 74130.5.3 The Population-Based Hybrid Annealing Algorithm 742

30.6 Experiments and Results 74430.6.1 Experimental Setting 74430.6.2 Discussion of Obtained Results 745

30.7 Summary 747References 748

Index 753

CHAPTER 1

AN INTRODUCTION TO MULTI-OBJECTIVEEVOLUTIONARY ALGORITHMS AND THEIR

APPLICATIONS

Carlos A. Coello Coello1 and Gary B. Lamont2

1CINVESTAV-IPNEvolutionary Computation Group

Dpto. de Ing. Elect./Secc. ComputationAv. IPN No. 2508, Col. San Pedro Zacatenco

Mexico, D.F. 07300, MEXICOE-mail: [email protected]

2 Department of Electrical and Computer EngineeringGraduate School of Engineering and Management

Air Force Institute of TechnologyWPAFB, Dayton, Ohio, 45433, USA

E-mail: [email protected]

This chapter provides the basic concepts necessary to understand therest of this book. The introductory material provided here includes somebasic mathematical definitions related to multi-objective optimization, abrief description of the most representative multi-objective evolutionaryalgorithms in current use and some of the most representative work onperformance measures used to validate them. In the final part of thischapter, we provide a brief description of each of the chapters containedwithin this volume.

1.1. Introduction

Early analogies between the mechanism of natural selection and a learning(or optimization) process led to the development of the so-called "evo-lutionary algorithms" (EAs)3, in which the main goal is to simulate theevolutionary process in a computer. The use of EAs for optimization taskshas become very popular in the last few years, spanning virtually everyapplication domain22 '44 '25 '4.

From the several emergent research areas in which EAs have become in-

1

2 Carlos A. Coello Coello and Gary B. Lamont

creasingly popular, multi-objective optimization has had one of the fastestgrowing in recent years12. A multi-objective optimization problem (MOP)differs from a single-objective optimization problem because it containsseveral objectives that require optimization. When optimizing a single-objective problem, the best single design solution is the goal. But for multi-objective problems, with several (possibly conflicting) objectives, there isusually no single optimal solution. Therefore, the decision maker is requiredto select a solution from a finite set by making compromises. A suitable solu-tion should provide for acceptable performance over all objectives40. Manyfields continue to address complex real-world multi-objective problems us-ing search techniques developed within computer engineering, computerscience, decision sciences, and operations research10. The potential of evo-lutionary algorithms for solving multi-objective optimization problems washinted as early as the late 1960s by Rosenberg47. However, the first ac-tual implementation of a multi-objective evolutionary algorithm (MOEA)was produced until the mid-1980s48'49. Since then, a considerable amountof research has been done in this area, now known as evolutionary multi-objective optimization (EMOO)12. The growing importance of this field isreflected by a significant increment (mainly during the last ten years) oftechnical papers in international conferences and peer-reviewed journals,special sessions in international conferences and interest groups in the In-ternet13.

The main motivation for using EAs to solve multi-objective optimizationproblems is because EAs deal simultaneously with a set of possible solu-tions (the so-called population) which allows us to find several members ofthe Pareto optimal set in a single run of the algorithm, instead of havingto perform a series of separate runs as in the case of the traditional math-ematical programming techniques40. Additionally, EAs are less susceptibleto the shape or continuity of the Pareto front (e.g., they can easily dealwith discontinuous and concave Pareto fronts), whereas these two issuesare known problems with mathematical programming techniques7'18'12'61.

This monograph attempts to present an extensive variety of high-dimensional MOPs and their acceptable statistical solutions using MOEAsas exercised by numerous researchers. The intent of our discussion then isto promote a wider understanding and an ability to use MOEAs in order

bThe first author maintains an EMOO repository with over 1700 bibliograph-ical entries at: h t tp: / /del ta .cs .cinvestav.mx/'ccoello/EMOO, with a mirror athttp://www.lania.mx/-ccoello/EMOO/

An Introduction to MOEAs and Their Applications 3

to find "good" solutions in a wide spectrum of high-dimensional real-worldapplications.

1.2. Basic Concepts

In order to provide a common basis for understanding the rest of this book,we provide next a set of basic definitions normally adopted both in single-objective and in multi-objective optimization12:

Definition 1 (Global Minimum): Given a function f : 0, C S = R" -»K, 0 ^ 0, for x € f) the value f* = f(x*) > —oo is called a global minimumif and only if

Vfefi: /(£*)</(£)• (l)

Then, x* is the global minimum solution(s), f is the objective function, andthe set fi is the feasible region (fl C S). The problem of determining theglobal minimum solution(s) is called the global optimization problem.•

Although single-objective optimization problems may have a unique op-timal solution, MOPs (as a rule) present a possibly uncountable set of solu-tions, which when evaluated, produce vectors whose components representtrade-offs in objective space. A decision maker then implicitly chooses anacceptable solution (or solutions) by selecting one or more of these vectors.MOPs are mathematically defined as follows:

Definition 2 (General MOP): In general, an MOP minimizes F{x) —( / i (x ) , . . . ,fk{x)) subject to gi(x) < 0, i = 1, . . . , m , x 6 f2. An MOPsolution minimizes the components of a vector F(x) where x is an n-dimensional decision variable vector (x = x\,..., xn) from some universeQ. D

Def in i t ion 3 ( P a r e t o D o m i n a n c e ) : A vector u = (ui,... ,v,k) is said

to dominate v = (v±,..., v^) (denoted by u <v) if and only if u is partiallyless than v, i.e., Vi € { 1 , . . . , k}, ut < vt A 3i e { 1 , . . . , k} : m < i>j. D

Definition 4 (Pareto Optimality): A solution x € Cl is said to bePareto optimal with respect to Q. if and only if there is no x' £ fl for which

v = F(x') = ( / i ( x ' ) , • • • , / * ( * ' ) ) dominates u = F{x) = (fi(x),...,fk(x)).

The phrase "Pareto optimal" is taken to mean with respect to the entire

decision variable space unless otherwise specified. •


Definition 5 (Pareto Optimal Set): For a given MOPF(x), the Paretooptimal set (V*) is defined as:

v* := {x e n | -a x' e n F(x') * F{X)}. (2)

•

Definition 6 (Pareto Front): For a given MOP F(x) and Pareto op-timal set V*, the Pareto front (VT*) is defined as:

VT* := {u = F(x) = (fi(x),...,fk(x)) \ x € V*}. (3)

•The Pareto optimal solutions are ones within the search space whose

corresponding objective vector components cannot be improved simultane-ously. These solutions are also known as non-inferior, admissible, or efficientsolutions, with the entire set represented by V* or Ptrue- Their correspond-ing vectors are known as nondominated; selecting a vector (s) from thisvector set (the Pareto Front set VT* or PFtrue) implicitly indicates ac-ceptable Pareto optimal solutions (genotypes). These are the set of all so-lutions whose vectors are nondominated; these solutions are classified basedon their phenotypical expression. Their expression (the nondominated vec-tors), when plotted in criterion (phenotype) space, is known as the Paretofronfi5>6S. With these basic MOP definitions, we are now ready to delve intothe structure of MOPs and the specifics of various MOEAs.

1.3. Basic Operation of a MOEA

The objective of a MOEA is to converge to the true Pareto front of a prob-lem which normally consists of a diverse set of points. MOPs (as a rule)can present an uncountable set of solutions, which when evaluated producevectors whose components represent trade-offs in decision space. DuringMOEA execution, a "local" set of Pareto optimal solutions (with respectto the current MOEA generational population) is determined at each EAgeneration and termed PCUrrent{t), where t represents the generation num-ber. Many MOEA implementations also use a secondary population, storingall/some Pareto optimal solutions found through the generations55. Thissecondary population is termed Pknown (t), also annotated with t (repre-senting completion of t generations) to reflect possible changes in its mem-bership during MOEA execution. Pknown (0) is defined as 0 (the empty set)and Pknown alone as the final, overall set of Pareto optimal solutions re-turned by a MOEA. Of course, the true Pareto optimal solution set (termed


Ptrue) is not explicitly known for MOPs of any difficulty. Ptrue is defined bythe functions composing an MOP; it is fixed and does not change.

Pcurrent^), Pknown, and Ptrue are sets of MOEA genotypes where eachset's phenotypes form a Pareto front. We term the associated Pareto frontfor each of these solution sets as PFcurrent(t), PFknown, and PFtrue. Thus,when using a MOEA to solve MOPs, one implicitly assumes that one ofthe following conditions holds: PFknown Q PFtrue or that over some norm(Euclidean, RMS, etc.), PFknown G [PFtrUe, PFtrUe + e], where e is a smallvalue.

Generally speaking, a MOEA is an extension on an EA in which twomain issues are considered:

• How to select individuals such that nondominated solutions arepreferred over those which are dominated.

• How to maintain diversity as to be able to maintain in the popu-lation as many elements of the Pareto optimal set as possible.

Regarding selection, most current MOEAs use some form of Paretoranking. This approach was originally proposed by Goldberg25 and it sortsthe population of an EA based on Pareto dominance, such that all nondom-inated individuals are assigned the same rank (or importance). The idea isthat all nondominated individuals get the same probability to reproduceand that such probability is higher than the one corresponding to individ-uals which are dominated. Although conceptually simple, several possibleways exist to implement a MOEA using Pareto ranking18'12.

The issue of how to maintain diversity in an EA as been addressed bya extensive number of researchers39'27. The approaches proposed includefitness sharing and niching19, clustering54'65, use of geographically-basedschemes to distribute solutions36'14'13, and the use of entropy32'16, amongothers. Additionally, some researchers have also adopted mating restrictionschemes51'63'41. More recently, the use of relaxed forms of Pareto dominancehas been adopted as a mechanism to encourage more exploration and, there-fore, to provide more diversity. From these mechanisms, e-dominance hasbecome increasingly popular, not only because of its effectiveness, but alsobecause of its sound theoretical foundation38.

In the last few years, the use of elitist schemes has also become com-mon among MOEA researchers. Such schemes tend to consist of the useof an external archive (normally called "secondary population") that mayinteract in different ways with the main (or "primary") population of theMOEA. Despite storing the nondominated solutions found along the evo-


lutionary process, secondary populations have also been used to improvethe distribution of the solutions35 and to regulate the selection pressureof a MOEA65. Alternatively, a few algorithms use a plus (+) selectionmechanism by which parents are combined with their offspring in a singlepopulation from which a subset of the "best" individuals is retained. Themost popular from these algorithms is the Nondominated Sorting GeneticAlgorithm-II (NSGA-II)21.

1.4. Classifying MOEAs

There are several possible ways to classify MOEAs. The following taxonomyis perhaps the most simple and is based on the type of selection mechanismadopted:

• Aggregating Functions• Population-based Approaches• Pareto-based Approaches

We will briefly discuss each of them in the following subsections.

1.4.1. Aggregating Functions

Perhaps the most straightforward approach to deal with multi-objectiveproblems is to combine them into a single scalar value (e.g., adding themtogether). These techniques are normally known as "aggregating functions",because they combine (or "aggregate") all the objectives of the problem intoa single one. An example of this approach is a fitness function in which weaim to solve the following problem:

k

min ^2,Wifi{x) (4)z = l

where Wi > 0 are the weighting coefficients representing the relativeimportance of the k objective functions of our problem. It is usually assumedthat

X> = 1 (5)

Aggregating functions may be linear (as the previous example) ornonlinear46'59'28. Aggregating functions have been largely underestimated


by MOEA researchers mainly because of the well-known limitation of lin-ear aggregating functions (i.e., they cannot generate non-convex portionsof the Pareto front regardless of the weight combination used17). Note how-ever that nonlinear aggregating functions do not necessarily present suchlimitation12, and they have been quite successful in multi-objective combi-natorial optimization30.

1.4.2. Population-Based Approaches

In this type of approach, the population of an EA is used to diversify thesearch, but the concept of Pareto dominance is not directly incorporatedinto the selection process. The classical example of this sort of approach isthe Vector Evaluated Genetic Algorithm (VEGA), proposed by Schaffer49.VEGA basically consists of a simple genetic algorithm with a modified se-lection mechanism. At each generation, a number of sub-populations aregenerated by performing proportional selection according to each objectivefunction in turn. Thus, for a problem with k objectives, k sub-populations ofsize M/k each are generated (assuming a total population size of M). Thesesub-populations are then shuffled together to obtain a new population ofsize M, on which the genetic algorithm applies the crossover and mutationoperators. VEGA has several problems, from which the most serious is thatits selection scheme is opposed to the concept of Pareto dominance. If, forexample, there is an individual that encodes a good compromise solutionfor all the objectives (i.e., a Pareto optimal solution), but it is not the bestin any of them, it will be discarded. Schaffer suggested some heuristics todeal with this problem. For example, to use a heuristic selection preferenceapproach for nondominated individuals in each generation, to protect indi-viduals that encode Pareto optimal solutions but are not the best in anysingle objective function. Also, crossbreeding among the "species" couldbe encouraged by adding some mate selection heuristics instead of usingthe random mate selection of the traditional genetic algorithm. Neverthe-less, the fact that Pareto dominance is not directly incorporated into theselection process of the algorithm remains as its main disadvantage.

One interesting aspect of VEGA is that despite its drawbacks it remainsin current use by some researchers mainly because it is appropriate forproblems in which we want the selection process to be biased and in whichwe have to deal with a large number of objectives (e.g., when handlingconstraints as objectives in single-objective optimization9 or when solvingproblems in which the objectives are conceptually identical11).


1.4.3. Pareto-Based Approaches

Under this category, we consider MOEAs that incorporate the concept ofPareto optimality in their selection mechanism. A wide variety of Pareto-based MOEAs have been proposed in the last few years and it is not theintent of this section to provide a comprehensive survey of them since sucha review is available elsewhere12. In contrast, this section provides a briefdiscussion of a relatively small set of Pareto-based MOEAs that are repre-sentative of the research being conducted in this area.

Goldberg's Pareto Ranking: Goldberg suggested moving the popu-lation toward PFtrue by using a selection mechanism that favors solutionsthat are nondominated with respect to the current population25. He alsosuggested the use of fitness sharing and niching as a diversity maintenancemechanism19.

Multi-Objective Genetic Algorithm (MOGA): Fonseca andFleming23 proposed a ranking approach different from Goldberg's scheme.In this case, each individual in the population is ranked based on howmany other points dominate them. All the nondominated individuals inthe population are assigned the same rank and obtain the same fitness, sothat they all have the same probability of being selected. MOGA uses aniche-formation method in order to diversify the population, and a rela-tively simple methodology is proposed to compute the similarity threshold(called a share) required to determine the radius of each niche.

The Nondominated Sorting Genetic Algorithm (NSGA): Thismethod53 is based on several layers of classifications of the individuals assuggested by Goldberg25. Before selection is performed, the population isranked on the basis of nondomination: all nondominated individuals areclassified into one category with a dummy fitness value, which is propor-tional to the population size, to provide an equal reproductive potentialfor these individuals. To maintain the diversity of the population, theseclassified individuals are shared with their dummy fitness values. Then thisgroup of classified individuals is ignored and another layer of nondominatedindividuals is considered. The process continues until all individuals in thepopulation are classified. Stochastic remainder proportionate selection isadopted for this technique. Since individuals in the first front have themaximum fitness value, they always get more copies than the rest of the


population. An offshoot of this approach, the NSGA-II21, uses elitism anda crowded comparison operator that ranks the population based on bothPareto dominance and region density. This crowded comparison operatormakes the NSGA-II considerably faster than its predecesor while producingvery good results.

Niched Pareto Genetic Algorithm (NPGA): This method em-ploys an interesting form of tournament selection called Pareto dominationtournaments. Two members of the population are chosen at random andthey are each compared to a subset of the population. If one is nondomi-nated and the other is not, then the nondominated one is selected. If thereis a tie (both are either dominated or nondominated), then fitness sharingdecides the tourney results28.

Strength Pareto Evolutionary Algorithm (SPEA): This methodattempts to integrate different MOEAs65. The algorithm uses a "strength"value that is computed in a similar way to the MOGA ranking system.Each member of the population is assigned a fitness value according to thestrengths of all nondominated solutions that dominate it. Diversity is main-tained through the use of a clustering technique called the "average linkagemethod."

A revision of this method, called SPEA262, adjusts slightly the fitnessstrategy and uses nearest neighbor techniques for clustering. In addition,archiving mechanism enhancements allow for the preservation of boundarysolutions that are missed with SPEA.

Multi-Objective Messy Genetic Algorithm (MOMGA): Thismethod extends the mGA20 to solve multi-objective problems. TheMOMGA55 is an explicit building block GA that produces all buildingblocks of a user specified size. The algorithm has three phases: Initializa-tion, Primordial, and Juxtapositional. The MOMGA-II algorithm wasdeveloped by Zydallis as an extension of the MOMGA67. It was developedin order to expand the state of the art for explicit building-block MOEAs.While there has been a lot of research done for single objective explicitbuilding-block EAs, this was a first attempt at using the concept for MOPs.Exponential growth of the population as the building block size grows maybe a disadvantage of this approach in some applications.


Multi-Objective Hierarchical Bayesian Optimization Algo-rithm (hBOA): This search technique is a conditional model builder. Itexpands the idea of the compact genetic algorithm and the stud geneticalgorithm. The hBOA defines a Bayesian model that represents "small"building blocks (BBs) reflecting genotypical epistasis using a hierarchicalBayesian network45. The mhBOA31 is in essence a linkage learning algo-rithm that extends the hBOA and attempts to define tight and loose link-ages to building blocks in the chromosome over a Pareto front. In particular,this method uses a Bayesian network (a conditional probabilistic model) toguide the search toward a solution. A disadvantage of this algorithm is thetime it takes to generate results for a relatively small number of linkages.

Pareto Archived Evolution Strategy (PAES): This method, for-mulated by Knowles and Corne34, uses a (1+1) evolution strategy, whereeach parent generates one offspring through mutation. The method usesan archive of nondominated solutions to compare with individuals in thecurrent population. For diversity, the algorithm generates a grid overlaidon the search space and counts the number of solutions in each grid. A dis-advantage of this method is its performance on disconected Pareto Fronts.

Micro-Genetic Algorithm for Multi-Objective Optimization:The micro-genetic algorithm was introduced by Coello Coello and ToscanoPulido10 and, by definition, has a small population requiring a reinitial-ization technique. An initial random population flows into a populationmemory which has two parts: a replaceable and a non-replaceable portion.The non-replaceable part provides the population diversity. The replaceableportion of course changes at the end of each generation where this popu-lation undergoes crossover and mutation. Using various elitism selectionoperators, the non-dominated individuals compose the replaceable portion.

General Multi-Objective Program (GENMOP): This method isa parallel, real-valued MOEA initially used for bioremediation research33.This method archives all previous population members and ranks them.Archived individuals with the highest ranks are used as a mating pool tomate with the current generation. The method uses equivalence class shar-ing for niching to allow for diversity in the mating pool. A disadvantageof this algorithm is the Pareto ranking of the archived individuals at eachgeneration.


Other researchers have combined elements of these MOEAs to developunique MOEAs for their specific problem domain with excellent results.

1.5. MOEA Performance Measures

The use of performance measures (or metrics) allows a researcher or compu-tational scientist to assess (in a quantitative way) the performance of theiralgorithms. The MOEA field is no different. MOEA performance measurestend to focus on the phenotype or objective domain as to the accuracy ofthe results. This is different to what most operations researchers do. Theytend to use metrics in the genotype domain. But since there is an explicitmapping between the two, it doesn't really matter in which domain youdefine your metrics12'57.

MOEA metrics can be used to measure final performance or track thegenerational performance of the algorithm. This is important because itallows the researcher to manage the algorithm convergence process duringexecution. This section presents a variety of MOEA metrics, yet, no at-tempt is made to be comprehensive. For a more detailed treatment of thistopic, the interested reader should consult additional references12'60'66.

Error Ratio (ER): This metric reports the number of vectors inPFknown that are not members of PFtrUe- This metric requires that theresearcher knows PFtrue. The mathematical representation of this metricis shown in equation 6:

ER = & i £ i (6)

where n is the number of vectors in PFknown and e is a zero when the ivector is an element of PFtrue or a 1 if i is not an element. So when ER = 0,the PFknown is the same as PFtrue; but when ER = 1, this indicates thatnone of the points in PFknown are in PFtrue.

Two Set Coverage (CS): This metric60 compares the coverage oftwo competing sets and outputs the percentage of individuals in one setdominated by the individuals of the other set. This metric does not requirethat the researcher has knowledge of PFtrue. The equation for this metricis shown in equation 7:

CS(X>,X") ± \a"^";WeX':a'yg"\ ( ? )


where X', X" C X are two sets of phenotype decision vectors, and (X1, X")are mapped to the interval [0,1]. This means that CS = 1 when X' domi-nates or equals X".

Generational Distance (GD): This metric was proposed by VanVeldhuizen and Lamont56. It reports how far, on average, PFknown is fromPFtrue. This metric requires that the researcher knows PFtTUe • It is math-ematically defined in equation

GD A ( E k ^ ! (8)n

where n is the number of vectors in PFknown, P = 2, and Di is the Eu-clidean distance between each member and the closest member of PFtrue,in the phenotype space. When GD = 0, PFknown — PFtrue.

Hyperarea and Ratio (H,HR): These metrics, introduced by Zitzler& Thiele64, define the area of coverage that PFknown has with respect tothe objective space. This would equate to the summation of all the areasof rectangles, bounded by the origin and (fi(x),f2{x)), for a two-objectiveMOEA. Mathematically, this is described in equation 9:

# = j(Jai|«iePFfcnouml (9)

where Vi is a nondominated vector in PFknown a nd di is the hyperarea cal-culated between the origin and vector V(. But if PFknown is not convex, theresults can be misleading. It is also assumed in this model that the originis (0,0).

The hyperarea ratio metric definition can be seen in equation 10:

HR±%- (10)

where Hi is the PFknown hyperarea and H2 is the hyperarea of PFtrue-This results in HR > 1 for minimization problems and HR < 1 for max-imization problems. For either type of problem, PFknown = PFtrue whenHR = 1. This metric requires that the researcher knows PFtrue.


Spacing (S): This metric was proposed by Schott50 and it measures thedistance variance of neighboring vectors in PFknown. Equation 11 definesthis metric.

and

di = minj(\fl(x) - f((x)\ + \f2{x) - f((x)\) (12)

where i, j = 1 . . . , n, d is the mean of all di, and n is the number of vectorsin PFknown- When 5 = 0, all members are spaced evenly apart. This metricdoes not require the researcher to know PFtrue.

Overall Nondominated Vector Generation Ratio (ONVGR):This metric measures the total number of nondominated vectors duringMOEA execution and divides it by the number of vectors found in PFtrUe •This metric is defined as shown in equation 13:

ONVG = PJlalse (13)

When ONVGR = 1 this states only that the same number of pointshave been found in both PFtrue and PFknown- It does not infer thatPFtrue — PFknown- This metric requires that the researcher knows PFtrue.

Progress Measure RP: For single-objective EAs, Back3 defines ametric that measures convergence velocity. This single-objective metric isapplied to multi-objective MOEAs55, and is reflected in equation 14:

RP = inJ^ (14)V GT

where Gi is the generational distance for the first generation and GT is thedistance for generation T. Recall that generational distance was defined inequation 8 and it measures the average distance from PFtrue to PFknown-This metric requires that the researcher knows PFtrue.

(11)


Generational Nondominated Vector Generation (GNVG): Thisis a simple metric, introduced by Van Veldhuizen55 that lists the numberof nondominated vectors produced for each generation. This is defined inequation 15

GNVG = \PFcurrent(t)\ (15)

This metric does not require the researcher knows PFtrue.

Nondominated Vector Addition (NVA): This metric, introducedby Van Veldhuizen55, calculates the number of nondominated vectors gainedor lost from the previous PFknown generation. Equation 16 defines thismetric.

NVA = \PFknown(t)\ - \PFknown(t - 1)| (16)

But this metric can be misleading when a new vector dominates twoor more vectors from the previous generation. In addition, this metric mayremain static over the course of several generations while new points areadded that dominate others from the previous generation. This metric doesnot require the researcher knows PFtrue-

As to what metrics are appropriate, it of course depends upon theMOEA application to the given MOP. Since in real-world applications, thetrue Pareto Front is unknown, relative metrics are usually selected. It is alsoworth observing that recent research has shed light on the limitations ofunary metrics (i.e., performance measures that assign each approximationof the Pareto optimal set a number that reflects a certain quality aspect)66.Such study favors the use of binary metrics. As a consequence of this study,it is expected that in the next few years MOEA researchers will eventuallyadopt binary metrics on a regular basis, but today, the use of unary metrics(such as error ratio and many of the others discussed in this section) is stillcommon.

1.6. Design of MOEA Experiments

To conduct a thorough evaluation of the performance of any MOEA, adesign of experiments or methodology should be common practice priorto testing and evaluating the search results. The main goal of MOEA re-search is the creation of an effective and efficient algorithm that renders


good solutions. But to achieve that goal, several smaller goals need to beaddressed. These goals can be classified under two categories: effectivenessgoals and efficiency goals. Effectiveness goals should list the effectivenessgoals and the experimental design employed to validate that these goalsare met. Efficiency Goals should list the experimental design employed tovalidate the efficiency goals. Also, a section on the Computing Environmentshould indicate the computing environment for ease of repeatability. Find-ing good solutions is the top priority for any MOEA application research.Therefore one has to validate that their algorithm does indeed find goodsolutions. Benchmarks can also used. In addition, comparison with currentMOP designs is appropriate for validation. Once a baseline set of runs arecompleted and analyzed, algorithm parameters can be tweaked to possiblyimprove effectiveness. The various application chapters in this text have at-tempted to adhere to such an experimental design and follow the reportingtechniques of the next section.

1.6.1. Reporting MOEA Computational Results

Before the advent of the "Scientific Method", many engineers and scientistsmerely used the trial and error method in an attempt to gain insight into aparticular problem. The scientific method is the process by which engineersand scientists, collectively and over time, endeavor to construct an accurate(that is, reliable, consistent and non-arbitrary) representation of the worldor the problem which they study. Recognizing that personal and culturalbeliefs influence both our perception and our interpretation of natural phe-nomena, we aim through the use of standard procedures and criteria tominimize those influences when developing a conjecture or a theory or aqualification. In summary, the scientific method attempts to minimize theinfluence of bias or prejudice in the experimenter.

Each application chapter reporting computational experiments attemptto follow the above objective since they use computer generated evidence tocompare or rank competing MOEA software techniques and Pareto Frontsolutions. Chapter authors consider various classical references that candirect computational experimentation6'15'29. According to Jackson, et al.29,the researcher should always keep in mind various elements identified as to"What to Keep In Mind When Testing:"

• Are the results presented statistically sufficient to justify the claimsmade?


• Is there sufficient detail to reproduce the results?• When should a statistically-based experiment be done-usually

when a claim such as "this method is better (i.e. faster, more ac-curate, more efficient, easier-to-use, etc.)"?

• Are the proper test problems being used?• Are all possible performance measures (efficiency, robustness, reli-

ability, ease-of-use, etc.) addressed?• Is enough information provided with respect to the architecture of

the hardware being used?

One should organize the design of experiments. For example, one shoulddiscuss input and output data, the identification of all parameters availableduring testing (for all tests the parameters are the same unless otherwiseindicated), a discussion of the random number generators and seeds andother topics that are pertinent to the set of experiments. Following thegeneral information, each individual experiment is presented with the ob-jective and methodology of the experiment identified. For each experimentany parameter settings or environmental settings that differ from the gen-eralized discussion are duly noted. Various statistical methods should beaddressed such as mean, average, max, min, student i-test, Kruskal-Wallistest, and others as appropriate for the computational experiment.

1.7. Layout of the Book

After presenting some basic concepts, terminology and a brief discussionon methodological aspects related to the use of MOEAs, we devote thislast section to discuss briefly each of the remaining chapters that are inte-grated into this book or monograph. As indicated in the preface, these 29chapters are divided in four application collections. The specific chaptersthat compose each of these parts are summarized in the following subsec-tions. Note that many authors use specific MOEAs that are summarizedin Section 1.4. Also, observe that some of the various metrics discussedin Section 1.5 are employed in statistical MOEA evaluation employing theexperimental testing techniques of Section 1.6.

1.7.1. Part I: Engineering Applications

Considering that the use of MOEAs in engineering has been very extensive,this first part is the largest in the book, as it includes chapters from 2 to13.


In Chapter 2, Ray adopts a scheme that handles objectives and con-straints separately. Nondominance is used not only for selecting individualsbut also to handle constraints. The MOEA adopted in this work is theNSGA53 with elitism. The approach is applied to some engineering designproblems (a welded beam, a bulk carrier and an airfoil).

Farina and Di Barba apply in Chapter 3 several approaches to the designof industrial electromagnetic devices (the case studies consist of a magneticreactor and an inductor for transverse-flux heating of a metal strip). Theauthors consider the use of the Non-dominated Sorting Evolutionary Strat-egy Algorithm0 (NSESA), a Pareto Gradient Based Algorithm (PGBA), aPareto Evolution Strategy Algorithm (PESTRA), and a Multi DirectionalEvolution Strategy Algorithm (MDESTRA). At the end, they decide toadopt hybrid approaches in which NSESA is combined with both a deter-ministic and a local-global strategy.

Reed & Devireddy use in Chapter 4 the NSGA-II21 enhanced with thee-dominance archiving and automatic parameterization techniques38 to op-timize ground water monitoring networks. The authors indicate that the useof e-dominance not only eliminated the empirical fine-tuning of parametersof their MOEA, but also reduced the computational demands by more than70% with respect to some of their previous work.

In Chapter 5, Hernandez Luna and Coello Coello use a particle swarmoptimizer with a population-based selection scheme (similar to VEGA49)to design combinational logic circuits. One of the relevant aspects of thiswork is that the problem to be solved is actually mono-objective. However,the use of a multi-objective selection scheme improves both the robustnessand the quality of the results obtained.

Furukawa et al. present in Chapter 6 the application of two MOEAs tothe sensor and vehicle parameter determination for successful autonomousvehicles navigation. The MOEAs adopted are: (1) the Multi-objectiveContinuous Evolutionary Algorithm (MCEA) and (2) the Multi-ObjectiveGradient-based Method (MOGM). Due to space limitations, only the re-sults produced by the MCEA are presented in the chapter, although theauthors indicate that both MOEAs reach the same final results. It is worthnoticing the use of the so-called Center-of-Gravity Method (CoGM) to se-lect a single solution from the Pareto optimal set produced by the MCEA.

In Chapter 7, Tan and Li use a MOEA to design optimal unified lineartime-invariant control (ULTIC) systems. The proposal consists of a method-

cThis algorithm is a variation of the NSGA53.


ology for performance-prioritized computer aided control system design inwhich a MOEA toolbox previously designed by the authors is used as anoptimization engine. An interesting aspect of this work is that the user isallowed to set his/her goals on-line (without having to restart the entire de-sign cycle) and can visualize (in real-time) the effect of such goal setting onthe results. The proposed methodology is applied to a non-minimal phaseplant control system.

Gaspar-Cunha and Covas in Chapter 8 apply a MOEA to solve polymerextrusion problems. The authors optimize the performance of both single-screw and co-rotating twin-screw extruders. The MOEA adopted is calledReduced Pareto Set Genetic Algorithm with Elitism (RPSGAe) and waspreviously proposed by the same authors24. An interesting aspect of thiswork is that the RPSGAe uses a clustering technique not to maintain di-versity as is normally done, but to reduce the number of Pareto optimalsolutions. The problems solved are formulated as multi-objective travelingsalesperson problems (i.e., they are actually dealing with multi-objectivecombinatorial optimization problems).

In Chapter 9, Hernandez Aguirre and Botello Rionda propose an exten-sion of the Pareto Archived Evolution Strategy (PAES)36 which is able todeal with both single-objective and multi-objective optimization problems.The proposed approach is called Inverted and Shrinkable Pareto ArchivedEvolutionary Strategy (ISPAES), and is used to solve several truss opti-mization problems (a common problem in structural and mechanical en-gineering). The main differences between ISPAES and PAES are in theselection mechanism and the implementation of the adaptive grid. The testproblems adopted include both single and multiple objective problems aswell as discrete and continuous search spaces.

Balling presents in Chapter 10 an interesting application of MOEAsto city and regional planning. The MOEA adopted uses the maximin fit-ness function previously proposed by the author5. The approach has beenapplied to plan the Wasatch Front Metropolitan Region in Utah (in theUSA). An interesting aspect of this work is the discussion presented by theauthor regarding the reluctance from the authorities to actually implementsome of the plans produced by the MOEA. The author attributes this re-luctance both to the high number of (nondominated) plans produced (noscheme to incorporate user's preferences8 was adopted by Balling) and tothe psychological impact that this sort of (radically new) approach has onpeople.

Jozefowiez et al. present in Chapter l l a MOEA to solve the bi-objective


covering tour problem. The MOEA adopted is the NSGA-II21, and the re-sults are compared with respect to an exact algorithm based on a branch-and-bound approach which can be applied only to relatively small instancesof the problem. The chapter also presents a thorough review of multi-objective routing problems reported in the specialized literature.

Chapter 12, by Kiinzli et al., presents a benchmark problem in computerengineering (the design space exploration of packet processor architectures).Besides describing several details related to the proposed benchmark prob-lem, the authors also refer to the text-based interface developed by themwhich is platform and programming language independent. This aims tofacilitate the use of different MOEAs (across different platforms) to solvesuch problem.

In the last chapter of the first part (Chapter 13), Obayashi and Sasakipresent the use of a MOEA for aerodynamic design of supersonic wings.The MOEA adopted is the Adaptive Range Multiobjective Genetic Algo-rithm (ARMOGA), which is based on an approach originally developedby Arakawa and Hagiwara2. The multi-objective extensions are based onMOGA23. An interesting aspect of this work is the use of Self-OrganizingMaps (SOMs) both to visualize trade-offs among the objectives of the prob-lem and to perform some sort of data mining of the designs produced.

1.7.2. Part II: Scientific Applications

The second part of the book, which focuses on scientific applications ofMOEAs, includes chapters from 14 to 19.

In Chapter 14, Ray presents the use of a MOEA to optimize gas-solidseparation devices used for particulate removal from air (namely, the designof cyclone separators and venturi scrubbers). The author used the NSG A53,mainly because of her previous experience with such algorithm.

Mancini et al. present in Chapter 15 the use of a MOEA for an appli-cation in Physics: the spectroscopic data analysis of inertia] confinementfusion implosion cores based on the self-consistent analysis of simultaneousnarrow-band X-ray images and X-ray line spectra. The MOEA adopted isthe Niched-Pareto Genetic Algorithm (NPGA)28.

Chapter 16, by Lahanas, presents a survey of the use of MOEAs inmedicine. The types of problems considered include medical image process-ing, computer-aided diagnosis, treatment planning, and data mining.

In Chapter 17, Kumar describes the use of a MOEA in the solution ofhigh-dimensional and complex domains of machine learning. The MOEA


is used as a pre-processor for partitioning these complex learning tasksinto simpler domains that can then be solved using traditional machinelearning approaches. The MOEA adopted is the Pareto Converging GeneticAlgorithm (PCGA), which was proposed by the author37.

Romero Zaliz et al. describe in Chapter 18 an approach for identify-ing interesting qualitative features in biological sequences. The approach iscalled Generalized Analysis of Promoters (GAP) and is based on the useof generalized clustering techniques where the features being sought corre-spond to the solutions of a multiobjective optimization problem. A MOEAis then used to identify multiple promoters occurrences within genomic reg-ulatory regions. The MOEA adopted is a Multi-Objective Scatter Search(MOSS) algorithm.

Lamont et al. present in Chapter 19 an application of the multi-objectivemessy genetic algorithm-II (MOMGA-II) to two NP-complete problems: themulti-objective Quadratic Assignment Problem (mQAP) and the ModifiedMulti-objective Knapsack Problem (MMOKP).

1.7.3. Part III: Industrial Applications

The third part of the book, which focuses on real-world industrial applica-tions of MOEAs, includes chapters from 20 to 24.

In Chapter 20, Anderson uses a MOEA to design fluid power systems.The MOEA adopted is called multi-objective struggle genetic algorithm(MOSGA) and was proposed by the same author1. The approach is furtherextended so that it can deal with mixed variable design problems (i.e., withboth continuous and discrete variables).

Mansouri presents in Chapter 21 the application of a MOEA in cellularmanufacturing systems. The problem tackled consists of deciding on whichparts to subcontract and which machines to duplicate in a cellular man-ufacturing system wherein some exceptional elements exist. The MOEAadopted is the NSGA53.

Chapter 22, by Ishibuchi and Shibata, presents the solution of flowshopscheduling problems (both single- and multi-objective) using genetic algo-rithms. The multi-objective instances are solved using the NSGA-II21. Theauthors recommend the use of mating restrictions and a hybridization withlocal search in order to improve the performance of the MOEA adopted.

Gandibleux et al. deal in Chapter 23 with multi-objective combinatorialoptimization problems. The approach adopted in this case is peculiar, sinceit is a population-based heuristic that uses three operators: crossover, path-


relinking and a local search on elite solutions. However, this approach differsfrom a MOEA in two main aspects: (1) it does not use Pareto ranking, and(2) it performs no direction searches to drive the approximation process.The authors apply their approach to the bi-objective assignment problemand to the bi-objective knapsack problem.

In Chapter 24, Watanabe and Hiroyasu apply a MOEA to the solu-tion of the multi-objective rectangular packing problem, which is a discretecombinatorial optimization problem that arises in many applications (e.g.,truck packing and floor planning, among others). The MOEA adopted is theNeighborhood Cultivation Genetic Algorithm (NCGA) which was proposedby the authors58.

1.7.4. Part IV: Miscellaneous Applications

The fourth and last part of the book, deals with miscellaneous applicationsof MOEAs in a variety of domains, and includes chapters from 25 to 30.

Pappa et al. present in Chapter 25 the use of MOEAs to select attributesin data mining. The authors use two approaches that were previously pro-posed by them: (1) an elitist multi-objective genetic algorithm (which usesPareto dominance) in which all the nondominated solutions found pass un-altered to the next generation42 and (2) a multi-objective forward sequentialselection method43.

In Chapter 26, Schlottmann and Seese present a fairly detailed surveyof the use of MOEAs in portfolio management problems. The authors em-phasize the importance of the incorporation of problem-specific knowledgeinto a MOEA as to improve its performance in such financial applications.The authors also identify some other potential applications of MOEAs infinance.

Chapter 27, by Jin et al., describes the application of a MOEA to theevolution of both the weights and the structure of neural networks used forregression and prediction. The MOEA adopted is the NSGA-II21, expandedwith Lamarckian inheritance. The authors report success of the MOEAto generate diverse neural network ensemble members, which significantlyimproves the regression accuracy, particularly in cases in which a singlenetwork is not able to predict reliably.

In Chapter 28, Fieldsend and Singh use a MOEA to train neural net-works used for time series forecasting. The MOEA adopted is a variationof PAES36. The most interesting aspect of this work is that the use of amulti-objective approach allows the user to get a good representation of the


complexity/accuracy trade-off of the problem being solved. This may leadto the selection of neural networks with very low complexity.

Chapter 29, by Ducheyne et al., presents the application of MOEAsin forest management problems (particularly forest scheduling problems).Two MOEAs are studied by the authors: MOGA23 and the NSGA-II21. Aninteresting aspect of this work is the use of fitness inheritance52 to speedup the optimization process.

Finally, in Chapter 30, Landa Silva and Burke propose the use of di-versity measures to guide a MOEA's search. Such an approach is used tosolve space allocation problems arising in academic institutions. The MOEAadopted is called Population-based Hybrid Annealing Algorithm and waspreviously proposed by the same authors. In this approach, each individualis evolved by means of local search and a specialized mutation operator.This MOEA combines concepts of simulated annealing, tabu search, evolu-tionary algorithms and hillclimbing.

1.8. General Comments

As has been seen in the previous presentation, this book includes a widevariety of applications of MOEAs. Nevertheless, if we consider the impor-tant growth of the number of publications related to MOEAs in the lastfew years, it is likely that we will see more novel applications in the nearfuture. As a matter of fact, there are still several areas in which applica-tions of MOEAs are rare (e.g., computer vision, operating systems, compilerdesign, computer architecture, and business activities among others).

The application of MOEAs to increasingly challenging problems is trig-gering more research on MOEA algorithmic design as well as influencingdevelopmental trends. For example, the hybridization of MOEAs with othermechanisms (e.g., local search) may become standard practice in complexMOP application domains.

This volume constitutes an initial attempt to collect a representativesample of contemporary MOEA applications, thus providing insight to theirefficient and effective use. Of course, it is expected that more and more spe-cialized monographs and textbooks will include the use of MOEAs in di-verse problem domains because of the expanding understanding and utilityof MOEA concepts in solving complex high-dimensional MOPs.


References

1. Johan Andersson. Multiobjective Optimization in Engineering Design—Applications to Fluid Power Systems. PhD thesis, Division of Fluid andMechanical Engineering Systems. Department of Mechanical Engineering.Linkoping Universitet, Linkoping, Sweden, 2001.

2. Masao Arakawa and Ichiro Hagiwara. Development of Adaptive Real Range(ARRange) Genetic Algorithms. JSME International Journal, Series C,41(4):969-977, 1998.

3. Thomas Back. Evolutionary Algorithms in Theory and Practice. Oxford Uni-versity Press, New York, 1996.

4. Thomas Back, David B. Fogel, and Zbigniew Michalewicz, editors. Handbookof Evolutionary Computation. Institute of Physics Publishing and OxfordUniversity Press, New York, 1997.

5. Richard Balling. The Maximin Fitness Function; Multiobjective City andRegional Planning. In Carlos M. Fonseca, Peter J. Fleming, Eckart Zitzler,Kalyanmoy Deb, and Lothar Thiele, editors, Evolutionary Multi-CriterionOptimization. Second International Conference, EMO 2003, pages 1-15,Faro, Portugal, April 2003. Springer. Lecture Notes in Computer Science.Volume 2632.

6. Richard S. Barr, Bruce L. Golden, James P. Kelly, Mauricio G. C. Resende,and Jr. William R. Stewart. Designing and Reporting on ComputationalExperiments with Heuristic Methods. Journal of Heuristics, 1:9-32, 1995.

7. Carlos A. Coello Coello. A Comprehensive Survey of Evolutionary-BasedMultiobjective Optimization Techniques. Knowledge and Information Sys-tems. An International Journal, 1(3):269—308, August 1999.

8. Carlos A. Coello Coello. Handling Preferences in Evolutionary MultiobjectiveOptimization: A Survey. In 2000 Congress on Evolutionary Computation,volume 1, pages 30-37, Piscataway, New Jersey, July 2000. IEEE ServiceCenter.

9. Carlos A. Coello Coello. Treating Constraints as Objectives for Single-Objective Evolutionary Optimization. Engineering Optimization, 32(3):275-308, 2000.

10. Carlos A. Coello Coello. A Short Tutorial on Evolutionary MultiobjectiveOptimization. In Eckart Zitzler, Kalyanmoy Deb, Lothar Thiele, CarlosA. Coello Coello, and David Corne, editors, First International Conferenceon Evolutionary Multi-Criterion Optimization, pages 21-40. Springer-Verlag.Lecture Notes in Computer Science No. 1993, 2001.

11. Carlos A. Coello Coello and Arturo Hernandez Aguirre. Design of Combina-tional Logic Circuits through an Evolutionary Multiobjective OptimizationApproach. Artificial Intelligence for Engineering, Design, Analysis and Man-ufacture, 16(l):39-53, January 2002.

12. Carlos A. Coello Coello, David A. Van Veldhuizen, and Gary B. Lamont.Evolutionary Algorithms for Solving Multi-Objective Problems. Kluwer Aca-demic Publishers, New York, May 2002.

13. David W. Corne, Nick R. Jerram, Joshua D. Knowles, and Martin J. Oates.


PESA-II: Region-based Selection in Evolutionary Multiobjective Optimiza-tion. In Lee Spector, Erik D. Goodman, Annie Wu, W.B. Langdon, Hans-Michael Voigt, Mitsuo Gen, Sandip Sen, Marco Dorigo, Shahram Pezeshk,Max H. Garzon, and Edmund Burke, editors, Proceedings of the Genetic andEvolutionary Computation Conference (GECCO'2001), pages 283-290, SanFrancisco, California, 2001. Morgan Kaufmann Publishers.

14. David W. Corne, Joshua D. Knowles, and Martin J. Oates. The ParetoEnvelope-based Selection Algorithm for Multiobjective Optimization. InMarc Schoenauer, Kalyanmoy Deb, Giinter Rudolph, Xin Yao, Evelyne Lut-ton, Juan Julian Merelo, and Hans-Paul Schwefel, editors, Proceedings of theParallel Problem Solving from Nature VI Conference, pages 839-848, Paris,France, 2000. Springer. Lecture Notes in Computer Science No. 1917.

15. H. Crowder, R. S. Demo, and J. H. Mulvey. On Reporting Computational Ex-periments with Mathematical Software. ACM Transactions on MathematicalSoftware, 5(2):193-203, June 1979.

16. Xunxue Cui, Miao Li, and Tingjian Fang. Study of Population Diversityof Multiobjective Evolutionary Algorithm Based on Immune and EntropyPrinciples. In Proceedings of the Congress on Evolutionary Computation 2001(CEC'2001), volume 2, pages 1316-1321, Piscataway, New Jersey, May 2001.IEEE Service Center.

17. Indraneel Das and John Dennis. A Closer Look at Drawbacks of Minimiz-ing Weighted Sums of Objectives for Pareto Set Generation in MulticriteriaOptimization Problems. Structural Optimization, 14(l):63-69, 1997.

18. Kalyanmoy Deb. Multi-Objective Optimization Using Evolutionary Algo-rithms. John Wiley & Sons, Chichester, UK, 2001.

19. Kalyanmoy Deb and David E. Goldberg. An Investigation of Niche andSpecies Formation in Genetic Function Optimization. In J. David Schaf-fer, editor, Proceedings of the Third International Conference on GeneticAlgorithms, pages 42-50, San Mateo, California, June 1989. George MasonUniversity, Morgan Kaufmann Publishers.

20. Kalyanmoy Deb and David E. Goldberg, mga in C: A Messy Genetic Algo-rithm in C. Technical Report 91008, Illinios Genetic Algorithms Laboratory(IlliGAL), September 1991.

21. Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. A Fastand Elitist Multiobjective Genetic Algorithm: NSGA-II. IEEE Transactionson Evolutionary Computation, 6(2): 182-197, April 2002.

22. David B. Fogel. Evolutionary Computation. Toward a New Philosophy ofMachine Intelligence. The Institute of Electrical and Electronic Engineers,New York, 1995.

23. Carlos M. Fonseca and Peter J. Fleming. Genetic Algorithms for Multiobjec-tive Optimization: Formulation, Discussion and Generalization. In StephanieForrest, editor, Proceedings of the Fifth International Conference on GeneticAlgorithms, pages 416-423, San Mateo, California, 1993. University of Illinoisat Urbana-Champaign, Morgan Kauffman Publishers.

24. Antonio Gaspar-Cunha and Jose A. Covas. RPSGAe-Reduced Pareto Set Ge-netic Algorithm: Application to Polymer Extrusion. In Xavier Gandibleux,


Marc Sevaux, Kenneth Sorensen, and Vincent T'kindt, editors, Metaheuris-tics for Multiobjective Optimisation, pages 221-249, Berlin, 2004. Springer.Lecture Notes in Economics and Mathematical Systems Vol. 535.

25. David E. Goldberg. Genetic Algorithms in Search, Optimization and MachineLearning. Addison-Wesley Publishing Company, Reading, Massachusetts,1989.

26. P. Hajela and C. Y. Lin. Genetic search strategies in multicriterion optimaldesign. Structural Optimization, 4:99-107, 1992.

27. Jeffrey Horn. The Nature of Niching: Genetic Algorithms and the Evolutionof Optimal, Cooperative Populations. PhD thesis, University of Illinois atUrbana Champaign, Urbana, Illinois, 1997.

28. Jeffrey Horn, Nicholas Nafpliotis, and David E. Goldberg. A Niched ParetoGenetic Algorithm for Multiobjective Optimization. In Proceedings of theFirst IEEE Conference on Evolutionary Computation, IEEE World Congresson Computational Intelligence, volume 1, pages 82-87, Piscataway, New Jer-sey, June 1994. IEEE Service Center.

29. Richard H. F. Jackson, Paul T. Boggs, Stephen G. Nash, and Susan Powell.Guidelines for Reporting Results of Computational Experiments - Report ofthe Ad Hoc Committee. Mathematical Programming, 49:413-425, 1991.

30. Andrzej Jaszkiewicz. On the performance of multiple-objective genetic lo-cal search on the 0/1 knapsack problem—a comparative experiment. IEEETransactions on Evolutionary Computation, 6(4):402-412, August 2002.

31. Nazan Khan. Bayesian optimization algorithms for multiobjective and heirar-chically difficult problems. Master's thesis, University of Illinois at Urbana-Champaign, Urbana, IL, July 2003.

32. Hajime Kita, Yasuyuki Yabumoto, Naoki Mori, and Yoshikazu Nishikawa.Multi-Objective Optimization by Means of the Thermodynamical GeneticAlgorithm. In Hans-Michael Voigt, Werner Ebeling, Ingo Rechenberg, andHans-Paul Schwefel, editors, Parallel Problem Solving from Nature—PPSNIV, Lecture Notes in Computer Science, pages 504-512, Berlin, Germany,September 1996. Springer-Verlag.

33. Mark R. Knarr, Mark N. Goltz, Gary B. Lamont, and Junqi Huang. In SituBioremediation of Perchlorate-Contaminated Groundwater using a Multi-Objective Parallel Evolutionary Algorithm. In Congress on EvolutionaryComputation (CEC'2003), volume 1, pages 1604-1611, Piscataway, New Jer-sey, December 2003. IEEE Service Center.

34. Joshua Knowles and David Corne. M-PAES: A Memetic Algorithm for Mul-tiobjective Optimization. In 2000 Congress on Evolutionary Computation,volume 1, pages 325-332, Piscataway, New Jersey, July 2000. IEEE ServiceCenter.

35. Joshua Knowles and David Corne. Properties of an Adaptive Archiving Al-gorithm for Storing Nondominated Vectors. IEEE Transactions on Evolu-tionary Computation, 7(2):100-116, April 2003.

36. Joshua D. Knowles and David W. Corne. Approximating the NondominatedFront Using the Pareto Archived Evolution Strategy. Evolutionary Compu-tation, 8(2):149-172, 2000.


37. Rajeev Kumar and Peter Rockett. Improved Sampling of the Pareto-Front inMultiobjective Genetic Optimizations by Steady-State Evolution: A ParetoConverging Genetic Algorithm. Evolutionary Computation, 10(3):283-314,Fall 2002.

38. Marco Laumanns, Lothar Thiele, Kalyanmoy Deb, and Eckart Zitzler. Com-bining Convergence and Diversity in Evolutionary Multi-objective Optimiza-tion. Evolutionary Computation, 10(3):263-282, Fall 2002.

39. Samir W. Mahfoud. Niching Methods for Genetic Algorithms. PhD thesis,University of Illinois at Urbana-Champaign, Department of General Engi-neering, Urbana, Illinois, May 1995.

40. Kaisa M. Miettinen. Nonlinear Multiobjective Optimization. Kluwer Aca-demic Publishers, Boston, Massachusetts, 1999.

41. Tadahiko Murata, Hisao Ishibuchi, and Mitsuo Gen. Specification of Ge-netic Search Directions in Cellular Multi-objective Genetic Algorithms. InEckart Zitzler, Kalyanmoy Deb, Lothar Thiele, Carlos A. Coello Coello, andDavid Corne, editors, First International Conference on Evolutionary Multi-Criterion Optimization, pages 82-95. Springer-Verlag. Lecture Notes in Com-puter Science No. 1993, 2001.

42. Gisele L. Pappa, Alex A. Freitas, and Celso A.A. Kaestner. Attribute Selec-tion with a Multiobjective Genetic Algorithm. In G. Bittencourt and G.L.Ramalho, editors, Proceedings of the 16th Brazilian Symposium on ArtificialIntelligence (SBIA-2002), pages 280-290. Springer-Verlag. Lecture Notes inArtificial Intelligence Vol. 2507, 2002.

43. Gisele L. Pappa, Alex A. Freitas, and Celso A. A. Kaestner. A MultiobjectiveGenetic Algorithm for Attribute Selection. In Proceedings of the 4th Inter-national Conference on Recent Advances in Soft Computing (RASC-2002),pages 116-121, Nottingham, UK, December 2002. Nottingham Trent Univer-sity.

44. Ian C. Parmee. Evolutionary and Adaptive Computing in Engineering Design.Springer, London, 2001.

45. Martin Pelikan and David E. Goldberg. Heirarchical problem solving and thebayesian optimization algorithm. In Proceedings of the Genetic and Evolu-tionary Computation Conference (GECCO-2000), pages 267-274, 2000.

46. Jacques Periaux, Mourad Sefrioui, and Bertrand Mantel. GA MultipleObjective Optimization Strategies for Electromagnetic Backscattering. InD. Quagliarella, J. Periaux, C. Poloni, and G. Winter, editors, Genetic Algo-rithms and Evolution Strategies in Engineering and Computer Science. Re-cent Advances and Industrial Applications, chapter 11, pages 225-243. JohnWiley and Sons, West Sussex, England, 1997.

47. R. S. Rosenberg. Simulation of genetic populations with biochemical proper-ties. PhD thesis, University of Michigan, Ann Harbor, Michigan, 1967.

48. J. David Schaffer. Multiple Objective Optimization with Vector Evaluated Ge-netic Algorithms. PhD thesis, Vanderbilt University, 1984.

49. J. David Schaffer. Multiple Objective Optimization with Vector EvaluatedGenetic Algorithms. In Genetic Algorithms and their Applications: Proceed-ings of the First International Conference on Genetic Algorithms, pages 93-


100. Lawrence Erlbaum, 1985.50. Jason R. Schott. Fault Tolerant Design Using Single and Multicriteria Ge-

netic Algorithm Optimization. Master's thesis, Department of Aeronauticsand Astronautics, Massachusetts Institute of Technology, Cambridge, Mas-sachusetts, May 1995.

51. K. J. Shaw and P. J. Fleming. Initial Study of Practical Multi-ObjectiveGenetic Algorithms for Scheduling the Production of Chilled Ready Meals.In Proceedings of Mendel'96, the 2nd International Mendel Conference onGenetic Algorithms, Brno, Czech Republic, September 1996.

52. Robert E. Smith, Bruce A. Dike, and S. A. Stegmann. Fitness Inheritance inGenetic Algorithms. In Proceedings of the 1995 ACM Symposium on AppliedComputing, pages 345-350, Nashville, Tennessee, USA, February 1995. ACM.

53. N. Srinivas and Kalyanmoy Deb. Multiobjective Optimization Using Non-dominated Sorting in Genetic Algorithms. Evolutionary Computation,2(3):221-248, Fall 1994.

54. Gregorio Toscano Pulido and Carlos A. Coello Coello. Using Clustering Tech-niques to Improve the Performance of a Particle Swarm Optimizer. In Kalyan-moy Deb et al., editor, Genetic and Evolutionary Computation-GECCO2004- Proceedings of the Genetic and Evolutionary Computation Conference,pages 225-237, Seattle, Washington, USA, June 2004. Springer-Verlag, Lec-ture Notes in Computer Science Vol. 3102.

55. David A. Van Veldhuizen. Multiobjective Evolutionary Algorithms: Classifica-tions, Analyses, and New Innovations. PhD thesis, Department of Electricaland Computer Engineering. Graduate School of Engineering. Air Force In-stitute of Technology, Wright-Patterson AFB, Ohio, May 1999.

56. David A. Van Veldhuizen and Gary B. Lamont. Evolutionary Computationand Convergence to a Pareto Front. In John R. Koza, editor, Late BreakingPapers at the Genetic Programming 1998 Conference, pages 221-228, Stan-ford University, California, July 1998. Stanford University Bookstore.

57. David A. Van Veldhuizen and Gary B. Lamont. On Measuring Multiobjec-tive Evolutionary Algorithm Performance. In 2000 Congress on EvolutionaryComputation, volume 1, pages 204-211, Piscataway, New Jersey, July 2000.IEEE Service Center.

58. Shinya Watanabe, Tomoyuki Hiroyasu, and Mitsunori Miki. NeighborhoodCultivation Genetic Algorithm for Multi-Objective Optimization Problems.In Lipo Wang, Kay Chen Tan, Takeshi Furuhashi, Jong-Hwan Kim, and XinYao, editors, Proceedings of the l^th Asia-Pacific Conference on SimulatedEvolution and Learning (SEAL'02), volume 1, pages 198-202, Orchid Coun-try Club, Singapore, November 2002. Nanyang Technical University.

59. R. S. Zebulum, M. A. Pacheco, and M. Vellasco. A multi-objective optimi-sation methodology applied to the synthesis of low-power operational ampli-fiers. In Ivan Jorge Cheuri and Carlos Alberto dos Reis Filho, editors, Pro-ceedings of the XIII International Conference in Microelectronics and Pack-aging, volume 1, pages 264-271, Curitiba, Brazil, August 1998.

60. Eckart Zitzler, Kalyanmoy Deb, and Lothar Thiele. Comparison of Multiob-jective Evolutionary Algorithms: Empirical Results. Evolutionary Computa-


tion, 8(2):173-195, Summer 2000.61. Eckart Zitzler, Marco Laumanns, and Stefan Bleuler. A Tutorial on Evolu-

tionary Multiobjective Optimization. In Xavier Gandibleux, Marc Sevaux,Kenneth Sorensen, and Vincent T'kindt, editors, Metaheuristics for Multi-objective Optimisation, pages 3-37, Berlin, 2004. Springer. Lecture Notes inEconomics and Mathematical Systems Vol. 535.

62. Eckart Zitzler, Marco Laumanns, and Lothar Thiele. SPEA2: Improving theStrength Pareto Evolutionary Algorithm. In K. Giannakoglou, D. Tsahalis,J. Periaux, P. Papailou, and T. Fogarty, editors, EUROGEN 2001. Evolu-tionary Methods for Design, Optimization and Control with Applications toIndustrial Problems, Athens, Greece, September 2001.

63. Eckart Zitzler and Lothar Thiele. An Evolutionary Algorithm for Multiob-jective Optimization: The Strength Pareto Approach. Technical Report 43,Computer Engineering and Communication Networks Lab (TIK), Swiss Fed-eral Institute of Technology (ETH), Zurich, Switzerland, May 1998.

64. Eckart Zitzler and Lothar Thiele. Multiobjective Optimization Using Evolu-tionary Algorithms—A Comparative Study. In A. E. Eiben, editor, ParallelProblem Solving from Nature V, pages 292-301, Amsterdam, September 1998.Springer-Verlag.

65. Eckart Zitzler and Lothar Thiele. Multiobjective Evolutionary Algorithms:A Comparative Case Study and the Strength Pareto Approach. IEEE Trans-actions on Evolutionary Computation, 3(4):257-271, November 1999.

66. Eckart Zitzler, Lothar Thiele, Marco Laumanns, Carlos M. Fonseca, and Vi-viane Grunert da Fonseca. Performance Assessment of Multiobjective Opti-mizers: An Analysis and Review. IEEE Transactions on Evolutionary Com-putation, 7(2):117-132, April 2003.

67. Jesse Zydallis. Explicit Building-Block Multiobjective Genetic Algorithms:Theory, Analysis, and Development. PhD thesis, Air Force Institute of Tech-nology, Wright Patterson AFB, OH, March 2003.

68. Jesse B. Zydallis, David A. Van Veldhuizen, and Gary B. Lamont. A Sta-tistical Comparison of Multiobjective Evolutionary Algorithms Includingthe MOMGA-II. In Eckart Zitzler, Kalyanmoy Deb, Lothar Thiele, CarlosA. Coello Coello, and David Corne, editors, First International Conference onEvolutionary Multi-Criterion Optimization, pages 226-240. Springer-Verlag.Lecture Notes in Computer Science No. 1993, 2001.

CHAPTER 2

APPLICATIONS OF MULTI-OBJECTIVE EVOLUTIONARYALGORITHMS IN ENGINEERING DESIGN

Tapabrata Ray

Temasek Laboratories, National University of Singapore5 Sports Drive 2, Singapore 117508


Engineering design is a multidisciplinary and multifaceted activity thatrequires simultaneous consideration of various design requirements andresource constraints. Such problems are inherently multiobjective in na-ture and involve highly nonlinear objectives and constraints often withfunctional and slope discontinuity that limits the effective use of gra-dient based optimization methods for their solution. Furthermore, inabsence of preference information among the objectives, the goal in amulti-objective optimization problem is to arrive at a set of Pareto opti-mal designs. Evolutionary algorithms are particularly attractive in solv-ing such problems as they are essentially stochastic, zero-order methodswhich maintain a set of solutions as a population and improves them overgenerations. In order for an optimization algorithm to be an effectivedesign tool, it should be computationally efficient and easy to use withminimal number of user inputs. A number of engineering design opti-mization examples are presented here and solved using a multi-objectiveevolutionary algorithm. The examples clearly demonstrate the benefitsoffered by multi-objective optimization and highlight the key features ofthe evolutionary algorithm.

2.1. Introduction

Real life problems in design optimization involve maximization or mini-mization of multiple objectives most of which are often in conflict. Unlikea single objective optimization problem where the aim is to find the bestsolution which is often unique, the aim in a multiple objective optimizationproblem is to arrive at a set of Pareto optimal designs. A design x* £ Fis termed Pareto optimal if there does not exist another x G F, such that

29

30 T. Ray

fi(x) < / J (X*) for alH = 1,. . . , k objectives and fj(x) < /j(x*) for at leastone j . Here, F denotes the feasible space (i.e., regions where the constraintsare satisfied) and fj (x) denotes the j t h objective corresponding to the de-sign x. If the design space is limited to M solutions instead of the entire F,the set of solutions are termed as nondominated solutions. Since in practice,all the solutions in F cannot be evaluated exhaustively, the goal of multi-objective optimization is to arrive at the set nondominated solutions witha hope that it is sufficiently close to the set of Pareto solutions. Diversityamong these set of solutions is also a desirable feature as it means makinga selection from a wider set of design alternatives.

Classical gradient based methods are not efficient for multiobjectiveproblems as they often lead to a single solution instead of a set of nondom-inated solutions. Multiple runs also cannot guarantee to reach a differentnondominated solution each time. Population based methods like the evo-lutionary algorithm is particularly attractive for such classes of problemsas they maintain a set of solutions as a population and improves themover time. Thus, they are capable of arriving at the set of nondominatedsolutions in a single run.

The objectives and constraint functions of a design optimization prob-lem are known to be highly nonlinear and computationally expensive toevaluate. These functions often possess functional and slope discontinuitythat limits the efficient use of classical gradient based optimization methods.Zero order methods like the evolutionary algorithm and its variants have anedge over the other gradient based approaches in this respect. Evolutionaryalgorithms require evaluation of numerous designs at every generation andhence the total computational time required to solve a design optimizationproblem is usually high. This is a cause of concern for real life applicationsand multiple processors and novel learning schemes are typically employedto contain the total computational time within affordable limits.

Another important feature of a design optimization problem is the pres-ence of a large number of constraints. Such constraints typically arise out ofdesigner's preferences, resource limitations, physical laws and performanceor statutory requirements. The presence of constraints is known to sig-nificantly affect the performance of all optimization algorithms, includingevolutionary search methods. There have been a number of approaches tohandle constraints in the domain of mathematical programming includingpenalty functions and their variants, repair methods, use of decoders, sep-arate treatment of constraints and objectives and hybrid methods incorpo-rating constraint satisfaction methods. An excellent description of various

Applications of Multi-Objective Evolutionary Algorithms in Engineering Design 31

constraint handling schemes appear in Michalewicz and Schoenauer1 andCoello2. An ideal constraint handling scheme should not require additionaluser inputs and preferably avoid scaling and aggregation of constraint viola-tions while at the same time make the best use of all computed information.

The variables of a design optimization problem usually have an under-lying physical significance and hence their range of variation can be decidedapriori. The total number of variables for a design problem is usually large,some of which assume continuous, integer or discrete values. This meansthat an optimization algorithm for engineering design should be able todeal with mixed variables and its performance should not degrade largelywith an increase in the problem size.

Various aspects of engineering design optimization problem have beendiscussed by Rao3 and Deb4. Both of these texts focus more on modelingthe problem and subsequently discuss about the gradient based methodsfor their solution. A comprehensive discussion on various multiobjective op-timization techniques are presented by Deb5 and Coello et al.6. Once againthese texts discuss more on various mechanisms within the multiobjectivealgorithm and their effects on the solution.

The above discussion provides an overview of design optimization prob-lems in general and outlines some of the features that an optimizationalgorithm should posses to effectively and efficiently solve these classes ofproblems. Section 2 provides the motivation and necessary details of theevolutionary algorithm that has been used in this study. Three design ex-amples are discussed in details in Section 3 while Section 4 summarizes andlists the major conclusions.

2.2. Multi-Objective Evolutionary Algorithm

The evolutionary algorithm presented in this text is designed to effectivelyand efficiently solve constrained, multiobjective problems from the domainof engineering design. Unlike most of its counterparts, the algorithm handlesobjectives and constraints separately using two fitness measures. The fitnessmeasures are derived through nondominace and hence do not rely on scalingand aggregation of constraint violations or objectives. Fundamentally, thealgorithm is built upon the following generic notions:

• The algorithm drives the set of solutions towards feasibility first,before trying to improve an infeasible individuals' objective value.

• A feasible solution is preferred over an infeasible solution.• Between two feasible solutions, the one with a better nondominated

32 T. Ray

rank based on the objective matrix is preferred over the other.• Between two infeasible solutions, one with a lower non-dominated

rank based on the constraint matrix is preferred over the other.

Ray et al.7 first proposed the use of non-dominated rank of an individualto compare infeasible solutions. The Nondominated Sorting Genetic Algo-rithm (NSGA) introduced by Srinivas and Deb8 has been used in this studyto rank the individuals. Although the process of non-dominated sortingbased on a constraint or the objective matrix is computationally expensive,it certainly eliminates the need for scaling and weighting factors, which areotherwise required to derive a single scalar measure of fitness. Furthermore,the information of all constraint violations is used by the algorithm ratherthan an aggregate or only the maximum violation as used by most penaltyfunction based approaches. The details of the algorithm are explained inthe context of a multi-objective, constrained minimization problem.

Minimize:

/ = [/i(x)/2(x)---/m(x)]. (1)

Subject to:

# ( x ) >ait i = l,2,...,q. (2)

hj(x)=bj,j = l,2,...,r. (3)

Where there are q inequality and r equality constraints and x =[xi x2 • • • xn] is the vector of n design variables. It is a common practice totransform the equality constraints (with a tolerance 8) to a set of inequali-ties and use a unified formulation for all constraints: —hj(x) > —bj—5 andhj(x) > bj — 8. Thus r equality constraints will give rise to 2r inequalities,and the total number of inequalities for the problem is denoted by s, wheres = q + 2r. For each individual, c denotes the constraint satisfaction vectorgiven by c — [ciC2 . . . cs] where

{0 if satisfied, i = 1,2,..., s

a»-3i(x) if violated, i - 1,2,..., qbi — 8 - / i , ( x ) if v i o l a t e d , i = q + l,q + 2,... ,q + r—bi — 8 + hi(x) if v i o l a t e d , i = q + r + l,q + r + 2,... , s .

For the above Cj's, CJ = 0 indicates the ith constraint is satisfied, whereasOil 0 indicates the violation of the constraint. The CONSTRAINT matrixfor a population of M individuals assumes the form

(4)


" Cn C12 • • ' Cu '

C21 C 2 2 • • • C 2 s

C O N S T R A I N T = . . . . ( 5 )

. CMl CM2 ' " ' cMs .

The objective matrix assumes the form

/11 /12 • • • fik

/21 /22 • • ' /2fcO B J E C T I V E = . . . . ( 6 )

. /Ml /M2 • ' • ftik .

In a population of M individuals, all nondominated individuals are as-signed a rank of 1. The rank 1 individuals are removed from the populationand the new set of nondominated individuals is assigned a rank of 2. Theprocess is continued until every individual in the population is assigned arank. Rank = 1 in the objective or the constraint matrix indicates that theindividual is nondominated. It can be observed from the constraint matrixthat when all the individuals in the population are infeasible, the Rank = 1solutions are the best in terms of minimal constraint violation. Wheneverthere is one or more feasible individuals in the population, the feasible so-lutions assume the rank of 1. The pseudo code of the algorithm is presentedbelow.

2.2.1. Algorithms

(1) i < - 0 .(2) Generate M individuals representing a population:

Pop(t) — I\,..., IM uniformly in the parametric space.(3) Evaluate each individual: Compute the objective and constraints

i.e., fk{h) and Cj(Ii); for i = 1,2, . . . , M individuals, A; =1,2,... ,P objectives and j = 1,2,... ,S constraints.

(4) Identify Elites: E(t) C Pop(t); where E(t) is the set of Elites.The remaining set of individuals are referred as R(t) such thatPop(t) =E(t)UR(t).

(5) Preserve the Elites:

34 T. Ray

Pop{t + 1) <-</>.

Pop{t + 1) 4- Pop(t + 1) U E(t).

(6) Select a Partner for Mating: For each parent Ij £ R{t), selectits partner from E(t).

(7) Mate Two Parents to Create a Child Ij.Pop(t + 1) <- Pop(£ + 1) U 7].

(8) ti-t + 1.(9) If £ < Tmax then go to 3, Else Stop.

The pseudo code of the Elite Identification process is as follows:

(1) Compute the non-dominated rank of every solution: RCi based onthe Constraint matrix and ROi based on the Objective Matrix.

(2) Set of Elites E(t) € Pop(t) is formed via the following steps;

E{t) <r- 4>.

E{t) <-It: If RQ = 1.

If size of E(t) > M/2 and size of E(t) < M

(i) E(t) «- 0.(ii) E(t) <- If. if ROi < jf E j l i ROj.

Else if size of E{t) < M/2

(i) E(t) 4- <A.(ii) E W ^ / i i i f i J C i ^ ^ E j i ^ .

Else if size of E(t) = M and ^ ^ ^ x i?O^ = 1

(i) E(i) 4- <t>.(ii) JS(t) <- /<: if COi < i E j l i COj and

CVi > ^ E ^ i ^î or vice versa.

Where CVi is the distance of the closest neighbor of the ith individualin the variable space and CO, is the distance of the closest neighbor of theiih individual in the objective space. These CVi and COi's are tranformedto ranks. When all the solutions of the population turn out to be nondomi-nated, ones that have close neighbors in both the variable and the objectivespace are dropped from the list of elites. This process is necessary to create


the room for solutions that are diverse in both parametric and objectivespace.

The pseudo code of the Partner Selection process is as follows:

(1) If Number of Feasible Solutions = 0: An individual is selected fromthe Elite List using a Roulette wheel selection based on constraintranks of the Elites.

(2) If Number of objectives > 1 and Number of Feasible Solutions> 0: An individual is selected from the Elite List using a Roulettewheel selection based on its the crowding rank in the objectivespace (COi).

(3) If Number of objectives = 1 and Number of Feasible Solutions > 0:An individual is selected from the Elite List using a Roulette wheelselection based on objective ranks of the Elites.

The pseudo code the Crossover operator is as follows:

(1) Scale every variable between 0 — 1 using the maximum and mini-mum value of variables in the population.

(2) D = E ^ = I ( 7 P I - JF2)2; 3 = !. 2> • • • >N variables; PP1 denotes thej t h variable of the Parent (PI) and IP2 denotes the j t h variable ofthe Parent (P2).

(3) P is the parent that is randomly chosen between PI and P2. C(i) =P(i) + N(fj, = 0,(T) • D; where a is the variance of the normaldistribution, i — 1,...,N variables, a = 1 has been used in thisstudy.

(4) Transform C(i)'s back to original scale to get the new child.

The algorithm presented above, attempts to improve the performanceof all individuals of a population unlike some evolutionary models whereonly the good parents participate in mating. This behavior on one handleads to expensive evaluations, while on the other provides scope for awider exploration that is useful for problems that are highly nonlinear. Thegreedy element of the algorithm arises from the stringent elite selectionprocedure and the crossover operator that explores the neighborhood ofthe elites. The algorithm is particularly attractive as a design tool, sinceit does not use scaling and aggregation of constraints and objectives, and

36 T. Ray

does not require additional inputs unlike most of its counterparts. Thus, itallows the designer to solve the actual design problem and not its modifiedform driven by the scope of the optimization algorithm. Although the useof nondominated sorting to handle constraints is an expensive operation, itis meaningful for problems where the objective and the constraint functionsare equally or even more expensive.

2.3. Examples

Three different classes of engineering design problems are presented andsolved in this section. The first example is a well studied multiobjective,constrained optimization problem that aims to design a welded beam withminimum cost and minimum end deflection. This is an example that allowscomparison of the evolutionary algorithm with its counterparts. The sec-ond example deals with the preliminary design of a bulk carrier. It is aninteresting example which is used here to illustrate the concepts of prob-lem specific initialization and crossover design for efficient constraint han-dling. The third example deals with a robust airfoil design problem with anequality constraint. This is a typical shape optimization problem where thedesign variables are shape parameters. The concept of hybridization andsolution repair is presented as a means to effectively deal with the equalityconstraint of the problem.

2.3.1. Design of a Welded Beam

This problem has been well studied in the context of single objective op-timization. A multiobjective formulation is presented here which aims tominimize the cost of the beam and also aims to minimize its maximum de-flection subject to constraints on shear stress, bending stress and bucklingload. There are four continuous design variables h, l,t and b that corre-spond to a;i,X2,X3 and 24 and are shown in Figure 2.1. It is evident thatminimization of cost will lead to smaller dimensions of the beam. When thebeam dimensions get smaller, the end deflection gets bigger and thus theconflicting objectives interact with each other to yield the set of nondomi-nated solutions. The mathematical formulation of the problem is presentedbelow.

Minimize:

h (x) = 1.10471^X2 + 0.04811z3:r4(14.0 + x2). (7)

/2(x)=<5(x). (8)


Subject to:

r(x) - Tmax < 0- (9)

cr(x) - amax < 0. (10)

%\ - x4 < 0. (11)

P-Pc(x)<0. (12)

r(x) = ^{r'Y + 2^pt + (T«)2. (13)

\/2a;i:r2

, " = ^ . (15)

M = P(L + f). (16)

fl = 7^T^7 (17)

'« = S" <19)

aW = S ' (20)4.013\/ 5#- / r, B\

P-">= ^ ('-gVicJ- (2I)Where P = 60001b, L = 14,(5max = 0.25in, £ = 30xl06psi, G = 12xl06psi,Tma:c = 13,600psi, amax = 30,000psi, 0.125 < xi < 10.0, 0.1 < x2 < 10.0,0.1 <x3< 10.0 and 0.125 < xA < 10.0.

(14)

(18)

38 T. Ray

Fig. 2.1. Variables of the Welded Beam Design Problem.

Fig. 2.2. Nondominated Front of the Welded Beam Design Problem.


Fig. 2.3. Progress of the Optimization Algorithm for the Welded Beam Design Problem.

A population size of 100 has been used in this study and the algorithmhas been allowed to evaluate a maximum of 20,000 designs. The nondomi-nated front is presented in Figure 2.2 while the progress of the optimizationalgorithm is presented in Figure 2.3. The final population has 85 feasiblesolutions out of which 50 are on the nondominated front. The algorithmevolved over 850 generations. The minimum cost and the minimum enddeflection designs are presented in Table 2.1.

Table 2.1. Minimum Cost and Minimum End DeflectionDesigns.

Cost Deflection xi X2 £3 X4

3.632894 0.013561 0.4532 3.71 7.0297 0.46655.201977 0.000446 0.9366 6.5903 9.9965 4.9296

The same multiobjective version of this welded beam design problemhas been solved by Deb9using a genetic algorithm coupled with a simulatedbinary crossover operator and by Ray and Liew10 using their swarm algo-rithm. The results obtained in this study are similar to those reported in

40 T. Ray

the abovementioned references.

2.3.2. Preliminary Design of Bulk Carrier

The second example is bulk carrier design problem that was originally pro-posed by Sen and Yang11. Sen and Yang11 considered the minimizationof transportation cost, minimization of lightship mass and the maximiza-tion of annual cargo transport capacity as three objectives for the design.The formulation presented here considers the minimization of transporta-tion cost and the maximization of annual cargo transport capacity as twoobjectives for the design optimization problem. There are 6 continuous vari-ables (L, T, D,CB,B and V). The bounds for the variables are 0 < L < 500,0 < T < 50, 0 < D < 50, 0.63 <CB < 0.75, 0 < B < 100 and 14 < V < 18.The formulation of the problem is presented below:

Minimize:

A COSTTransportation Cost = . (22)

Maximize:

Annual Cargo Transport Capacity = C-DWT x RTPA. (23)

Subject to:

3,000 < £ W < 500,000. (24)

JtfP ± °-32- (25)

T<0.45x.CW0-31 . (26)

T < 0.7 x D + 0.7. (27)

| > 6- (28)

§ < 15- (29)

\ < 19- (30)

GM > 0.07 x B. (31)


AJCOST = CAP-COST + RUN.COST + VOY.COST x RTPA. (32)

CAP.COST = 0.2 x SHIP.COST. (33)

RUN.COST = 40,000 x DWT03. (34)

VOY.COST = F.COST + P.COST. (35)

F.COST = 1.05 x DC x SD x FP. (36)

DC = P x 0.19x24/1000 + 0.2. (37)

£WT = DISP - LSM. (38)

LSM = STEEL-MASS + OUT.MASS + MAC.MASS. (39)

STEEL-MASS = 0.034 x L17 x S 0 7 x D0A x C^5. (40)

OUT.MASS = L08 x 5 0 6 x D03 x C&1. (41)

DISP = LxBxTxCBx 1.025. (42)

Ci = -10847.2 x CB2 + 12817 x CB - 6960.32. (43)

C2 = 4977.06 x CB2 - 8105.61 x CB + 4465.51. (44)

P = DISPV* x V3 x c * (45)(SL)°-5 + O i

M^C,MA5r5 = 0.17 x P 0 9 . (46)

SHIP.COST = 1.3(2000 x (STEEL-MASS)°-85+3500 x (Of/T-MASS) + 2400 x Pos). ^ '

48

49

50

42 T. Ray

C-DWT = DWT -FL- CSW. (51)

FL = DC x (SD + 5). (52)

C5VF = 2 x DWT05. (53)

PC = 6.3 x DWT08. (54)

G M = 0.53 x T + (0-085 x C B - 0 . 0 0 2 ) ^ _ 1 Q _ Q g 2 x D ( 5 g )

i x GB

Where .RTM = 5,000 nautical miles, FP = 100 pounds/tonne,C-RATE = 8,000 tonnes/day, and g = 9.8065 m/s2.


List of SymbolsA.CARGO Annual cargo carrying capacity (tonnes/year).A-COST Annual cost (pounds/year).B Breadth of ship (m).C-DWT Cargo deadweight (tonnes).C-RATE Cargo handling rate (tonnes/day).Cl, C2 Coefficients.CAP.COST Capital cost (pounds/year).CB Block coefficient.CSW Weight of crew, stores and water (tonnes).D Depth of the ship (m).DC Daily consumption of fuel (tonnes/day).DISP Displacement (tonnes).DWT Deadweight (tonnes).F-COST Fuel cost (pounds).FL Fuel carried (tonnes).FP Fuel price (pounds/tonne).g Acceleration due to gravity (m/s2).GM Metacentric height (m).L Length of the ship (m).LSM Lightship mass (tonnes).MAC-MASS Machinery mass (tonnes).OUT-MASS Outfit mass (tonnes).P Shaft power (HP).P-COST Port cost (pounds).RTM Round trip (miles).RTPA Number of round trips per year.RUN-COST Running cost (pounds/year).SD Number of sea days per year.SHIP-COST Cost of ship (pounds).STEEL-MASS Steel mass (tonnes).T Draft of the ship (m).V Speed of the ship (knots).VOY-COST Voyage cost (pounds/voyage).

This example is interesting as one can observe the following:

(1) Objective 1 needs to be minimized while objective 2 needs to bemaximized. Since maximization of / is equivalent to minimizationof - / , the problem is transformed as minimization of transporta-

44 T. Ray

tion cost and minimization of (-l)xannual cargo transport capac-ity.

(2) Since the constraints presented in Eqn 25-Eqn 30 deal with thedesign variables directly, it is meaningful to design specific initial-ization and crossover mechanisms to ensure feasibility. This is animportant aspect of a real life problem modeling which is less talkedabout. The following simple strategy is used in this study for ini-tialization:

(a) Randomly create variables x3 and x5 using their upper andlower bounds.

(b) Generate Xo using upper bound of Xo and lower bound com-puted using Eqn 25 and x$.

(c) Generate X4 using upper bound of X4 and lower bound com-puted using Eqn 28 and XQ.

(d) Generate X2 using upper bound of X2 and lower bound com-puted using Eqn 29 and XQ.

(e) Generate x\ using lower bound of XQ and lower bound com-puted using Eqn 30 while the upper bound is computed usingX2 and Eqn 27.

(f) Repeat steps (a)-(e) to create an individual if Eqn 24 is notsatisfied.

The crossover mechanism is also designed along similar lines where thevariables £3 and x§ are inherited from parents PI and P2 and the rest of thesteps are similar. A population size of 100 has been used in this study andthe algorithm has been allowed to evaluate a maximum of 20,000 designs.The nondominated front is presented in Figure 2.4 and the progress ofthe optimization algorithm is presented in Figure 2.5. The final populationhas 62 feasible solutions out of which 36 are on the nondominated front.The algorithm evolved over 328 generations. The final set of nondominatedsolutions is presented in Table 2.2.

This example highlights how to design a problem specific initializationand a crossover scheme to deal with constraints. The only constraint aspresented in Eqn 31 is handled as an explicit constraint while all others aretaken care by the initialization and the crossover operator. Such a schemeensures feasibility of every design that is generated and evaluated duringthe course of search.


Fig. 2.4. Nondominated Solutions for the Bulk Carrier Design Example.

Fig. 2.5. Progress of the Optimization Algorithm for the Bulk Carrier Design Example.

46 T. Ray

Table 2.2. Nondominated Solutions.

L T D CB B V

265.0703 15.1403 21.7275 0.7334 34.4538 16.3101327.4603 18.5278 26.4749 0.7499 41.8770 17.9999435.3247 26.2185 42.6520 0.7499 66.0194 17.9999326.2294 18.5649 26.9896 0.7499 41.5569 17.9999459.9728 24.3553 47.8616 0.7499 69.1481 17.9999329.4331 18.5606 26.5699 0.7499 41.9282 17.9999308.8997 16.5381 23.1810 0.7499 36.9305 17.3352327.0127 17.4201 24.1052 0.7499 36.8523 17.9999382.6671 23.9601 39.6441 0.7499 61.7968 17.9999324.4934 20.4814 32.9082 0.7499 51.3516 17.9579333.7987 17.7141 23.6796 0.7499 37.4533 17.9999325.9471 17.1620 23.5674 0.7499 35.9492 17.9999325.9119 17.4712 24.3110 0.7499 38.2992 17.9999316.8036 17.1316 23.8424 0.7499 36.7042 17.2910325.3925 17.9019 25.2768 0.7499 40.3176 17.9999322.6850 20.2319 32.1511 0.7499 50.1056 17.9996322.7691 17.6337 24.7932 0.7499 39.3969 17.9999331.4340 20.9741 34.0957 0.7499 53.2014 17.9999314.8510 18.1938 27.8552 0.7499 42.1150 17.5574334.6981 18.2687 25.2962 0.7499 40.0075 17.9555364.3878 22.9444 37.4294 0.7499 58.7262 17.9999324.4286 17.7842 25.1372 0.7499 40.0405 17.9999332.6913 21.1350 36.6322 0.7499 53.6001 17.9999286.9063 15.1007 20.5941 0.7499 32.1932 16.6399323.1446 17.0824 24.0715 0.7499 38.1472 17.9999334.4517 18.9468 27.9699 0.7499 43.2734 17.9999335.6447 21.3705 35.9064 0.7499 55.0913 17.9999324.3072 17.2151 24.2747 0.7499 38.2882 17.9999346.2387 21.0877 33.6655 0.7499 51.9920 17.9999324.9371 17.1112 23.4540 0.7499 36.9022 17.9999305.1605 16.4044 22.5431 0.7499 35.3416 17.2129389.8927 23.1705 36.0575 0.7499 56.1758 17.9999425.9697 26.0540 45.0071 0.7499 68.1254 17.9999330.7853 20.4269 34.0816 0.7499 52.4444 17.9999455.3675 24.2495 39.5678 0.7499 58.2831 17.9999422.5550 22.7782 42.1548 0.7499 63.6935 17.9999

2.3.3. Design of Robust Airfoil

The third example deals with the design of robust airfoils. Unlike the firsttwo design problems where the variables are directly related to the objectiveand constraints, this is a shape optimization problem where the objectivesand constraints are dependent on the shape which in turn is controlled bythe design variables. The aim is to design an airfoil shape that generates


Table 2.3. Extreme Designs.

Min. Transport Cost Max. Annual Cargo TransportDesign Capacity Design

Transportation Cost 6.211 12.540Cargo Transport Amount 853027.78 1271678.87

L 265.0703 459.9728T 15.1403 24.3553D 21.7275 47.8616

CB 0.7334 0.7499B 34.4538 69.1481V 16.3101 17.9999

a lift coefficient CL = 0.7 at a designed Mach number (M) of 0.73 withan angle of attack between 1 and 3 degrees. The drag coefficient and thepitching moment coefficient needs to be minimized at the operating pointand its 4 neighbors. The formulation of the design problem is presented asfollows:

Minimize:

/i(x) = \ [CD(M - 0.05, a) + CD(M + 0.05, a)+CD(M, a - 0.5) + CD(M, a + 0.5) + CD(M,a)]. [ '

/2(x) = | [CM(M - 0.05,a)2 + CM(M + 0.05, a)2

+CM(M,a - 0.5)2 + CM{M,a + 0.5)2 + CM(M,a)2}. [ }

Subject to:

CL = 0.7 at (M, a). (58)

Unlike examples 1 and 2, where the number and nature of the vari-ables are fixed, the number of variables of an airfoil optimization problemis influenced by choice of the airfoil representation scheme. A PARSEC rep-resentation has been used in this study. The shape representation schemedictates the number of variables for the optimization problem and also therange of shapes that can evolve using the scheme.

The PARSEC representation scheme parameterizes the upper and thelower airfoil surfaces using polynomials in coordinates X and Z as:

6

Z=Y,anX(n-il (59)n=l

where an are real coefficients. The parameters of a PARSEC represen-tation are the leading-edge radius (rie), upper and lower crest heights (ZTJP,

48 T. Ray

Fig. 2.6. Variables in the PARSEC Representation Scheme.

Table 2.4. Upper and Lower Bounds of theVariables.

Variable Name Upper Bound Lower Bound

XU P 0.5 0.3Zup 0.075 0.05Z T E 0 0rle 0.0085 0.0055

ZXXUP -0.4 -0.6

QTE - 8 -12XLo 0.42 0.28ZLo -0.050 -0.075

ZXXLO 0.85 0.55

/ 3 T E -9 .5 -14.5A Z T E 0 0

ZLO) and location (Xyp, XLO), curvatures at the upper and lower crests(ZXXUP, ZXXLO), trailing-edge thickness (AZTE) and ordinate (ZTE), direc-tion and wedge angle (axE, /?TE)- The parameters are schematically shownin Figure 2.6 while the variable bounds used in this study are listed inTable 2.4. In this study an Euler code has been used to compute the flowaround the airfoil. This problem is particularly interesting as it involves anequality constraint. Handling equality constraints is known to be difficultusing zero order stochastic algorithms like the genetic and evolutionaryalgorithms. A hybrid scheme is employed here to deal with the equalityconstraint which is also referred as a solution repair scheme. The following


flowchart outlines the concept of repair.

Generate a Random Airfoil Shape Using the PARSECVariable Bounds.

v

Assume an Angle of Attack (a) within the AllowableBounds.

y

Initial Solution: Angle of Attack (a) and the PARSECVariables

V

Compute the Performance of the Airfoil at the Angle ofAttack (a) and the Given Mach Number (M).

i'_

Minimize: (Ct-0.7)2 using Marquardt Levenberg to arrive atthe Optimum Angle of Attack (a*) for the Given Mach

Number (M) and the PARSEC variables.

\r

Repaired Solution: Angle of Attack (a*) and the PARSECVariables that Satisfy the Equality Constraint

Fig. 2.7. Solution Repair Mechanism.

The results are presented in Figure 2.8. One can observe that the ini-tial set of solutions clearly migrated towards the nondominated front. Theperformance of the final set of designs at the operating point (M, a) alongwith their average performances of the initial and final set of solutions arepresented in Figure 2.8. The study was conducted using 30 initial solutionswhich were allowed to evolve over 10 generations. The optimization processevaluated 128 distinct airfoil shapes and required a total of 36,503 GFDcomputations (to iteratively satisfy the CL constraint). The designed an-gle of attack of the nondominated designs varied between 1.32 and 2.98degrees.

50 T. Ray

Fig. 2.8. Nondominated Solutions for the Robust Airfoil Design Problem.

2.4. Summary and Conclusions

Engineering design in reality is a multidisciplinary task and inherently re-quires the solution of a multiobjective constrained optimization problem.The objectives and constraints are highly nonlinear and computationally ex-pensive to evaluate. Stochastic algorithms like the evolutionary algorithmsare ideally suited to solve such problems as they are zero order, populationbased optimization strategies that can arrive at the desired set of nondom-inated solutions in a single run.

The evolutionary algorithm presented in Section 2.2 is capable of han-dling unconstrained and constrained multiobjective problems with mixedvariables without having any restriction on the number of variables, con-straints or objectives. The use of nondominance to handle constraints elim-inates the problem of scaling and aggregation at the expense of nondom-inated sorting. Nondominated sorting is a computationally expensive pro-cess. However, for engineering design problems, it is meaningful to makeuse of all computed information to better guide the search, as objective andconstraint evaluations are far more expensive. Furthermore, such a schemeimproves the usability of the algorithm as a designer does not need to spec-ify additional parameters for scaling and aggregating constraint violations.


The evolutionary algorithm presented in this study employs a stringent,adaptive, elite preservation strategy which ensures good solutions are neverlost during the course of search. Unlike, most evolutionary algorithms whereonly the good parents participate in mating, this algorithm ensures that allsolutions participate in mating which is useful for exploring highly nonlin-ear search spaces. The diversity of the solutions is controlled by the partnerselection scheme that relies on a crowding measure. The adaptive crowdingmeasure ensures that elites with distant neighbors are preferred over eliteswith close neighbors.

The design examples presented in this study are carefully chosen toillustrate various aspects of engineering design optimization. The first ex-ample of welded beam design is used to illustrate the basic behavior of theevolutionary algorithm and to instill the confidence that its performanceis comparable with other existing stochastic techniques. The bulk carrierdesign example outlines the steps involved in designing a problem specificinitialization and a crossover scheme to ensure that only feasible solutionsare created and evaluated. It also highlights the fact that for some prob-lems, constraint satisfaction can be ensured using such schemes. The thirdexample deals with a shape optimization problem where the design vari-ables are shape variables. The problem is also interesting as it involves anequality constraint. The concept of solution repair using a gradient basedtechnique is presented. Handling equality constraints within an evolution-ary algorithm is known to be nontrivial and the above hybrid scheme is apossible alternative to handle such equalities.

It is interesting to observe that in all the design examples, multiobjec-tive optimization resulted in a set of competitive solutions. The solutionsare parametrically diverse and their range of performance also varied sig-nificantly. Typically, a few representatitive solutions are selected from thesenondominated solutions and additional selection criteria are introduced inorder to select a single design for possible implementation.

We are currently using multiobjective evolutionary algorithms to solvea wide range of problems which include dielectric filter design, Yagi-Udaantenna design, conceptual configuration design of aircrafts, aircraft intakeshape optimization and redesign of wings and airfoils. We are also workingtowards the use of surrogate assisted models for real life multiobjectivedesign optimization problems.

52 T. Ray

References

1. Z. Michalewicz, M. Schoenauer, Evolutionary Algorithms for ConstrainedParameter Optimization Problems, Evolutionary Computation, 4(1) 1(1996).

2. C. A. Coello Coello, Theoretical and Numerical Constraint Handling Tech-niques used with Evolutionary Algorithms: A Survey of the State of theArt, Computer Methods in Applied Mechanics and Engineering, 191(11-12) 1245 (2002).

3. S. S. Rao, Engineering Optimization: Theory and Practice, John Wiley hSons, 3 r d ed., (1996).

4. K. Deb, Optimization for Engineering Design: Algorithms and Examples,Prentice Hall, New Delhi, (1995).

5. K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms, JohnWiley & Sons, (2001).

6. C. A. Coello, D. A. Van Veldhuizen and G. B. Lamont, Evolutionary Algo-rithms for Solving Multi-Objective Problems, Kluwer Academic Publishers,New York, (2002).

7. T. Ray, K. Tai and K. C. Seow, An Evolutionary Algorithm for Multiobjec-tive Optimization, Engineering Optimization, 33(4), 399 (2001).

8. N. Srinivas and K. Deb, Multiobjective Optimization Using NondominatedSorting in Genetic Algorithms, Evolutionary Computation, 2(3), 221 (1994).

9. K. Deb, Evolutionary Multi-Criterion Optimization,In Evolutionary Algo-rithms in Engineering and Computer Science, Wiley & Sons, K. Miettinen,Pekka Neittaanmaki,M. M. Mkel and Jacques Priaux (Eds) 135 (1999).

10. T. Ray and K. M. Liew, A Swarm Metaphor for Multiobjective Optimiza-tion, Engineering Optimization, 34(2) 141 (2002).

11. P. Sen and J. B. Yang, Multiple Criteria Decision Support in EngineeringZ?e.si<7n,Springer-Verlag, London, (1998).

CHAPTER 3

OPTIMAL DESIGN OF INDUSTRIALELECTROMAGNETIC DEVICES: A MULTIOBJECTIVE

EVOLUTIONARY APPROACH

"Marco Farina and ''Paolo Di Barba8STMicroelectronics, SST Corporate R&D

Agrate Brianza (MI), ItalyE-mail: [email protected]

University of Pavia, Department of Electrical EngineeringPavia, Italy


3.1. Introduction

A wide variety of stochastic methods are available in literature for Paretomultiobjective optimization n'19>5>14>7 and a comprehensive comparativestudy and a review of the state-of-the-art is presented by e. g. Zitzler andThiele 20. All these algorithms have been developed for solving optimizationproblems where the cost of objective functions evaluation is moderate. Asa consequence the number of individuals used is usually very high becausethe majority of such methods derive from standard GA where the typicalindividuals number in the population size is 30 through 500. When dealingwith design optimization of electromagnetic devices 2>12 the evaluation ofthe objective functions usually requires a Finite Element Method (FEM)field computation that may be coupled and non-linear; typical duration ofsuch a computation even on powerful computers makes the use of thesemethods unpractical 13>15.

Let us consider as an example a real life case where one of the objectivesrequires a FEM based torque computation of 5 minutes; if the populationsize is 50 and 100 iteration are necessary for convergence we need at least25000 minutes which is an unaffordable time for industrial design timing.This is why, having discarded deterministic methods for the problem of local

53

54 M. Farina and P. Di Barba

minima, the development of stochastic methods for Pareto Optimizationthat reduces the number of objective function evaluations is necessary.

From a general point of view a wide variety of strategies for ParetoOptimal Front (POF) approximation can be considered 17>10>18>8. Wefocus our attention on cost effective algorithms, where the number ofobjective function calls is reduced at a given degree of POF approxi-mation accuracy (when Multiobjective Optimization is concerned, accu-racy=diversity+convergence). When building cost effective algorithms formultiobjective optimization the simplest idea would be to consider deter-ministic search algorithms on different scalarized formulations and thusconsider Non Pareto-sorting Algorithms (NPSA) where individuals do notinteract among each other during optimization. This is why a stochasticsearch engine is not strictly necessary (when the problems of local fronts isnot considered) for convergence towards the front as is the case for Paretosorting algorithms (PSA).

3.2. The Algorithms

Before going into details in the description of the developed algorithms,we give in Figure 3.1 a schematic classification of different algorithms thathave been developed and used in this work, in order to make the usedterminology clear since the beginning .

Cost-effectivePareto optimization algorithms

Deterministic search < ' • Stochastic search

I ' ' |'L ^llNon Pareto-sorting algorithms Pareto-sorting algorithms Non Pareto-sorting algorithms

I PGBA I | N S E S A | |MDESTRA| |PESTRA|

Fig. 3.1. Classification of developed algorithms. The meaning of acronyms is the fol-lowing: PGBA, Pareto Gradient Based Algorithm; NSESA, Non-dominated Sorting Evo-lution Strategy Algorithm; PESTRA, Pareto Evolution STRAtegy; MDESTRA, MultiDirection Evolution STRAtegy.

Design of Electromagnetic Devices: A Multiobjective Evolutionary Approach 55

3.2.1. Non-Dominated Sorting Evolution StrategyAlgorithm (NSESA)

The algorithm is derived from Srinivas and Deb's NSGA 16 from the pointof view of its general structure. The differences are mainly in the fitnessassignment strategy and in the evolutionary operators which are generation,mutation and annealing of a (1+1)-ES algorithm 1. Often the use of GA-based strategies is computationally unaffordable or highly unpractical froman industrial point of view. Therefore we have decided to adopt a (1+1)-ESalgorithm as the optimization engine of the multiobjective strategy because,in our experience, it is robust and provides good convergence even when fewindividuals are considered. It should be noted that the generation, mutationand annealing steps are implemented in parallel; this is possible because inour implementation individuals do not interact each other during the wholeprocess, apart from the steps of Pareto ranking and fitness evaluation.

As it can be seen from the flow-chart (see Figure 3.2) in the first stepof the algorithm we generate, in a random way, an initial population ofindividuals in the design domain search space. In the second step weclassify individuals into Pareto sets. This means that we first apply thedominance region criterion to the whole population and we thus collect allnon-dominated individuals in the first front, then we remove these indi-viduals from the population and we apply the same criterion in order toobtain the second front and so on. The third step consists of assigning afitness value to each individual; two criteria must be followed in this step:forcing convergence to the Pareto optimal set and forcing diversity amongsolutions.

In order to do this, the fitness value for each individual has to dependon the Pareto set which the individual belongs to and a sharing procedureis to be implemented in order to favor isolated solutions and thus avoidclustering of solutions.

When implementing a fitness sharing procedure, diversity of individualsin either design space or objective space can be considered. Moreover solu-tions with strong diversity in shape can be characterized by weak diversityin objective value (the opposite as well). Both procedures can lead to re-sults useful for an industrial designer who is interested in both shape andperformance diversity of optimal solutions. This is why a sharing procedurein only one of the two spaces cannot guarantee a satisfactory approximationof the Pareto optimal front in the other space. Once the current populationhas been divided into Pareto sets, all relative distances among individuals


in each front are evaluated in both spaces. A bigger fitness value is assignedto isolated individuals either in design or in objective space, while a smallerfitness value is assigned to close individuals reducing their survival proba-bility. More into details, we at first consider the first front and we assign adummy fitness dfiti to the individuals as follows:

dfih = \\U - cw1^1 (1)

where cwx is the center of gravity of the first front. As previously men-tioned, in order to build the sharing procedure we then evaluate the normal-ized average distances dij among elements in both design and objectivesdomain as follows, (dxij or dfij when shape or performance diversity hasto be enhanced, respectively):

N (rk - rk )2 H^ (fk - fk

\ | £i (xmaxk - xminkpY \ | (fmaxk - fmin*)2

where i, j = 1 : nsetk and k stands for the k-th front nset is the number ofindividuals in the set, [x]k the set of individual values (a nset x N matrix),[/]* the set of objective values (a nset x M matrix). Moreover xmaxk andxmink are maximum and minimum values of the p-th columns of the [x]k

matrix respectively, while fmax^ and fmin^ are maximum and minimumvalues of the p-th columns of the [f]k matrix respectively. Afterwards weevaluate for each individual couple in the front the following value:

s^=\l-^)2 ^ ^ ^ l : - ^ (3)(0 else

where dk j can assume one of the previous values (dxkj or d/*j), and whereak can assume one of the following values when shape or performancesdiversity have to be enhanced, respectively.

N M

Y^(xmaxk - xmink)2 ^( /mox* - fmink)2

k _ \_P=1 k _ \ j / = l - .

°x ~ nsetk - 1 ' af ~ nsetk - 1 [ '

ak is the threshold value defining numerically if points are near or far awayfrom each other (in either of the two domains). After this, the followingparameter mk measuring how the fitness of the i-th individual has to be

(2)


reduced with respect to the dummy fitness dfitk, is evaluated

nseth

< = E shh (5)

Finally, the fitness value for the i-th individual is evaluated:

fit" = # I*L i = i : nsetk ( 6 )

miBefore moving to the k + 1-th front a new dummy fitness dfitk+i has

to be evaluated:

dfitk+1 = min fit* - \\cwk+1 -cwk\\^ (7)i=l:nsetk

where cwk, cwk+1 are the center of weights of the current and the next frontrespectively. The procedure is repeated for all successive fronts. We pointout that convergence towards the front is always performed in the objectivespace while sharing procedures can be performed either in the design spaceor in the objective one.

Fig. 3.2. General flowchart of an NSESA algorithm: a zoom on fitness assignment isshown in Figure 3.3.


Fig. 3.3. Flowchart of an NSESA fitness assignment where upoint is the utopia pointU, nik the number of individuals and nf is the number of objectives.

3.2.1.1. Pareto Gradient Based Algorithms (PGBA) and HybridStrategies

One alternative possibility is to approximate the POF via deterministicsearch; a standard gradient based algorithm can be run for several individ-uals according to one of the scalarized formulations previously presentedwith different weights or different threshold values. The logical sequenceof operations of such an algorithm is shown in Table 3.5. Though beingthe simplest and most immediate strategy the risk of being trapped inlocal minima is remarkable. PGBA can hardly be used as a global POFapproximation strategy but it may be used as a second part of an hybridstochastic-deterministic and global-local strategy.

NSESA has been used for global search and PGBA as local search.The strategy is outlined in Table 3.6 and compared with a fully stochasticstrategy (Table 3.6).

Two are the key-points of such a strategy; the first one is the switchingcriterion to be used for stopping the evolutionary search and moving tolocal search, i.e. set up the value of Ki (see table 3.6 and 3.7 where ESTOP

is the maximum normalized distance between individuals for successive it-


Table 3.5. PGBA sequence of operations.

Start1-Build a random starting population of npop individuals2-Compute npop x M starting objectives values3-Build a uniform distribution of npop x M weights4-Build scalarized functions with different weights5-Run npop deterministic search with different scalarized functionsend

erations, being assigned the value K\ or K2 depending on the strategy).The second one is the metrics to be used in order to assign npop searchdirections to individuals when moving to local search. When convergencetowards POF has to be represented, convergence indexes Cx and C/ canbe profitably used. The two indexes monitor the convergence toward POS(in the design variable space) and toward POF (in the objective functionspace) respectively. The global search is based on Pareto-ranking and thusdoes not require preference functions.

Table 3.6. Combined global-local strategy.

BEGIN

o Build a random starting population (npop indiv.)

o Run NSESA up to partial conv. (.SSTOP = ft < Jfi)

o Build npop scalar preference functions

o Run npop CGA or NMA up to full conv. (.SSTOP = K\)

END

Table 3.7. Conventional strategy (Fully global).

BEGINo Build a random starting population (npop indiv.)

o Run NSESA up to full conv. (ESTOP = Kl)

END

3.2.1.2. Pareto Evolution Strategy Algorithm (PESTRA)

PESTR.A is a very simple algorithm where a (1+1)-ES is adopted in which anew design vector is accepted if it dominates its parent in the Pareto sense.Starting from an initial population of individuals that span the feasibleregion in a random way, the aforementioned criterion of optimization is


Fig. 3.4. General flowchart of PESTRA algorithm.

applied to each individual; the result is a final population that gives afirst approximation of the Pareto optimal front. The main advantage ofthe method is the reduced computational cost not in term of number ofobjective function evaluations but in terms of algorithm complexity, sincethere is no need to sort the current population into Pareto sets at eachiteration and then to assign a suitable fitness to the individuals of eachset. On the other hand, individuals do not interact during evolution andtherefore a clustering of solutions could occur.

In Figure 3.4 the simplified flow-chart of the algorithm is reported; it canbe noted that each step of the procedure has been implemented in parallel.The major drawback of the algorithm is the lack of an algorithm forcing thespread of non-inferior solutions; as a consequence, they may cluster arounda small sub-region of the Pareto optimal front. To prevent this occurrence,a large number of individuals in the initial population is in order.


3.2.1.3. Multi Directional Evolution Strategy Algorithm(MDESTRA)

Table 3.8. MDESTRA scheme.

Start1-Build a starting population of npop individuals2-Build an uniform weight distribution of the unitary M-cube with npop values3-Build npop scalar formulations using previously defined weight vectors4-Run npop independent evolution strategy searchesStop

Another very simple strategy is outlined in Table 3.8. This algorithmis equal to PGBA in terms of multiple search direction building but deter-ministic search is substituted by a stochastic one. This similarity will befully exploited in order to compare deterministic search with stochastic onewhen the same scalarized formulations are considered. Moreover we haveconsidered normalized weighted sum as an example but any scalar formula-tion with different weights or different threshold values may be consideredtaking into account what has been shown about limits and drawbacks ofscalar formulations 6.

Validation and performance measurement for evolutionary multiob-jective optimization algorithms is much more complex than the single-objective counterpart mainly because convergence is not anymore towardsa point but towards the POF which is a curve for 2 objective problems,a surface for three objective problems and an M — 1 hyper-surface in anm-dimensional space for M objective problems; moreover even if the equa-tion of all single objective is known and simple, it is normally impossibleto analytically compute the POF equation. This is why especially devotedconvergence indexes and error evaluation formulas are necessary. Here somesolutions to this problem are proposed and applied to validate algorithms.In order to do this, several analytical test cases are available in the litera-ture presenting one or more of the pathological behaviors that have beenaddressed 9; some more test cases have been developed, being especially de-voted to test algorithms for multiobjective optimization of electromagneticdevices 3.

3.3. Case Studies

The methodologies of multi-objective optimization developed by the au-thors are applied to two case studies, that represent a realistic example of


industrial applications:

• a magnetic reactor rating 5.9[MVA] at a nominal current of 893[A]at a frequency of 50 Hz, used in power systems to mitigate theshort-circuit current;

• an inductor for transverse-flux heating of a metal strip, suppliedby a current equal to 700[A] with a frequency variable between 250Hz and 3 kHz, that is used for annealing flat metal work pieces ata temperature up to 700°C.

Always, geometric variables are assumed to be design variables, subjectto a set of constraints. In the former case study, the cost of active materialsand the leakage field in the winding are the objective functions, both to beminimized; in the latter case study, the efficiency of the inductor and thetemperature uniformity in the strip are the objective functions, both to bemaximized. For each case study, the conflict between objectives is justifiedaccording to physical principles and fully investigated in a numerical way.

3.3.1. Shape Design of a Shielded Reactor

The shape optimization of a single-phase series reactor for power applica-tions is considered first. The reactor is employed to reduce the peak valueof short-circuit current and so to mitigate its electrodynamical effects.

The reactor, the cross-section of which is shown in Figure 3.5, is char-acterized by a coreless winding with cylindrical shape (foil winding); it isboxed in a laminated magnetic shield with rectangular shape in order toprotect the surrounding environment against the strong stray field. Thelatter, in turn, gives rise to power losses in the winding that limit the oper-ation of the device. The higher the winding, the lesser the stray field; on theother hand, the realization of a higher winding and shield, though reducingthe effect of leakage, causes an increase of volume and cost of the reactorso that a conflict of design criteria is originated. For a prototype reactor,rating 5.9[MVA] at a nominal current of 893[v4], the following values hold:h = 500[mm], dm = 590[mm], a = 210[mm], d = 80[mm], t = 40[mm],N - 212, filling factor of the winding ks = 0.504.

3.3.1.1. Direct Problem

The distribution of magnetostatic field in the reactor, for which the Carte-sian symmetry is assumed to be valid, is governed by the Poisson's equationof vector potential A = (0,0, A)


Fig. 3.5. Cross-section of the reactor (one quarter) and design variables.

f-V-(lvA) = J< A = 0 along x = 0 (8)

y | 4 = o elsewheresubject to boundary conditions A=0 along x=0 and elsewhere; J=3.57

[A I mm2] is the current density in the winding while r= l and r=104 arethe values of relative permeability of non-magnetic materials and iron, re-spectively. To solve problem stated in equation 8 numerically, the two-dimensional field region shown in Figure 7, including an external layerof air, has been discretized by means of a regular grid of finite elements,namely triangles with quadratic variation of potentials; the total numberof elements is ne = 950 approximately. The evolutionary optimizer calls a

finite-element solver for performing the field analysis and then updates the

grid at each iteration.

3.3.1.2. Inverse Problem

The shape of the device can be described by means of seven design variables:

geometric height h, mean diameter dm, radial thickness a of the winding,

number of turns N, axial distance d between winding and magnetic shield,


thickness s of the shield, radial distance t between winding and shield. Twoconflicting criteria can be defined:

• the material cost /i of the reactor, namely the weighted sum ofcopper and iron weights, to be minimized:

Fi = 4kiWi[s(dm + a + t)l + s(^+d + s)l] + kcwckslah (9)

with hi = 1, kc = 3 while u>i = 7860[kgm~3} and wc = 8930[kgm~3]are specific weights of iron and copper, respectively;

• the fringing field /2 inside the winding, i.e. the mean radial com-ponent of magnetic induction in the cross-section of the winding,to be minimized as well:

NW

^ M E I ^ B ! (io)

where NW — 64 is the number of points of a grid sampling theradial induction in the winding.

The following constrains have been prescribed:

• the rated value of inductance L = 23.57[miJ];• the induction in the core, not exceeding 0.8 T, when the current

per turn is equal to 7^=893 A;• the insulation gap between winding and core.

Three independent design variables design variables have been se-lected, i.e.

• height h• mean diameter dm

• number of turns N of the winding

Finally, a set of bounds preserves the geometrical congruency of themodel, namely:

0.5 < h < 1.5 [m]0.1 + 2a < dm < 1.8[m] (11)

162 < N < 262

The multiobjective optimization problem can then be cast as follows:given an initial population of npop individuals randomly distributed in the


3D design space (h, dm, N), find npop non-dominated individuals represent-ing an approximation of the Pareto optimal front in the 2D objective space(/1J2).

For the sake of comparison, two scalar formulations scalar formulationof the multiobjective problem have been considered too, namely the objec-tive weighting and the far-from-worst programming for which the followingpreference functions have been defined, respectively:

f = c1f1+c2f2 (12)

to be minimized, where c\ — 1CP4 and c^ = 25 are dimensional coefficientsensuring in a heuristic way equal preference to each objective and

7 = ( l - c 1 / 1 ) ( l - c 2 / 2 ) (13)

to be maximized, subject to the prescribed constraints. In all cases,an evolutionary algorithm ((1 + 1ES)) has been used as the optimizationengine.

3.3.1.3. Sample-and-Rank Approach

Prior to tackle the procedure of optimization, a preliminary identificationof the Pareto optimal front has been achieved by sampling, in a randomway, the feasible region in both design and objective spaces. To this end, adistribution of 1000 samples fulfilling a uniform probability density in thedesign space is considered; constraints and bounds are taken into account.

In Figure 3.6 the corresponding distribution of samples in the objectivespace is represented; it can be noted that the design criteria defined trans-form a quasi-uniform distribution in the design space into a non-uniformone in the objective space. The Pareto optimal front is approximated bythe lower boundary of the latter distribution; it links two incommensurableobjectives and appears to be convex and connected. In particular the frontis characterized by two branches. The first one corresponds to devices ex-hibiting a nearly constant cost, while the second branch is characterized bya remarkable variability and corresponds to devices for which both strayfield and cost are actually in mutual conflict. Moreover it can be notedthat, though starting from a quasi-uniform sampling of design space, theapproximation of the Pareto optimal front is poorer in the branch of mutualconflict than in the branch of constant cost; this fact confirms the physicalexpectation that building a low-stray winding is a non-trivial task. As a


Fig. 3.6. Sampling of the objective space (1000 points): a subset of non-dominatedsamples (o) is highlighted.

consequence, the approximation of the branch of mutual conflict representsa moderately stiff problem from the numerical viewpoint. In Figure 3.6 asubset of non-dominated samples is highlighted along the branch of thefront exhibiting a more pronounced variability.

Fig. 3.7. Distribution of non-dominated points (o) in the sampled design space (dm — hplane on the left and (dm — N plane) on the right ).

In Figure 3.7 the points corresponding to the non-dominated samplesshown in Figure 3.6 are represented in (h, dm) and (dm, N) planes, respec-


tively; in particular Figure 3.7 on the right shows a wide set of non-trivialdesigns differing both in shape and in performance.

3.3.1.4. Optimization Results

The described procedure of multiobjective evolutionary optimization hasbeen run in two cases, considering 10 and 20 individuals, respectively. Afteroverlapping, in the objective space, the results obtained in the two cases, afew solutions appeared to be weakly dominated and therefore it has beendecided to filter them out.

Fig. 3.8. Approximation of the Pareto optimal front by means of 24 individuals afterfiltering 10+20 solutions.

As a result, 24 non-dominated solutions distributed along the Pareto op-timal front have been finally obtained; they are shown in Figure 3.8 wherethe prototype design is also reported. It can be noted that the solutionsfound represent a subset of the Pareto optimal front, characterized by astray variation of 62.5% and a corresponding cost variation of 46.7% ap-


proximately. The objective values for the prototype are also shown. As canbe seen some solution in the front dominates the prototype and some othersolutions are equivalent in the Pareto sense.

Fig. 3.9. Non-dominated solutions: final shapes of the reactor (24 individuals, sizes inm). Shapes are ranked from the minimum stray (upper left) to minimum cost (lowerright).

In Figure 3.9 the shapes of the 24 non-dominated solutions are shown;they are ranked starting from the minimum stray (upper left) device to theminimum cost (lower right) device; the variability of geometry is evident.

In Figure 3.10 the distribution of magnetic induction in the reactoris shown for the two extremal solutions belonging to the set of 24 non-dominated solutions.

For the sake of comparison, two scalar optimizations have been run con-sidering preference functions pi and p2, respectively; the standard (1+1)-evolution strategy has been applied. The corresponding optima are reportedin Figure 3.11; it can be noted that the two scalar solutions are practicallycoincident and represent a non-dominated point located near the center ofgravity of the Pareto optimal front.


Fig. 3.10. Contour plot of magnetic induction for minimum stray field (A), minimumcost (C) and intermediate (B) non-dominated solution.

Fig. 3.11. Final results: comparison of scalar and vector optimization.

3.3.2. Shape Design of an Inductor forTransverse-Flux-Heating of a Non-FerromagneticStrip

The shape design of an inductor for induction heating applications is consid-ered next 4. Transverse flux induction heating (TFH) has been well known


for many years and is suited to heat non-ferrous metal strips using low fre-quencies. An important design criterion for a TFH system is the electricalefficiency of the inductor; however, a high efficiency can be obtained evenif the transient temperature distribution in the strip is not uniform. Onthe other hand non uniformity of temperature may cause deformation ofthe strip. For these reasons the design of a TFH system should fulfill bothrequirements (i.e. efficiency and uniformity) at a time. The problem canbe stated as a multiobjective optimization problem. A typical inductor forTFH systems, shown in Figure 3.12, is composed of two parts, each facingone side of the workpiece (metal strip and is characterized by any numberof poles, with different dimensions and supply).

Fig. 3.12. Typical TFH geometry configuration with 6 sections.

A typical section of the inductor is presented in Figure 3.13 with themain parameters used for the implementation of the analytical-numericalcode for field analysis.

3.3.2.1. Direct Problem

The following assumptions have been made in order to develop the electro-magnetic analysis of the device:

• all the materials have linear and isotropic characteristics,• inductor thickness is neglected (sheet current distribution in the

inductor coils are considered),• infinite magnetic permeability in the magnetic yoke is considered.


Fig. 3.13. Top view and cross section of the model for a TFH device.

Thanks to these assumptions the problem formulation becomes:

rotH = KIT rotA

<j=-h (14)rotA = -9-tex + °-tey + (d-£-?t)ez

divA=d-t-^ = o

In terms of the corresponding phasors the problem becomes:

' V 2 i x - iÂx = 0

.v '4- 4 ^ = o (Hx{x,y,h) = -Ky(x,y)

Hy(x,y,h) - -Kx(x,y)where A, H, K, J are: magnetic vector potential, magnetic field strength,

laminar current density, current density, permeability and electrical resis-tivity, respectively. To make the integration possible, the magnetomotiveforce (MMF) produced by the exciting currents, has been decomposed intwo dimensional Fourier's series along x and y directions, assuming it peri-

(15)


odical along both directions:

MMF = g(x,y)NI where

g(x,y)= E E ahkcos(îx) cos(^-ky)< t=l,3,...fc=l,3,... (16)

ah* = IT / Is 9(x> y) cos{îx) cos(^k%j) with

«7 = 2[26 + 2d + 2c], r = 2[2a + 2d + 2/]

The analytical solution of the Helmholtz's equation in three dimensionsgives the expressions of the electric field in the strip and in the air; by know-ing the former, the power density distribution and all integral parameters ofthe system can be calculated. The analysis of the thermal transient is basedon finite difference method starting from the solution of the electromagneticproblem described above.

3.3.2.2. Inverse Problem

In order to perform a design optimization, as reported in Figure 3.13, thedesign variables are a, half-internal width of the coil in the longitudinaldirection and b, half-internal height in the transversal direction; the widthof the coil conductor is fixed and equal to 22 mm. The inductor, composedof four coils, is supplied by a current equal to 700 A, a frequency equal to1200 Hz and the velocity of the strip is v = 0Acms~l. The material of thestrip is silver; the width of the strip to be heated is fixed and equal to 100mm. The following two objective functions have thus been defined:

• the electrical efficiency (to be maximized) of the inductor definedas the ratio between power transferred to the workpiece and powersupplied to the inductor.

• the maximum temperature gap T (to be minimized) in the y direc-tion in the same instant.

Congruence bounds have been imposed to design variables (a,b).It is interesting to remark that the simultaneous maximization of effi-

ciency and minimization of temperature gap gives origin to a conflict be-tween the two objectives. Therefore the multiobjective optimization prob-lem can be cast as follows: given an initial population of npop individualsrandomly distributed in the 2D design space (a, b), find npop non-dominatedindividuals representing an approximation of the Pareto optimal front inthe 2D objective space (/i,/2).


Before tackling optimization, several objective function surfaces analysishave been performed, one for each pair of design variables. In Figure 3.14 anexample of conflict between the objective functions is reported; the valuesof a and b have been varied on a 100-point grid.

Fig. 3.14. Objective function surfaces for the TFH problem versus two design variable.

3.3.2.3. Sample-and-Rank Approach

Following the sampling-and-ranking approach, the results shown in Figure3.15 have been obtained. The Pareto optimal set is represented by 33 outof 1024 points; they form a subset of the feasible region, characterizedby values of variable a close to its upper bound and values of variable bdistributed between its central value and its upper bound.

On the other hand, the Pareto optimal front is approximated by meansof 33 non-dominated solutions over 1024 samples; the front exhibits a vari-ation of about 4 % in terms of electrical efficiency of the inductor and avariation of about 9 % in terms of maximum temperature gap betweencentre and edge of the strip to be heated.

3.3.2.4. Optimization Results

Due to the high computational cost of each objective function evaluation(approximately 5 minutes on a Pentium IV processor running at 1.8 GHz)due to the precision required in approximating the POF, the optimizationhas been tackled via the hybrid stochastic-deterministic and local-globalstrategy NSESA+PCGA and NSESA+PSA.

Three different solutions are shown in Figure 3.16 corresponding to


Fig. 3.15. Mapping of objective space (1024 points; non-dominated solutions high-lighted by circles)

columns in Table 3.9. As can be seen when a fully stochastic strategy isrun with the same cost of the hybrid one, the solution is highly unsatisfac-tory demonstrating the validity of the hybrid strategy. On the other handthe hybrid strategy with the simplex method as local search is able to givea bigger diversity in solution but, from an industrial point of view some so-lutions are to be discarded because of the too low efficiency value. Finallya set of 8 solution has been selected; the shape of them is shown in Figure3.17.

Table 3.9. Number of objective function calls and cpu-time forTFH inductor optimization.

Figure I A I B I CStrategy NSESA+PSA NSESA+PCGA ~~NSESAglobal s. 128 150 206local s. 422 56

Total cost 550 206 206cpu-time [h] 17 7 7


3.4. Conclusions

The optimal shape design of several electromagnetic devices has beenachieved by means of a multiobjective evolutionary optimization based onthe concept of non-dominated solutions. A wide number of configurationsbelonging to the Pareto optimal front have been identified, so offering thedesigner an effective choice among devices that rank from the best perform-ing one to the less costly one. The computational cost of the methodologydeveloped is light and compatible with resources of PC based platforms.More generally, the following remarks can be drawn.

• A wide choice among optimal solutions implies a better compati-bility of the design with industrial normalization and technologicalconstraints.

• Multi-objective optimization enhances the diversity of perfor-mances of optimal designs and therefore could highlight non-trivialsolutions that are a priori unpredictable.

• Having a set of optimal solutions makes it easy to fulfill a posterioritime-varying constraints that are typical of real-life engineering (e.g. the ones imposed by suppliers of materials), whereas in scalar op-timization they have to be carefully prescribed in order the uniquesolution be feasible also from the point of view of such constraints.

References

1. T. Back. Evolutionary Algorithms in Theory and Practice. Oxford UniversityPress, New York, 1996.

2. P. Di Barba and M.Farina. Multiobjective shape optimisation of air coredsolenoids. COMPEL International Journal for computation and mathematicsin Electrical and Electronic Engineering, 21(l):45-57, 2002. JP.

3. P. Di Barba, M.Farina, and A. Savini. An improved technique for enhanc-ing diversity in pareto evolutionary optimization of electromagnetic devices.COMPEL International Journal for computation and mathematics in Elec-trical and Electronic Engineering, 20(2):482-496, 2001. SCP.

4. M. Battistetti, F. Colaone, P. Di Barba, F. Dughiero, M.Farina, S. Lupi, andA. Savini. Optimal design of an inductor for transverse flux heating usinga combined evolutionary-simplex method. COMPEL International Journalfor Computation and Mathematics in Electrical and Electronic Engineering,20(2):507-522, 2001. SCP.

5. Carlos Coello Coello. Complete and continuously updated bibliography col-lection on emo, http://delta.cs.cinvestav.mx/~ccoello/emoo/emoobib.html.

6. Indraneel Das and John Dennis. A Closer Look at Drawbacks of Minimiz-


Fig. 3.16. Optimization results for the TFH device (C,A,B cases described in table 3.9).

ing Weighted Sums of Objectives for Pareto Set Generation in MulticriteriaOptimization Problems. Structural Optimization, 14(l):63-69, 1997.

7. K. Deb. A web-site with important references and several downloadable soft-ware, http://www.iitk.ac.in/kangal/deb.htm.

8. K. Deb. Nonlinear goal programming using multi-objective genetic algo-rithms. Journal of the Operational Research Society, 52(3):291—302, 2001.

9. Kalyanmoy Deb. Multi-Objective Genetic Algorithms: Problem Difficultiesand Construction of Test Problems. Evolutionary Computation, 7(3):205-230, Fall 1999.


Fig. 3.17. Cross section of Pareto optimal solutions ranked in F\ increasing order.

10. J. Knowles and D. Corne. Approximating the nondominated front using thepareto archived evolution strategy. Evolutionary Computation, 8(2):149-172,2000.

11. Joshua D. Knowles, Martin J. Oates, and David W. Corne. MultiobjectiveEvolutionary Algorithms Applied to two Problems in Telecommunications.BT Technology Journal, 18(4):51-64, October 2000.

12. M.Farina, P. Di Barba, and A. Bramanti. A grs method for pareto-optimalfront identification in electromagnetic multiobjective synthesis. IEE Proceed-ings Science, Measurement and Technology, 149(5):207-213, 2002. JP.

13. M.Farina and J. K. Sykulski. Comparative study of evolution strategies com-bined with approximation techniques for practical electromagnetic optimisa-tion problem. IEEE Trans, on Mag., 37:3216-3220, 2001. SCP.

14. Kaisa Miettinen. A web-site with important references and several download-able software, http://www.mit.jyu.fi/~miettine.

15. K. Rashid, M.Farina, J. A. Ramirez, J. K. Sykulski, and E. M. Freeman.A comparison of two generalized response surface methods for optimisationin electromagnetics. COMPEL International Journal for Computation andMathematics in Electrical and Electronic Engineering, 20(3):740-753, 2001.


SCP, Highly Commended Award, Emerald Literati Club, 2002.16. N. Srinivas and K. Deb. Multiobjective optimization using non-dominated

sorting in genetic algorithm. Evol. Corn-put., 2(3):221-248, 1994.17. A. M. Sultan and A.B. Templeman. Generation of pareto solutions by

entropy-based methods, in Multiobjective Programming and Goal Program-ming: Theories and application, M. Tamiz, ed., Berlin, Springher verlag,pages 164-195, 1996.

18. M. Zeleny. Multiple criteria decision making: Eight concepts of optimality.Human System Management, 17:97-107, 1998.

19. Eckart Zitzler, Marco Laumanns, and Stefan Bleuler. A Tutorial on Evolu-tionary Multiobjective Optimization. In Xavier Gandibleux, Marc Sevaux,Kenneth Sorensen, and Vincent T'kindt, editors, Metaheuristics for Multi-objective Optimisation, pages 3-37, Berlin, 2004. Springer. Lecture Notes inEconomics and Mathematical Systems Vol. 535.

20. Thiele L. Zitzler E. Multiobjective evolutionary algorithms: A comparativecase study and the strength pareto approach. IEEE Tansactions on Evolu-tionary Computation, 3(4):257-271, 1999.

CHAPTER 4

GROUNDWATER MONITORING DESIGN: A CASE STUDYCOMBINING EPSILON DOMINANCE ARCHIVING ANDAUTOMATIC PARAMETERIZATION FOR THE NSGA-II

Patrick Reed and Venkat Devireddy

Department of Civil and Environmental Engineering,The Pennsylvania State University

212 Sackett Building, University Park,PA 16802-1408


The monitoring design problem is extremely challenging because it re-quires environmental engineers to capture an impacted system's govern-ing processes, elucidate human and ecologic risks, limit monitoring costs,and satisfy the interests of multiple stakeholders (e.g., site owners, regu-lators, and public advocates). Evolutionary multiobjective optimization(EMO) has tremendous potential to help resolve these issues by pro-viding environmental stakeholders with a direct understanding of theirmonitoring tradeoffs. This chapter demonstrates the use of the Nondom-inated Sorted Genetic Algorithm-II (NSGA-II) to optimize groundwatermonitoring networks for conflicting objectives. Additionally, this chapterdemonstrates how e-dominance archiving and automatic parameteriza-tion techniques for the NSGA-II can be used to significantly improvethe algorithm's ease-of-use and efficiency for computationally intensiveapplications. Results are presented for a 2-objective groundwater moni-toring case study in which the archiving and parameterization techniquesfor the NSGA-II combined to reduce computational demands by greaterthan 70-percent relative to prior published results. The methods of thischapter can be easily generalized to other multiobjective applications tominimize computational times as well as trial-and-error EMO parameteranalysis.

4.1. Introduction

Environmental engineers address a broad array of pollution problems thatrange in scale from global emissions of greenhouse gases to micro-scale

79

80 Patrick Reed and Venkat Devireddy

strategies for using bacteria to mitigate water pollution. Although environ-mental engineers work on a broad array of problems, they share the chal-lenge of developing engineered systems that successfully protect human andecologic health while seeking to limit the financial burden placed on society.Balancing these conflicting objectives is extremely challenging because itrequires environmental engineers to capture an impacted system's govern-ing processes, elucidate human and ecologic risks, limit design costs, andsatisfy the interests of multiple stakeholders. In the past decade, researchershave recognized that evolutionary multiobjective optimization (EMO) hassignificant potential as a decision support tool that can be used to helpresolve these issues by providing stakeholders with a direct understandingof the tradeoffs for environmental engineering systems1"7.

This chapter demonstrates the use of EMO to design groundwater mon-itoring networks for conflicting objectives. Long-term groundwater moni-toring (LTM) can be defined as the sampling of groundwater quality overlong time-scales to provide "sufficient and appropriate information" to as-sess if current mitigation or contaminant control measures are performingadequately to be protective of human and ecological health8. The LTMproblem is ideal for demonstrating how EMO can aid environmental engi-neers because of the tremendous expense and complexity of characterizinggroundwater contamination sites over long time periods. Projected federalexpenditures within the United States on LTM of contaminated ground-water for the decade beginning in the year 2000 are expected to exceed $5billion8-9'10.

The multiobjective LTM design problem presented in this chapteris solved using a modified version of the Nondominated Sorted GeneticAlgorithm-II (NSGA-II)11, which will be termed the e-dominance NSGA-IIin this chapter using the abbreviated notation, e-NSGA-II. The e-NSGA-II demonstrates how e-dominance archiving 12>13 can be combined with aparameterization strategy for the NSGA-II14 to accomplish the followinggoals: (1) ensure the algorithm will maintain diverse solutions, (2) eliminatethe need for trial-and-error analysis for parameter settings (i.e., populationsize, crossover and mutation probabilities), and (3) allow users to suffi-ciently capture tradeoffs using a minimum number of design evaluations. Asufficiently quantified tradeoff can be defined as a subset of nondominatedsolutions that provides an adequate representation of the Pareto frontierthat can be used to inform decision making.

In this chapter, section 4.2 overviews prior studies used in the develop-ment of the e-NSGA-II. Section 4.3 discusses the groundwater monitoring

e-Dominance Archiving and Automatic Parameterization 81

test case used to demonstrate the e-NSGA-II. Sections 4.4 and 4.5 providea more detailed description of the e-NSGA-II and its performance for thegroundwater monitoring test case, respectively. Sections 4.6 and 4.7 discusshow the e-NSGA-II and future extensions of this work have significant po-tential to help environmental engineers address computationally intensiveapplications where stakeholders must balance more than two performanceobjectives (i.e.,high-order Pareto optimization problems).

4.2. Prior Work

The £-NSGA-II combines the external archiving techniques recommendedby Laumanns et al. 12 with automatic parameterization techniques15'16 de-veloped to eliminate trial-and-error analysis for setting the NSGA-IFs pa-rameters. A primary drawback of using EMO methods for environmentalapplications lies in the large costs associated with assessing performance{i.e., algorithmic reliability and solution quality). The common practice ofassessing performance for a distribution of random seeds employed in theEMO literature is often prohibitively expensive in terms of computationalcosts and in terms of the time that must be invested by users. The goal ofthe automated parameterization approaches developed by Reed et al.15 isto eliminate the need to assess algorithmic performance for a distribution ofrandom number seeds and instead focus on the NSGA-II's reliability and ef-ficiency for a single random seed. Reliability is addressed in the approach byadaptively increasing the size of the population. The method uses multipleruns in which the nondominated solutions are accumulated from search per-formed using successively doubled population sizes. The runs (i.e.,searcheswith the successively doubled population sizes) continue until either theuser-defined maximum run-time is reached or sufficient solution accuracyhas been attained.

The NSGA-II parameterization approach presented by Reed et al.lA wasdemonstrated on the same case study that will be discussed in Section 3.0of this chapter. Their approach required a total of 38,000 function eval-uations, which is an 80-percent reduction from prior published results4.Moreover, the method enabled Reed and Minsker7 to solve a 4-objectivemonitoring application (i.e., a high order Pareto optimization problem),which represents a new problem class within the environmental literaturethat has historically been dismissed as intractable (e.g., see pages 197-198 ofSun17). Although Reed et al.14 helped to demonstrate how EMO can helpenvironmental engineers step beyond 2-objective applications, the method


employs an inefficient form of archiving, the approach fails to allow usersto bias search towards important objectives, and the method does not takeadvantage of early run results to guide subsequent search.

Reed et al.li recommended offline analysis for accumulating nondomi-nated solutions across multiple runs. Offline analysis can be viewed as anunbounded archive (i.e., the number of solutions stored in memory is notlimited) of the nondominated solutions found by the NSGA-II in every gen-eration of every run. Laumanns et al.12 highlight that unbounded archivingleads to memory and nondomination sorting inefficiencies. The £-NSGA-II approach discussed in this chapter was specifically developed followingLaumanns et al's theoretical recommendations for bounding archive sizeand improving solution diversity using the principle of e-domination. £-domination requires users to specify the precision with which they wishto quantify each objective. User-specified precisions can be used to biassearch towards regions of an application's objective space with the highestprecision requirements (see Section 4.0 for more details). The e-dominationarchive was used in this study to maintain a diverse representation of thePareto optimal set; moreover the archived solutions found with small pop-ulations are used to pre-condition search with larger populations and min-imize the number of design evaluations required to solve an application.

The reader should note that beyond the £-NSGA-II, Deb et al.13 havealso proposed an extension of the NSGA-II to improve its diversity opera-tor termed the Clustered NSGA-II (C-NSGA-II) as well as a steady statee-dominance multiobjective evolutionary algorithm (MOEA) that balancesconvergence speed and diversity. The C-NSGA-II replaces the crowdingdistance procedure with the clustering technique that was used in theStrength Pareto Evolutionary Algorithm18. Though C-NSGA-IFs resultswere better than NSGA-II, the large computational time for implement-ing the method's clustering algorithm eliminated the algorithm from con-sideration. The steady state e-MOEA13 helped to encourage our usage of£-dominance archives in this chapter. We did not use the steady state e-MOEA itself in this chapter because (1) the algorithm is limited to the real-coded representation and (2) the algorithm's small generation gap (i.e.,only1 population member is being replaced during each iteration) limited itsability to take advantage of small populations runs to reduce the overallnumber of function evaluations required to solve an application. Readersinterested in other online adaptive strategies beyond the £-NSGA-II shouldreference the micro-GA19 and its successor, micro-GA220, which use theconcept of small population sizing and automatic parameterization. Tan et


al.21 demonstrate a dynamic population sizing scheme for the incrementingmultiobjective evolutionary algorithm (IMOEA). Kursawe22 and Abbas23

utilize online adaptive strategies to enhance Pareto optimization resultsattained from an evolutionary strategy and differential evolution, respec-tively.

4.3. Monitoring Test Case Problem

4.3.1. Test Case Overview

The e-NSGA-II's performance is demonstrated on a 2-objective test caseoriginally modeled by Reed et al.A. The test case is based on an actual sitelocated at the Lawrence Livermore National Laboratory in Livermore, Cal-ifornia, which has been historically contaminated with a large spill of thesolvent perchloroethylene (PCE). The goal of this application is to monitorthe site using groundwater monitoring wells (i.e., wells drilled to samplesubsurface water) to track the PCE's migration. PCE is a human healthconcern because the solvent is known to cause cancer in exposed humanbeings. The site is undergoing long-term monitoring, in which groundwatersamples are used to assess the effectiveness of current efforts to reduce thesite's contamination. During this long-term monitoring phase for a contam-inated site, sampling and laboratory analysis can be a controlling factor inthe costs of managing a site. The monitoring wells can sample from 1 to3 locations along their vertical axis and have a minimum spacing of 60-mbetween wells in the horizontal plane. Quarterly sampling of the entire net-work of wells has a potential cost of over $70,000 annually for PCE testingalone, which could translate into millions of dollars because the site's lifespan will be several decades and potentially even centuries.

4.3.2. Problem Formulation

Equation 1 gives the multiobjective problem formulation for quantifyingthe tradeoff between minimizing sampling costs and the maintenance of ahigh quality interpolated picture of the PCE contamination.

Minimize F(xK) = [fi(xK), f2(xK)} , VK £ finwell

/i(*«) = E Cs{i)xKi ,^

/2(*it)=El(c;H(«i)-ce",t(%))2

3 = 1


F(xK) is a vector valued objective function whose components [fi(xK),/2(^K)] represent the cost and squared relative estimation error (SREE),respectively, for the Kth monitoring scheme xK taken from the collectionof all possible sampling designs fl. Equation 2 defines the binary decisionvariables representing the nth monitoring scheme.

f 1, if the ith well is sampled w , . .„.xki = { „' , Vfc,z (2)

(_ 0, otherwiseIf the ith well is sampled it is assumed that all available locations along

the vertical axis of that well will be sampled at a cost of Cs{i). Cs(i)ranged from $365 to $1095 for 1 to 3 samples analyzed for PCE solely.Sampling all available levels within each well reduces the size of f2 from 250

to 220 where 50 and 20 represent the total number of sampling locationsand monitoring wells (nwell), respectively. Reducing the size of Q enabledthe entire decision space of this application to be enumerated. Enumerationwas employed to identify the true Pareto frontier so that the performanceof the e-NSGA-II could be rigorously tested. In particular, the enumeratedPareto frontier is used in this chapter to show the algorithm's efficiency andreliability.

The SREE objective provides a measure of how the interpolated pic-ture of the plume using data only from wells included in the Kth samplingplan compares to the result attained using data from all available samplinglocations. The measure is computed by summing the squared deviations be-tween the PCE estimates using data from all available sampling locations,c*all(uj), and the estimates based on the Kth sampling plan c*st(uj) at eachlocation Uj in the interpolation domain. Each Uj specifies the coordinatesfor the j t h grid point in the interpolation domain. The interpolation do-main consisted of a total of 3300 grid points (nest in equation 1). The PCEestimates used in the calculation of the SREE for each of the sampling de-signs were attained using a nonlinear spatial interpolation method (see 4

for more details).

4.4. Overview of the £-NSGA-II Approach

The e-NSGA-II algorithm proposed in this chapter aims at reducing userinteraction requirements and the computational complexity associated withsolving multiobjective optimization problems. EMO algorithms require theuser to specify the following parameters:


• Population Size• Run length• Probability of Crossover• Probability of Mutation

The specification of these parameters is typically done using multipletrial-and-error runs of an EMO algorithm; wasting user time and computa-tional resources. e-NSGA-II enables the user to specify the precision withwhich they want to quantify the Pareto optimal set and all other param-eters are automatically specified within the algorithm. A brief descriptionof the algorithm is given below in Figure 4.1.

Fig. 4.1. Schematic overview of the e-NSGA-II.

The proposed algorithm consists of three steps. The first step utilizes theNSGA-II with a starting population of 5 individuals to initiate EMO search.The initial population size is set arbitrarily small to ensure the algorithm'sinitial search is done using a minimum number of function evaluations.Subsequent increases in the population size adjust the population size tothe appropriate size based on problem difficulty. In the second step, thee-NSGA-II uses a fixed sized archive to store the nondominated solutionsgenerated in every generation of the NSGA-II runs. The archive is updatedusing the concept of e-dominance, which has the benefit of ensuring that


the archive maintains a diverse set of solutions, e-dominance requires theuser to define the precision with which they want evaluate each objective(e.g., quantify costs in thousands, hundreds, or tens of dollars) by specifyingan appropriate e value for each objective.

The third step checks if user-specified termination performance criteriaare satisfied and the Pareto optimal set has been sufficiently quantified. Ifthe criteria are not satisfied, the population size is doubled and search iscontinued. When increasing the population, the initial population of thenew run has solutions injected from the archive at the end of the previousrun. The algorithm terminates if either a maximum user time is reachedor if doubling the population size fails to significantly increase the numberof nondominated solutions found across two runs. The following sectionsdiscuss the e-NSGA-II in greater detail.

4.4.1. Searching with the NSGA-II

The e-NSGA-II was motivated by the authors' goal of minimizing the totalnumber of function evaluations required to solve computationally intensiveenvironmental applications and eliminate trial-and-error analysis for settingthe NSGA-II's parameters. Population size has been the key parameter con-trolling the performance and efficiency of our prior applications4'7'14. Thedynamic population sizing and injection approach applied in the e-NSGA-IIsimply exploits computationally inexpensive small populations to expeditesearch while increasing population size commensurate with problem diffi-culty to ensure the Pareto optimal set can be reliably approximated.

The initial population size, NQ is set to some arbitrary small value {e.g.,5), as it is expected that subsequent multi-populations runs would adjust foran undersized population. A randomly selected subset of the solutions ob-tained using the small population sizes are injected into subsequent largerpopulations, aiding faster convergence to the Pareto front. This can beviewed as using series of "connected" NSGA-II runs that share results sothat the Pareto optimal set can be reliably approximated. Computationalsavings should be viewed in two contexts: (1) the use of minimal populationsizes and (2) elimination of random seed analysis. Note that the numberof times the population size will be doubled varies with different randomseeds, though exploiting search with small populations will on average dra-matically reduce computational times. Moreover, our approach eliminatesthe need to repeatedly solve an application for a distribution of randomseeds.


The NSGA-II's remaining parameters are set automatically based onwhether an application is being solved using a real or binary coding. Theresults shown in this chapter are for a binary coded application where theNSGA-II's parameters are set following the approach recommended by Reedet al. 14. The initial and all subsequent populations are allowed to searchfor a fixed run length, t. The run length is set using the domino convergencemodel developed by Thierens et al.2i such that each population is allowedto search for 21 generations, where Us the binary string length. For themonitoring application presented in this chapter maximum run length wasspecified to be 40 generations. The uniform crossover operator was usedto minimize positional bias25 with the probability of crossover Pc set to0.5 based on prior empirical results25 as well as a theoretical disruptionboundary relation derived by Thierens26. The probability of mutation Pm

is set to 1/iVwhere N is the current population size. This relationship isbased on the recommendations of DeJong27 and Schaffer28 and has theadvantage of increasing the diversity of small populations and preservingsolutions in large population runs. The proposed e-NSGA-II algorithm canbe easily adapted for real coded problems by using Deb's29 recommendedrun length (i.e,250 generations), as well as his recommended settings forthe crossover and mutation operators used in the real-coded version of theNSGA-II.

4.4.2. Archive Update

Recent studies in the EMO literature have highlighted the importance ofbalancing algorithmic convergence speed and solution diversity12'13. Thesestudies have highlighted that the NSGA-II remains one of the fastest con-verging methods available, but its crowding operator fails to promote di-versity for challenging EMO problems. The £-NSGA-II overcomes this fail-ure using e-dominance archives12. The e-dominance archiving approach isparticularly attractive for environmental applications because it allows theuser to define the precision with which they want to quantify their trade-offs while bounding the size of the archive and maintaining a diverse set ofsolutions. Figure 4.2 adapted from Deb et al.13 illustrates the e-dominanceapproach.

The concept of e-dominance requires the user to define the precisionthey want to use to evaluate each objective. The user specified precisionor tolerance vector e defines a grid for a problem's objective space [seeFigure 4.2], which biases the NSGA-II's search towards the portions of a


Fig. 4.2. Illustration of e-dominance (adapted from Deb et al.13.)

problem's decision space that have the highest precision requirements. Fig-ure 4.2 illustrates how e-domination allows decision makers to extend asolution's zone of domination based on their required precision for eachobjective (i.e., E\ and e2 )• Under traditional nondomination sorting solu-tion P dominates region PECF whereas using e-domination the solutiondominates the larger region ABCD. The e-dominance archive improvesthe NSGA-II's ability to maintain a diverse set of nondominated solutionsby only allowing 1 archive member per grid cell. In the case when multiplenondominated points reside in a single grid cell, only the point closest tothe lower left corner of the cell (assuming minimization) will be added tothe on-line archive thereby ensuring convergence to the true Pareto optimalset12. For example, solution 1 in figure 4.2 would be stored in the archivebecause it is closer to Point G than solution 2.

The archive is updated in every generation of the e-NSGA-II runs with adiverse set of "e-nondominated" solutions, which are guaranteed to be sep-arated by a minimum distance of e in the objective. The values specifiedfor e also directly impact the algorithm's convergence speed. A high preci-sion representation of the Pareto optimal set can be captured by specifyingthe precision tolerances e to be very small. Small precision tolerances willincrease the number of Pareto optimal solutions that are e-nondominated,increase the archive size, and increase population sizing requirements (30,p. 74). The e-NSGA-II has the advantage of allowing users to dramaticallyreduce computation times, by accepting a lower resolution (i.e.,specifyinghigher values of e) representation of the Pareto frontier. Note that lowerresolution approximate representations of the Pareto frontier can be help-ful by reducing the number of designs decision makers must consider. Inenvironmental applications, decision makers can benefit from a small set ofPareto optimal solutions that allow them to interpret the general shape or


inflection of their design tradeoffs to support a diminishing returns analysisfor their design criteria (e.g.,How much can contaminant map uncertaintybe reduced with additional groundwater samples?)

4.4.3. Injection and Termination

The e-NSGA-II also seeks to speed convergence by pre-conditioning searchwith larger population runs with the prior search results attained usingsmall populations. In prior efforts14, any attempts to inject solutions foundusing small population into subsequent runs made the NSGA-II prema-turely converge to poor representations of the Pareto optimal set, especiallyfor problems with greater than 2 objectives. The e-domination archive'sability to preserve diversity plays a crucial role in overcoming this limita-tion. As described previously in Section 4.1, the e-NSGA-II begins searchwith an initial population of 5 individuals for 21 generations from whichthe e-nondominated solutions identified in this initial run are stored in thearchive. A minimum of two successive runs must be used to determine iffurther search is justified.

Search progress is rated in terms of a user-defined criterion that spec-ifies the minimum percentage change in the number of £-nondominatedindividuals AND found in two successive runs. For example, consider twosuccessive runs of the £-NSGA-II in which the first run uses a populationof N sampling designs to evolve a e-nondominated set composed of A in-dividuals, while the second run uses a population of 2N designs to evolvea e-nondominated set of K individuals. The results of these runs are usedin equation 1 to define which of the two following courses of action will betaken: (1) population size is again doubled, resulting in 4N individuals tobe used in an additional run of the e-NSGA-II or (2) the algorithm stopsto allow the user to assess if the e-nondominated set has been quantified tosufficient accuracy.

if AJVD < ('—-j—*•) 100 then double N and continue searchelse stop search

The archive at the end of each run contains e-nondominated solutionsthat can be used to guide search in future runs and speed convergence tothe Pareto front. This is achieved by injecting e-nondominated solutionsfrom the archive at the end of the run with population size JVinto the ini-tial population of the next run which has a population size 2N. Figure 4.2

(3)


illustrates the two scenarios that arise when the e-NSGA-II injects solu-tions from the archive generated with a population size N into the initialgeneration of a run with a population size 27V.

In scenario 1 shown in Figure 4.3a, the archive size A is smaller thanthe subsequent population size 2N. In this case, 100-percent of the e-nondominated archive solutions are injected into the first generation ofthe subsequent run with 2N individuals. We have found that the numberof injected solutions should be maximized to aid rapid convergence. Thee-dominance archive in combination with successive doubling of populationsize guarantees the e-NSGA-II will maintain sufficient solution diversity.Figure 4.3b shows the second injection scenario, which occurs when thearchive size A is greater than the next population size 27V. In this case,2/V e-nondominated archive solutions are selected randomly and injectedinto the first generation of the next run, again maximizing the impact ofinjected solutions.

Fig. 4.3. Schematic representation of injection when (a) the archive size A is smallerthan the next population size 2N and (b) the archive size A is larger than the nextpopulation size 2N.


4.5. Results

The following experiments were designed to validate that the e-NSGA-IIcan efficiently and reliably approximate the monitoring test case's enumer-ated Pareto front [shown in Figure 4.4]. The experiments show the e-NSGA-IFs efficiency relative to the NSGA-II using both Deb's29 recommendedparameter settings as well as the parameter settings recommended by Reedet al.14. An additional goal of these experiments is to clearly demonstratehow user termination and precision criteria impact the e-NSGA-II's perfor-mance.

Fig. 4.4. The enumerated tradeoff between cost and mapping error designated as SREB.

Table 4.10 compares the performance of the proposed algorithm with afixed population sized NSGA-II using Deb's recommended parameter set-tings and the adaptive population sizing approach of Reed et al.14. Theseexperiments were designed to be a conservative performance test for the e-NSGA-II by favoring the fixed and adaptive population NSGA-II runs. Theruns based on Deb's and Reed's prior recommendations both use unbounded


offline archives, in which all identified nondominated solutions were storedin memory and used to generate their final results. Additionally, since theenumerated tradeoff is discrete with only 36 solutions the £-NSGA-II's in-jection has reduced role in enhancing performance relative to problems withlarger Pareto optimal sets. The problem was solved with 50 random seedsand the average number of nondominated solutions obtained as well as theaverage number of function evaluations taken is reported. The NSGA-II pa-rameter settings based on Deb's recommendations used a population size of100, run length of 250 generations while the probabilities of crossover andmutation were set to 0.5 and 0.01, respectively. Reed et al.14 recommendedthat users specify an initial population size equal to twice the number ofPareto optimal solutions they expected to find. For the monitoring casestudy solved in this chapter Reed et al. 14 used an initial population of 60.In this comparison, the initial population is set to 60 and the rest of theparameters are specified as the same as those described in Section 4.1.

Table 4.10. Comparison of the proposed algorithm with NSGA-II and the previous designmethodology proposed by Reed et al.14.

| n l , , L J , il E -NSGA-I IDeb et al Reed et al —r „ i r~—Ajyj) = 10% AND = 5%Average no. of solutions found 30 32 25 27Min no. of function evaluations 25000 16800 " 1400 1400Ave. no. of function evaluations" 25000 39889 7992 15512Max no. of function evaluations 25000 74400 ~ 25400 25400

The fixed population NSGA-II was able to find an average of 30 nondom-inated solutions using 25000 function evaluations. The adaptive population-sizing runs based on these parameter settings took nearly an average of40000 function evaluations to obtain an average of 32 nondominated solu-tions per run. As expected, the additional 15000 design evaluations are theprimary reason why more nondominated solutions were found per run rel-ative to the fixed population runs using Deb's settings. It should be notedthat these runs seek an enumerated front that was quantified using 6-digitprecision (10"06).

For the e-NSGA-II runs, cost [ecost] and SREE [ESREE] precision limitswere set equal to 0.001 and 10~05, respectively. In test runs performed forthis chapter, it was observed that the SREE objective was the most sensitiveto precision limits and that there were no substantial impacts on perfor-mance when esREE was set equal to be less than 10~OB [see Figure 4.8].The e-NSGA-II required an average of 7992 function evaluations to obtain


an average of 25 solutions. Although the e-NSGA-II found fewer solutionson average than other methods, the method generally found a more diverseset of solutions representing the full extent of the Cost—SREE tradeoff.The e-NSGA-II results represent a computational savings of 70 and 80-percent relative to using Deb's and Reed et al.'s recommended parametersettings. Moreover, it should be emphasized that the e-dominance archiveis dramatically more efficient than using offline analysis. Table 4.10 alsoshows that reducing the ff-NSGA-II's termination AND does result in anincrease in the number of nondominated solutions found, but on averagethe user must expend twice the number of function evaluations. Figure 4.5shows a run result representative of the e-NSGA-II's average performance,in which 6200 function evaluations where used to evolve 26 nondominatedpoints (setting A/VD equal to 10-percent).

Fig. 4.5. Typical performance of the £-NSGA-II when AJVD is set equal to 10-percent.

For this application, setting AND to 10-percent produces a sufficientrepresentation of the Cost-SREE tradeoff; all subsequent results use this


termination criterion.Figures 4.6 and 4.7 graphically compare the results obtained from the

best and the worst runs of the three tested variants of the NSGA-II. Fig-ure 4.6 shows the best results obtained from the 50 runs of each versionof the algorithm measured in terms of number of nondominated solutionsfound. All of the algorithms were able to capture the true front. The fixedsized NSGA-II as well as the e-NSGA-II required 25000 function evalua-tions to capture the entire enumerated front, while the adaptive populationNSGA-II using offline analysis described by Reed et al (2003) used 74400function evaluations.

Fig. 4.6. Comparison of the solutions obtained from the best run of each algorithmrated in terms of number of nondominated solutions found.

The worst results defined in terms of the number of nondominated so-lutions obtained are plotted in Figure 4.7. Initial review of the plot maylead the reader to conclude that the e-NSGA-II did not perform as wellas the other methods. This particular run of the e-NSGA-II highlights a


potential side-effect of using AND termination criterion. In this case, dou-bling the population size from 10 to 20 failed to produce more than a 10percent increase in the number of nondominated solutions found leading toa premature end to the run. Figure 4.7 shows that the e-NSGA-II found 18solutions after just 1400 function evaluations. Moreover, the 18 solutionsare distributed over the entire Cost—SREE tradeoff; if performance is mea-sured in terms of solution diversity and convergence speed the e-NSGA-IFsperformance is far superior to the other methods. If the user wants a moreaccurate representation of the tradeoff they would only have to lower ANDvalue and continue the run.

Fig. 4.7. Comparison of the solutions obtained from the worst run of each algorithmrated in terms of number of nondominated solutions found.

The values of ecostand £SREB indicate the degree of precision that theuser expects to use when evaluating each of the two objectives. The usercan bias the search towards a certain objective by increasing the precisionrequirements for that objective. For the Cost—SREE monitoring problem,


the SREE objective is the most sensitive to precision requirements. The ef-fects of varying the value of SSREE on the number of solutions obtained bythe e-NSGA-II as well as the total number of function evaluations requiredare demonstrated in Figure 4.8 for a £cost and AJVC of 0.001 and 10-percent,respectively. The value of CSREE is varied between 10~03 and 10~06. Theresults shown in Figure 4.8 are averaged for 50 random seeds. The num-ber of nondominated solutions found by the e-NSGA-II does not increasesignificantly as £SREB^ decreased beyond 10~05. Figure 4.8 indicates thatusers can attain approximations to the Cost—SREE tradeoff using less than4000 function evaluations, which represents an order of magnitude decreaserelative to our prior solution approaches for this problem.

Fig. 4,8. Average variation of the number of nondominated solutions found and thenumber of function evaluations required with different values of SSREE-


4.6. Discussion

The e-NSGA-II gives users more direct control to balance their accuracyneeds and the computational demands associated with evolving the Paretofrontiers for their applications. A key result presented in this chapter isthat with as few as 1400 function evaluations, the algorithm was able toapproximate the Cost—SREE tradeoff's general shape by identifying a di-verse set of solutions along the entire extent of the curve. The approximaterepresentation could be used for by environmental decision makers to makereasonable assessments of the diminishing returns of using more than 35samples [corresponding to a scaled cost of 0.6 in Figure 4.7].

The computational efficiency of the e-NSGA-II will aid our future ef-forts in exploring the use of EMO to solve high-order Pareto optimizationproblems. Reed and Minsker7 introduced the value of considering morethan 2 objectives for water resources and environmental design applications.High-order Pareto frontiers allow decision makers to better understand in-teractions between their objectives. As an example, Reed and Minsker's4-objective monitoring application highlighted previously unknown objec-tive conflicts that significantly impact the design of LTM systems.

The e-NSGA-II has significant potential for dramatically reducing thecomputational costs of evolving high-order Pareto fronts. The algorithms e-dominance archive will also enhance decision-making by bounding the sizeof the Pareto optimal set that stakeholders must consider. For monitoringapplications, the value of mapping accuracy (SREE) can be visualized inspace. Stakeholders can visualize members of the reduced set of solutionsevolved by the e-NSGA-II to better understand how their objectives impactdesigns and to exploit low cost improvements in their design objectives.

4.7. Conclusions

The e-NSGA-II demonstrates how e-dominance archiving can be combinedwith a parameterization strategy for the NSGA-II to accomplish the fol-lowing goals: (1) ensure the algorithm will maintain diverse solutions, (2)eliminate the need for trial-and-error analysis for parameter settings {i.e.,population size, crossover and mutation probabilities), and (3) allow usersto sufficiently capture tradeoffs using a minimum number of design evalu-ations. A sufficiently quantified tradeoff can be defined as a subset of non-dominated solutions that provide an adequate representation of the Paretofrontier that can be used to inform decision making. Results are presentedfor a 2-objective groundwater monitoring case study in which the archiv-


ing and parameterization techniques for the NSGA-II combined to reducecomputational demands by greater than 70-percent relative to prior pub-lished results. The methods of this chapter can be easily generalized toother multiobjective applications to minimize computational times as wellas trial-and-error parameter analysis.

References

1. B. J. Ritzel, J. W. Eheart, and S. R. Ranjithan, Using genetic algorithmsto solve a multiple objective groundwater pollution containment problem.Water Resources Research, 1994. 30(5): p. 1589-1603.2. D. Halhal, G. A. Walters, D. Ouazar and D. A. Savic, Water networkrehabilitation with structured messy genetic algorithm. Journal of WaterResources Planning and Management, 1997. 123(2): p. 137-146.3. D. H. Loughlin, S. R. Ranjithan, J. W. Baugh Jr. and E. D. Brill Jr.,Application of Genetic Algorithms for the Design of Ozone Control Strate-gies. Journal of the Air and Waste Management Association, 2000. 50: p.1050-1063.4. P. Reed, B.S. Minsker, and D.E. Goldberg, A multiobjective approach tocost effective long-term groundwater monitoring using an Elitist Nondomi-nated Sorted Genetic Algorithm with historical data. Journal of Hydroinfor-matics, 2001. 3(2): p. 71-90.5. M. A. Erickson, A. Mayer, and J. Horn, Multi-objective optimal design ofgroundwater remediation systems: application of the niched Pareto geneticalgorithm (NPGA). Advances in Water Resources, 2002. 25(1): p. 51-56.6. Z. Kapelan, D. A. Savic, and G. A. Walters, Multiobjective Sampling De-sign for Water Distribution Model Calibration. Journal of Water ResourcesPlanning and Management, 2003. 129(6): p. 466-479.7. P. Reed and B. S. Minsker, Striking the Balance: Long-Term GroundwaterMonitoring Design for Conflicting Objectives. Journal of Water ResourcesPlanning and Management, 2004. 130(2): p. 140-149.8. Task Committee on Long-Term Groundwater Monitoring Design, Long-Term Groundwater Monitoring: The State of the Art. 2003, Reston, VA:American Society of Civil Engineers.9. National Research Council, Environmental Cleanup at Navy Facili-ties: Adaptive Site Management. 2003, Washington, D. C.: The NationalAcademies Press.10. Energy, D.o., DOE/EM-0563 A report to Congress on long-term steward-ship: Volume I Summary Report. 2001, Office of Environmental Management:Washington, D.C.11. K. Deb, A. Pratap, S. Agarwal and T. Meyarivan, A Fast and ElitistMultiobjective Genetic Algorithm: NSGA-II. IEEE Transactions. Evolution-ary Computation, 2002. 6(2): p. 182-197.12. M. Laumanns, L. Thiele, K. Deb and E. Zitzler, Combining Conver-gence and Diversity in Evolutionary Multiobjective Optimization. Evolution-


ary Computation, 2002. 10(2): p. 263-282.13. K. Deb, M. Mohan, and S. Mishra, A Fast Multi-objective EvolutionaryAlgorithm for Finding Well-Spread Pareto-Optimal Solutions. In Fonseca etal., editor, Proceedings of the Evolutionary Multi-Criterion Optimization.Second International Conference, EMO 2003, Faro, Portugal, 2003, Springer.Lecture Notes in Computer Science. Volume 2632: p 222-236.14. P. Reed, B.S. Minsker, and D.E. Goldberg, Simplifying MultiobjectiveOptimization: An Automated Design Methodology for the NondominatedSorted Genetic Algorithm-II. Water Resources Research, 2003. 39(7): p. 1196,doi: 10.1029/2002WR001483.15. P. Reed, T. Ellsworth, and B.S. Minsker, Spatial Interpolation Methodsfor Nonstationary Plume Data. Ground Water, 2004. 42(2): p. 190-202.16. V. Devireddy and P. Reed. An Efficient Design Methodology for the Non-dominated Sorted Genetic Algorithm-II. In Late Breaking Papers within theProceedings for the 2003 Genetic and Evolutionary Computation Conference(GECCO 2003). 2003. Chicago, IL: p. 67-71.17. N.-Z. Sun, Inverse Problems in Groundwater Modeling. Theory and Ap-plications of Transport in Porous Media, ed. J. Bear. Vol. 6. 1994, New York,NY: Kluwer Academic Publishers.18. E. Zitzler, M. Laumanns, and L. Thiele, SPEA2: Improving the StrengthPareto Evolutionary Algorithm. 2001, Department of Electrical Engineering,Swiss Federal Institute of Technology: Zurich, Switzerland.19. Carlos A. Coello Coello and Gregorio Toscano Pulido. MultiobjectiveOptimization using a Micro-Genetic Algorithm. In Lee Spector et al., ed-itor, Proceedings of the Genetic and Evolutionary Computation Conference(GECCO'2001), San Francisco, California, 2001. Morgan Kaufmann Publish-ers: p 274-282.20. Gregorio Toscano Pulido and Carlos A. Coello Coello, The Micro Ge-netic Algorithm 2: Towards Online Adaptation in Evolutionary Multiobjec-tive Optimization. In Fonseca et al., editor, Proceedings of the EvolutionaryMulti-Criterion Optimization. Second International Conference, EMO 2003,Faro, Portugal, 2003, Springer. Lecture Notes in Computer Science. Volume2632: p 252-266.21. K.C. Tan, T.H. Lee, and E.F. Khor. Evolutionary Algorithms with Dy-namic Population Size and Local Exploration for Multiobjective Optimiza-tion. IEEE Transactions on Evolutionary Computation, 5(6):565-588, De-cember 2001.22. F. Kursawe. A Variant of Evolution Strategies for Vector Optimization. InH. P. Schwefel and R.Manner, editors, Parallel Problem Solving from Nature.1st Workshop, PPSN I, volume 496 of Lecture Notes in Computer Science,pages 193-197, Berlin, Germany, Oct 1991.Springer-Verlag.23. H. A. Abbass. The Self-Adaptive Pareto Differential Evolution Algorithm.In Congress on Evolutionary Computation (CEC'2002), volume 1, pages 831-836, Piscataway, New Jersey,May 2002. IEEE Service Center.24. D. Thierens, D.E. Goldberg, and A.G. Pereira. Domino Convergence,Drift, and the Temporal-Salience Structure of Problems. In The 1998 IEEE


International Conference on Evolutionary Computation. 1998: IEEE Press.25. T. Back, D. Fogel, and Z. Michalewicz, Handbook of Evolutionary Com-putation. 2000, Bristol, UK.26. D. Thierens, Analysis and design of genetic algorithms. 1995, KatholiekeUniversiteit: Leuven, Belgium.27. K. DeJong, An analysis of the behavior of a class of genetic adaptivesystems. 1975, University of Michigan: Ann Arbor, ML28. J. D. Schaffer, R. A. Caruana, L. J. Eshelman and R. Das, A studyof control parameters affecting online performance of genetic algorithms forfunction optimization. In Proceedings of the Third International Conferenceon Genetic Algorithms. 1989: Morgan Kaufmann.29. K. Deb, Multi-Objective Optimization using Evolutionary Algorithms.2001, New York, NY: John Wiley L Sons LTD.30. N. Khan, Bayesian Optimization Algorithms for Multiobjective and Hi-erarchically Difficult Problems. Masters Thesis, 2003, University of Illinoisat Urbana-Champaign: Urbana

CHAPTER 5

USING A PARTICLE SWARM OPTIMIZER WITH AMULTI-OBJECTIVE SELECTION SCHEME TO DESIGN

COMBINATIONAL LOGIC CIRCUITS

Erika Hernandez Luna and Carlos A. Coello CoelloCINVESTAV-IPN

Evolutionary Computation GroupDpto. de Ing. Elect./Secc. Computation

Av. IPN No. 2508, Col. San Pedro ZacatencoMexico, D.F. 07300, MEXICO


In this chapter, we propose the introduction of a multi-objective selectionscheme in a particle swarm optimizer used for designing combinationallogic circuits. The proposed selection scheme is based on the use of sub-populations to distribute the search effort in a better way within theparticles of the population as to accelerate convergence while improvingthe robustness of the algorithm. For our study, we compare six PSO-based approaches, combining different encodings (integer and binary)with both single- and multi-objective selection schemes. The comparativestudy performed indicates that the use of a population-based approachcombined with an integer encoding improves both the robustness andquality of results of PSO when designing combinational logic circuits.

5.1. Introduction

The Particle Swarm Optimization (PSO) algorithm is a biologically-inspired technique originally proposed by James Kennedy and Russell Eber-hart 18 '19. PSO has been successfully used as a (mainly nonlinear) opti-mization technique and has become increasingly popular mainly due to itssimplicity (in terms of its implementation), its low computational cost andits good overall performance 19.

The main idea behind PSO is to simulate the movement of a flockof birds seeking food. In this simulation, the behavior of each individual

101

102 Hernandez Luna and Coello Coello

gets affected by both an individual and a social factor. Each individual(or particle) contains its current position in the search space as well as itsvelocity and the best position found by the individual so far 19. As manyother biologically-inspired heuristics, PSO is a population-based approachthat can be denned as P' = (m(/(P)), where P is the population, whichconsists of a set of positions in search space, / is the fitness function thatreturns a vector of values that indicate the goodness of each individual,and TO is a manipulation function that generates a new population from thecurrent population. Such a manipulation function is based on the behavioralmodel of insect colonies x.

PSO can be seen as a distributed behavioral algorithm that performs(in its more general version) multidimensional search. In the simulation,the behavior of each individual is affected by either the best local or thebest global individual. The approach uses a population of potential solutions(called "particles") and a measure of performance similar to the fitness valueused with evolutionary algorithms. Also, the adjustments of individualsare analogous to the use of a crossover operator. However, this approachintroduces the use of flying potential solutions through hyperspace (usedto accelerate convergence). Additionally, PSO allows individuals to benefitfrom their past experiences 19.

In this chapter, we propose the use of a multi-objective selection schemeto design combinational circuits. Our approach is based on some of our pre-vious research on circuit design using genetic algorithms 6. The proposalconsists of handling each of the matches between a solution generated byour PSO approach and the values specified by the truth table as equalityconstraints. To avoid the dimensionality problems associated with conven-tional multi-objective optimization techniques, we use a population-basedapproach similar to the Vector Evaluated Genetic Algorithm (VEGA) 26.

5.2. Problem Statement

The main goal of logic circuit simplification is normally the minimizationof the amount of hardware necessary to build a certain particular system,since less hardware will normally imply a lower final cost. The problem ofinterest to us consists of designing a circuit that performs a desired function(specified by a truth table), given a certain specified set of available logicgates. The complexity of a logic circuit is a function of the number of gatesin the circuit. The complexity of a gate generally is a function of the numberof inputs to it. Because a logic circuit is a realization (implementation) of

Using a PSO to Design Combinational Logic Circuits 103

a Boolean function in hardware, reducing the number of literals in thefunction should reduce the number of inputs to each gate and the numberof gates in the circuit—thus reducing the complexity of the circuit. Ouroverall measure of circuit optimality is the total number of gates used,regardless of their kind. This is approximately proportional to the totalpart cost of the circuit. Obviously, this sort of analysis must be performedonly for fully functional circuits.

Boolean functions can be simplified through algebraic manipulations.However, the process is tedious and requires considerable experience fromthe human designer as to achieve compact circuits.

As it is known, there are several standard graphical design aids suchas the Karnaugh Maps 17'29, which are widely used by human designers.There are also other tools more suitable for computer implementation suchas the Quine-McCluskey Method 25>22, Espresso 2 and MisII 3.

Evolutionary algorithms have been applied to the design of circuits ofdifferent types, and have been found very useful in a wide variety of appli-cations due to their robustness and exploratory power. The area devoted tothe study and application of evolutionary algorithms to design electroniccircuits is called evolvable hardware 27.16-30. This area has been subdividedby some authors into two sub-areas 31:

(1) intrinsic evolution: deals with the design and validations of thecircuits directly in hardware.

(2) extrinsic evolution: only deals with computer simulations of thecircuits without reaching their actual implementation in hardware.

Within extrinsic evolution, several types of heuristics have been appliedto design combinational logic circuits. For example: genetic programming23,20,n,4 a n t coiony IO^ genetic algorithms 5, and, only recently, particleswarm optimization 13'8.

Despite the drawbacks of classical combinational circuit design tech-niques, some of them can handle truth tables with hundreds of inputs,whereas evolutionary algorithms are restricted to relatively small truth ta-bles 23. However, the most interesting aspect of evolutionary design is thepossibility of studying its emergent patterns 23'5. The goals are, therefore,different when we design circuits using evolutionary algorithms. First, weaim to optimize circuits (using a certain metric) in a different way, and in-tuitively, we can think of producing novel designs (since there is no humanintervention). Such novel designs have been shown in the past 23.24.5>15.


Second, it would be extremely useful to extract design patterns from suchevolutionary-generated solutions. This could lead to a practical design pro-cess in which a small (optimal) circuit is used as a building block to producecomplex circuits. Such a divide-and-conquer approach has also been sug-gested in the past 28>23.

5.3. Our Proposed Approach

The first important component of the algorithm proposed in this paper isthe representation adopted to encode a circuit. In our case, we used a bidi-mensional matrix as in our previous work 5 (see Figure 5.1). More formally,we can say that any circuit can be represented as a bidimensional array ofgates Sij, where j indicates the level of a gate, so that those gates closerto the inputs have lower values of j . (Level values are incremented fromleft to right in Figure 5.1). For a fixed j , the index i varies with respect tothe gates that are "next" to each other in the circuit, but without beingnecessarily connected. Each matrix element is a gate (there are 5 types ofgates: AND, NOT, OR, XOR and WIREd) that receives its 2 inputs fromany gate at the previous column as shown in Figure 5.1. This sort of encod-ing was originally proposed by Louis 21. The so-called "cartesian geneticprogramming" 23 also adopts a similar encoding to the matrix previouslydescribed.

Using the aforementioned matrix, a logic circuit can be encoded usingeither binary or integer strings. PSO, however, tends to deal with eitherbinary or real-numbers representation. For our comparative study, we willadopt two integer representations:

(1) Integer A: This encoding was proposed by Hu et al. 14).(2) Integer B: This encoding is proposed by us.

In the PSO algorithm, the individual factor Pbest refers to the decisionsthat the individual has made so far and that have worked best (in terms ofits performance measure). This value has an impact on its future decisions.Additionally, the social factor Ntest refers to the decisions that the otherindividuals (within a certain neighborhood) have made so far and that haveworked best for them. This value will also affect the future decisions of theindividuals in the given neighborhood.

dWIRE basically indicates a null operation, or in other words, the absence of gate, and itis used just to keep regularity in the representation used by our approach that otherwisewould have to use variable-length strings.


Fig. 5.1. Encoding used for each of the matrix elements that represent a circuit.

Figure 5.2 shows the pseudocode of the PSO algorithm that we proposefor the design of combinational logic circuits. Its main difference with re-spect to traditional PSO has to do with the update of the position of theparticle in each of its dimensions (marked with ** in Figure 5.2). The mainprocedure for updating each dimension d of the particle for a traditionalbinary approach, an integer A and an integer B approach is shown next:

• Binary approach

if flipfsig^)] = 1 thenCopy into the d position of the particle the value 1

elseCopy into the d position of the particle the value 0

• Integer A approach

if flip[sig(ud)] — 1 thenCopy into the d position of the particle the correspondingvalue of Nbest.

• Integer B approach

if flip[sig(wd)] = 1 thenCopy to the particle the value of Nêst in the position d

els if flip[l - sig(wd)] = 1 then


Randomly initialize the population of particles, P.Repeat {

For each particle i in the population P {Compute the fitness of the particle P[i]If the fitness of P[i] is better than the fitness ofthe best particle found so far Pbest[<],Then update Pbest using P[i].

}For each particle i in P {

Select the particle with the best fitness in thetopological neighborhood of P[i]and update the value of Nbest[i]

}For each particle i in the population P {

Compute the new velocity for each dimension ofthe particleV[i]new = V[i\oid + MPbest\i] - P[i]) +4>2{Nbest[i] - P[i\)** Update the position of the particle P[i]

}Apply uniform mutation with a (user given) rate.

} Until reaching the stop condition

Fig. 5.2. Pseudocode of the PSO algorithm adopted in this work. Note the addition ofa mutation operator.

Copy into the d position of the particle the correspondingvalue of Pbest-

In all cases, flip\p] returns 1 with a given probability p. The variable Vdrefers to the velocity of the particle in the d dimension The function signormalizes variable Vd and is defined as follows:

Both, the Integer A and the Integer B approaches normalize the ve-locity of each dimension of the particle in the range 0 to 1, so that we canfurther determine (in a random way) whether we need to change the currentposition or not (this is done with the probability given by the velocity). Ifthe change is required, then we copy to the particle the value of Nbest inthe current position. Otherwise, the Integer A approach leaves the particleintact. When the change is not required, the Integer B approach checks

(1)


again whether is necessary to change the current position, but now usinga probability of 1 — va, where Vd is the current velocity. If the change isrequired, then we copy to the particle the value of P\,tst in the position thatwe are updating. Otherwise, we leave the particle intact. These two integerrepresentations are exemplified in Figure 5.3. As in our previous work 8, weintroduce here a mutation operator to our PSO algorithm in order to im-prove its exploratory power, since this seems necessary when applying thisapproach to the design of circuits. Furthermore, in this case, the particlestry to follow the same characteristics of iV(,est and Pbest and could get stuckin their current position. Thus, the use of a mutation operator is vital inorder to avoid this problem.

Fig. 5.3. Example of the two integer representations used for our PSO algorithm.

5.4. Use of a Multi-Objective Approach

The objective function in our case is defined as in our previous work 5: it isthe total number of matches (between the outputs produced by an encodedcircuit and the intented values defined by the user in the truth table). Foreach match, we increase the value of the objective function by one. If theencoded circuit is feasible (i.e., it matches the truth table completely), thenwe add one (the so-called "bonus") for each WIRE present in the solution.Note however, that in this case, we use a multi-objective approach to assignfitness. The main idea behind our proposed approach is to use a population-


based multi-objective optimization technique similar to VEGA 26 to handleeach of the outputs of a circuit as an objective (see Figure 5.4). In otherwords, we would have an optimization problem with m equality constraints,where m is the number of values (i.e., outputs) of the truth table that weaim to match. So, for example, a circuit with 3 inputs and a single output,would have m = 23 = 8 values to match. At each generation, the populationis split into m + 1 sub-populations, where m is defined as indicated before(we have to add one to consider also the objective function). Each sub-population optimizes a separate constraint (in this case, an output of thecircuit). Therefore, the main mission of each sub-population is to matchits corresponding output with the value indicated by the user in the truthtable.

Old NewSub-populations Sub-populations

1 f(x) 1 f(x)

2 °i(x) 2 °/x)Apply [\

3 02 ( x ) I \ 3 ° 2 ( X )

• genetic •• operators •

m+1 o(x) m+1 o(x)m m

Fig. 5.4. Graphical representation of the selection scheme approach adopted.

The main issue here is how to handle the different situations that couldarise. Our proposal is the following:

if Oj(X) ^ tj then fitness(X) = 0else if v ^ 0 AND x e R then fitness = —velse fitness = /(X)

where Oj(X) refers to the value of output j for the encoded circuit X;tj is the value specified for output j in the truth table; v is the numberof outputs that are not matched by the circuit X (< m); and R is thesubpopulation whose objective is to match all the output values from the


truth table. Finally, /(X) is the fitness function defined as:

, , _ f h(X) if X is infeasible , .

^ >~\ h(X) + w(X) otherwise [ '

In this equation, h(X) refers to the number of matches between thecircuit X and the values denned in the truth table, and w(X) is the numberof WIREs in the circuit X. As can be seen, the scheme adopted in this workis slightly different from the one used by our MGA reported in 6. The mainreason for adopting this approach is that in our experiments, it producedmore competitive results, improving in most cases the results obtained withour single-objective PSO, as we will see in the next section.

5.5. Comparison of Results

The truth tables used to validate our PSO approach were taken from thespecialized literature. In our experimental study, we compared the follow-ing approaches: a binary multi-objective PSO approach (BMPSO), a PSOapproach using an integer A encoding (EAMPSO), a PSO approach usingan integer B encoding (EBPSO), a binary single-objective PSO (BPSO),a single-objective PSO approach using integer A encoding (EAPSO), asingle-objective PSO approach using integer B encoding (EBPSO) and themulti-objective genetic algorithm for circuit design (MGA) 6. For each ofthe examples shown, we performed 20 independent runs, and the avail-able set of gates considered was the following: AND, OR, NOT, XOR andWIRE. We used a matrix of size 5 x 5 in all cases, except for the secondexample for which a 6 x 6 matrix was adopted. The parameters adopted byboth BPSO and BMPSO were the following: fa = fa = 0.8, Vmax = 3.0,mutation rate Pm = 0.1 and neighborhood size = 3. EAPSO, EAMPSO,EBPSO and EBMPSO used: fa = fa = 0.2, Vmax = 0.4, Pm = 0.1 andneighborhood size = 3. The MGA used Pm = 0.00667 and a crossover rate= 0.5 (as suggested in 6).

5.5.1. Example 1

Our first example has 4 inputs and 1 output, as shown in Table 5.11. Theadditional parameters adopted by each approach are shown in Table 5.12.Note that we attempted to perform the same number of fitness functionevaluations with all the approaches compared. In Table 5.13, we show acomparison of the results of all the approaches adopted. The best solution


S = (B + {D® A))1 + (C® D{D © A))Fig. 5.5. Diagram and Boolean expression corresponding to the best solution found byour multi-objective PSO approaches for example 1.

found for this example has 6 gates and is graphically shown in Figure 5.5.Note that both BMPSO and EBMPSO were able to find a circuit thatuses one gate less than their single-objective counterparts (i.e., BPSO andEBPSO). Nevertheless, the average fitness of both BMPSO and EBMPSOwere lower than the values of their single-objective counterparts. Also notethat although EAMPSO was not able to improve the solutions obtainedby EAPSO, its percentage of feasible circuits increased from 65% to 85%.Also, the average fitness of EAMPSO was 30.25 compared to the 26.75 valueproduced by EAPSO. In this example, the MGA did not perform too wellwhen compared with any of our PSO versions. Its percetange of feasiblecircuits was low (35%) and it was not able to find the solution with only6 gates produced by some of the PSO approaches. Another interesting factwas that EBPSO had the best average fitness (31.2), but was not able toproduce circuits with 6 gates. EAMPSO, in contrast, had the second bestaverage fitness (30.25), but was able to find circuits with only 6 gates 5% ofthe time. Thus, EAMPSO can be considered as the best overall performerin this example.

The Boolean expression corresponding to the best solution found by ahuman designer is: 5 = {(A © B) © {(AD)(B + C))) + ((A + C) + D)'. Thissolution has 9 gates and was generated using Karnaugh maps and Booleanalgebra. This solution has been reported before in the specialized literature(see 7) and can be used as a reference to compare the results obtained byour PSO approach. The best solution found by our PSO approaches onlyrequires 6 gates.

5.5.2. Example 2

Our second example has 4 inputs and 1 output and its truth table is shownin Table 5.14. The additional parameters adopted by each approach areshown in Table 5.15.


Table 5.11. Truth tablefor example 1.

D C B A I S~0 0 0 0 ~T~0 0 0 1 00 0 1 0 00 0 1 1 00 1 0 0 10 1 0 1 10 1 1 0 1

0 1 1 1 11 0 0 0 11 0 0 1 11 0 1 0 11 0 1 1 01 1 0 0 01 1 0 1 11 1 1 0 01 1 1 1 I 1

Table 5.12. Parameters adopted for example 1.

Technique Population size Iterations Fitness function evaluationsMPSO 68 1,471 100,028

PSO 50 2,000 ~ 100,000MGA 170 601) 102,000

Table 5.13. Comparison of the results obtained byour multi-objective versions of PSO, our single-objective PSO ver-sions, MGA and a human designer for the first example. b.s.=bestsolution.

approach gates freq. feas. avg.# avg. std.b.s. b.s. circs. gates fitn. dev.

BMPSO 8 5% 20% 22.8 18.~ 6.622EAMPSO 6 5% 85% 10.75 30.25 6.680EBMPSO 6 5% 75% 12.75 28.25 7.953

BPSO 9 15% 45% 19.1 21.9 7.887EAPSO 6 5% 65% 14.25 26.75 8.902EBPSO 7 30% 90% 9.8 31.2~ 5.616MGA 7 15% 35% 19.95 21.05 8.929

Humandesigner 9 - - - - -

In Table 5.16, we show a comparison of the results of all the approachesadopted. The best solution found for this example has 6 gates and is graph-ically shown in Figure 5.6. Note that in this example, BPSO had a slightly


oS

S = (C © D){B © C) + ((B © A) © (C © D))

Fig. 5.6. Diagram and Boolean expression corresponding to the best solution found byour multi-objective PSO approaches for example 2.

better performance than BMPSO (both in terms of average fitness andin terms of frequency with which the best solution was found). The twomulti-objective algorithms that adopted an integer encoding (EAMPSOand EBMPSO) showed an excellent performance, being able to find a cir-cuit with 6 gates (fitness of 35) in every single run (the standard deviationwas zero). This performance is significantly better than that of the single-objective versions of these two algorithms (EAPSO and EBPSO). Again,the MGA did not perform too well when compared with any of our PSOversions. The MGA was not able to produce feasible circuits in all of itsruns, and the best circuit was found only 30% of the time. In this case, bothEAMPSO and EBMPSO were the best overall performers, with an averagefitness of 35 and a standard deviation of zero.

The Boolean expression corresponding to the best solution found by ahuman designer is: S = (A®B)®(C®D)+D'(CA)+B(A'D). This solutionhas 11 gates and was generated using Karnaugh maps and Boolean algebra.It is worth contrasting the best solution produced by the human designerwith respect to the best solution found by our PSO approaches which onlyrequires 6 gates.

5.5.3. Example 3

Our third example has 5 inputs and 1 output, as shown in Table 5.17. Theadditional parameters adopted by each approach are shown in Table 5.18.In Table 5.19, we show a comparison of the results of all the approachesadopted. The best solution found for this example has 7 gates and is graph-ically shown in Figure 5.7. In this case, none of the binary versions of PSOwas able to produce feasible circuits, which exemplifies the usefulness of


Table 5.14. Truth tablefor example 2.

~D C B A I S0 0 0 0 00 0 0 1 10 0 1 0 10 0 1 1 00 1 0 0 10 1 0 1 10 1 1 0 00 1 1 1 11 0 0 0 11 0 0 1 01 0 1 0 11 0 1 1 11 1 0 0 01 1 0 1 11 1 1 0 11 1 1 1 I 0


Technique Population size Iterations Fitness function evaluationsMPSO 68 1,471 100,028PSO ~ 50 2,000 ~ 100,000MGA [ 170 I 600 ~ 102,000

Table 5.16. Comparison of the results obtained by our multi-objectiveversions of PSO, our single-objective PSO versions, MGA and a humandesigner for the second example. b.s.=best solution.


BMPSO (5 65% 100% 679 34.1 1.3338EAMPSO 6 100% 100% 6 35 ~ 0EBMPSO 6 100% ~l00% 6 35 0

BPSO 6 75% 100% 6.75 34.25 1.6181EAPSO 6 75% 95% 7.3 33.7 4.4615EBPSO 6 85% 100% 6.15 34.85 0.3664MGA 6 ~ 30% 90% 9.3 31.7~ 6.2669 '

H P | 1 1 I - I - - - I -

adopting integer encodings in PSO. There were mixed results for the otherapproaches. Both EAMPSO and EAPSO found the best solution with thesame frequency (15%), but EAMPSO found feasible circuits 40% of thetime (versus 35% of EAPSO). In terms of average fitness both EAMPSOand EAPSO had similar results (41.7 vs. 40.45). Thus, we can conclude


S = (((£ + D)(B © A))(C + (ED))) © B


that EAMPSO was the best overall performer in this example. Interest-ingly, EBPSO had both the highest average fitness (41.9) and the highestpercentage of feasible circuits (45%), but was not able to find a circuit with7 gates. The MGA was able to find circuits with 7 gates, but both its per-centage of feasible circuits (20%) and its average fitness (36) were low incomparison with the multi-objective PSO approaches.

The Boolean expression corresponding to the best solution found by ahuman designer is: S = B(D'C + E'(D © C)) + A(DC + E(D © C)). Thissolution has 13 gates and was generated using Karnaugh maps and Booleanalgebra. It is worth contrasting the best solution produced by the humandesigner with respect to the best solution found by our PSO approacheswhich only requires 7 gates.

5.5.4. Example 4

Our fourth example has 4 inputs and 2 outputs as shown in Table 5.20. Theadditional parameters adopted by each approach are shown in Table 5.21.In Table 5.22, we show a comparison of the results of all the approachesadopted. The best solution found for this example has 7 gates and is graph-ically shown in Figure 5.8. In this case, BPSO produced considerably betterresults than its multi-objective counterpart (BMPSO) both in terms of av-erage fitness (46.95 vs. 38.60) and in terms of percentage of feasible circuitsproduced (95% vs. 50%). EAMPSO, however, was able to considerably im-prove the results produced by its single-objective counterpart (EAPSO)also in terms of both average fitness (49.25 vs. 43.55) and percentage offeasible circuits produced (100% vs. 70%). Note that both EBMPSO andEBPSO were able to find feasible circuits in all their runs and had similaraverage fitnesses (49.85 vs. 49.25), but the former converged more often


Table 5.17. Truth table forexample 3.

~E~ D C B A I S0 0 0 0 0 00 0 0 0 1 00 0 0 1 0 10 0 0 1 1 10 0 1 0 0 00 0 1 0 1 00 0 1 1 0 10 0 1 1 1 10 1 0 0 0 00 1 0 0 1 00 1 0 1 0 10 1 0 1 1 10 1 1 0 0 00 1 1 0 1 10 1 1 1 0 00 1 1 1 1 11 0 0 0 0 01 0 0 0 1 01 0 0 1 0 11 0 0 1 1 11 0 1 0 0 01 0 1 0 1 11 0 1 1 0 01 0 1 1 1 11 1 0 0 0 01 1 0 0 1 11 1 0 1 0 01 1 0 1 1 11 1 1 0 0 01 1 1 0 1 11 1 1 1 0 01 1 1 1 1 1 1


Technique Population size Iterations Fitness function evaluationsMPSO 99 20,000 1,980,000PSO 50 39,600 ~~~ 1,980,000MGA 330 6,000 1,980,000

to the best solution found (90% vs. 60%). In fact, EBMPSO was the bestoverall performer in this example. Again, the MGA had a poor performancewith respect to the PSO-based multi-objective approaches (EAMPSO andEBMPSO), although it had a better average fitness than both BMPSO and


Table 5.19. Comparison of the results obtained by our multi-objective versions of PSO, our single-objective PSO versions, MGAand a human designer for the third example. b.s.=best solution.


BMPSO~ * ~ 0% 0% * 29.8~ 0.410EAMPSO 7 15% 40% 26.3 41.7 14.543EBMPSO~ 7 5% 20% 32.25 35.75~ 11.461

BPSO * 0% 0% * 29.9 0.308EAPSO 7 15% ~ 35% ~~27.55 40.45~ 14.529EBPSO 8 20% 45% 26.1 41.9 13.619MGA 8 5% 20% 38 36 13.322

Humandesigner 13 - - - -

Table 5.20. Truth table for ex-ample 4.

~1b C B A I So I Si~~0 0 0 0 I 0 ~

0 0 0 1 1 00 0 1 0 1 00 0 1 1 0 00 1 0 0 1 00 1 0 1 1 00 1 1 0 0 00 1 1 1 0 01 0 0 0 1 01 0 0 1 0 01 0 1 0 0 01 0 1 1 0 11 1 0 0 0 01 1 0 1 0 01 1 1 0 0 11 1 1 1 0 1

EAPSO and was also able to find the circuit of 7 gates generated by thePSO-based approaches.

The Boolean expression corresponding to the best solution found bya human designer is: So = B'D1 + C'A'(D' + B') and Si - BD{A +C). This solution has 12 gates and was generated using Karnaugh mapsand Boolean algebra. Note that the outputs were solved separately (astraditionally done when using Karnaugh maps). It is worth contrasting thebest solution produced by the human designer with respect to the bestsolution found by our PSO approaches which only requires 7 gates.


S0 = ((CA){B + D) + BD)'Sl = (CA)(B + D)(BD)



Technique Population size Iterations Fitness function evaluationsMPSO ~ 99 2,000 198,000PSO 50 4,000 200,000MGA 330 610 201,300 ~

Table 5.22. Comparison of the results obtained by our multi-objective versions of PSO, our single-objective PSO versions, MGAand a human designer for the fourth example. b.s.=best solution.


BMPSO~ 7 10% ~~50%~ 18.4 " 38.6 1T210EAMPSO 7 65% 100% 7.75 49.25 1.333EBMPSO 7 90% " 100% 7.15 49.85 0.489

BPSO 7 30% ' 95% 10.05 46.95 4.330EAPSO 7 40% 70% 13.45 43.55 8.530EBPSO 7 60% ~ 100% 7.75 49.25 1.160MGA 7 25% 75% 13.4 43.6 8.090

Humandesigner 12 -

5.5.5. Example 5

Our fifth example has 4 inputs and 3 outputs as shown in Table 5.23. Theadditional parameters adopted by each approach are shown in Table 5.24.In Table 5.25, we show a comparison of the results of all the approachesadopted. The best solution found for this example has 7 gates and is graph-ically shown in Figure 5.9. In this case, none of the binary versions of PSOwas able to generate feasible circuits. Note that the performance of EAPSO


So = {AC © {B © D)) © ((£> © AC) + (B © £>))5i = AC © (B © D); 52 = C © A


was better than that of EAMPSO both in terms of average fitness (55.85vs. 53.30) and in terms of frequency with which the best solution was found(10% vs. 5%). However, EBMPSO had a slightly better performance thanEBPSO both in terms of average fitness (58.90 vs. 58.75) and in termsof the frequency with which the best solution was found (35% vs. 15%).Nevertheless, EBPSO had a slightly better percentage of feasible circuitsfound than EBMPSO (70% vs. 65%). Although marginally, we concludethat EBMPSO was the best overall performer in this example. The MGAwas not able to generate circuits with 7 gates, but it found feasible circuitsmore consistently than most of the PSO-based approaches.

The Boolean expression corresponding to the best solution found by ahuman designer is: So = {AC){B®D)+BD, Sl = C'(B®D) + C{A®{B®D)) and 52 = A © C. This solution has 11 gates and was generated usingKarnaugh maps and Boolean algebra. Note that the outputs were solvedseparately. It is worth contrasting the best solution produced by the humandesigner with respect to the best solution found by our PSO approacheswhich only requires 7 gates.

5.5.6. Example 6

Our sixth example has 4 inputs and 4 outputs, as shown in Table 5.26.The additional parameters adopted by each approach are shown in Ta-ble 5.27. In Table 5.28, we show a comparison of the results of all theapproaches adopted. The best solution found for this example has 7 gates


Table 5.23. Truth table for example5.

D C B A I So I Si I sV0 0 0 0 0 0 00 0 0 1 0 0 10 0 1 0 0 1 00 0 1 1 0 1 10 1 0 0 0 0 10 1 0 1 0 1 00 1 1 0 0 1 10 1 1 1 1 0 01 0 0 0 0 1 01 0 0 1 0 1 11 0 1 0 1 0 01 0 1 1 1 0 11 1 0 0 0 1 11 1 0 1 1 0 01 1 1 0 1 0 11 1 1 1 I 1 I 1 I 0


Technique Population size Iterations Fitness function evaluationsMPSO ~ 147 5,000 735,000PSO 50 14,700 735,000

MGA I 490 | 1,500 | 735,000

Table 5.25. Comparison of the results obtained byour multi-objective versions of PSO, our single-objective PSO ver-sions, MGA and a human designer for the fifth example. b.s.=bestsolution.


BMPSO ~ * 0% 0% * 44.5 1.100EAMPSO" 7 5% 45% 19.7 53.3 8.053 'EBMPSO 7 35% 65% 14.1 58.~9~ 8.985

BPSO * 0% 0% * 45.65 1.089EAPSO 7 10% 55% 17.15 55.85 8.610EBPSO 7 15% 70% 14.25 58.75 8.123MGA 8 10% 70% 15.9 57.1 7.490

Humandesigner 11 -

and is graphically shown in Figure 5.10. In this case, none of the binaryversions of PSO was able to produce feasible circuits. The performance ofEAMPSO was considerably better than that of its single-objective coun-terpart (EAPSO) both in terms of frequency of the best solution found


So = (CA)(DB); S3 = CASi = DB © (CA){DB); S2 = DA® BC

Fig. 5.10. Diagram and Boolean expression corresponding to the best solution foundby our multi-objective PSO approaches for example 6.

(30% vs. 10%) as in terms of the percentage of feasible circuits found (80%vs. 35%). EBMPSO had also a better performance than its single-objectivecounterpart (EBPSO) both in terms of frequency of the best solution found(25% vs. 15%) as in terms of the percentage of feasible circuits found (75%vs. 35%). In this case, the MGA performed better than any of the PSO-based approaches, producing the highest average fitness (80.4) with thelowest number of fitness function evaluations. Thus, the MGA was the bestoverall performer in this example.

The Boolean expression corresponding to the best solution found by ahuman designer is: So = (DC)(BA), Si = (DB)(CA)', S2 = CB @ DAand 53 = CA. This solution has 8 gates and was reported in 7, where amulti-objective genetic algorithm was used. It is worth noticing that thebest solution found by our PSO approaches uses only 7 gates.

5.6. Conclusions and Future Work

In this chapter, we have introduced a population-based PSO approach (sim-ilar to VEGA 26) to design combinational logic circuits. We have also pre-sented a study in which six PSO-based algorithms were compared (usingboth single- and multi-objective schemes and different encodings). Also, apopulation-based genetic algorithm (MGA) was included in the compari-son, since we were interested in analyzing the effect of the search engineadopted in the quality and consistency of the results obtained. The resultsobtained clearly indicate that the population-based PSO approaches pro-posed perform better than the MGA.


Table 5.26. Truth table for example 6.

D C B A I So I Si I S2 I S30 0 0 0 0 0 0 00 0 0 1 0 0 0 00 0 1 0 0 0 0 00 0 1 1 0 0 0 00 1 0 0 0 0 0 00 1 0 1 0 0 0 10 1 1 0 0 0 1 00 1 1 1 0 0 1 11 0 0 0 0 0 0 01 0 0 1 0 0 1 01 0 1 0 0 1 0 01 0 1 1 0 1 1 01 1 0 0 0 0 0 01 1 0 1 0 0 1 11 1 1 0 0 1 1 01 1 1 1 1 0 0 1


Technique Population size Iterations Fitness function evaluationsMPSO ~ 195 5,000 ~ 975,000PSO 50 19,500 975,000MGA 650 500 325,000

Table 5.28. Comparison of the results obtained by our multi-objective versions of PSO, our single-objective PSO versions, MGAand a human designer for the sixth example. b.s.=best solution.


BMPSO~ * ' 0% 0% * 60.35 0.7452EAMPSO 7 30% 80% 11.8 77.2 7.7432EBMPSO 7 ~ 25% " 75% 13.15 75.85~ 8.0934 '

BPSO * ~ 0% 0% * 60.75~ 0.6387 'EAPSO 7 10% 35% 21.2 67.8 8.9713EBPSO 7 15% 35% 22.05 66.95 8.64MGA 7 15% 100%~ 8.6 ~ 80.4 ~ 1.14

Humandesigner 8 - - . . .

Within the six PSO-based techniques compared, it was clear that theapproaches that adopted both a multi-objective selection scheme and anInteger B encoding 8 were the best overall performers. The results alsosuggest that the use of binary PSO for designing combinational logic circuits


is not advisable, since this sort of approach had difficulties even for reachingthe feasible region in some cases. An interesting outcome of our study isthat we found that PSO acts as a better search engine than a geneticalgorithm when adopting a population-based selection scheme for designingcombinational logic circuits.

As part of our future work, we are interested in exploring alternative en-codings (e.g., graphs and trees) that have not been used so far with particleswarm optimizers 19. We are also interested in studying some alternativemulti-objective selection schemes (e.g., Pareto ranking 12) in the context ofcombinational circuit design using PSO 9.

Acknowledgements

The first author acknowledges support from CONACyT through a scholar-ship to pursue graduate studies at the Computer Science Section of the Elec-trical Engineering Department at CINVESTAV-IPN. The second authorgratefully acknowledges support from CONACyT through project 42435-Y.

References

1. Peter J. Angeline. Evolutionary optimization versus particle swarm optimiza-tion: philosophy and performance differences. In Waagen D. Porto V.W.,Saravanan N. and Eiben A.E., editors, Evolutionary Programming VII: Pro-ceedings of the Seventh Annual Conference on Evolutionary Programming,pages 611-618. Springer, 1998.

2. R. K. Brayton, G. D. Hachtel, C. T. McMullen, and A. L. Sangiovanni-Vincentelli. Logic Minimization Algorithms for VLSI Synthesis. Kluwer Aca-demic Publishers, Dordrecht, The Netherlands, 1984.

3. R. K. Brayton, R. Rudell, A. Sangiovanni-Vincentelli, and A. R. Wang. MIS:A multiple-level logic optimization system. IEEE Transactions on Computer-Aided Design, CAD-6 (6):1062-1081, November 1987.

4. Bill P. Buckles, Arturo Hernandez Aguirre, and Carlos Coello Coello. Circuitdesign using genetic programming: An illustrative study. In Proceedings of the10th NASA Symposium on VLSI Design, pages 4.1-1-4.1-10, AlbuquerqueNM, 2002.

5. Carlos A. Coello Coello, Alan D. Christiansen, and Arturo HernandezAguirre. Use of Evolutionary Techniques to Automate the Design of Combi-national Circuits. International Journal of Smart Engineering System Design,2(4):299-314, June 2000.

6. Carlos A. Coello Coello and Arturo Hernandez Aguirre. Design of combi-national logic circuits through an evolutionary multiobjective optimizationapproach. Artificial Intelligence for Engineering, Design, Analysis and Man-ufacture, 16(l):39-53, January 2002.


7. Carlos A. Coello Coello, Arturo Hernandez Aguirre, and Bill P. Buckles. Evo-lutionary Multiobjective Design of Combinational Logic Circuits. In JasonLohn, Adrian Stoica, Didier Keymeulen, and Silvano Colombano, editors,Proceedings of the Second NASA/DoD Workshop on Evolvable Hardware,pages 161-170. IEEE Computer Society, Los Alamitos, California, July 2000.

8. Carlos A. Coello Coello, Erika Hernandez Luna, and Arturo HernandezAguirre. Use of particle swarm optimization to design combinational logic cir-cuits. In Pauline C. Haddow Andy M. Tyrell and Jim Torresen, editors, Evolv-able Systems: From Biology to Hardware. 5th International Conference, ICES2003, pages 398-409, Trondheim, Norway, 2003. Springer, Lecture Notes inComputer Science Vol. 2606.

9. Carlos A. Coello Coello, David A. Van Veldhuizen, and Gary B. Lamont.Evolutionary Algorithms for Solving Multi- Objective Problems. Kluwer Aca-demic Publishers, New York, May 2002. ISBN 0-3064-6762-3.

10. Carlos A. Coello Coello, Rosa Laura Zavala Gutierrez, Benito MendozaGarcia, and Arturo Hernandez Aguirre. Automated Design of CombinationalLogic Circuits using the Ant System. Engineering Optimization, 34(2): 109-127, March 2002.

11. Edgar Galvan Lopez, Riccardo Poli, and Carlos A. Coello Coello. ReusingCode in Genetic Programming. In Genetic Programming, 7th EuropeanConference, EuroGP'2004, pages 359-368, Coimbra, Portugal, April 2004.Springer. Lecture Notes in Computer Science. Volume 3003.

12. David E. Goldberg. Genetic Algorithms in Search, Optimization and MachineLearning. Addison Wesley, Reading, MA, 1989.

13. Venu G. Gudise and Ganesh K. Venayagamoorthy. Evolving digital circuitsusing particle swarm. In Proceedings of the NNS-IEEE International JointConference on Neural Networks, pages 468-472, Portland, OR, USA, 2003.

14. Xiaohui Hu, Russell C. Eberhart, and Yuhui Shi. Swarm intelligence for per-mutation optimization: a case study on n-queens problem. In Proceedings ofthe IEEE Swarm Intelligence Symposium 2003 (SIS 2003), pages 243-246,Indianapolis, Indiana, USA., 2003.

15. Eduardo Islas Perez, Carlos A. Coello Coello, and Arturo Hernandez Aguirre.Extraction of Design Patterns from Evolutionary Algorithms using Case-Based Reasoning. In Yong Liu, Kiyoshi Tanaka, Masaya Iwata, TetsuyaHiguchi, and Moritoshi Yasunaga, editors, Evolvable Systems: From Biologyto Hardware (ICES'2001), pages 244-255. Springer-Verlag. Lecture Notes inComputer Science No. 2210, October 2001.

16. Tatiana Kalganova. A new evolutionary hardware approach for logic design.In Annie S. Wu, editor, Proc. of the GECCO'99 Student Workshop, pages360-361, Orlando, Florida, USA, 1999.

17. M. Karnaugh. A map method for synthesis of combinational logic circuits.Transactions of the AIEE, Communications and Electronics, 72 (I):593-599,November 1953.

18. James Kennedy and Russell C. Eberhart. Particle Swarm Optimization. InProceedings of the 1995 IEEE International Conference on Neural Networks,pages 1942-1948, Piscataway, New Jersey, 1995. IEEE Service Center.


19. James Kennedy and Russell C. Eberhart. Swarm Intelligence. Morgan Kauf-mann Publishers, San Francisco, California, 2001.

20. John R. Koza. Genetic Programming. On the Programming of Computers byMeans of Natural Selection. MIT Press, Cambridge, Massachusetts, 1992.

21. Sushil J. Louis. Genetic Algorithms as a Computational Tool for Design. PhDthesis, Department of Computer Science, Indiana University, August 1993.

22. E. J. McCluskey. Minimization of boolean functions. Bell Systems TechnicalJournal, 35 (5):1417-1444, November 1956.

23. Julian F. Miller, Dominic Job, and Vesselin K. Vassilev. Principles in theEvolutionary Design of Digital Circuits—Part I. Genetic Programming andEvolvable Machines, l(l/2):7-35, April 2000.

24. Julian F. Miller, Tatiana Kalganova, Natalia Lipnitskaya, and Dominic Job.The Genetic Algorithm as a Discovery Engine: Strange Circuits and NewPrinciples. In Proceedings of the AISB Symposium on Creative EvolutionarySystems (CES'99), Edinburgh, Scotland, April 1999.

25. W. V. Quine. A way to simplify truth functions. American MathematicalMonthly, 62 (9):627-631, 1955.

26. J. David Schaffer. Multiple Objective Optimization with Vector EvaluatedGenetic Algorithms. In Genetic Algorithms and their Applications: Proceed-ings of the First International Conference on Genetic Algorithms, pages 93-100. Lawrence Erlbaum, 1985.

27. Peter J. Bentley Timothy G. W. Gordon. On evolvable hardware. In Seppo J.Ovaska and Les m Sztandera, editors, Soft Computing in Industrial Electron-ics, pages 279-323, Heidelberg, 2002. Eds,.Physica-Verlag.

28. Jim Torresen. A Divide-and-Conquer Approach to Evolvable Hardware. InMoshe Sipper, Daniel Mange, and Andres Perez-Uribe, editors, Proceedings ofthe Second International Conference on Evolvable Systems (ICES'98), pages57-65, Lausanne, Switzerland, 1998. Springer-Verlag.

29. E. W. Veitch. A Chart Method for Simplifying Boolean Functions. Proceed-ings of the ACM, pages 127-133, May 1952.

30. Xin Yao and Tetsuya Higuchi. Promises and Challenges of Evolvable Hard-ware. In Tetsuya Higuchi, Masaya Iwata, and W. Liu, editors, Proceedingsof the First International Conference on Evolvable Systems: From Biology toHardware (ICES'96), Lecture Notes in Computer Science, Vol. 1259, pages55-78, Heidelberg, Germany, 1997. Springer-Verlag.

31. Ricardo S. Zebulum, M. A. Pacheco, and M. Vellasco. Evolvable Systemsin Hardware Design: Taxonomy, Survey and Applications. In T. Higuchi andM. Iwata, editors, Proceedings of the First International Conference on Evolv-able Systems (ICES'96), pages 344-358, Berlin, Germany, 1997. Springer-Verlag.

CHAPTER 6

APPLICATION OF MULTI-OBJECTIVE EVOLUTIONARYALGORITHMS IN AUTONOMOUS VEHICLES

NAVIGATION

Tomonari Furukawa*, Gamini Dissanayake^ and Hugh F.Durr ant-Why te^t

ARC Centre, of Excellence in Autonomous Systems'School of Mechanical and Manufacturing Engineering

The University of New South Wales, Sydney 2052 AustraliaE-mail: [email protected]

''Faculty of EngineeringThe University of Technology, Sydney, 2007 Australia

E-mail: [email protected]**' Australian Centre for Field Robotics

The University of Sydney, Sydney 2006 AustraliaE-mail: [email protected]

The successful navigation of an autonomous vehicle heavily depends onthe accuracy of the parameters of the vehicle and sensor models, whichare determined before the vehicle is in use. One of the main sourcesof error is the present prior way of determining parameters, partly be-cause the currently accepted procedure for determining the parametersis not sufficiently accurate and partly because the parameters vary asthe vehicle is driven.

This chapter presents an application of multi-objective evolutionaryalgorithms to the sensor and vehicle parameter determination for suc-cessful autonomous vehicles navigation. Followed by the multi-objectiveformulation, a general framework for multi-objective optimization, twotypes of search methods to find solutions efficiently and a techniquefor selecting a final solution from the multiple solutions are proposed.The proposed parameter determination technique was applied to an au-tonomous vehicle developed by the authors, and an appropriate param-eter set has been obtained.

125

126 T. Furukawa, et al.

6.1. Introduction

Driving a vehicle in an unstructured outdoor environment, involving skill-ful manoeuvres, often exposes a human driver to significant danger. Au-tonomous vehicles, which carry a navigation system to provide the knowl-edge of vehicle position and trajectory and subsequently control the vehi-cle along a desired path, have received considerable attention in the lastdecade1 ~3.

Sensors used in the navigation system can be classified into two types,namely absolute sensors and relative sensors. The absolute sensors directlymeasure the position and/or orientation of the vehicle with respect to itsenvironment. The laser range finder that observes beacons present in theenvironment4, Inertial Measurement Unit (IMU) that measures the angu-lar velocities and the accelerations of the vehicle in three orthogonal axes,Global Positioning System (GPS), compass and gyroscope belong to thisclass of sensors. The relative sensors, usually known as dead-reckoning sen-sors, measure the vehicle state internally from the vehicle drive train. Thewheel and steering encoders are typical examples for this class.

The role of dead-reckoning sensors in navigation, together with a kine-matic model, is to predict the position and orientation of the vehicle. Ab-solute sensors are, meanwhile, employed to reset errors that inevitably ac-cumulate due to the integration present in the prediction step. As absoluteinformation is usually not available at high enough rates to be useful forcontrol purposes, it is important that the dead-reckoning sensors provideaccurate information between such updates.

Despite the dramatic theoretical progress, the bottleneck for the suc-cessful vehicle autonomy is the inaccuracy of kinematic parameters andcalibration factors used in the kinematic vehicle and sensor models. Theparameters associated with the kinematic vehicle and sensor models aremeasured or computed only when the vehicle is designed or commissioned1,although the characteristics of electro-mechanical systems change with timegradually. Further, the encoders attached to the wheels and steering are cal-ibrated using only specific manoeuvres such as moving along straight linesand circular paths5'6 and correlating the distance travelled as measured bythe encoders and an external measuring device, typically a tape measure.

The solution to this accuracy problem is to compute the kinematic pa-rameters and calibration constants using data gathered during the nor-mal operation of an autonomous vehicle7. This makes it possible to checkwhether the parameters used in the navigation algorithms are accurate and

Application of MOEAs in Autonomous Vehicles Navigation 127

make any necessary changes without resorting to specific test manoeuvresor modifications to sensor configurations.

This chapter describes an application of multi-objective evolutionaryalgorithms8"12 (MOEAs) to the identification of such kinematic parame-ters and calibration factors used in autonomous navigation. Since the dif-ference between the vehicle state computed using the kinematic equationsand the relative sensor and that computed using an absolute sensor is thecriterion used to find the parameters, one may think of applying any ofthe conventional single-objective optimization methods13"20. The primaryreason for the use of a multi-objective optimization technique stems fromthe fact that the difference is represented in terms of two different types oferror, the position error and the orientation error, each of which is derivedfrom a different set of sensor readings21"23.

In accordance to the multi-objective problem formulation, a generalframework for multi-objective optimization is presented, and, further,Multi-objective Continuous Evolutionary Algorithms (MCEAs) and Multi-objective Gradient-based Method (MOGM) are proposed to solve this classof multi-objective optimization problems efficiently. The solution to a multi-objective optimization problem is a solution space rather than a single pointsolution, so that the multi-objective optimization method results in findingPareto-optimal solutions, which describe the solution space through distri-bution. The Center-of-Gravity Method (CoGM) is thus proposed to selectan appropriate final solution from the Pareto-optimal solutions.

This chapter is organized as follows. Section 6.2 provides the backgroundmaterial on autonomous vehicles and describes the experimental set upused for obtaining the data. The parameter identification problem for au-tonomous vehicles is formulated as a multi-objective optimization problemin Section 6.3. A general framework for finding Pareto-optimal solutionsand CoGM for selecting a final solution are also described. Section 6.4 de-scribes the MCEA and MOGM, whereas the solution to the problem ofparameter identification in autonomous vehicles is provided in Section 6.5.Conclusions are summarized in Section 6.6.

6.2. Autonomous Vehicles

6.2.1. Experimental Setup

Fig. 6.1 shows a vehicle used as a test bed for research into the navigation ofautonomous vehicles. This is a rear wheel driven vehicle that is steered usingan Ackerman type steering linkage driving the front wheels. Four sensors


are mounted on the vehicle. As dead-reckoning sensors, an encoder, fittedto the rear left wheel, gives a measure of the vehicles speed, and a linearvariable differential transformer (LVDT) on the steering rack provides ameasurement proportional to the steering angle. The encoder and the LVDTare read at a rate of 20 Hz. Carrier Phase Differential GPS unit with a ratedaccuracy of 0.02 m in position and 0.02 m/s in velocity when at least sixsatellites are in view is used to measure the absolute position of the vehicleat a sample rate of 4 Hz. An inertial measurement unit comprising of threeorthogonal gyroscopes and three accelerometers are also mounted on thevehicle. In the work described in this paper, only one of these gyroscopesis used to measure the angular velocity of the vehicle about a vertical axis.The inertial measurement unit provides information at a sample rate of 125Hz.

Fig. 6.1. Autonomous vehicle.

6.2.2. Vehicle Model

The kinematic model of a vehicle moving in the horizontal plane is shown inFig. 6.2. Location of the vehicle is given by state variables [x,y,(f>], wherex and y are the coordinates of the center of the rear-axle, and tfi is theorientation of the vehicle body as shown. The inputs that are used to controlthe vehicle are the velocity at the center of the rear axle, v, and the average


steering angle 7. The equations of motion for this vehicle at any time instantk are given by

x(k) = v(k) • coscfr(k),

y{k) =v(k) -sin</>(&),

0(AO = ^ - t a n 7 ( * ) , (1)

where I is the vehicle wheel base.

Fig. 6.2. State and control of the vehicle.

6.2.3. Relative Sensor Models

6.2.3.1. Steering Encoder

The steering encoder measures the displacement of the steering rackPsTE(k), which is linearly proportional to the steering angle. The steer-ing angle j(k) can be expressed as

j(k) =cx •psTE(k) + c2, (2)

where c\ and ci are the gain and the offset of the encoder.

6.2.3.2. Wheel Encoder

The wheel encoder provides the angular position of the left rear wheel of thevehicle. Difference between successive position measurements can be usedto determine the velocity of the left rear wheel. The velocity of the vehicle


v(k) is related to the velocity measured by the encoder pvEL(k) throughthe following kinematic transformation.

v(k) = c3 • pVEL{k) + 4>(k) • b, (3)

where C3 is the gain of the encoder. Note that the substitution of Eq. 3into Eq. 1 introduces another velocity term v(k). Assembling the velocityterms, resultant velocity v(k) is described as

V{k) = i-fc.tan7(fc) ( 4 )

6.2.4. Absolute Sensor Models

6.2.4.1. Global Positioning Systems

The GPS mounted on the vehicle directly provides the absolute position[XGPSIVGPS} at which the sensor is mounted. The position of the vehicleobtained from the GPS sensor, [x, y] is given by the following equations:

x(k) = xGPS(k) - r • cos{(f>{k) + 9},

y(k)=yGPs(k)~r-sin{4>(k)+9}, (5)

where r and 8 are the location of the GPS unit, in polar coordinates, withrespect to the local coordinate frame on the vehicle as shown in Fig. 6.3.Note that [x,y,<j>] represents the position/orientation obtained from theabsolute sensor measurements.

Fig. 6.3. Location and notation of sensors mounted on the vehicle.


6.2.4.2. Inertia! Measurement Unit

The rate of change of orientation of the vehicle 4> is related to the readingof the gyroscope PINS by

4>(k) = piNs(k) + POFF, (6)

where the initial offset POFF is the average value of the gyroscope measure-ments obtained a priori when the vehicle is stationary:

POFF = 7 • ( ' )Kf

6.2.5. Simulation and Measurement of the Vehicle State

The sensors described in the last section can be used to both simulateand measure the state of the vehicle at any instant. In the simulation,the state of the vehicle [x(k + 1), y(h + l),4>(k + 1)] at time instant k + 1is iteratively computed from the state of the vehicle [x(k),y(k),(f>(k)] asshown in Fig. 6.4, so that data to be prepared a priori are control inputs[v(k), 7(fe)] with respect to all time (k = 1,2,..., kf) and initial state of thevehicle [x(0),y(0), < (0)]. With the navigation data \psTE{k),pvEL{k)} fromthe sensors, the control inputs are derived from Eqs. 2 and 3 at any timeinstant k, and the initial position of the vehicle can also be obtained usingEq. 5 from measurements by setting [x(Q),y(0),<p(0)] = [x(Q),y(0),4>(0)].The state of the vehicle at any time instant can be conclusively computedas far as initial orientation </>(0) is specified.

The location of the vehicle used as measurement data, [x(k), y(k), (j>(k)],can also be computed from the absolute sensor readings at all times. Sincethe data obtainable from the gyroscope is the rate of change of the vehi-cle orientation <fi(k), the orientation of the vehicle <j)(k) can be computedby iteratively deriving 4>(k + 1) from (f>(k), given its initial state <fi(0). Asshown in Eq. 5, the position is also governed by <f)(0) in addition to themeasurements from GPS. The location of the vehicle at all times is thusdetermined by obtaining measurements from the GPS and the gyroscopeby using the equation.

6.2.6. Prediction of the Vehicle State

Fig. 6.5 shows a pictorial view of the process used for estimating the locationof an autonomous vehicle. Information obtained from internal sensors isused together with a kinematic model of the vehicle to obtain an estimate


Fig. 6.4. Flowchart of simulation.

of the vehicle state. Due to noise present in the sensors as well as theinaccuracies of the vehicle model, error in this estimate gradually increases.Thus, information from external sensors is periodically measured, and errorsaccumulated during each period are incorporated into the state estimateusing a Kalman filter based estimator to correct errors and obtain a moreaccurate state estimate. Note that this information may not be availablefor extended periods of time depending on the environment in which thevehicle operates. GPS signals are often prone to blackout near buildings andother structure that obstruct or reflect radio signals. In such situations, thevehicle navigation purely relies on the estimates obtained from the internalsensors and the vehicle model. Therefore, the availability of an accuratevehicle model with accurate kinematic parameters is extremely valuablefor the proper functioning of an autonomous vehicle navigation system.


Fig. 6.5. Navigation system of an autonomous vehicle.

6.3. Parameter Identification of Autonomous Vehicles


By observing the autonomous vehicles in the last section, the parameteridentification problem of concern can be characterized as follows:

• Parameters to be identified are xT = [c\, c2, c3, /, b, 0(0), r, 8] € St8.• Errors in position and orientation of the vehicle must be minimized

to identify the parameters.• For the Kalman filter based estimator, the predictor model is only

needed to be accurate over each short time period between thereceipt of external sensor readings.

The consideration of all the characteristics brings the following problemformulation of a multi-objective optimization problem:

f(x)T = [/POS(X),/OJU(X)] -)• rnin, (8)

where, to be accurate over each short time period, objective functions f (x) :Rn ->• 5ft2 are given by

fposW = E X > ( * ' k'f+3)-x(i • k'f+j)\\2

i=\ j=\

+\\y(i-k'f+j)-y(i-k'f+j)\\2,np k'f

/ofl/(x) = J2 E I W • k'f + 3) - Hi • k'f + j)\\\ (9)j = l 3=1


and

x(i -k'f) = x(i-k'f),y(i-k'f) = y(i • k'f),i = l, . . . ,np ,

<t>(i-k'f) = 4(i-k'f),i = l , . . . , n p . (10)

where k'f is the number of iterations for each period which is used for further

autonomous navigation, and np is the number of partitions in the vehicle

operation. The total number of iterations is given by kf = k\ • nv.

6.3.2. A General Framework for Searching Pareto-OptimalSolutions

Fig. 6.6 shows the flowchart of the framework of the multi-objective opti-mization proposed in this chapter. In order to find multiple solutions, themulti-objective optimization searches with A multiple points, i.e.,

x(tf) = {Xl\...,xf}e0R")\ (ii)

where xf is the ith search point at Kth generation. The initial population,X(0), is generated randomly within a specified range [xmin ,xmax]. Eachobjective function value fj(x.f) is then calculated with each parameter setx.f, finally yielding

F(J0 = {f(xf),...f(x£)}. (12)

Unlike the other MOEAs, two scalar criteria are evaluated for eachsearch point in the proposed framework. One is the rank in Pareto-optimality as usual

0(/r) = {6»(xfr),...,(?(xf)}) (13)

where 6 : 5ft™ —> N, and the other is a positive real-valued scalar objectivefunction or a fitness, which is derived by taking the rank into account:

*(Ar) = {0(xf))...,0(xf)}, (14)

where <fi : 5ft" —> 3t+. The rank is evaluated to check the degree of Pareto-optimality of each search point. On the other hand, the fitness is usedto create the next search point xf^+1, and the creation depends upon thesearch methods to be used. The next population is thus written in canonicalform as

X(K + 1) = s(X(K), $(Jf), V$(tf), V2$GF0), (15)


Fig. 6.6. Flowchart of multi-objective evolutionary algorithms.

where s is the search operator.Once the iterative computation is enabled, we want to find effective

Pareto-optimal solutions as many as possible such that the solution spacecan be configured. Another technique proposed here is a Pareto poolingstrategy where the set of Pareto-optimal solutions created in the past arepooled as P{K) besides the population of search points X{K).

The process of the Pareto pooling technique is as follows. The wholePareto-optimal solutions obtained in the first generation are saved in thisstorage, i.e., P(0) = X(0). From the second generation, the newly createdPareto-optimal solutions in the optimization loop, X(K+1), are comparedto the stored Pareto-optimal solutions P(K), and the new set of Pareto-optimal solutions P(K + 1) is saved in the storage as illustrated in Fig. 6.7.Some Pareto-optimal solutions may be identical or very close to an exist-ing point. The storage of such solutions is simply a waste of memory, so


that they are discarded if they are closer than the resolution set a priori.The creations of the new population and the Pareto-optimal solutions arerepeated until a terminal condition is satisfied.

Fig. 6.7. Creation of Pareto-optimal solutions.

6.3.3. Selection of a Single Solution by CoGM

Fig. 6.8 illustrates Pareto-optimal solutions where two objective functionsf — [/i,/2]T are minimized to identify three parameters x = [xi, X2,x3]

T.As two-dimensional function space and three-dimensional parameter spaceare still easy to visualize, one may incorporate human knowledge into com-putational knowledge-based techniques such as expert systems and fuzzylogic24 for automatic selection of a single solution. However, if the numbersof objective functions and parameters are considerably large, the knowledgeto be constructed is immense, and such techniques are no longer possiblepractically. In this case, one prominent way is to select the solution resid-ing in the center of solution space since this solution is robust. The authorshere propose a technique where the closest solution to the center-of-gravityis chosen as the solution. Let the Pareto-optimal solutions finally obtained


be x \ Vi e {1,... ,?}. If each solution is evaluated in a scalar manner, i.e.,(^(x1), the center-of-gravity is in general given by

As the Pareto-optimal solutions must be evaluated equally, we can con-sider all the Pareto-optimal solutions possess the same scalar value, i.e.,(^(x1) = ••• = tp(5tq). No matter what the value is, the center-of-gravityresults in the form:

x = ^ i = 1 . (17)

The effectiveness of the center-of-gravity method cannot be proved the-oretically, but it is highly acceptable, as it has been commonly used in fuzzylogic24 to find a solution from the solution space described by fuzzy sets.The adoption of a clustering algorithm will increase the reliability of thesolution25.

Fig. 6.8. Process of deriving a single solution.

(16)


6.4. Multi-Objective Optimization

6.4.1. Evaluation of Functions

6.4.1.1. Rank Function

Fig. 6.9 depicts the process to rank the search points and resultantly deriveQ(K) in Eq. 13. The process is purely based on an elimination rule. In therule, every objective function at every search point fj(xf), Vi G {1,..., A},Vj € {1, ...,m}, is first calculated, and the Pareto-optimal set in the pop-ulation is ranked No. 1, i.e., 0(xf) = 1 if the search point xf is in thePareto-optimal set. The group of search points ranked No. 1 is denoted as(7(1) in the figure. The points with rank No. 1 are then eliminated from thepopulation, and the Pareto-optimal set in the current population is rankedNo. 2, #(xf) = 2. Ranking is continued in the same fashion until all thepoints are ranked26.

Fig. 6.9. Ranking process.


6.4.1.2. Fitness Function

The evaluation of fitness of each search point starts with finding the bestand worst values of each objective function among the population:

f /best; = min{/ ;(xf )|Vt € {1.....A}} . .I /worst; = max{/j(xf )|Vi € {1,..., A}} ' y '

If we temporarily define the fitness as

6' (xK) - fWOTStj ~ ^ X ' ^ (19)/worstj /best;

we can get the normalized conditions:

0 < # ( x ? ) < 1, (20)

and this allows us to treat the fitness of each objective function in the samescale. The fitness of points with the same rank has to be the same, and thetrue fitness of each objective function is thus defined as:

<A,(xf) = max{0;.(xf )\9(xf) = 9(xf),l # i,Vi G {1,..., A}}. (21)

The fitness of each individual can be conclusively calculated as:m

0(xf)=J>,0J(xf) , (22)3 = 1

where Wj € [0,1] is a weighting factor, the value of which varies dependingon the search methods present in the next subsection. The fitness value willappear within the range:

0 < <A(xf) < m. (23)

6.4.2. Search Methods

6.4.2.1. MCEA

The process to find x + 1 from xf in this approach is conducted algorithmi-cally in an evolutionary manner through two evolutionary operators, repro-duction and selection23. The reproduction, consisting of recombination andmutation, contributes to the creation of new search points inheriting someinformation from the old search points whereas the selection guaranteesthat the search points on the whole move towards finding Pareto-optimalsolutions.


As the approach is concerned with the identification of continuous searchpoints, the recombination and mutation adopt continuous formulation. Af-ter all the search points are paired randomly, a pair of search points, x£fand x^, go through the following recombination operation:

J x£ := (1 - M)x* + nxfI X/3 •— \l ~ Wx/3 + Mxa

where parameter /j, may be defined by the normal distribution with mean0 and standard deviation a:

» = N(0,a2), (25)

or simply a uniform distribution:

fi = rand(-/imax, /"max) (26)

with often 0 < //max < 0.3. The 'rand' operator in the equation returnsa uniformly random value within the range specified in the input. Themutation can also be achieved simply by implementing

xf :=rand{pm}(xmin,xmax) (27)

with small probability Pm27 • Note that the mutation may not be necessaryfor parameter // with normal distribution since it can allow individuals toalter largely with a small probability, when the coefficient /i is large.

The new search points xf^+1 are finally determined through the selec-tion operation, which favorably selects individuals of higher fitness propor-tionally more often than those of lower fitness for the next iteration. As<Kxf0 > 0 is satisfied by Eq. 23, the proportional selection26, which is re-ported to be faster in convergence than the other popular selection of theranking selection28, can be directly used in the proposed algorithm. In thisselection, the reproduction probabilities of individuals are given by theirrelative fitness:

6.4.2.2. MOGM

The strength of the proposed framework is the introduction of fitness ${K),as gradient-based search methods, which are based on a continuous func-tion value and search much more efficiently than conventional evolutionary

(24)

(28)


algorithms, can be implemented. In order to yield a well-distributed solu-tion, the weighting factor Wj in Eq. 22 is chosen randomly in the range[0,1]

Wi = rand(0,1). (29)

With this </>(xf), the next state of a search point is given by

x f + ^ x f + Axf, (30)

where the step of the search point is determined by

Axf = ad(xf, <ft(x?), V^>(xf)). (31)

In the equation, a is the search step length iteratively searched as a sub-problem by Wolfe's algorithms29, whereas mapping d outputs the directionof the search step. In the steepest descent (SD) method, the mapping isdefined as

dSD(xf,V0(xf)) = V0(xf). (32)

In the quasi-Newton (QN) method, the mapping is defined as

dQJV(xf, V0(xf)) = VA"1 V0(xf), (33)

where Ak ~ V2</>(xf).The effectiveness of MCEAs and MOGM is not investigated in this chap-

ter due to the limitation in chapter length. It was previously demonstratedwith various numerical examples. The reader is referred to the report byFurukawa, et al.30 for the details.

6.5. Application of Parameter Identification of anAutonomous Vehicle

The proposed technique was applied to the identification of the parameterset of the autonomous vehicle developed by the authors where the vehicletracked a path in a flat parking area for 100 seconds. The vehicle pathcreated from GPS readings is shown in Fig. 6.10. Note that x and y coordi-nate data are collected independently with respect to time. The gyroscope,steering encoder and velocity encoder readings are shown in Figs. 6.11-6.13.The information from these sensors was sub-sampled at 4 Hz to obtain asynchronous sequence of data for parameter identification.

Although the parameters have been well identified by both methods, wewill show the results when MCEA was used in this section. Table 6.29 liststhe parameters used for simulation and MCEA to execute identification,


and the initial guess of the search space of parameters to be identified islisted in Table 6.30 in the form of the lower and upper boundaries, i.e.,xmin and xmax- The search space was chosen to include the original cali-bration data of each parameter at the center of the space, and the rangewas determined based on its reliability.

Fig. 6.10. Vehicle path created from GPS readings (x [m] - y [m]).

Fig. 6.11. Gyroscope readings (x: Time [sec] - y: Rate of change of orientation [rad/s]).

Fig. 6.14 shows the Pareto-optimal solutions in function space after 100


Fig. 6.12. Steering encoder readings (x: Time [sec] - y: Steering encoder counts).

Fig. 6.13. Velocity encoder readings (x: Time [sec] - y: Velocity encoder counts).

Table 6.29. Parameters for autonomous

vehicle parameter identification.Parameter Value

No. of generations 2500Population (A) 10

Mutation rate Pm 0.10Time step 0.05No. of partitions (np) 20

generations. It is first easily found that the orientation is much smallerthan the position in objective function value but that the solutions arewell distributed showing a smooth convex-shaped curve in such a different


Table 6.30. Initial search space of parameters to be identified.

Parameter I b c\ ci

xmin 3.10 0.85 4.50 10"4 -0.925Xmax 3;20 1.05 4.60 -lO"4 -0.900

Parameter a <j>o r 9

x m m 4.90 -10"4 1.96 3.65 0.160Xmax 5.00 -10~4 1.98 3;67 0.190

scale. Next, Pareto-optimal solutions in parameter space are depicted inFigs. 6.15-6.18. Although the parameter scales are also different from eachother, the solutions in each graph show a characteristic distribution, fromupper-left to lower-right for Figs. 6.15-6.17 and from lower-left to upper-right for Fig. 6.18.

Fig. 6.14. Pareto-optimal solutions of identification in function space (x: fpos ~ V-

foRl)-

Because of the high dimensionality of the parameter space, The finalsolution cannot be selected in parameter space manually. The CoGM wasused to select the final solution, and the solution is listed in Table 6.31.Together with the parameters identified by the original calibration, thesolutions each having the minimum position and orientation errors are alsoshown in the table for comparison. Since rather monotonic distributionsof solutions in parameter space are obtained for this example, the solution


Fig. 6.15. Pareto-optimal solutions of identification in parameter space (x: I [m] - y: b[m]).

Fig. 6.16. Pareto-optimal solutions of identification in parameter space (x: C3 - y: c\).

chosen by the center-of-gravity method is well within the solution space.

Table 6.31. Parameters identified.

Parameter I b ci C2

Chosen 3.152 0.9441 4.536 -10"4 -0.9149Min. pos. err. 3.145 0.9641 4.513 -lO"4 -0.9203Min. ori. err. 3.160 0.8831 4.549 10~4 -0.9154

Original 3J^ 095^ 4.55 -10~4 -0.9125

Parameter C3 <po r 0

Chosen 4.955 -10^4 1.967 3.675 0.1801Min. pos. err. 4.963 -10-4 1.973 3.695 0.1807Min. ori. err. 4.946 -lO"4 1.963 3.650 0.1770

Original 4.95 -lO"4 L97 3;66 0.175


Fig. 6.17. Pareto-optimal solutions of identification in parameter space (x: C2 - y- </>o)-

Fig. 6.18. Pareto-optimal solutions of identification in parameter space (x: r [m] - y: 6[rad]).

The simulation result using the parameter set chosen, used to calculatethe objective function values, is shown in Fig. 6.19 with the GPS data de-noted as 'Experiment'. The simulated path shows some accumulated errors,but it is well along the GPS data, clearly indicating that an appropriate pa-rameter set is identified. The errors may be caused by the slip of the vehicleand other inaccuracies of the model rather than its parameters themselves.Since there is no way to investigate the errors with the current vehicleset-up, we shall not further discuss it in this paper.

To investigate the appropriateness of this solution to the other Pareto-optimal solutions, simulation without correcting the path at every partitionwas conducted with the three Pareto-optimal solutions in Table 6.31. Inorder to see how robust the parameter set chosen through the proposed


Fig. 6.19. Simulation results with parameters chosen (x [m] - y [m]).

technique is, the simulation was conducted not only for the first 100 secondsduring which GPS data were used to find the solution but also for the next100 seconds. The simulation results with the three solutions are depictedin Figs. 6.20-6.25. The solution with the minimum position error and thesolution chosen correlate well with GPS data in comparison to the solutionwith the minimum orientation error.

The orientation accuracy must also be investigated to find the most ap-propriate solution, and in order to see the results in more detail, the errorvalues computed by the three Pareto-optimal solutions and by the originalparameter set are listed in Table 6.32. It is first seen that the worst solutionsin position error and orientation error in both the first and second 100 sec-onds among the three Pareto-optimal solutions are the solutions with theminimum orientation error and with the minimum position error, respec-tively. Particularly, the orientation error by the minimum position errorsolution and the position error by the minimum orientation error solution,both in the second 100 seconds, are significantly large compared to the oth-ers. This is clearly caused by the fact that the other objective function wasnot much considered; to get the minimum position error solution, for ex-ample, the objective function describing the orientation errors is not takeninto account. The parameter set by the original calibration shows largesterrors in almost all the items. This indicates the importance of parameteridentification during vehicle operation.


The solution chosen is not worst in any criterion, and it is even betterthan the solution with the minimum position error in the position errorof the second 100 seconds. This characteristic remained even with differentnumerical examples. The fact that the accurate orientation of the vehicle ateach iteration can contribute to its accurate positioning may have increasedthe accuracy of the solution chosen in position.

Fig. 6.20. Non-partitioned simulation results with parameters chosen (1 s t 100 seconds,(x [m] - y [m])).

Table 6.32. Position and orientation errors.

Solution Position error Orientation error1st 100 sec 2nd 100 sec 1st 100 sec 2nd 100 sec

Chosen 710.66 23,361 1.5656 316.52Min. ori. err. 706.165 37,992 2.5941 2,673.7Min. pos. err. 719.83 253,920 1.2630 8.5044

Original 721.45 57,351 2.7214 3,172.4

6.6. Conclusions

A technique to identify the parameter set of an autonomous vehicle dur-ing the normal operation has been proposed. Formulating the parameteridentification problem in a multi-objective fashion where position and ori-entation errors are minimized, a multi-objective optimization method is


Fig. 6.21. Non-partitioned simulation results with parameters chosen (2nd 100 seconds,(x [m] - y [m])).

Fig. 6.22. Non-partitioned simulation results with minimum position error parameters(Is4 100 seconds, (x [m] - y [m])).

first used to find Pareto-optimal parameter sets that minimize the two er-ror functions. A framework for multi-objective optimization and two searchmethods, MCEA and MOGM, which find Pareto-optimal solutions of thisclass of multi-objective optimization problems efficiently, have been pro-


Fig. 6.23. Non-partitioned simulation results with minimum position error parameters(2nd 100 seconds, (x [m] - y [m])).

Fig. 6.24. Non-partitioned simulation results with minimum orientation error parame-ters {1st 100 seconds, (x [m] - y [m])).

posed. Finally, CoGM has been proposed to select a final parameter setfrom the Pareto-optimal solutions.

The proposed technique was applied to parameter identification of anautonomous vehicle developed by the authors, and a solution was chosen


Fig. 6.25. Non-partitioned simulation results with minimum orientation error parame-ters (2nd 100 seconds, (a: [m] - y [m])).

from the Pareto-optimal solutions derived by MCEA. The solution was com-

pared to the original parameter set and the other Pareto-optimal solutions,

and the appropriateness of the solution in accuracy has been demonstrated.

The parameter set identified by the proposed technique further has

proven to increase the accuracy of simultaneous localization and map-

building of an autonomous vehicle by an average of 11.3% in comparison to

the navigation with the original parameter set used. This result has indi-

cates the overall effectiveness of the proposed technique for the parameter

identification of an autonomous vehicle.

6.7. Acknowledgement

This work is supported by the ARC Centre of Excellence program, funded

by the Australian Research Council (ARC) and the New South Wales State

Government.

References

1. H. F. Durrant-Whyte, International Journal of Robotics Research 15(5),407-440 (1986).

2. R. Madhavan, G. Dissanayake, H. F. Durrant-Whyte, J. M. Roberts, P. I.Corke and J. Cunningham, Mineral Resources Engineering 8(3), 313-323(1994).


3. T. Pilarski, M. Happold, H. Pangels, M. Ollis, K. Fitzpatrick and A. Stentz,Proceedings of the 8th International Topical Meeting on Robotics and Re-mote Systems, April (1999).

4. S. Scheding, G. Dissanayake, E. Nebot and H. F. Durrant-Whyte, IEEETransactions on Robotics and Automation 15(1), 85-95 (1999).

5. J. Borenstein and L. Feng, IEEE Transactions on Robotics and Automation12(6), 869-880 (1996).

6. S. Singh and D. H. Shin, Vision and Navigation: The CMU Navlab C. E.Thorpe, Ed., Kluwer Press, 365 (1990).

7. T. Furukawa and G. Dissanayake, Engineering Optimization 34(4), 22-48(2002).

8. C. A. C. Coello, International Journal of Knowledge and Information Sys-tems 1(3), 269-308, (1999).

9. D. V. Van Veldhuizen and G. B. Lamont, Evolutionary Computation 8(2),125-147, (2000).

10. R. Kumar and P. Rockett, Evolutionary Computation 10(3), 283-314,(2002).

11. A. Toffolo and E. Benini, Evolutionary Computation 11(2), 151-167, (2003).12. L. Costa and P. Oliveira, Evolutionary Computation 11(4), 417-438, (2003).13. Y. Bard, Nonlinear Parameter Estimation (Academic Press, New York,

1976).14. L. C. W. Dixon, Nonlinear Optimisation (The English Universities Press,

London, 1972).15. W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, Numer-

ical Recipes in C (Cambridge University Press, Cambridge, 1988).16. G. L. Nemhauser, A. H. G. Rinnooy Kan and M. J. Todd, Handbooks in

Operations Management Science Vol. 1 Optimization (Elsevier Science Pub-lishers B.V., Amsterdam, 1989).

17. F. Hoffmeister and T. Baeck, Genetic Algorithms and Evolution Strategies:Similarities and Differences (Technical Report, University of Dortmund,Germany, Sys-1/92, 1992).

18. T. Baek and H.-P. Schwefel, International Journal of Evolutionary Compu-tation 1(1), 1-23, (1993).

19. T. Furukawa and G. Dissanayake, Proceedings of the 71st JSME AnnualMeeting 930(71), 509-510 (1993).

20. T. Furukawa and G. Yagawa, International Journal for Numerical Methodsin Engineering 40, 1071-1090 (1997).

21. C. M. Fonseca and P. J. Fleming, Proceedings of the Fifth InternationalConference on Genetic Algorithms, (S. Forrest, Ed., Morgan Kaufmann,San Mateo, CA, 416-423, 1993).

22. C. M. Fonseca and P. J. Fleming, International Journal of EvolutionaryComputation, 3(1), 1-16, (1993).

23. T. Furukawa, International Journal for Numerical Methods in Engineering52, 219-238, (2001).

24. J. H. Holland, Adaptation in Natural and Artificial Systems (The Universityof Michigan Press, Michigan, 1975).


25. M. J. Jeong, S. Yoshimura, T. Furukawa, G. Yagawa and Y. J. Kim, Pro-ceedings of Computational Engineering Conference 5, 231-234, (2000).

26. D. Goldberg, Genetic Algorithms in Search, Optimization and MachineLearning (Addison-Wesley, Reading, MA, 1989).

27. T. Kowaltczyk, T. Furukawa, S. Yoshimura and G. Yagawa, Inverse Prob-lems in Engineering Mechanics, Eds. M. Tanaka and G. S. Dulikravich (El-sevier Science Publishers B.V., Amsterdam, 1998), 541-550.

28. J. E. Baker, Proceedings of the First International Conference on GeneticAlgorithms and Their Applications J. J. Grefenstette, Ed., 101-111, (1985).

29. P. Wolfe, Econometrica, 27, 382-398, (1959).30. T. Furukawa, S. Yoshimura and H. Kawai,Proceedings of the Fifth World

Congress on Computational Mechanics, (Eds. H.A. Mang, F.G. Rammer-stofer and J. Eberhardsteiner, Vienna University of Technology, 2002) 1-11.

CHAPTER 7

AUTOMATING CONTROL SYSTEM DESIGN VIA AMULTIOBJECTIVE EVOLUTIONARY ALGORITHM

K.C. Tan* and Y. Li**

* Department of Electrical and Computer EngineeringNational University of Singapore

4 Engineering Drive 3, Singapore 117576Republic of Singapore

E-mail: eletankcQnus.edu.sg** Center for Systems and Control &

Dept. of Electronics and Electrical EngineeringUniversity of Glasgow

Glasgow G12 8LT, UK

This chapter presents a performance-prioritized computer aided controlsystem design (CACSD) methodology using a multi-objective evolution-ary algorithm. The evolutionary CACSD approach unifies different con-trol laws in both the time and frequency domains based upon perfor-mance satisfactions, without the need of aggregating different designcriteria into a compromise function. It is shown that control engineers'expertise as well as settings on goal or priority for different preferenceon each performance requirement can be easily included and modifiedon-line according to the evolving trade-offs, which makes the controllerdesign interactive, transparent and simple for real-time implementation.Advantages of the evolutionary CACSD methodology are illustratedupon a non-minimal phase plant control system, which offer a set oflow-order Pareto optimal controllers satisfying all the conflicting perfor-mance requirements in the face of system constraints.

7.1. Introduction

With rapid developments in linear time-invariant (LTI) control theories andalgorithms in the past few decades, many control schemes ranging from themost straightforward proportional plus integral plus derivative (PID), phaselead/lag and pole-placement schemes to more sophisticated optimal, adap-

155

156 K.C. Tan and Y. Li

tive and robust control algorithms have been available to control engineers.Each of these control schemes, however, employs a different control charac-teristic or design technique that is often restricted ad-hoc to one particularproblem or addresses only a limited subset of performance issues. To de-sign an optimal controller using these methods, control engineers need toselect an appropriate control law that best suits the application on hand,and to determine a practical control structure with a set of optimal con-troller parameters that best satisfies the usually conflicting performancespecifications in both the time and frequency domains.

An effective design approach is to coin the linear controller synthesisby meeting all types of performance requirements and constraints via nu-merical optimization, instead of by a specific control scheme or in a narrowproblem domain. This approach of simultaneously addressing design speci-fications in both the time and frequency domains is, however, semi-infiniteand generally not everywhere differentiable1"6. Therefore conventional nu-merical approaches that often rely on a smooth and differentiable perfor-mance index can only address a small subset of the problem or to limit thetype of the design specifications for convex optimization7"8, which formsthe major obstacle on the development of a generalized numerical optimiza-tion package for practical control applications.

In this chapter, a uniform CACSD methodology is presented to accom-modate LTI control laws based on performance requirements and practicaldesign constraints in both the time and frequency domains, without theneed of linear parameterization or confining the design in a particular do-main for convex optimization. Unlike existing mutually independent andindividual LTI control schemes, control engineers can easily address prac-tical performance requirements such as rise time or overshoots in the timedomain, and formulate the robustness specifications such as disturbancerejection or plant uncertainty according to the well developed robustnesstheorems in the frequency domain, as desired.

Developing such an optimal unified linear time-invariant control (UL-TIC) system, however, requires a powerful and global multi-objective opti-mization technique to determine the multiple controller parameters simul-taneously, in order to satisfy a set of usually conflicting design specificationsin a multi-modal multi-objective design space. Complexity, nonlinearity andconstraints in practical systems, such as voltage/current limits, saturation,transportation delays, noise or disturbance, cause the design problem spaceto be discontinuous and difficult to solve using conventional analytical orCACSD software packages. Current numerical methods employed in ex-

Automating Control System Design via a Multiobjective Evolutionary Algorithm 157

isting CACSD tools are based upon a-priori gradient-guided approaches,which are often applicable to a subset of design problem or only useful forcontrol system analysis and simulations2'3. These tools are computation-ally intractable because in the worst case their computation time growsexponentially with the number of design parameters. They are incapableof delivering a global, high-dimensional and automated multi-objective de-sign solution in designing an optimal ULTIC system. Since practical designspecifications and constraints are often mixed or competing among eachother, using such a CACSD package for optimal ULTIC designs often re-quires control engineers to go through numerous heuristic simulations andanalysis before a 'satisfactory' design emerges.

The simulation and analytical power of modern CACSD can, however,be utilized to achieve design automation of ULTIC systems if it is inter-faced and coupled with powerful evolutionary based intelligent search tools.Sedgewick9 pointed out that one way to extend the power of a digital com-puter is to endow it with the power of intelligent non-determinism to assertthat when an algorithm is faced with a choice of search options, it hasthe power to intelligently 'guess' for the right one. Artificially emulatingDarwinian's principle of 'survival-of-the-fittest' on natural selection andgenetics10, evolutionary algorithm is such a non-deterministic polynomial(NP) computing technique that has the ability to replace human 'trial-and-error' based iterative process by intelligent computer-automated designs.Using such an evolutionary design optimization approach, control engineers'expertise can also be easily incorporated into the initial design 'database' forintelligent design-reuse to achieve a faster convergence11. More importantly,such an evolutionary CACSD approach allows any mixed or sophisticatedconflicting specifications and constraints in practical applications be unifiedand addressed easily under one design banner: Performance Satisfaction.

This chapter presents an MOEA application to CACSD design automa-tion in ULTIC systems by unifying all LTI approaches under performancesatisfactions in both the time and frequency domains. Unlike existing multi-objective optimization methods that linearly combine multiple attributesto form a composite scalar objective function, the MOEA incorporates theconcept of Pareto's domination to evolve a family of non-dominated so-lutions along the Pareto optimal frontier. Further, each of the objectivecomponents can have different priorities or preferences to guide the op-timization from individual design specifications rather than manually pre-weighting the objective functions. Besides the flexibility of specifying a low-order controller structure to simplify the design and implementation tasks,


the design approach also allows control engineers to interplay and examinedifferent trade-offs among the multiple performance requirements. Such anevolutionary 'intelligent' CACSD methodology for optimal ULTIC designshas been successfully applied to many control engineering applications2"4.

The overall architecture of the evolutionary CACSD methodology foroptimal ULTIC systems is presented in Section 2, which includes the UL-TIC system formulation and formation of various design specifications com-monly adopted in practical applications. Validation of the methodologyagainst practical ULTIC design problem for a single-input single-output(SISO) non-minimal phase plant is given in Section 3. Conclusions aredrawn in Section 4.

7.2. Performance Based Design Unification andAutomation

Almost all types of LTI controllers are in the form of a transfer functionmatrix or its bijective state-space equation when the design is eventuallycomplete. The order and the coefficients of the transfer function, however,vary with the control law or a compromise design objective as to satisfy cer-tain design specifications. For example, a controller designed from the linearquadratic regulator (LQR) scheme tends to offer a minimized quadratic er-ror with some minimal control effort, while an H^ controller provides therobust performance with a minimal value of mixed sensitivity function. Al-though the obtained coefficients or orders of these two types of controllersmay be different, the common purpose of both control laws is to devisean LTI controller that could guarantee a closed-loop performance to meetcertain customer specifications in either the time or the frequency domain.

Therefore a step towards the unification of LTI control laws is to cointhe controller design by meeting practical performance specifications viaCACSD optimization approach, instead of by a particular control schemeor in a confined problem domain. This CACSD unified approach shouldeliminate the need of pre-selecting a specific control scheme for a givenapplication, so as to form a performance-prioritized unified design that iseasily understood and applicable to practical control engineers. Further, itshould be capable of incorporating performance specifications in both thetime and frequency domains that engineers are familiar with, and take intoaccount various system constraints12"14.


7.2.1. The Overall Design Architecture

The overall evolutionary CACSD paradigm for ULTIC systems is illus-trated in Fig. 7.1. As highlighted in the Introduction, design unificationof LTI control system can be formulated as an interactive multi-objectiveoptimization problem that searches for a set of Pareto optimal controllerssatisfying the often-conflicting practical performance requirements. Such adesign optimization cycle accommodates three different modules: the inter-active human decision-making module (control engineer), the optimizationmodule (MOEA toolbox15) and the control module (system and specifica-tions). According to the system performance requirements as well as anya-priori knowledge on the problem on-hand, control engineers may spec-ify or select a set of desired specifications from a template15 and forms amultiple-cost function in the control module, which need not necessarilybe convex or confined to a particular control scheme. These ULTIC designspecifications can also incorporate different performances in both the timeand frequency domain or other system characteristics such as poles, zeros oretc., as desired. Based on these performance specifications, responses of thecontrol system consists of the set of input/output signals, the plant modeland the candidate controller that is recommended from the optimizationmodule are evaluated so as to determine the different cost values for eachdesign specification in the multiple-cost function.

According to the evaluation result of the cost function in the controlmodule and the design guidance, if any, such as goal and priority informa-tion from the decision-making module, the optimization module (MOEAtoolbox15) automates the ULTIC design process and intelligently searchesfor the 'optimal' controller parameters that best satisfy the set of perfor-mance specifications. On-line optimization progress and simulation results,such as the design trade-offs or convergence trace can be displayed graphi-cally and feedback to the decision-making module. In this way, the overallULTIC design environment can be supervised and monitored effectively,which helps control engineers in making any further actions such as ex-amining the competing design trade-offs, altering the design specifications,adjusting goal settings that are too stringent or generous, or even modifyingthe control and system structure if necessary. This man-machine interactivedesign and optimization process maybe proceeded until all design specifi-cations have been met or the control engineer is satisfied with the controlperformances. One merit of such approach is that the design problem aswell as interaction with the optimization process is closely linked to the en-


Fig. 7.1. A general CACSD architecture for evolutionary ULTIC systems

vironment of that particular application. A control engineer, in most cases,is not required to deal with any details that are related to the optimizationalgorithm or to worry about any possible ill-conditioning problem in thedesigns1.

7.2.2. Control System Formulation

A general control system configuration for posing performance specifica-tions is shown in the control module of Fig. 7.1. The operator G is a 2x2block transfer matrix mapping the inputs w and u to the outputs z and

[y\ [G21G22\[u\ {)

The actual process or plant is represented by the sub-matrix G22, i-e.the nominal model Go, which is linear time-invariant and may be unspec-ified except for the constraint of lying within a given set II ('uncertaintymodeling'). H is the ULTIC controller to be designed in order to satisfy allspecifications and constraints in the system as given by

H , , = Pitj<nSn"m~1 + ••• + P i , j , m + 2 S + P i , j , m + 1 , y1<J Pi,j,mSm + • • • + Pi,j,lS + Piij<0


where i, j denotes the respective elements in the transfer matrix andPi,j,k S 3f+ VA: 6 {0, 1, ..., n} are the coefficients to be determined inthe design; y is the signal that the controller has access to, and u is theoutput of the controller with usually a hard constraint saturation range suchas the limited drive voltage or current. The mapping from the exogenousinputs w (disturbances, noise, reference commands etc.,) to the regulatedoutputs z (tracking errors, control inputs, measured outputs etc.,) containsall the input-output maps of interest12. As illustrated in Fig. 7.1, the evo-lutionary CACSD for ULTIC systems is to find an optimal controller Hthat minimizes a set of performance requirements in terms of magnitudeor norm of the map from wto z in both the time and frequency domains,subject to certain constraints on the behavior of the system.

7.2.3. Performance Specifications

In developing the ULTIC systems, a set of objectives or specifications isoften formed as to reflect the various performance requirements that areneeded in designing a practical control system. Existing CACSD approachesrequire the performance index for these design objectives to be within aconvex set or restricted to a confined problem domain, which may be im-practical. In contrast, there is no hard limitation or objectives transfor-mation needed in the evolutionary ULTIC system designs. This advan-tage allows many system constraints or conflicting specifications in boththe time and frequency domains to be easily incorporated in the design,which are unmatched using conventional CACSD methods. To guide thea-posteriori non-deterministic evolution towards the global optimum, theevolutionary approach merely requires a performance index to indicate therelative strength for each candidate design, which is naturally available orcan be easily formulated for most practical control applications. In orderto address the various design specifications commonly accommodated inpractical control applications, it is essential that the design objectives for-mulated in ULITC systems should at least reflect the following performancerequirements:

7.2.3.1. Stability

Stability is often the first concern in any control system designs, which couldbe determined by solving the roots of the characteristic polynomials. Thecost of stability can then be defined as the total number of unstable closed-loop poles or the positive poles on the right-hand-side of the s-plane as


given by Nr{Re(eig) > 0}, i.e. no right-hand poles on the s-plane indicatesthat the system is stable and vice versa.

7.2.3.2. Step Response Specifications

Practical control engineers often address system transient and steady-stateperformances in terms of time domain specifications. These time domainperformances are specified upon step response since it gives a good indica-tion of the response for the controlled variable to command inputs that areconstant for long periods and occasionally change quickly to a new value.For a SISO system, the performance requirement of steady-state accuracycan be defined as ess < 1 - y(t)t^roc , i.e. the difference between the actualresponse of the commanded variables after the system is settled down.

7.2.3.3. Disturbance Rejection

The disturbance rejection problem is defined as follows: find a feedbackcontroller that minimizes the maximum amplitude (HQQ norm) of the regu-lated output over all possible disturbances of bounded magnitude. A generalstructure to represent the disturbance rejection for a broad class of controlproblems is given in Fig. 7.2, which depicts the particular case where thedisturbance enters the system at the plant output. The mathematical rep-resentation is given by,

y = z = Gou+W1d* = W1{I + G0Hy1=W1S

U

The matrix 5is known as the sensitivity function and the maximum singularvalues of S determines the disturbance attenuation, since 5 is in fact theclosed-loop transfer function from disturbance d to the measured output y.W\ is the desired disturbance attenuation factor, which is often a functionof frequency to allow a different attenuation factor at each frequency. Thedisturbance attenuation specification may thus be given as

a(S)<\\Wrl\\oo => IJWi lU < 1 (4)

where a defines the largest singular value of a matrix.

7.2.3.4. Robust Stability

It is important that the designed closed-loop system is stable and providesguaranteed bounds on the performance deterioration, even for 'large' plant


Fig. 7.2. A disturbance rejection problem

variations that maybe occurred in practical applications. Roughly speaking,a robust stability specification requires some design specifications to behold, even if the plant Go is replaced by any Gpert from the specified set IIof possible perturbed plants.

Small Gain Theorem: Suppose the nominal plant Go in Fig. 7.3 is sta-ble with the multiplicative uncertainty A being zero. Then the size of thesmallest stable A for which the system becomes unstable is16

. . . 1 I + G0Ha (A) = —— = (5)a (T) G0H ^

Therefore the singular value Bode plot of the complementary sensitivityfunction T can be used to measure the stability margins of the feedback sys-tem in face of multiplicative plant uncertainties. The multiplicative stabilitymargin is, by definition, the 'size' of the smallest stable A that destabilizesthe system as shown in Fig. 7.3. According to the small gain theorem, thesmaller ~o (T) is, the greater the size of the smallest destabilizing multiplica-tive perturbation will be and, hence, the greater the stability margins of thesystem. The stability margin of a closed-loop system can thus be specifiedvia the singular value inequalities such as

aiTXWW^W^ =* \\W2T\\OO< 1 (6)

where IJH ""11| are the respective sizes of the largest anticipated multi-plicative plant uncertainties.

7.2.3.5. Actuator Saturation

In a practical control system, the size of actuator signals should be limitedsince a large actuator signal may be associated with excessive power con-


Fig. 7.3. Stability robustness problem with multiplicative perturbation

sumption or resource usage, apart from its drawback as a disturbance toother parts of the systems if not subject to hardware limitation. A generalstructure for saturation nonlinearities at the input of the plant is shown inFig. 7.4. To pose this problem, a saturation function is denned,

1 Umaxsgn{u) \u\ > f/max

Let the plant be described as Gou = Go-Sat(u), the objective is to designan optimal ULTIC controller H that satisfies all the design specificationswith an allowable control effort of max(u) < Um&x, so as to stay in thelinear region of the operation. Note that performances of the closed-loopsystem such as tracking accuracy and disturbance attenuation are boundedby the actuator saturation specification, i.e. a smaller control effort oftenresults in a poorer performance in tracking and disturbance rejection dueto the limited control gain in order to operate the system in a linear region.In addition, the stability for such a system will mean the local stability ofthe nonlinear system.

Fig. 7.4. Saturation nonlinearities at the plant input

7.2.3.6. Minimal Controller Order

It is often desired that the controller to be designed in practical controlsystem is as simple as possible, since a simple controller would require less

(7)


computation and implementation effort than a higher-order controller17. Itis thus useful to include the order of ULTIC controller as one of the designspecification here, in order to find the smallest-order controller that satisfiesall the performance requirements and system constraints.

The performance and robustness specifications that are formulatedabove cover the usual design requirements in many practical control appli-cations. Note that other design specifications such as phase/gain margin,time delay, noise rejection etc., could also be easily added to the ULTICsystem in a similar way, if desired. As addressed in the Introduction, design-ing an optimal ULTIC system requires simultaneously optimizing multiplecontroller coefficients to satisfy the set of conflicting design specifications.It leads to a multi-dimensional and multi-modal design problem character-ized by the multi-objective performance indices, which can be tackled viaa multiobjective evolutionary algorithm.

7.3. An Evolutionary ULTIC Design Application

In this section, the control system design application of a non-minimumphase SISO plant using an MOEA toolbox15 is presented to illustrate theeffectiveness of the evolutionary ULTIC design methodology. Consideringthe following non-minimal phase plant as studied in18:

r n = -1.3(a-5.5307)(a +4.9083)o U s(s + 0.3565 -5.27j)(s + 0.3565 + 5.27j)(s + 0.0007) {>

This nominal model has a 'non-minimum phase' zero at z = 5.5307 anda nearly unstable pole at p = -0.0007, which makes it an interesting robustcontrol design problem. Here, the aim is to design an ULTIC controller thatmeets a set of time and frequency domain performance requirements, whilesatisfying certain system constraints such as actuator saturation. Fig. 7.5shows the overall design block diagram of the ULTIC system, which includeseight design objectives and one hard actuator constraint to be satisfied aslisted in Table 7.33. The underlying aim of setting the priority vector in thesecond last column of Table 7.33 is to obtain a controller that first stabilizesthe system within the actuator saturation limit for hardware implementa-tion. Note that the actuator saturation is set as a hard constraint reflectingthe hard limit of this performance requirement, which requires no furtherminimization if the control action u is within the saturation limit. Further,the system must be robust to plant uncertainty and disturbance attenua-tion under the level of tolerances as defined by the weighting functions ofWi and W2 in Fig. 7.6 18. Having fulfilled these requirements, the system


should also satisfy some time domain specifications as defined by the tran-sient and steady-state responses. Although determination of the objectiveand the priority settings may be a subjective matter and depends on theperformance requirements, ranking the priorities may be unnecessary andcan be ignored for a 'minimum-commitment' design19. If, however, an engi-neer commits himself to prioritizing the objectives, it is a much easier taskthan weighting the different objectives that are compulsory in objectivefunction aggregation approaches6.

Fig. 7.5. Block diagram of the ULTIC system design

Fig. 7.6. Frequency responses of W\ and W?

The order of all candidate controllers is not fixed, while allowing itsmaximum to be of third-order. Parameter settings of the MOEA toolbox15


Table 7.33. Time and frequency domain design specifications for the non-minimalphase plant.

Design specification Objective Goal Priority ConstraintI 1. Stability Nr[Ke(eig)] 0 I soft

(closed- > 0 (Sta)e . loop poles)

domain 2 . Disturbance S 1 3 soft

rejection3. Plant T I 3 soft

uncertainty4. Controller Co 3™ 5 soft

order5. Actuator ~Aci 0.5 V 2 hardsaturation

Time 6. Rise time Tr 4 s 4 softdomain 7. Overshoots Mv 0.05 4 soft

8. 5% settling 7^ f s 4 softtime

9. Steady-state e^ 0.01 s 4 softerror

are shown in Fig. 7.7. The design took less than 2 hours on a Pentium II350MHz processor, with a population and generation size of 100. At theend of the evolution, all ULTIC controllers recommended by the toolboxhave met the nine design specifications as listed in Table 7.33. Among thesecontrollers, 88 are of second-order and 12 are of third-order.

The system closed-loop responses for these ULTIC controllers are shownin Fig. 7.8, where all the responses are within the clear area showing goodperformance of the time domain specifications. Fig. 7.9 shows the frequencyresponses of both WiS and W2T for all the Pareto optimal controllers, inwhich the gains of the responses are satisfactory less than the requiredmagnitude of 0 dB.

To illustrate robustness of the evolutionary designed ULTIC system ondisturbance rejection, a sinusoidal acted as disturbance signal was appliedto the system, with an amplitude and angular frequency of 1 volt and0.05 rad/s, respectively. The sinusoidal and its attenuated signal for allPareto optimal ULTIC controllers are shown by the dashed and solid linein Fig. 7.10, respectively. Clearly, the disturbance has been attenuated suc-cessfully as required by the 2nd objective in Table 7.33, which had resulteda 10 times in gain reduction of the original sinusoidal signal.

Fig. 7.11 shows the output responses for one of the randomly chosenPareto optimal controller with a perturbed nominal model of eqn. 8 asto study the system robustness in terms of plant uncertainties. The plant

168 K. C. Tan and Y. Li

Fig. 7.7. Quick setups of the MOEA toolbox for the ULTIC problem

Fig. 7.8. The MOEA optimized output responses for the SISO system

is being perturbed simultaneously upon both the zeros and poles of the


Fig. 7.9. Frequency responses of the non-minimal phase system

nominal model in the range of

T-Z-Zl> ?2-p-Pl (9)


Fig. 7.10. The sinusoidal disturbance and its attenuated signal

where z\ = 2z and p\ = l.lp; z and p is the zeros and poles of the nomi-nal plant, respectively. It was observed that plant perturbations upon thesystem poles are much more sensitive than the zeros, due to the 'almost un-stable' pole that is located very near to the imaginary axis, i.e. p = 0.0007.As shown in Fig. 7.11, the ULTIC system is able to maintain relativelygood response and stability performance despite the various perturbationsmade upon the nominal plant.

Apart from the flexibility in analyzing the control performance, theevolutionary design also allows on-line examination of different trade-offsamong the multiple conflicting specifications, modification of existing ob-jectives and constraints or zooms into any region of interest before selectingone final controller for real-time implementation. The trade-off graph of theresultant 100 ULTIC controllers is shown in Fig. 7.12, where each line rep-resenting a solution found by the evolutionary optimization. The x—axisshows the design specifications, the y-axis shows the normalized cost foreach objective and the cross-mark shows the desired goal setting for eachspecification. Clearly, trade-offs between adjacent specifications results inthe crossing of the lines between them, whereas concurrent lines that donot cross each other indicating the specifications do not compete with oneanother. For example, the specification of tracking error (ess) and controller


Fig. 7.11. Output responses of the ULITC system with plant uncertainties

order (Co) do not directly compete against each other, whereas the sensi-tivity function (5) and complementary sensitivity function (T) appear tocompete heavily, as expected.

Information contained in the trade-off graph of Fig. 7.12 also suggeststhat a lower goal setting of rise time and settling time is possible, and theseobjectives could be further optimized to arrive at even better transient per-formance if desired. A powerful feature of designing ULTIC system usingMOEA is that all the goal and priority settings can be conveniently exam-ined and modified at any time during the evolution process. For example,the designer may change his preference and decide to set a goal settingof 2™d-order, instead of the 3rd-order, for the controller order specificationafter certain number of generations. Fig. 7.13 illustrates the behavior ofthe evolution upon online modification of this goal setting after the designin Fig. 7.12. Due to the sudden change of a tighter goal setting, none ofthe individuals manages to meet all the required specifications as shownin Fig. 7.13(a). After continuing the evolution for 5 generations, the trade-offs move towards satisfying the controller order specification at the perfor-mance expenses of other objectives as shown in Fig. 7.13(b). In Fig. 7.13(c),the evolution continues and again leads to the satisfaction of all the goalsettings including the controller order specification, by having less room


Fig. 7.12. Trade-off graph of the final evolutionary designed ULTIC system

for further improvement of other design objectives or achieving less Paretooptimal solutions as compared to the one in Fig. 7.12. Clearly, this man-machine interactive design approach has enabled control engineers to divertthe evolution into any interested trade-off regions as well as to modify cer-tain specifications or preferences on-line, without the need of restarting theentire design cycles as required by conventional methods.

7.4. Conclusions

This chapter has presented an automated CACSD design methodology foruniform LTI control systems using an MOEA, which is capable of unifyingdifferent LTI design schemes under performance satisfactions and elimi-nating the need of pre-selecting a specific control law. Unlike conventionalmethods, control engineers' expertise as well as settings on goal or priorityfor different preference on each design specification can be easily incorpo-rated and modified on-line according to the evolving trade-offs, withoutthe need of repeating the whole design process. In principle, any number orcombination of constraints and performance specifications can be includedin the evolutionary ULTIC design if desired. Validation results upon a non-minimum phase control system illustrate the efficiency and effectiveness ofthe methodology.


(a) Reducing the goal setting of controller order from 3rd- to 2nd- order

(b) After 5 generations

(c) After another 5 generations

Fig. 7.13. Effects of the evolution upon the on-line modification of goal setting


References

1. W.T. Nye and A.L. Tits, An application-oriented, optimization-basedmethodology for interactive design of engineering systems, Int. J. Contr.,vol. 43, pp. 1693-1721 (1986).2. Y. Li, K.C. Tan, K.C. Ng and D.J. Murray-Smith, Performance basedlinear control system design by genetic algorithm with simulated annealing,Proc. 34th Conf. on Decision and Conir.,New Orleans, pp. 731-736 (1995).3. Y. Li, K.C. Tan and C. Marionneau, Direct design of linear control sys-tems from plant I/O data using parallel evolutionary algorithms, Int. Conf.on Control'96, Special Session on Evolutionary Algorithms for Contr. Eng.,University of Exeter, UK, pp. 680-686 (1996).4. K.C. Tan and Y. Li, Multi-objective genetic algorithm based time andfrequency domain design unification of control systems, IFAC Int. Sym. onArtificial Intelligence in Real-Time Contr., Kuala Lumpur, Malaysia, pp. 61-66 (1997).5. P.J. Fleming and A.P. Pashkevich, Application of multi-objective optimiza-tion to compensator design for SISO control systems, Electronics Letters, vol.22, no. 5, pp. 258-259 (1986).6. W.Y. Ng, Interactive Multi-objective Programming as a Framework forComputer-aided Control System Design, Lecture notes in control and infor-mation sciences (Springer-Verlag, 1989).7. R.G. Becker, A.J. Heunis and D. Q. Mayne, Computer-aided design ofcontrol systems via optimization, IEE Proc. Ft. D, vol. 126, no. 6, pp. 573-578 (1979).8. E. Polak, D.Q. Mayne and D.M. Stimler, Control system design via semi-infinite optimization: A review, Proc. IEEE, vol. 72, no. 12, pp. 1777-1794(1984).9. R. Sedgewick, Algorithms, 2nd Edition (Addison-Wesley, Reading, MA,1988).10. Z. Michalewicz, Genetic Algorithms + Data Structure = EvolutionaryPrograms, 2nd Edition (Springer-Verlag, Berlin, 1994).11. K.J. MacCallum, Design reuse - Design concepts in new engineering con-texts, Proc. Control, Design and Production Research Conf., Heriot-WattUniversity, pp. 51-57 (1995).12. M.A. Dahleh and I. Diaz-Bobillo, Control of Uncertain Systems: A LinearProgramming Approach (Prentice Hall, Englewood Cliffs, NJ, 1995).13. W.S. Levine and M.B. Tischler, CONDUIT - Control Designer's UnifiedInterface. IEEE Int. Conf. Contr. Appl. and Sys. Design, Hawaii, pp. 422-427(1999).14. H.A. Barker, Open environments and object-oriented methods forcomputer-aided control system design, Contr. Eng. Practice, vol. 3, no. 3,pp. 347-356 (1995).15. K.C. Tan, T.H. Lee, D. Khoo and E.F. Khor, A multi-objective evolu-tionary algorithm toolbox for computer-aided multi-objective optimization,IEEE Transactions on Systems, Man and Cybernetics: Part B (Cybernetics),


vol. 31, no. 4, pp. 537-556 (2001).16. G. Zames, On the input-output stability of time-varying non-linear feed-back systems, Parts I and II, IEEE Trans. Auto. Contr., AC-11, 2 & 3, pp.228-238 & 465-476 (1966).17. P. Schroder, A.J. Chipperfield, P.J. Fleming and N. Grum, Multi-objective optimization of distributed active magnetic bearing controllers,Conf. on Genetic Algorithms in Engineering Systems: Innovations and Ap-plications, pp. 13-18 (1997)18. J.C. Doyle, B. Francis and A. Tannenbaum, Feedback Control Theory(Macmillan Publishing Company, New York, 1992).19. K.X. Guan and K.J. MacCallum, Adopting a minimum commitment prin-ciple for computer aided geometric design systems. Artificial Intelligence inDesign '96 (Gero, J. S. and Sudweeks, F., eds) (Kluwer Academic Publishers,1996), pp. 623-639.

CHAPTER 8

THE USE OF EVOLUTIONARY ALGORITHMS TO SOLVEPRACTICAL PROBLEMS IN POLYMER EXTRUSION

Antonio Gaspar-Cunha and Jose A. Covas

IPC - Institute for Polymers and CompositesDept. of Polymer Engineering

University of Minho, 4800-058 Guimaraes, PortugalE-mail: gaspar,[email protected]

This work aims at selecting the operating conditions and designingscrews that optimize the performance of single-screw and co-rotatingtwin-screw extruders, which are machines widely used by the polymerprocessing industry. A special MOEA, denoted as Reduced Pareto SetGenetic Algorithm, RPSGAe, is presented and used to solve these multi-objective combinatorial problems. Twin screw design is formulated as aTravelling Salesman Problem, TSP, given its discrete nature. Variouscase studies are analyzed and their validity is discussed, thus demon-strating the potential practical usefulness of this approach.

8.1. Introduction

Polymer extrusion is a major plastics processing technology used for themanufacture of a wide range of plastics products (such as pipes and pro-files, film, sheet, filaments, fibers, electrical wires and cables) and also forthe production of raw materials {e.g., modified polymers, polymer blends,fiber/polymer matrix composites, biodegradable systems)1'2. The essentialunit of an extrusion line is the extruder, which is composed of one (singlescrew extruder) or more screws (the most common being the co-rotatingtwin screw extruder) rotating at constant speed inside a heated barrel. Solidpolymer (in pellets or powder form) is supplied to the screw channel eitherby gravity flow from a hopper or by a feeder set at a prescribed rate. Thesolid progresses along the screw and melts due to the combined effect ofconducted and dissipated heat. This (highly viscous non-Newtonian) meltis subsequently homogenized (via both dispersive and distributive mixing),

177

178 A. Gaspar-Cunha and J.A. Covas

pressurized and forced to pass through the die, where it is shaped into therequired cross-section, before being quenched1"3. Mathematical modellingof the global process involves coupling a sequence of numerical routines,each valid for a process stage where specific physical/rheological phenom-ena develop (namely solids conveying, melting, melt conveying, dispersive-distributive mixing, devolatilization) 1 - 3 . In other words, each zone is de-scribed by the relevant governing equations (mass conservation, momentumand energy), together with constitutive equations describing the rheologicaland thermal responses of the material, linked to the adjacent zones throughthe appropriate boundary conditions.

The relative simplicity of the screw extruder geometry masks the com-plexity of the flow developed. In practice, setting the operating conditionsand/or designing screws for new applications are usually carried out bya trial-and-error procedure, where tentative extrusion experiments, or ma-chining of screws, are performed until satisfactory results (i.e., the desirableperformance) are obtained. Since the above targets correspond to multi-objective problems, and given their typology, they can instead be solvedadopting a scientific methodology based on Multi-Objective EvolutionaryAlgorithms (MOEAs)4'5. The present work focus on the application of thisoptimization methodology to single and twin-screw polymer extrusion. Forthis purpose, a special MOEA, denoted as Reduced Pareto Set GeneticAlgorithm with elitism (RPSGAe), is proposed6'7. This algorithm uses aclustering technique to reduce the number of solutions on the efficient fron-tier. Fitness is determined through a ranking function, the individuals beingsorted using the same clustering technique.

Thus, section 2 presents the main functional process features and dis-cusses the characteristics of the optimization problems. The RPSGAe ispresented and described in detail in section 3, where a specific screw designmethodology is also proposed. Evolutionary algorithms are then used insection 4 to set the operating conditions and to design screws for single andtwin-screw extruders.

8.2. Polymer Extrusion

8.2.1. Single Screw Extrusion

A conventional plasticating single-screw extrusion unit uses an Archimedes-type screw (with at least three distinct geometrical zones in terms of channeldepth), rotating at constant speed, inside a heated barrel. As illustrated inFig. l.A, intensive experimental research demonstrated that the material

The Use of EAs to Solve Practical Problems in Polymer Extrusion 179

deposited in the hopper passes through various sequential functional zoneswhich will induce a certain thermo-mechanical environment1'7. Flow in thehopper is due to gravity, while that in the first screw turns results fromfriction dragging (solids conveying). Soon, a melt film will form near to theinner barrel wall (delay zone), followed by the creation and growth of a meltpool (melting zone). Eventually, all fluid elements will progress along thescrew channel following an helicoidal path (melt conveying) and pressureflow will take place in the die.

Figure 2 shows the physical assumptions underlying the mathematicalmodel of the global process. Calculations are performed in small screwchannel increments, a detailed description being available elsewhere7"9. Fora given polymer / system geometry / operating conditions set, the programnot only predicts the evolution of important process variables along thescrew (as shown in Fig. l.B for pressure and melting rate), but also yieldsthe values of parameters which, altogether, describe the overall processperformance (these include - see Fig. l.C - mass output, mechanical powerconsumption, length of screw required for melting, melt temperature, degreeof mixing - WATS and viscous dissipation, which is quantified by the ratiomaximum temperature / barrel temperature)7.

The process is quite sensitive to changes in geometry and/or operatingconditions. As can be observed in the example of Fig. l.C, an increase inscrew speed produces an increase in mass output, but at the cost of morepower consumption, higher melt temperatures - due to viscous dissipation- and lower mixing quality. In fact, WATS generally decreases with increas-ing screw speed, as there is less channel length available for mixing (due tolower melting rates) and shorter residence times. Therefore, setting the op-erating conditions requires establishing a compromise between the relativesatisfaction of the above parameters. The same reasoning could be appliedto screw design.

8.2.2. Co-Rotating Twin-Screw Extrusion

The limitations of single screw extruders in terms of the interdependencebetween output, die resistance and mixing quality, as well as in the ca-pability of producing effective random distributive and dispersive mixingstimulated the use of co-rotating twin-screw extruders for compoundingoperations1'2. In these machines two parallel intermeshing screws rotate inthe same direction, inside a cavity with a cross-section with a format-of-8.Since the screws are generally of modular construction, it is possible to


A)

B)

C)

Fig. 8.1. Single-screw extruder: A) geometry; B) melt pressure and melting profiles; C)performance measures.

build profiles where the location of melting, mixing intensity and averageresidence time can be estimated a priori. Also, the barrel can contain aper-


Fig. 8.2. Physical models for single-screw extrusion.

tures for secondary feeding (e.g., additives, fillers), devolatilization (e.g.,removal of water vapor or of reaction volatiles), etc. In the case of the ex-truder of Fig. 3.A, the material is supplied at a prescribed rate, so thatconveying sections are only partially fed. Melting will occur at the stag-gering kneading block upstream (by the combined effect of heat conductedand dissipated from the mechanical smearing of the solid pellets), while thethird kneading block will provide the adequate seal for devolatilization.

Although these extruders have also attracted a significant amount of ex-perimental and theoretical work in the last decades10"13, the understandingof certain process stages, such as melting, is still far from complete14^16.Consequently, for modelling purposes melting is often considered as instan-taneous and taking place before the first restrictive element upstream. Fromthe melting location to the die exit computations of melt flow are performedseparately for each type of screw element (right-handed or left-handed screwelements, staggered kneading disks) - as illustrated in Fig. 4. This is also theconcept of the LUDOVIC software17, whose predictions have been shownto be within 10% of the experimental values17'18. As for single screw ex-trusion, for a given polymer / system geometry / operating conditions set,


A

B;

C)

Fig. 8.3. Twin-screw extruder: A) geometry; B) pressure and cumulative residence time;C) performance measures.

the software predicts the evolution along the screw of variables such astemperature, melt pressure, shear rate, viscosity, residence time, specificenergy and filling ratio (Fig. 3.B) and the values of global performanceparameters {e.g., average residence time, average strain, mechanical powerconsumption, maximum melt temperature, outlet temperature, as in Fig.


3.C).The response of these machines is also sensitive to the operating con-

ditions, in this case output, screw rotation speed and temperature. Theeffect of output is illustrated in Fig. 3. Output influences mainly the num-ber of fully filled channels, hence mechanical power consumption, averageresidence time and strain. However, the level of shear stresses at kneadingdisks remains the same, hence the maximum temperatures attained are notaffected.

Fig. 8.4. Physical models for co-rotating twin-screw extrusion.

8.2.3. Optimization Characteristics

As discussed above, for each application the performance of single and twinscrew extruders is determined by the operating conditions and machine ge-ometry. The former include screw speed (N) and barrel temperature profiles(Tbi), and mass output (Q) in the case of twin-screw extruders. As illus-trated in Fig. 5, which identifies the parameters to be optimized for eachtype of machine, N, Tbi, and Q can vary continuously within a prescribedrange, which is dictated by the characteristics of the motor and the thermal


stability of the polymer. In the case of the twin-screw machine N and Qare not independent, since for each N there is a maximum attainable Q (asthe screws become fully filled along their axis). This limit is detected by theLUDOVIC17, which does not converge if the two values are incompatible.

The geometric parameters of single-screw extruders can also vary con-tinuously within a preset interval. As shown in Fig. 5, if one is aiming atdesigning a new screw for an existing extruder, then consideration shouldbe given to the definition of the screw length of the feed {L{) and com-pression (L2) zones, their corresponding internal diameters (Di and D3,respectively), the flight thickness (e) and the screw pitch (P). The variationintervals are defined by a number of reasons, such as excessive mechanicalwork on the polymer (maximum D1/D3 ratio), mechanical resistance of thescrew (minimum D{), polymer conveying characteristics (minimum Li).

Conversely, screws for twin screw extruders are built by selecting therequired number of elements from a set of available geometries and thendefining their relative position. As Fig. 5 shows, if a screw is made of 14elements and the aim is to define the relative position of 10 (of which 5are transport elements, 4 are kneading blocks and 1 is a reverse element),there are 10! possible combinations, i.e., a complex discrete combinatorialproblem must be solved. Although less common, one could also envisageto optimize the geometry of individual elements, which would entail thecontinuous variation of parameters within a prescribed interval.

Despite the obvious practical importance of the topic, there is limitedexperience on the use of an optimization approach to define the operatingconditions or to design screws for polymer extrusion. Most effort has beenconcentrated on single screw extrusion19'20, although Potente et al.21 hasrecently suggested the use of a quality function to optimize the geometryof specific screw elements for twin screw extruders.

8.3. Optimization Algorithm

8.3.1. Multi-Objective Optimization

As most real-world optimization problems, optimization of polymer extru-sion is multi-objective. This can be dealt with in two ways, depending onthe moment when the decision about the relative importance of the variouscriteria is to be taken. If it is feasible to establish that importance beforethe search takes place, then the various individual objectives can be con-gregated into a unique function, yielding a single objective optimizationproblem. However, if the relative weight of each criterion is changed, a new


Fig. 8.5. Parameters to be optimized.

optimization run needs to be carried out.When the relative value of the criteria is not known a priori, it is possible

to take advantage of the fact that Genetic Algorithms work with a popula-tion of points to optimize all criteria simultaneously. This is performed with


a Multi-Objective Evolutionary Algorithm (MOEA). The result will be aset of non-dominated vectors, denoted as Pareto-optimal solutions, evidenc-ing the trade-off between the criteria and the parameters to be optimized.Thus, the decision maker can choose a solution resulting from a specificcompromise between the relative satisfaction of the individual criteria.

8.3.2. Reduced Pareto Set Genetic Algorithm with Elitism(RPSGAe)

In MOEAs the selection phase of a traditional Evolutionary Algorithm isreplaced by a routine able to deal with multiple objectives. Usually, thisis made applying the fitness assignment, density estimation and archiv-ing operators, various methods being available for this purpose4'5. In thiswork, the Reduced Pareto Set Genetic Algorithm with Elitism (RPSGAe)6

is adopted, which involves the application of a clustering technique to re-duce the number of solutions on the efficient frontier, while maintainingintact its characteristics. The clustering technique, proposed by Rosemanand Gero22 and known as complete-linkage method, compares the proxim-ity of solutions on the hyper-space using a measure of the distance betweenthem. Solutions closer to a pre-defined distance are aggregated. Fitness isdetermined through a ranking function, the individuals being sorted withthe same clustering technique. In order to incorporate these techniques inthe EA, Algorithm 1 was developed. The RPSGAe follows the steps of atraditional EA, except it defines an external (elitist) population and uses aspecific fitness evaluation. It starts with the random definition of an internalpopulation of size N and with the creation of an empty external population.At each generation, the following operations are carried out:

• The internal population is evaluated using the modelling package;• Fitness is calculated using the clustering technique (see Algorithm

2 below6);• A fixed number of best individuals are copied to the external pop-

ulation until this becomes full;• Algorithm 2 is applied again, to sort the individuals of the external

population;• A pre-defined number of the best individuals is incorporated in the

internal population, by replacing the lowest fitness individuals;• Reproduction, crossover and mutation operators are applied.


Algorithm 1 (RPSGAe):

Random initial population (internal)Empty external populationwhile not Stop-Condition do

Evaluate internal populationCalculate the Fitness of all the individuals using Algorithm 2Copy the best individuals to the external populationif the external population becomes full

Apply Algorithm 2 to this populationCopy the best individuals to the internal population

end ifSelect the individuals for reproductionCrossoverMutation

end while

Algorithm 2 starts with the definition of the number of ranks, NRankS,and the rank of each individual, Rank[i], is set to 0. For each rank, r,the population is reduced to NR individuals (where NR is the number ofindividuals of each rank), using the clustering technique. Then, rank r isattributed to these NR individuals. The algorithm ends when the numberof pre-defined ranks is reached. Finally, the fitness of individual i (Fi) iscalculated using the following linear ranking function:

Fi = 2 _ SP + 2 {SP ~ 1] {NRanks + 1 ~ Rank [i]) (1)Nflanks

where SP is the selection pressure (1 < SP < 2). Detailed informationon these algorithms can be found elsewhere6'7.

8.3.3. Travelling Salesman Problem

The above RPSGAe can be easily adapted to the various extrusion opti-mization problems involving continuous variables, i.e., setting the operatingconditions for both single and twin-screw extruders and designing screwsfor single-screw extruders. When the aim is to optimize the screw config-uration of twin-screw extruders, a discrete combinatorial problem must besolved (Twin-Screw Configuration Problem, TSCP). However, TSCP can


Algorithm 2 (Clustering):

Definition of NRanks

Rank[i]=0r=\do

NR = r(N/NRanks\Reduce the population down to NR individualsr = r + 1

while (r < NRanks)Calculate fitnessEnd

be formulated as a Travelling Salesman Problem (TSP), as illustrated inFig. 6. In the TSP the salesman needs to visit n cities, the aim being to se-lect the visiting sequence that minimizes the distance travelled and/or thetotal cost (two alternative routes are suggested). In the TSCP the polymeris the Travelling Salesman and the screw elements are the cities. In thiscase, the polymer must flow through the different elements, whose locationin the screw has to be determined in order to maximize the global processperformance.

Fig. 8.6. Twin-screw configuration problem (TSCP) formulated as a TSP.

Formulating TSCP as a TSP yields the possibility of using the vastnumber of algorithms available to solve the latter. In fact, single objective


TSPs have been solved using EAs23'24 but, apparently, only Zhenyu25 ap-proached multi-objective TSPs. The difficulty of using MOEA arises fromthe fact that the traditional crossover and mutation operators are not suffi-ciently capable of granting a positive and rapid evolution of the populationalong the various generations26. Thus, a specific TSP reproduction opera-tor, incorporating crossover and mutation, and able to make full use of theheuristic information contained in the population, the inver-over, has beensuggested. It has been shown to out-perform other evolutionary operatorsin the resolution of single objective TSPs26.

Consequently, a MOEA for solving multi-objective TSP (or, equiva-lently, TSCP) was developed (Algorithm 3). It starts with the randomgeneration of the N individuals of the internal population and an emptyexternal population of size 2 * N. After evaluating the former using theLUDOVIC routine, the following actions are taken for each generation:

• The individuals are ranked using Algorithm 2;• The entire internal population is copied to the elitist population;• The inver-over operator is applied in order to generate the remain-

ing N individuals of the elitist population;• The new individuals are evaluated;• The non-domination test and Algorithm 2 are applied to the elitist

population to rank its 2N individuals;• The best N individuals of the elitist population are copied to the

main population.

The algorithm is concluded when the number of generations is reached.The solutions are the non-dominated individuals of the last internal popu-lation.

8.4. Results and Discussion

The optimization algorithms discussed in the previous section will nowbe used to solve the situations depicted in Fig. 5. Single and twin screwextrusion will be studied separately and, for each, the operating conditionsand the screw geometry will be optimized.

8.4.1. Single Screw Extrusion

Operating conditionsThe aim is to determine the operating conditions, i.e., screw speed (N)

and barrel temperature profile (T1: T2 and T3), which may vary continu-


Algorithm 3 (MOEA for TSP):

Random initial population (internal)Empty external populationEvaluate internal populationwhile not Stop-Condition do

Calculate the Fitness of all the individuals using Algorithm 2Copy the N individuals to the external populationApply the inver-over operator to generate new N individualsEvaluate the new N individualsApply Algorithm 2 to the external populationCopy the best N individuals to the internal population

end while

ously within the range defined between square brackets in Fig. 5, that willmaximize the performance described by the six criteria presented in Table1. Thus, the global objective is to maximize mass output and degree ofmixing (WATS), while minimizing the length of screw required for melt-ing, melt temperature, power consumption and viscous dissipation, whichis obviously conflicting. The prescribed range of variation of each criterionis also stated in Table 1. The polymer properties (a commercial high den-sity polyethylene extrusion grade) and the extruder geometry (a LeistritzLSM 36, a laboratorial machine) are known7. The following GA parameterswere used: 50 generations, crossover rate of 0.8, mutation rate of 0.05, inter-nal and external populations having 100 individuals, limit of the clusteringalgorithm set at 0.2 and Nuanks equal to 30.

Table 8.34. Criteria for optimizing single screw operating conditions and corre-sponding range of variation.

Criteria Aim Range of

variation

Cl - Output (kg/hr) Maximize 1-20C2 - Length of screw required for melting (m) Minimize 0.2 - 0.9C3 - Melt temperature (°C) Minimize 150 - 210C4 - Power consumption (W) Minimize 0 - 9200C5 - WATS Maximize 0 - 1300C6 - Viscous dissipation - Tmax/Tb Minimize 0.5 - 1.5


Figure 7 shows some of the optimal Pareto plots obtained for the si-multaneous optimization of all the six criteria, both in the criteria's (Fig.7.A) and parameters to optimize domain (Fig. 7.B). As expected, in thissix-dimensional space distinction between dominated and non-dominatedsolutions is difficult, since points that appear to be dominated in one Paretofrontier are probably non-dominated in another, i.e., selecting a solution isnot easy. One alternative consists in quantifying the relative importanceof the criteria using a conventional quality function, such as the weightedsum, applied to the final population:

Fi = J2wifi (2)

Here, Fi is the fitness of individual i, q is the number of criteria, fjis the objective function of criterion j and Wj is the corresponding weight(0 < Wj < 1). The decision maker defines the weight of each criterion andapplies this function to the non-dominated solutions, thus finding the bestresult. Using output (Cl in Table 1) as a basis of comparison, Table 2 showsthe operating conditions proposed when its weight (wi) varies between 0.1and 0.5. As output becomes more relevant to the global performance, Nincreases due to their direct relationship. However, as illustrated in Fig. 1,the remaining criteria will be progressively less assured. The results of thismethodology have been validated experimentally7.

Table 8.35. Best operating conditions for single-screw extrusion.

Weights Operating Conditions

mi w2 to w5 N (rpm) Ti/T2/T3 (°C)

51 0.9/4 13! 207/155/150

0.2 0.8/4 23.0 185/183/153

0.3 0.7/4 23.0 185/183/153

0.4 0.6/4 48.5 161/199/195

0.5 0.5/4 48.5 161/199/195

Screw designAs identified in Fig. 5, the aim is to define the values of L1; L2, £>i, D3,

P and e that, for the same polymer and for fixed operating conditions (TV =50rpm and Ti = 170 °C), will again optimize the criteria identified in Table1. Since this involves, as above, a six-dimensional space in the criteria's or


Fig. 8.7. Optimal Pareto plots: A) Criteria's domain; B) Parameters to optimizedomain.

in the parameters to optimize domains, following the same procedure yieldsthe results shown in Table 3. As illustrated in Fig. 8, two quite differentscrew profiles are proposed (see Fig. 8), one when output is not relevant,the other when it is at least as important as the remaining criteria. Theformer has a high D3/D1 ratio and a shallow pumping section (£3), favoringmelting and mixing, but opposing high throughputs. Conversely, the secondscrew profile possesses a higher channel cross-section, inducing higher flows.

Table 8.36. Best screw geometries for single-screw extrusion.

Weights Screw geometry (mm)

w\ VL>2 to u>5 L\ L2 D\ D3 P e

0.1 0.9/4 U D 8^15 22T6 3T9 38^9 3.2

0.2 0.8/4 7.5D 7.1D 25.1 26.9 36.2 3.7

0.3 0.7/4 7.5D 7.1D 25.1 26.9 36.2 3.7

0.4 0.6/4 7.5D 7.ID 25.1 26.9 36.2 3.7

0.5 0.5/4 7.5D 7.ID 25.1 26.9 36.2 3.7


Fig. 8.8. Best screw profiles: A) tin =0.1; B) (0.2 < w\ < 0.5 (see Table 3).

In industrial practice screws must be flexible, i.e., they must exhibitgood performance for a range of materials and operating conditions. Thisrequirement may be included in the design routine by studying the sensi-tivity of designs proposed by the optimization algorithm to limited changesin relevant parameters, such as polymer rheology, operating conditions andeven the relative importance of the weights9. More specifically, assuming u>i— 0.2, the five best screws proposed by the optimization algorithm are thoseof Table 4. When these are subjected to a sensitivity analysis, the data ofFig. 9 is obtained, where the black bars represent the average global perfor-mance, and the white bars the respective standard deviation. Thus, screw1 can be chosen if global performance is of paramount importance; or screw2 may be selected when process stability has priority.

Table 8.37. Best screws considered for a sensitivity analysis (wi=0.2).

L\ Z/2 L3 D\ (mm) D3 (mm)

Screw 1 7.5D 7.1D 11.4D 26^9 3 6 !

Screw 2 6.3D 8.4D 11.3D 31.9 38.9

Screw 3 6.3D 8.4D 11.3D 31.9 39.4

Screw 4 6.3D 8.4D 11.4D 31.8 40.6

Screw 5 5.9D 8.4D 11.6D 30.8 32.3


0.7 T 1

c 0.6 - _ n ~\ ., •;

i ' |JB *"1B t i l l Q Operating conditionsa o.4 -• . - i - - T§H--- f-wB— S J B " " B Rheo/ogica/properties

Screw 1 Screw 2 Screw 3 Screw 4 Screw 5

Fig. 8.9. Global sensitivity to small changes in operating conditions, rheological prop-erties and criteria importance of the 5 best screws of Table 4.

8.4.2. Twin-Screw Extrusion

Operating conditionsAs shown in Fig. 5, this problem involves determining screw speed (N),

barrel temperature profile (2\, T2 and T3) and flow rate (Q). The detailedscrew geometry is given in Table 5, while Table 6 presents the criteria andtheir corresponding aim and range of variation. Since Q is imposed by avolumetric/gravimetric feeder but, simultaneously, it is convenient to maxi-mize it, it is taken both as parameter and optimization criterion. The RPS-GAe was applied using the following parameters: 50 generations, crossoverrate of 0.8, mutation rate of 0.05, internal and external populations with100 individuals, limits of the clustering algorithm set at 0.2 and Nnanks =30.

Table 8.38. Screw configuration: L - Length (mm); P - Pitch (mm).

1 2 3 4 5 6 7 8 9 10 11 12 13

~L 97.5 150 60 60 30 120 45 60 60 3TB 120 90 30

P 45 30 20 KB90 -30 30 KB-60 45 30 KB-30 60 30 20

Figure 10 shows the Pareto frontiers in the criteria's domain, plottedagainst output, while Table 7 presents the results obtained when the setof weights of Table 2 is used upon application of equation (1). As theimportance of Q increases, the best solutions (represented in Fig. 10 from

The Use of BAs to Solve Practical Problems in Polymer Extrusion 195

Table 8.39. Criteria for optimizing twin-screw operating conditions and corre-sponding range of variation.

Criteria Aim Range of variation

Cl - Output (kg/hr) Maximize 3 20

C2 - Average strain Maximize 1000 15000

C3 - Melt temp, at die exit (°C) Stay within range 180-210 220-240

C4 - Power consumption (W) Minimize 0 9200

C5 - Average residence time (s) Minimize 10 300

1 to 5) change radically. Therefore, the decision depends entirely on the(somewhat subjective) definition on the relative importance of the criteria.

Fig. 8.10. Pareto frontiers on the criterias domain after the optimization of the oper-ating conditions.

Screw configurationFinally, Algorithm 3 will be used to optimize screw configuration, i.e.,

to define the best location of 10 screw elements (comprising 5 transportelements, 4 kneading blocks and 1 reverse element), as illustrated in Fig.5. Two criteria, melt temperature and mechanical power consumption -


Table 8.40. Best operating conditions for twin-screw extrusion.

Weights Operating Conditions

Wi w2tow5 iV(rpm) Q* (kg/hr) Ti (°C) T2 (°C) T3 (°C)

0.1 0.9/4 184 3 200 167 194

0.2 0.8/4 184 3 200 167 194

0.3 0.7/4 193 25 205 172 205

0.4 0.6/4 193 25 205 172 205

0.5 0.5/4 193 25 205 172 205

0.6 0.4/4 193 25 205 172 205

which are particularly dependent on screw geometry - should be minimized.Output, screw speed and barrel temperature are kept constant at 10 kg/hr,100 rpm and 200 °C, respectively. The same genetic parameters were used,with the exception of the population size (200 external and 100 internalindividuals).

Figure 11 (top) shows the Pareto-curves in the criteria's domain forthe initial and final populations. The improvement provided by MOEA isrelevant. Since the two criteria are conflicting, solutions 1, 2 and 3, cor-responding to relative degrees of satisfaction of each criterion, are consid-ered, the corresponding screw profiles being represented in Fig. 11 (bottom).Screw 1 produces the highest power consumption, but the lowest outlet tem-perature. The kneading and reverse elements are located more upstream,therefore this screw is less restrictive downstream. Thus, the polymer meltsearlier (increasing energy consumption, as melt flow requires more powerthan solids flow) and the melt has time to recover from the early viscousdissipation (low melt temperature). The profile - and thus the behavior -of screw 3 is the opposite, while screw 3 exhibits a geometry that is a com-promise between the other two, although more similar to that of screw 1.These results are in general agreement with practical experience, althougha formal experimental validation needs to be carried out.

8.5. Conclusions

An elitist multi-objective genetic algorithm, denoted as RPSGAe, was usedto select the operating conditions and to design screws that optimize theperformance of single-screw and co-rotating twin-screw extrusion, which areimportant industrial processing technologies. These correspond to complexmulti-objective, combinatorial, not always continuous problems. The exam-ples studied demonstrated that MOEA is sensitive to the type and relative


Fig. 8.11. Twin-screw configuration results: Top - Pareto curve; Bottom - optimalscrews.

importance of the individual criteria, that the method proposed yields solu-tions with physical meaning and that it is possible to incorporate importantempirical knowledge through constraints/prescribed variation range of bothcriteria and process parameters.

Acknowledgments

This work was supported by the Portuguese Fundagao para a Ciencia eTecnologia under grant POCTI/34569/CTM/2000.

References

1. C. Rauwendaal, Polymer Extrusion (Hanser Publishers, Munich 1986).2. J.F. Agassant, P. Avenas and J. Sergent, La Mise en Forme des Matires

Plastiques (Lavoisier, 3rd edition, Paris, 1996).3. Z. Tadmor and I. Klein, Engineering Principles of Plasticating Extrusion

(Van Nostrand Reinhold, Ney York, 1970).


4. C.A. Coello Coello, D.A. Van Veldhuizen and G.B Lamont, EvolutionaryAlgorithms for Solving Multi-Objective Problems (Kluwer, 2002).

5. K. Deb, Multi-Objective Optimization using Evolutionary Algorithms (Wi-ley, 2001).

6. A. Gaspar-Cunha and J.A. Covas, RPSGAe - A Multiobjective Genetic Al-gorithm with Elitism: Application to Polymer Extrusion, in Metaheuristicsfor Multiobjective Optimisation, Lecture Notes in Economics and Mathe-matical Systems, Eds. X. Gandibleux, M. Sevaux, K. Sorensen, V. T'kindt(Springer, 2004).

7. A. Gaspar-Cunha, Modelling and Optimisation of Single Screw Extrusion(Ph. D. Thesis, University of Minho, Braga, 2000).

8. J.A. Covas, A. Gaspar-Cunha and P. Oliveira, An Optimization Approachto Practical Problems in Plasticating Single Screw Extrusion, Polym. Eng.and Sci. 39, 3, p. 443 (1999).

9. A. Gaspar-Cunha and J.A. Covas, The Design of Extrusion Screws: AnOptimisation Approach, Intern. Polym. Process. 16, p. 229 (2001).

10. J.L. White, Twin Screw Extrusion; Technology and Principles (Hanser, Mu-nich, 1990).

11. J.L. White, A.Y. Coran and A. Moet, Polymer Mixing; Technology andEngineering (Hanser, Munich, 2001).

12. H. Potente, J. Ansahl and R. Wittemeier, Throughput characteristics ofTightly Intermeshing Co-rotating Twin Screw Extruders, Intern. Polym.Proc. 5, p. 208 (1990).

13. H. Potente, J. Ansahl and B. Klarholtz, Design of Tightly Intermesing Co-Rotating Twin Screw Extruders, Intern. Polym. Proc. 9, p. 11 (1994).

14. B. Vergnes, G. Souveton, M.L. Delacour and A. Ainser, Experimental andTheorectical Study of Polymer Melting in a Co-rotating Twin Screw Ex-truder, Intern. Polym. Proc. 16, p. 351 (2001).

15. H. Potente and U. Melish, A Physico-Mathematical Model for Solids Con-veying in Co-rotating Twin Screw Extruders, Intern. Polym. Proc. 11, p.101 (1996).

16. S. Bawiskar and J.L. White, A Composite Model for Solid Conveying, Melt-ing, Pressure and Fill Factor Profiles in Modular Go-Rotating Twin ScrewExtruders, Intern. Polym. Proc. 12, p. 331 (1997).

17. B. Vergnes, G. Delia Valle and L. Delamare, A Global Computer Softwarefor Polymer Flows in Corotating Twin Screw Extruders, Polym. Eng. Sci.38, p. 1781 (1998).

18. O.S. Carneiro, J.A. Covas and B. Vergnes, Experimental and TheorecticalStudy of Twin-Screw Extrusion of Polypropylene, J. Appl. Polym. Sci. 78,p. 1419 (2000).

19. F. Fassihi-Tash and N. Sherkat, In-exact Decision Support for the Designof Plasticating Extruder Screws, Proceedings of Polymat'94, p. 434 (1994).

20. C.A. Thibodeau and P.G. Lafleur, Computer Design and Screw Optimiza-tion, Proceedings of the PPS Annual Meeting, Shangai, China, p. 15 (2000).

21. H. Potente, A. Mller and K. Kretschmer,iDevelopment and Verification of aMethod to Optimize Individual Screw Elements for Co-rotating Twin Screw


Extruders, Proceeding of ANTEC2003 conference, USA (2003).22. M.A. Roseman and J.S. Gero, Reducing the Pareto Optimal Set in Multi-

criteria Optimization, Eng. Optim. 8, p. 189 (1985).23. Y. Nagata and S. Kobayashi, Edge Assembly Crossover: A High-power GA

for the TSP, Seventh Int. Conf. on Genetic Algorithms, Michigan, USA, p.450 (1997).

24. C.L. Valenzuela and L.P. Williams, Improving Simple Heuristic Algorithmsfor the TSP using a GA, Seventh Int. Conf. on Genetic Algorithms, Michi-gan, USA, p. 458 (1997).

25. Y. Zhenyu, L. Zhang, K. Lishan and L. Guangming, A New MOEA forMulti-objective TSP and Its Convergence Property Analysis, Proceedings ofthe Second Int. Conf. On Evol. Multi-Objective Optimization (EMO'2003),Faro, Portugal, p. 342 (2003).

26. G. Tao and Z. Michalewicz, Inver-over Operator for the TSP, Proceedingsof the 5th Parallel Problem Solving from Nature, Amsterdam, p. 803 (1998).

CHAPTER 9

EVOLUTIONARY MULTI-OBJECTIVEOPTIMIZATION OF TRUSSES

Arturo Hernandez Aguirre and Salvador Botello RiondaCenter for Research in Mathematics (CIMAT)

Department of Computer ScienceA.P. 402, Guanajuato, Gto. C.P. 36000 MEXICO

E-mail: artha,[email protected]

In this chapter, we introduce the ISPAES evolutionary computation al-gorithm for truss optimization. The ISPAES algorithm needs little or nomodifications to solve single-objective or multi-objective problems witha large number of constraints, either in discrete or continuous searchspace. Thus, we present a detailed description of the ISPAES algorithm,and solve several truss optimization problems. Different modalities are il-lustrated, that is, continuous/discrete, single/multiple objective. Paretofronts in both continuous and discrete space are shown.

9.1. Introduction

Evolutionary algorithms (EAs) are search and optimization techniques in-spired by the natural selection principle. Thus, they are global optimizationtechniques naturally well suited for unconstrained problems. Since most reallife problems involve constraints, a considerable amount of research has re-cently been triggered to augment EAs with a constraint-handling technique.A common constraint handling technique in EAs is the use penalty func-tions. In this approach, the amount of constraint violation is used to pe-nalize infeasible individuals so that feasible individuals are favored by theselection process 25>28. Nonetheless, the use of multi-objective optimiza-tion concepts has proved more promissory for constraint handling. In thischapter we introduce the Inverted and Shrinkable Pareto Archived Evolu-tionary Strategy, ISPAES, which is an extension of the PAES algorithm17. ISPAES does not present the scalability problems that prevented its

201

202 A. Hernandez and S. Botello

antecessor from solving larger problems. The ISPAES algorithm 3-2 hassuccessfully solved the well-known Michalewicz's benchmark 19, which con-sists of a set of 13 single-objective optimization problems with constraints,in continuous search space. It is uncommon to find in the specialized lit-erature an evolutionary optimization algorithm that had been applied tosolve problems of so different nature as we present in this chapter. We showsolutions for single and multi-objective problems with constraints, in con-tinuous and discrete search space. For multi-objective problems we depictthe results as a Pareto front; for discrete optimization problems we use thecatalog of Altos Hornos de Mexico.

The organization of the chapter is the following:Section 9.2 presents the three most popular ways in which the Pareto

concept has been incorporated into EAs. The constraint handling approachis also explained. Section 9.3 presents a detailed description of the ISPAESalgorithm (for continuous search space), and the simple changes needed forsolving discrete search space problems. Section 9.4 describes engineering op-timization problems taken from the standard literature. Finally, Section 9.5draws our conclusions and provides some paths of future research.

9.2. Related Work

Since our approach belongs to the group of techniques in which multiob-jective optimization concepts are adopted to handle constraints, we willbriefly discuss some of the most relevant work done in this area. The mainidea of adopting multiobjective optimization concepts to handle constraintsis to redefine the single-objective optimization of f(x) as a multiobjectiveoptimization problem in which we will have m + 1 objectives, where m isthe total number of constraints. Then, we can apply any multiobjectiveoptimization technique n to the new vector v = (f(x), fi(x),..., fm(x)),where fi(x),..., fm(x) are the original constraints of the problem. An idealsolution x would thus have fi(x) > 0 for 1 < i < m and f(x) < f(y) for allfeasible y (assuming minimization).

Three are the mechanisms taken from evolutionary multiobjective op-timization that are more frequently incorporated into constraint-handlingtechniques 18:

(1) Use of Pareto dominance as a selection criterion. Examples of thistype of approach are given in 6>16>8.

(2) Use of Pareto ranking 14 to assign fitness in such a way that non-dominated individuals (i.e., feasible individuals in this case) are

Multi-Objective Optimization of Trusses 203

assigned a higher fitness value. Examples of this type of approach99 99 7

are given in zz>ZJ-\(3) Split the population in subpopulations that are evaluated either

with respect to the objective function or with respect to a singleconstraint of the problem. This is the selection mechanism adoptedin the Vector Evaluated Genetic Algorithm (VEGA) 26. Examplesof this type of approach are given in 29>20>9.

In order to sample the feasible region of the search space widely enoughto reach the global optimum it is necessary to maintain a balance betweenfeasible and infeasible solutions. If this diversity is not maintained, thesearch will focus only in one area of the feasible region. Thus, it will leadto a local optimum solution.

A multiobjective optimization technique aims to find a set of trade-offsolutions which are considered good in all the objectives to be optimized.In global nonlinear optimization, the main goal is to find the global opti-mum. Therefore, some changes must be done to those approaches in orderto adapt them to the new goal. Our main concern is that feasibility takesprecedence, in this case, over nondominance. Therefore, good "trade-off" so-lutions that are not feasible cannot be considered as good as bad "trade-off"solutions that are feasible. Furthermore, a mechanism to maintain diversitymust normally be added to any evolutionary multiobjective optimizationtechnique.

Tied to the constraint handling mechanism of ISPAES, we can find anenhanced selection operator. A desirable selection operator will provide ablend of feasible and infeasible individuals at any generation of the evo-lutionary process. Higher population diversity enhances exploration andprevents premature convergence. A robust evolutionary algorithm for con-strained optimization will provide a selection mechanism with two clearobjectives: to keep diversity, and to provide promissory individuals (ap-proaching the optimum). These goals are difficult to reach when the selec-tion mechanism is driven by "greedy rules" that fail to cooperate. A poorselection mechanism could minimize the effort of the diversity mechanismif only best-and-feasible individuals are favored. Similarly, a poor diversitypreservation mechanism could never provide interesting individuals to thePareto dominance-based selection operator as to create a promissory blendof individuals for the next generation.


9.3. ISPAES Algorithm

All of the approaches discussed in the previous section have drawbacksthat keep them from producing competitive results with respect to theconstraint-handling techniques that represent the state-of-the-art in evo-lutionary optimization. In a recent technical report 18, four of the exist-ing techniques based on multiobjective optimization concepts (i.e., CO-MOGA 29, VEGA 9, MOGA 8 and NPGA 7) have been compared usingMichalewicz's benchmark 19 and some additional engineering optimizationproblems. Although inconclusive, the results indicate that the use of Paretodominance as a selection criterion gives better results than Pareto rankingor the use of a population-based approach. However, in all cases, the ap-proaches analyzed are unable to reach the global optimum of problems witheither high dimensionality, large feasible regions or many nonlinear equalityconstraints 18.

In contrast, the approach proposed in this paper uses Pareto dominanceas the criterion selection, but unlike the previous work in the area, a sec-ondary population is used in this case. The approach, which is a relativelysimple extension of PAES 17 provides, however, very good results, whichare highly competitive with those generated with an approach that rep-resents the state-of-the-art in constrained evolutionary optimization. Thestructure of the ISPAES algorithm is shown in Figure 9.1. Notice the twoloops operating over the Pareto set (in the external storage). The right loopaims for exploration of the search space, the left loop aims for populationdiversity and exploitation.

ISPAES has been implemented as an extension of the Pareto ArchivedEvolution Strategy (PAES) proposed by Knowles and Corne 1T for multiob-jective optimization. PAES's main feature is the use of an adaptive grid onwhich objective function space is located using a coordinate system. Sucha grid is the diversity maintenance mechanism of PAES and it constitutesthe main feature of this algorithm. The grid is created by bisecting k timesthe function space of dimension d (d is the number of objective functionsof the problem. In our case, d is given by the total number of constraintsplus one. In other words, d = n + p+1, where n is the number of inequalityconstraints, and p is the number of equality constraints. Note that we addone to this summation to include the original objective function of the prob-lem) . The control of 2kd grid cells means the allocation of a large amount ofphysical memory for even small problems. For instance, 10 functions and 5bisections of the space produce 250 cells. Thus, the first feature introduced


ISPAES ALGORITHM

INITIAL POPULATIONPICK PARENT FROM

LESS CROWDED AREA

r

<SELECT> I *

[ MUTATION

<GETMINMAX> ^ ^ ,;

I PARETO CHILDT S E T ' DOMINATES

<TRIM> | I I N ° 1 P A R E N T ?

\ i T ~ ^ Yes

<ADJUSTPARAMETERS> IADD CHILD BY USING

<RELOCATE INDIVIDUALS PROCEDURE <TEST>ON NEW GRID>

NEW PARETO SET

Fig. 9.1. The logical structure of ISPAES algorithm

in ISPAES is the "inverted" part of the algorithm that deals with this spaceusage problem. ISPAES's fitness function is mainly driven by a feasibilitycriterion. Global information carried by the individuals surrounding thefeasible region is used to concentrate the search effort on smaller areas asthe evolutionary process takes place. In consequence, the search space beingexplored is "shrunk" over time. Eventually, upon termination, the size ofthe search space being inspected will be very small and will contain the so-lution desired (in the case of single-objective problems. For multi-objectiveproblems, it will contain the feasible region).

The main algorithm of ISPAES is shown in Figure 9.2. Its goal is theconstruction of the Pareto front which is stored in an external memory(called file). The algorithm performs Maxnew loops, generating a child hfrom a random parent c in every loop. Therefore, the ISPAES algorithmintroduced here is based on a (1 + 1) — ES. If the child is better than the


parent, that is, the child dominates its parent, then it is inserted in file,and its position is recorded. A child is generated by introducing randommutations to the parent, thus, h = mutate(c) will alter a parent withincrements whose standard deviation is governed by Equation 1.

maxsize: maximum size of filec: current parent £ X (decision variable space)h: child of c € X, a^- individual in file that dominates had,: individual in file dominated by hcurrent: current number of individuals in filecnew: number of individuals generated thus farg: pick a new parent from less densely populated

region every g new individualsr: shrink space at every r new individualscurrent = 1; cnew=0;c = newindividual();add(c);While cnew<MaxNew do

h = mutate(c); cnew+ =1;if (<Hh) then Label Aelse if (hXc) then { remove(c); add(h); c=h; }else if (3 a^ € file | a < h) then Label Aelse if (3 &d 6 file | h •< a<j) then {

add( h ); V ad { remove(ad); current- =1 }else test(h,c,file)

Label Aif (cnew % g==0) then c = individual inless densely populated regionif (cnew % r==0) then shrinkspace(file)

End While

Fig. 9.2. Main algorithm of ISPAES

Most of main and the function test (h,c,file) in ISPAES are devotedto three things: (1) decide whether a new child should be inserted in file,and if so, (2) how to make room for the new member and (3) who becomesthe new parent. Every g new children created, a new parent is randomly


picked from file for this purpose. Also, every r children generated, the spaceis shrunk around the current Pareto front represented by the individualsof the external memory. Here we introduce the following notation: X\dx2means x\ is located in a less populated region of the grid than x2- Thepseudo-code of this function is depicted in Figure 9.3.

if (current < maxsize) then {add(h);if (h • c) then c=h }

else if (3ap6file | h • ap) then {remove(ap); add(h)if (h D c) then c = h; }

Fig. 9.3. Pseudo-code of test(h,c,file) (called by main of IS-PAES)

9.3.1. Inverted "ownership"

As noted before, ISPAES keeps the location of every individual in the grid,whereas PAES keeps occupancy of every grid location. The advantage ofthe inverted relationship is clear since in the worst scenario there would beas many grid locations to record as population size (thus one individual pergrid location).

9.3.2. Shrinking the Objective Space

Shrinkspace(file) is the most important function of ISPAES since its taskis the reduction of the search space. The space is reduced every r number ofgenerations. The pseudo-code of Shrinkspace(file) is shown in Figure 9.4.

In the following we describe the four tasks performed by shrinkspace.

• The function select (file) returns a list whose elements are thebest individuals found in file. The size of the list is set to 15% ofmaxsize. Thus, the goal of select (file) is to create a list with: a)only the best feasible individuals, b) a combination of feasible andpartially feasible individuals, or c) the "most promising" infeasibleindividuals. The selection algorithm is shown in Figure 9.5. Notethat validconstraints (a list of indexes to the problem constraints)


XpOb- vector containing the smallest value of either i j £ lXpob' vector containing the largest value of either Xi G Xselect (file);getMinMax( file, xpob, xpob);trim(xp0(), xpob );adjustparameters(file);

Fig. 9.4. Pseudo-code of Shrinkspace(flle) (called by main of IS-PAES)

indicates the order in which constraints are tested. The loop stepsover the constraints removing only one (the worst) individual foreach constraint till there is none to delete (all feasible), or 15% offile size is reached (in other words, 85% of the Pareto set will begenerated anew using the best 15% individuals as parents). Also,in order to keep diversity, a new parent is randomly chosen fromthe less populated region of the grid after placing on it g newindividuals.

• The function getMinMax(file) takes the list list (last step in Fig-ure 9.5) and finds the extreme values of the decision variables rep-resented by those individuals. Thus, the vectors x_pob and xpob arefound.

• Function trim(xp06, xpob) shrinks the feasible space around the po-tential solutions enclosed in the hypervolume defined by the vectorsx_pOb a n d ~x~pob- Thus, the function trim(cCp0;,, xpob) (see Figure 9.6)determines the new boundaries for the decision variables.The value of /? is the percentage by which the boundary values ofeither Xi £ X must be reduced such that the resulting hypervol-ume H is a fraction a of its previous value. The function trim firstfinds in the population the boundary values of each decision vari-able: ~xPob,i and xpob t. Then the new vectors x~i and x_t are updatedby deltaMim, which is the reduction in each variable that in theoverall reflects a change in the volume by a factor /?. In ISPAESall objective variables are reduced at the same rate /?, therefore, /?can be deduced from a as discussed next. Since we need the newhypervolume to be a fraction a of the previous one,

Hnew > a-ôld


m: number of constraintsi: constraint indexmaxsize: max size of filelistsize: 50% of maxsizeconstraintvalue(x,i): value of individual at constraint isortfile(flle): sort file by objective functionworst(file,i): worst individual in file for constraint ivalidconstraints={l,2,3,...,m};i=firstin(validconstraints);While (size(file) > listsize and size(validconstraints) > 0) {

x=worst(file,i)if (x violates constraint i)

file=delete(file,x)else validconstraints=removeindex(validconstraints,i)

if (size(validconstraints) > 0) i=nextin(validconstraints)}if (size(file) == listsize))

list=fileelse

ffle=sort(file)list=copy(file,listsize) *pick the best listsize elements*

Fig. 9.5. Pseudo-code of select(file) (called by shrinkspace)

n(5j+i-sj+i)=°n^-«!)i = l » = 1

Either xi is reduced at the same rate j3, thus

i= l i= l

i=l i

Pn = a

In short, the new search interval of each decision variable Xi is


n: size of decision vector;afj: actual upper bound of the ith decision variablext: actual lower bound of the ith decision variable^pob,i' upper bound of ith decision variable in populationxpob f lower bound of ith decision variable in populationVi : i e { 1, .. •, n }

slacki = 0.05 x (xpob,i - xpobi)width-pobi = xpob,i - xpobi; width* = x\ — x\

deltaMirii = "- — i

deltas = max(slackj-, deltaMin;);x\+1 = xpob,i + deltas x\+l = x_pobi - deltas

if (z-+ 1 > x0Tiginai,i) thenT * + 1 _ - T ' + 1 - T • • , - r t + 1 - r • , ••2L{ — -î ôriginal.,i-> x j — •"original^)

if ( ^ + 1 < xoriginalti) then x\+1+ = x<,riginal,i ~ £*+1;T t+ i - Ti-i —original,i>

i f ('7r f+1 > T • • i A f h p n T7*"1"1 — "r • • i ••l i ^x >* xoriginal,z) i*nt;ii x^ — d>original,it

Fig. 9.6. Pseudo-code of trim (called by shrinkspace)

adjusted as follows (the complete algorithm is shown in Figure 9.4):

widthnew > f3 x widthoid

It should be noted that the value of a has an important impacton the performance of ISPAES because it controls the shrinkingspeed. In order to determine a range within which we could setthis parameter for a large variety of problems, we studied the effectof a on the performance of our algorithm for many test problems.From analyzing this effect, we found that in all cases, a range ofa between 85% and 97% was always able to generate the bestpossible solutions to each problem. Values smaller than 0.80 makethe algorithm prone to converge to local minima. Values of a toonear to 100% slow down convergence, although they increase theprobability of success. In order to avoid a fine tuning of a dependentof each test function, we decided to set its value to 0.90, which weconsidered as a good compromise based on our analysis. As wewill see later on, this value of a provided good results in all theproblems solved.

Multi-Objective Optimization of TYusses 211

Note that also the parameter r (see Figure 9.2), which controlsthe shrinkspace rate, plays an important role in the algorithm.To set the value of r, we performed a similar analysis to the onepreviously described for a. In this analysis, we related the behaviorof r with that of a and with the performance of ISPAES. Our resultsindicated that a value of r = 2 * maxsize provided convergence tothe optimum in most of the problems (maxsize is the number ofelements allowed to the Pareto set, stored in the external file).Thus, we used r = 200, and maxsize = 100 in all the experimentsreported in this paper.The variable slack is computed once every new search interval isdetermined (usually set to 5% of the interval). The role of slack issimply to prevent (up to some extent) against fast decreasing ratesof the search interval.

• The last step of shrinkspace() is a call to adjustparame-ters(flle). The goal is to re-start the control variable a through:

(7i = {xi-Zi)/Vn ie(l,...,n) (1)

This expression is also used during the generation of the initial pop-ulation. In that case, the upper and lower bounds take the initialvalues of the search space indicated by the problem. The varia-tion of the mutation probability follows the exponential behaviorsuggested by Back 4.

ElitismA special form of elitism is implemented in IS-PAES to prevent the lost ofthe best individual. Elitism is implemented as follows: the best individualof the generation is marked and only replaced by another one if it is in thefeasible region and with better objective function value.

ISPAES for Optimizing problems in Discrete Search SpaceSimple modifications are required for discrete optimization problems. Theinitial value of all objective variables is a random integer drawn from anuniform distribution, and bounded by the upper and lower limits staten bythe specific problem.Mutation of objective variables is performed as follows,

xl+1 = x\+ rand(<Ji)

where cr, is the control variable of the corresponding objective variable,and rand(ai) is a random number with uniform distribution in the interval


Control variables a% are mutated as follows,

if(random() < 0.45) then a — a + 1; else a = a — 1;

this is, with little less probability than the average of 0.5, the control vari-ables diminish their value by 1.The reduction of the search space is performed as shown in Figure 9.6 forthe real space case, except that all results of the computations must berounded up to the next integer. The variable slack is also computed as de-picted in Figure 9.6, it must also be rounded up, and its smallest possiblevalue is 1.

9.4. Optimization Examples

The parameters used by the ISPAES algorithm for solving all the followingproblems were:

• maxsize = 100, the size of the Pareto set• r = 200, the reduction rate. Perform "shrinkspace" each time that

200 children have been generated.• listsize = 50%of max size. Thus, when infeasible individuals are

removed from the Pareto set, keep at least 50% of the original size.• Q = 0.9. Thus, the hypervolume preserved after a shrunk is at least

90%• A total of 50,000 fitness function evaluations are performed.

9.4.1. Optimization of a 49-bar Plane Truss

The first engineering optimization problem chosen is the optimization ofthe 49-bar plane truss shown in Figure 9.7. The solutions to this problemwere computed in discrete search space using the catalog of Altos Hornos deMexico. Both single-objective and multi-objective versions of the problemare described next.

9.4.1.1. The 49-bar Plane Truss as a Single-Objective OptimizationProblem with Constraints

The goal is to find the cross-sectional area of each member of the truss, suchthat the overall weight is minimized, subject to stress and displacementconstraints. The weight of the truss is given by F(x) = X)j=i 7A?AJ> where

Multi- Objective Optimization of Trusses 213

Aj is the cross-sectional area of the j t h member, Lj is the correspondinglength of the bar, and 7 is the volumetric density of the material.

Fig. 9.7. 49-bar plane truss used as the first engineering optimization example

We used the catalog of Altos Homos de Mexico, S.A., with 65 entries forthe cross-sectional areas available for the design. Other relevant informationis the following: Young modulus = 2.1 x 106 kg/cm2, maximum allowablestress = 3500.00 kg/cm2, 7 = 7.4250 x 10~3 kg/cm3, and a horizontal loadof 4994.00 kg applied to the nodes: 3, 5, 7, 9, 12, 14, 16, 19, 21, 23, 25 y27. We solved this problem for three cases:

(1) Case 1. Stress constraints only: Maximum allowable stress =3500.00 kg/cm2. A total of 49 constraints, thus 50 objective func-tions.

(2) Case 2. Stress and displacement constraints: Maximum al-lowable stress = 3500.00 kg/cm2, maximum displacement per node= 10 cm. There are 72 constraints, thus 73 objective functions.

(3) Case 3. Real-world problem: The design problem considerstraction and compression stress on the bars, as well as their properweight. Maximum allowable stress = 3500.00 kg/cm2, maximumdisplacement per node =10 cm. A total of 72 constraints, thus, 73


objective functions.

The average result of 30 runs for each case are shown in Tables 9.41, 9.42and 9.43. We compare ISPAES with previous results reported by Botelloet al. 5 using other heuristics with a penalty function 25 (SA: SimulatedAnnealing, GA50: Genetic Algorithm with a population of 50, and GSSA:General Stochastic Search Algorithm with populations of 50 and 5).

Table 9.41. Comparison of differentalgorithms on the 49-bar truss, case 1

| Algorithm | Average Weight (Kg) |

ISPAES I 610SA 627

GA50 649GSSA50 619GSSA5 625


\ Algorithm | Average Weight (Kg) |

ISPAES 725SA 737

GA50 817GSSA50 ~ 748GSSA5 | 769


Algorithm [ Average Weight (Kg) |

ISPAES I 2603SA 2724

GA50 2784GSSA50 ~ 2570GSSA5 2716

We can clearly see that in all the cases tried, ISPAES produced thelowest average weight.


9.4.1.2. The 49-bar Plane Truss as a Multi-Objective OptimizationProblem with Constraints

The statement of this problem is similar to case 3 in Section 9.4.1.1, but nowwe consider two objective functions for simultaneous optimization. Firstobjective is the minimization of structure weight, the second objective isthe minimization of the horizontal displacement of the node at upper rightcorner of the structure. The Pareto front of these two objectives subject to71 constraints is shown in Figure 9.8.

Fig. 9.8. Pareto front for 49-bar plane truss for two objective optimization problem(See Section 9.4.1.2)

9.4.2. Optimization of a 10-bar Plane Truss

The second engineering optimization problem chosen is the optimization ofthe 10-bar plane truss shown in Figure 9.9. This problem has been solvedby several authors in real search space, thus, we solved it in real spacefor both single-objective and multi-objective optimization for the sake ofcomparisons.


9.4.2.1. The 10-bar Plane Truss as a Single-Objective OptimizationProblem with Constraints

The goal is to find the cross-sectional area of each bar of this truss suchthat its weight is minimized, subject to stress and displacement constraints.The weight of the truss is given by:

10

F(x) = Y/lAjLj (2).7 = 1

where: a; is a candidate solution, Aj is the cross-sectional area of thejth member, Lj is the length of member j and 7 is the volumetric weightof the material.

The maximum allowable displacement for each node (vertical and hori-zontal) is assumed as 5.08 cm. There are 10 stress constraints and 8 displace-ment constraints in total. The minimum and maximum allowable value forthe cross-sectional areas are 0.5062 cm2 and 999.0 cm2, respectively. Theremaining assumed data are: Young's modulus E — 7.3xl05 kg/cm2, max-imum allowable stress = 1742.11 kg/cm2, 7 = 7.4239xl0~3 kg/cm3, and avertical load of -45454.0 kg applied at nodes 2 and 4.

Table 9.44 shows the minimum value found for this problem by dif-ferent heuristic algorithms 5: GSSA (general stochastic search algorithmwith a population size of five, crossover rate of zero, and mutation rate0. . . 10/(number_of_bars), and simulated annealing with a = 1.001), VGA(variable-length genetic algorithm of Rajeev and Krishnamoorthy 21, withpopulation size of 50), MC (Monte-Carlo annealing algorithm of Elperin12), SAARSR (Simulated Annealing with Automatic Reduction of SearchRange, proposed by Tzan and Pantelides 30), ISA (Iterated Simulated An-nealing, of Ackley 1, and SSO (State Space Optimal 15).

We can see in Table 9.44 that ISPAES found better results than any ofthe other methods. Note that MC found a solution with a lower weight thanISPAES, but such a solution violates stress and displacement constraints,as can be seen in Tables 9.45 and 9.46.

The convergence of the algorithm is shown in Figure 9.10 (for any ran-dom run). Note that the algorithm reaches the neighborhood of the opti-mum at 25000 fitness function evaluations.


Fig. 9.9. 10-bar plane truss used as the second engineering optimization example

9.4.2.2. The 10-bar Plane Truss as a Multi-Objective OptimizationProblem with Constraints

Now we approach the 10-bar truss design as a multiobjective optimizationproblem with constraints. The first objective is the structure weight min-imization, and the second is the vertical displacement of node number 2.The Pareto front of these functions is shown in Figure 9.11.

9.4.3. Optimization of a 72-bar 3D Structure

The next problem is the design of the 72-bar 3D structure shown in Figure9.12, which has been addressed elsewhere in the literature 10.


Table 9.44. Comparison of weights for the 10-bar plane truss of the second engineeringexample

Element I ISPAES I GSSA I VGA I MC I SSO I ISA I SAARSRJ . 190.53 205.17 206.46 200.01 193.75 269.48 201.35_2 0.6466 ~0.6452 0.6452 0.6452~ 0.6452~ 79.810 ~ 0.6452_3 146.33 134.20 151.62 129.04 150.15 178.45 161.55_4 95.07 90.973 103.23 90.328 98.62 152.90 95.68_5 0.6452 0.6452 0.6452 0.6452 0.6452 70.390 0.6452_6 3.0166 0.6452 0.6452 0.6452 3.23 10.260 4.19_7 47.677 55.487 54.84 51.616 48.18 147.87 49.16_8 129.826 127.75 129.04 145.17 136.64 14.710 131.55_9 133.282 133.56 132.27 96.78 139.47 156.06 134.32

10 0.6452 0.6452 0.6452~ 0.6452 0.6452 87.740 0.6452Vol. (cm3) 801624.5 805777 833258 765710 828956 1313131 833258Weight (kg) 5951 6186 6186 5685 ~ 6155 9750 ' 6187

Table 9.45. Comparison of stresses for the 10-bar plane truss of the second engineering example.We indicate in boldface the elements in which the stress constraints are being violated

Element I IS-PAES I GSSA I VGA I MC I SSO I ISA I SAARSR~_1 483.27 -447.65 -444.75 -460.10 -475.31 -209.75 -476.582 " -73.37 ~ 0.41 ~ 3.41 ' -15.30 91.98 ~ -111.35 ' 43.99

_3 -613.26 670.31 593.43 ' 695.72 597.46 449.90 569.04

_4 -478.62 499.60 440.30 503.06 461.46 239.13 485.80_5 1741.30 -1464.09 -1428.68 " -1757.16 -1754.88 362.13 " -1641.04

_6 -15.72 0.41 3.41 -15.30 18.37 -866.13 14.83_7 1313.54 -1134.31 -1148.24 -1214.48 -1299.10 ~~-763.45 -1311.60

_8 -507.89 513.60 508.24 453.71 482.74 1064.60 528.83_9 482.80 -481.25 -485.97 -664.00 -461.46 -331.34 -492.79_L0 103.985 ~ 0.58 -4.82 21.64 -130.07 ~143.23 | -65.61

Table 9.46. Comparison of displacements for the 10-bar plane truss of the second engineer-ing example. We indicate in boldface the elements in which the displacement constraintsare being violated

Element I IS-PAES I GSSA I VGA I MC I SSO I ISA I SAARSR1 0.5134 0.5602 0.5528 0.5954 0.4802 0.4022 0.5419

2 -5.080 ~ -5.0798 -4.9040 ' -5.4352 -4.9056" -3.8008 ' 5.08893 -1.368 -1.4654~~ -1.2948" -1.5016 " -1.3264 '-0.8631 -1.3213

_4 -5.060 -5.0792 -4.8997 5.4543 -4.8826 -4.8857 -5.07465 0.6053 0.5607~ 0.5571 0.5763 " 0.5954 ' 0.2627 ~ 0.5970

Jj -1.878 -1.8474 -1.8303 -1.7130 -1.8047 -2.9298 -1.93037 ~ ~ -0.768 -0.8396~ -0.7433 ~ -0.8715 ' -0.7484 -0.5636 ~ -0.7129

_8 -4.059 -3.6813 -3.6199 -3.9140 -4.0030 | -2.4762 -3.9901

The truss is subject to two distinct loading conditions and sixteen inde-pendent design variables. All nodes are subject to displacement constraint


Fig. 9.10. Typical convergence of ISPAES for 10-bar truss problem as a single objectiveoptimization (Section 9.4.2.1)

A < 0.25 inches in x and y direction. All bars have a stress constraint-1759.25 kg/cm2 < {aa)i < 1759.25 kg/cm2, i = 1,2.. . 72. The mini-mum size constraint is 0.254 cm2 < At, i — 1,2.. . 72. The properties of thematerial are: modulus of elasticity: 7.031 x 106kg/cm2, volumetric weight:2.77 x 10~3 kg/cm3. The first loading condition has a point load in node 1with 2270 kg in x direction, 2270 kg in y direction and —2270 kg in z direc-tion. The second loading condition has four load points in nodes 1,2,3 and4, with —2270 kg in z direction. The problem consists on designing the trussfor both loading conditions. In Table 9.47 we give the group description ofthe truss.

We solved this problem as a single-objective optimization case in bothcontinuous and discrete search spaces.

9.4.3.1. The 72-bar 3D Structure in Continuous Search Space as aSingle-Objective Optimization Problem with Constraints

As noted, the design problem is the minimization of weight structure subjectto both loading conditions, the We compare ISPAES against several results


Fig. 9.11. Pareto front for 10-bar truss optimization as a multiobjective optimizationproblem

Table 9.47. 72-bar 3D cross sec-tions by group

Group Number Member1 " A1-A42 A5-A123 A13-A164 A17-A185 A19-A226 A23-A307 A31-A348 A35-A369 A37-A4010 A4i-A4811 A49-A5212 A53-A5413 A55-A5814 AS9-A6615 A67-A7016 A71-A72


Fig. 9.12. Optimization of 72-bar 3D structure

of other authors in Table 9.48; as it can be observed IS-PAES provides thebest solution. In Table 9.49 we show basic statistics for 30 runs.

Table 9.48. ISPAES vs results of several au-thors for 72-bar 3D structure in continuoussearch space

| Algorithm | Best Minimun Weight (Kg) |

IS-PAES I 172.02Venkayya 31 173.06Gellatly l 5 ~ 179.77Renwei 24 172.36Schmit27 176.44

Xicheng 32 172.90GAPS 5 | 173.94


Table 9.49. ISPAES statis-tics for 72-bar 3D structure incontinuous search space

| Parameter [ Weight (Kg) |

" Best " 172.02Worst 172.09Mean 172.05Std. dev. 0.015Median 172.04Fact. Sol. 30

9.4.3.2. The 72-bar 3D Structure in Discrete Search Space as aSingle-Objective Optimization Problem with Constraints

We solved three cases of this problem using the catalog of Altos Hornosde Mexico, S.A. with 65 entries for the cross-sectional areas: 1) stress con-straints only; 2) stress and displacement constraints; 3) displacement con-straints, and considers bar traction and compression stress, as well as theirproper weight. The values of material properties and constraints remainwith no change for all three cases. Statistics of the best solutions in 30 runsfor the 3 cases are shown in Table 9.50

Table 9.50. ISPAES solutions to 72-bar 3D structure us-ing a catalog with 65 entries (discrete search space)

["Parameter | Casel (Kg) | Case2 (Kg) | Case3 (Kg)

Best I 92.3295 I 192.7194 I 630.400~Worst ~ 92.3295 " 193.4353 640.3640

Mean 92.3295 " 192.9098 633.2354Std. dev. CUD 0.3060 2.7371Median 92.3295 192.7194 632.9665Fact. Sol. 30 30 30

9.5. Final Remarks and Future Work

We have introduced the ISPAES evolutionary algorithm that combines thefollowing three ideas: 1) a constraint handling mechanism based on multi-objective optimization concepts; 2) a Pareto dominance-based selection op-erator which promotes diversity and a desired blend including promissoryand "best infeasible" individuals; 3) a search reduction population drivenmechanism, thus, self adaptable, that directs the search towards potentialareas of the space; and 4) an external memory to store the latest Pareto set.


The algorithm in its basic form is used to solve single and multi-objectiveproblems, in discrete and continuous search space. Pareto fronts in thesespaces have been computed during the experiments. This double capabil-ity of solving both single- and multi-objective optimization problems is notcommon for an evolutionary algorithm.

ISPAES requires to make decisions over four parameters (described inSection 9.4): the size of the Pareto set; the shrinkspace rate; the size of thenew hypervolume after reduction; and the percentage of remanent "bestinfeasible" individuals in the Pareto set. These parameters are not hard toset, they could be as easy or difficult to set as any parameter setting ofthe standard Genetic Algorithm. After some trials the proper combinationscomes easily since they are related in a logical manner. Any approximateset of parameters works nicely for the algorithm, so robustness towards sev-eral kinds of problems is one advantage of the ISPAES algorithm. Scalingto large problems is of course required and it has been the major weak-ness of evolutionary algorithms. Nonetheless, we showed here how ISPAES,using both the real representation inherent to the evolution strategies andmulti-objective optimization concepts, is able to handle a large number ofconstraints.

Future work for this algorithm is the development of a multiparent ap-proach which should improve diversity and exploration.

Acknowledgments

The authors recognize support for this work from CONACyT project No.40721-Y, Mexico.

References

1. D. Ackley. An empirical study of bit vector function optimization. InLawrence Davis, editor, Genetic Algorithms and Simulated Annealing, pages170-271. Morgan Kaufmann Publishers, Los Altos, California, 1987.

2. Arturo Hernandez Aguirre, S. Botello, C. Coello, and G. Lizarraga. Useof Multiobjective Optimization Concepts to Handle Constraints in Single-Objective Optimization. In Proceedings of the Genetic and EvolutionaryComputation Conference (GECCO 2003), pages 573-584, Berlin, Germany,July 2003. Springer-Verlag, Lecture Notes in Computer Science No. 2723.

3. Arturo Hernandez Aguirre, S. Botello, G. Lizarraga, and C. Coello. ISPAES:A Constraint-Handling Technique Based on Multiobjective OptimizationConcepts. In Proceedings of the 2nd International Conference, EvolutionaryMulti- Criterion Optimization (EMO2003), pages 73-87, Berlin, Germany,April 2003. Springer-Verlag, Lecture Notes in Computer Science No. 2632.


4. Thomas Back. Evolutionary Algorithms in Theory and Practice. Oxford Uni-versity Press, New York, 1996.

5. Salvador Botello, Jose Luis Marroqum, Eugenio Oiiate, and Johan Van Hore-beek. Solving Structural Optimization problems with Genetic Algorithms andSimulated Annealing. International Journal for Numerical Methods in Engi-neering, 45(8):1069-1084, July 1999.

6. Eduardo Camponogara and Sarosh N. Talukdar. A Genetic Algorithm forConstrained and Multiobjective Optimization. In Jarmo T. Alander, edi-tor, 3rd Nordic Workshop on Genetic Algorithms and Their Applications(3NWGA), pages 49-62, Vaasa, Finland, August 1997. University of Vaasa.

7. Carlos A. Coello Coello and Efren Mezura-Montes. Handling Constraints inGenetic Algorithms Using Dominance-Based Tournaments. In I.C. Parmee,editor, Proceedings of the Fifth International Conference on Adaptive Com-puting Design and Manufacture (ACDM 2002), volume 5, pages 273-284,University of Exeter, Devon, UK, April 2002. Springer-Verlag.

8. Carlos A. Coello Coello. Constraint-handling using an evolutionary multi-objective optimization technique. Civil Engineering and Environmental Sys-tems, 17:319-346, 2000.

9. Carlos A. Coello Coello. Treating Constraints as Objectives for Single-Objective Evolutionary Optimization. Engineering Optimization, 32(3):275-308, 2000.

10. Carlos A. Coello Coello and Alan D. Christiansen. Multiobjective opti-mization of trusses using genetic algorithms. Computers and Structures,75(6):647-660, May 2000.

11. Carlos A. Coello Coello, David A. Van Veldhuizen, and Gary B. Lamont.Evolutionary Algorithms for Solving Multi-Objective Problems. Kluwer Aca-demic Publishers, New York, May 2002. ISBN 0-3064-6762-3.

12. T. Elperin. Monte-carlo structural optimization in discrete variables withannealing algorithm. International Journal for Numerical Methods in Engi-neering, 26:815-821, 1988.

13. R. A. Gellatly and L. Berke. Optimal structural design. Technical ReportAFFDL-TR-70-165, Air Force Flight Dynamics Laboratory, 1971.

14. David E. Goldberg. Genetic Algorithms in Search, Optimization and MachineLearning. Addison-Wesley Publishing Company, Reading, Massachusetts,1989.

15. Edward J. Haug and Jasbir S. Arora. Applied Optimal Design: Mechanicaland Structural Systems. Wiley, New York, 1979.

16. F. Jimenez, A.F. Gomez-Skarmeta, and G. Sanchez. How Evolutionary Multi-objective Optimization can be used for Goals and Priorities based Optimiza-tion. In E. Alba, F. Fernandez, J.A. Gomez, F. Herrera, J.I. Hidalgo, J. Lan-chares, J.J. Merelo, and J.M. Sanchez, editors, Primer Congreso Espanolde Algoritmos Evolutivos y Bioinspirados (AEB'02), pages 460-465, MeridaEspana, 2002. Universidad de la Extremadura, Espana.

17. Joshua D. Knowles and David W. Corne. Approximating the NondominatedFront Using the Pareto Archived Evolution Strategy. Evolutionary Compu-tation, 8(2):149-172, 2000.


18. Efren Mezura-Montes and Carlos A. Coello Coello. A Numerical Compari-son of some Multiobjective-based Techniques to Handle Constraints in Ge-netic Algorithms. Technical Report EVOCINV-03-2002, Evolutionary Com-putation Group at CINVESTAV-IPN, Mexico, D.F. 07300, September 2002.available at: http://www.cs.cinvestav.mx/~EVOCINV/.

19. Zbigniew Michalewicz and Marc Schoenauer. Evolutionary Algorithms forConstrained Parameter Optimization Problems. Evolutionary Computation,4(l):l-32, 1996.

20. I. C. Parmee and G. Purchase. The development of a directed genetic searchtechnique for heavily constrained design spaces. In I. C. Parmee, editor, Adap-tive Computing in Engineering Design and Control-'94, pages 97-102, Ply-mouth, UK, 1994. University of Plymouth, University of Plymouth.

21. S. Rajeev and C.S. Krishnamoorthy. Genetic Algorithms-Based Methodolo-gies for Design Optimization of Trusses. Journal of Structural Engineering,123(3):350-358, 1997.

22. Tapabrata Ray, Tai Kang, and Seow Kian Chye. An Evolutionary Algorithmfor Constrained Optimization. In Darrell Whitley et al., editor, Proceedingsof the Genetic and Evolutionary Computation Conference (GECCO '2000),pages 771-777, San Francisco, California, 2000. Morgan Kaufmann.

23. Tapabrata Ray and K.M. Liew. A Swarm Metaphor for Multiobjective DesignOptimization. Engineering Optimization, 34(2):141-153, March 2002.

24. X. Renwei and L. Peng. Structural optimization based on second order ap-proximations of functions and dual theory. Computer Methods in AppliedMechanics and Engineering, 65:101-104, 1987.

25. Jon T. Richardson, Mark R. Palmer, Gunar Liepins, and Mike Hilliard. SomeGuidelines for Genetic Algorithms with Penalty Functions. In J. David Schaf-fer, editor, Proceedings of the Third International Conference on GeneticAlgorithms (ICGA-89), pages 191-197, San Mateo, California, June 1989.George Mason University, Morgan Kaufmann Publishers.

26. J. David Schaffer. Multiple Objective Optimization with Vector EvaluatedGenetic Algorithms. In Genetic Algorithms and their Applications: Proceed-ings of the First International Conference on Genetic Algorithms, pages 93-100. Lawrence Erlbaum, 1985.

27. L.A. Schmidt and B. Farshi. Some Approximation Concepts for StructuralSynthesis. Journal of the American Institute of Aeronautics and Astronautics,12:231-233, 1974.

28. Alice E. Smith and David W. Coit. Constraint Handling Techniques—Penalty Functions. In Thomas Back, David B. Fogel, and ZbigniewMichalewicz, editors, Handbook of Evolutionary Computation, chapter C 5.2.Oxford University Press and Institute of Physics Publishing, 1997.

29. Patrick D. Surry and Nicholas J. Radcliffe. The COMOGA Method: Con-strained Optimisation by Multiobjective Genetic Algorithms. Control andCybernetics, 26(3):391-412, 1997.

30. S. Tzan and C.P. Pantelides. Annealing strategy for optimal structural de-sign. Journal of Structural Engineering, 122(7):815-827, 1996.

31. VB. Venkayya. Design of Optimum Structures. Computers & Structures,


1:265-309, 1971.32. W. Xicheng and M. Guixu. A parallel iterative algorithm for structural opti-

mization. Computer Methods in Applied Mechanics and Engineering, 96:25-32, 1992.

CHAPTER 10

CITY AND REGIONAL PLANNING VIA A MOEA:LESSONS LEARNED

Richard Balling

Department of Civil and Environmental Engineering,Brigham Young University

Provo, Utah, USAE-mail: [email protected]

The traditional approach to city and regional land-use and trans-portation planning is described. The traditional approach totally de-pends on the preferences and past experiences of planners, and does notobjectively optimize the very large search space. Furthermore, cities in ametropolitan region often plan independently, neglecting regional goals.The planning problem is also fraught with multiple competing objectivesand interests. This chapter describes a multi-year research project to ap-ply a MOEA to city and regional planning in general, and to planningof the Wasatch Front Metropolitan Region in Utah, USA in particular.The problem formulation at both the city level and the regional levelis described. The choice of MOEA is explained. Results were presentedto professional planners and elected officials. Based on their suggestions,modifications were made both to the problem formulation and the algo-rithm. Although non-dominated plans generated by the MOEA have notbeen adopted per se, the results have influenced planning thinking andtrends. The obstacles to adopting this radically new way of city planningare described.

10.1. The Traditional Approach

Seventy six percent of the population of the state of Utah in the UnitedStates lives in the Wasatch Front Metropolitan Region (WFMR). The pop-ulation of the WFMR in 2000 was 1,702,450. The WFMR encompassesWeber, Davis, Salt Lake, and Utah counties and includes Salt Lake City aswell as 70 additional cities. The WFMR is approximately 100 miles long inthe north-south direction and 20 miles wide in the east west direction. The

227

228 Richard Balling

WFMR is bounded along the eastern side by the Wasatch Front mountainrange, which rises abruptly from the valley floor. On the west, the WFMRis bounded by the Great Salt Lake and Utah Lake.

The natural beauty and recreational opportunities of the WFMR havebeen key reasons for the rapid growth that has occurred in recent decades.The population grew 27% in 1990-2000. The WFMR was brought into worldview during the Winter Olympic Games of 2002. This has attracted evenmore growth to the region. Planners project the population to jump byanother 41% by the year 20201.

In previous decades, unfettered development and sprawl were allowedto occur in the WFMR. However, the growth surge of the past decade hasbrought anxiety to the residents of the WFMR. Opinion polls have shownthe management of growth to be among the top concerns of the people2. Thestate of Utah is politically very conservative, and the rights of businesses,including developers, are highly esteemed. Nevertheless, most residents feelthat growth cannot be allowed to continue unmanaged. In December 1995,the governor convened a three-day Growth Summit that was aired on allthree major television stations. Issues were discussed, and ideas were ex-plored. The Growth Summit further heightened public awareness of thiscomplex problem3.

A local citizen's action group, the Coalition for Utah's Future, obtainedstudies from other metropolitan regions in the western United States thatwere experiencing high growth. In particular, studies from Denver, Col-orado and Portland, Oregon were obtained. Following the approach usedin these studies, the group, under a program named Envision Utah, devel-oped four contrasting land use and transportation plans for the region4.These plans were made available to citizens via the local newspapers andpublic meetings. The public response was mixed. Many felt that the choiceof plans was severely limited and somewhat biased in the sense that one ofthe plans was made to seem clearly superior5.

This approach to planning is typical of the traditional approach to re-gional and city planning. In the traditional approach, a handful of plans aredeveloped by planners and presented to decision-makers for selection. Thedevelopment of the candidate plans is largely subjective and highly depen-dent on the experience and preferences of the planners. The subjectivity ofthe process can induce skepticism among elected officials and the citizenry.

Researchers at Brigham Young University have received two grants fromthe National Science Foundation to study the possibility of increasing ob-jectivity in the land use and transportation planning process through the

City and Regional Planning via a MOEA: Lessons Learned 229

use of formal optimization algorithms. The first grant was devoted to cityplanning6, and the second grant was devoted to regional planning7. A Multi-Objective Evolutionary Algorithm (MOEA) was selected as the optimiza-tion algorithm because it can:

1) objectively search large spaces,2) handle discrete variables and discontinuous functions,3) rationally treat multiple competing objectives.

The number of possible land use and transportation plans for a city orregion is huge, making the search space very large. The design variablesare discrete-valued choices between different land uses and street classifica-tions. The objectives and constraints are evaluated from empirical modelsthat may be discontinuous functions of the design variables. There are sev-eral competing objectives in the problem, and it is very difficult, if notimpossible, to get stakeholders to agree on the relative importance of eachobjective.

10.2. The MOEA Approach

Let nsize be the generation size and let ngener be the number of generations.The MOEA begins by randomly generating a starting generation of nsizeplans. The objectives and constraints are then evaluated for each plan.

The maximin fitness8 is calculated for each feasible plan (plan satisfyingall the constraints). Let nobj be the number of objectives, and let Pk bethe scaled value of the kth objective for the ith plan in the generation.Assuming that all objectives are minimized, and assuming that all plans inthe generation are distinct in objective space, plan j dominates plan i if:

fk > fkioi all k from 1 to nobj (1)

This is equivalent to:

min (fl -fl)>0 (2)k=l,nobj \ J*J - \ >

The ith plan will be dominated if:

max (,Tnh (fk~f0)-° W

j = 1, nsize \k=l>n°b3 ^ ' )

230 Richard Balling

The maximin fitness of the ith plan is:

fitness1 = max ( min (fk~fl)) (4)j = l,nsize \k=i,nobj\ ) J

The maximin fitness of dominated plans will be greater than or equalto zero, and the maximin fitness of nondominated plans will be less thanzero. The maximin fitness is minimized.

The maximin fitness has another important property. It does not treatall nondominated plans equally. Nondominated plans that are widely sepa-rated from other plans in objective space will have more negative (better)maximin fitness than nondominated plans that are clustered in objectivespace. In the limit as clustering increases, two nondominated plans thathave the same values for all objectives will have zero maximin fitness. Themaximin fitness was chosen for its simplicity, and the fact that it rewardsboth dominance and diversity.

Initially, infeasible plans were deleted from the generation and replacedwith newly generated feasible plans. However, we later decided that it wouldbe better to allow infeasible plans in the generation in order to increasegenetic diversity in the search process. We assume that all constraints arescaled so that they are satisfied when less than zero and violated whengreater than zero. Let maxfeas be the maximum fitness over all feasibleplans in the generation. The fitness of an infeasible plan is taken as maxfeasplus the maximum value of all constraints for the plan. Thus, the fitness ofany infeasible plan is always greater than the fitnesses of all feasible plansin the generation.

After the fitnesses of all plans in the starting generation are evaluated,the starting generation becomes the parent generation, and the processesof tournament selection, single-point crossover, and gene-wise mutation areemployed to produce a child generation of nsize plans. We used a tourna-ment size of 3, a crossover probability of 0.7, and a mutation probabilityof 0.01. The values of the objectives and constraints are then evaluated forall plans in the child generation.

Elitism is employed by combining the parent generation and the childgeneration into a combined generation of 2*nsize plans. The maximin fitnessis evaluated for each feasible plan in this combined generation, maxfeas isevaluated as the maximum fitness over all feasible plans, and the fitnessof each infeasible plan is evaluated as maxfeas plus the maximum scaledconstraint value for the plan. The nsize plans with the lowest fitnesses


from the combined generation become the next parent generation, and theremaining nsize plans are discarded. The processes of selection, crossover,mutation, fitness evaluation, and elitism are repeated using the new parentgeneration.

10.3. City Planning: Provo and Orem

We began by applying the MOEA approach to two adjacent cities in theWFMR, the cities of Provo and Orem. The combined population of bothcities in 2002 was 189,490, and the projected combined population in 2025is 316,200.

We obtained current zoning maps from the planning departments ofboth cities. The combined cities were divided into 195 zones. The zoningmaps specified one of the following 11 land uses for each zone:

FARM farm landVLDR very low density residentialLDR low density residentialMDR medium density residentialHDR high density residentialCDB central business districtGC general commercialSC shopping centerLI light industrialHI heavy industrialUNIV university

In our optimization problem, we decided to allow all zones to changeland use except for the university zones (there are two universities in Provoand Orem). Thus, we assigned one base-ten gene to each non-universityzone. The value of this gene for a particular zone specifies the future landuse for that zone from among the ten non-university land uses.

We also identified 45 major streets in the cities. Each street is currentlyassigned one of the following classifications in the status quo plan:

C2 2-lane collectorC3 3-lane collectorC4 4-lane collectorC5 5-lane collector

232 Richard Balling

A2 2-lane arterialA3 3-lane arterialA4 4-lane arterialA5 5-lane arterialA6 6-lane arterialA7 7-lane arterialF6 6-lane freeway

Speeds on arterial streets are generally higher than speeds on collec-tor streets, and access is more limited. Speeds on freeways are significantlyhigher than speeds on arterial streets, and access is significantly more lim-ited. In our optimization problem, we decided to allow all streets to changeclassification except for the freeways. Thus, we assigned one base-ten geneto each non-freeway street. The value of this gene for a particular streetspecifies the future classification for that street from among the ten non-freeway classifications.

Various objectives and constraints were considered during the course ofthe research project. From the beginning, it was clear that the minimizationof traffic congestion had to be included. Initially, a commercial traffic analy-sis model, MinUTP9, was used. The model had many capabilities that werenot needed, and lacked other capabilities that were needed. We decided todevelop our own traffic analysis model. The model analyzes traffic duringthe peak commute period as well as during the rest of the day. Based on theland uses assigned to the zones for a particular future plan, the model gen-erates the number of trips originating from each zone. These trips includehome to work trips, home to non-work trips, and business trips betweenworkplaces. The destination zone for each trip is then determined via thegravity model, which takes into account the travel time between origin anddestination as well as the relative attractiveness of zones as destinations.Trips are then assigned to streets via a multi-path assignment model. Asstreets reach their capacity, their average speed is lowered. This may causetrips to be re-routed to other streets. The traffic congestion objective isthe minimization of the total travel time of all trips in a 24-hour day. TheMinUTP program required 105 seconds to analyze a single plan. Our owntraffic model required only 10 seconds.

The second objective that was considered was the minimization ofchange from the status quo. The change required by a future plan is ameasure of its political acceptability. Change is measured in terms of num-ber of people affected. The change is the sum over the zones of the number


of people currently living in the zone times a change factor plus the sumover the streets of the number of people currently living on the street timesa change factor. Change factors range from zero to one based on the degreeof change between status quo land use / street classification and futureplanned land use / street classification. The change factor is zero if thereis no change in land use / street classification, the change factor is closeto zero for small changes (i.e. a change from VLDR to LDR or from C3 toC4), and the change factor is close to one for large changes (i.e. a changefrom VLDR to HI or from C3 to A7).

Three constraints were considered: a housing constraint, an employmentconstraint, and a green space constraint. The housing capacity of the statusquo zoning plan is 249,035 people based on housing densities in the residen-tial zones. Since the status quo population is 189,490, the build out rate forProvo and Orem is currently only 76%. Assuming a 97% build out rate forthe year 2025 when the population is projected to be 316,200, the housingconstraint on future plans requires a minimum capacity of 316,200/0.97 =327,000 people. The employment capacity of the status quo zoning planis 196,188 jobs based on employment densities in the commercial zones.The ratio of housing capacity to employment capacity for the status quozoning plan is 249,035/196,188 = 1.27 people per job. Assuming this sameratio in the year 2025, the employment constraint on future plans requiresa minimum capacity of 327,000/1.27 — 257,600 jobs. Green space is theamount of land zoned as FARM or UNIV. The status quo zoning plan has5980 acres of green space. The green space constraint on future plans forthe year 2025 requires a minimum of 4000 acres.

During the course of the research project, other objectives and con-straints were considered. The minimization of air pollution was considered,but it was concluded that this objective would be nearly equivalent to theminimization of traffic congestion. The minimization of new infrastructurecosts was considered, but this was found to be strongly correlated to theminimization of change from the status quo. Constraints on minimum util-ities, schools, and emergency services were deemed equivalent to the min-imum housing and employment constraints. At one point we tried to splitthe housing constraint into three constraints corresponding to low-incomehousing, medium-income housing, and high-income housing. However, itwas too difficult to obtain reliable data.

The MOEA was executed for the city of Provo separately, and thenfor the combined cities of Provo and Orem simultaneously. The benefitsof simultaneous planning over separate planning of the cities became evi-

234 Richard Balling

dent. Simultaneous planning led to increased capacities on east-west streetsconnecting to two cities. It also allowed the cities to develop cooperativeroles. For example, traffic congestion was reduced and green space was pre-served by assigning more high-density residential land to Provo and morecommercial land to Orem. Of course, such a plan has serious property taximplications and may not be politically acceptable to both cities.

After the MOEA was executed for the combined cities, the results werepresented to the planning departments of both cities. The results consistedof several feasible non-dominated plans. The planners rejected most of theplans as unrealistic. Even though a plan may satisfy the constraints andhave low traffic congestion and low overall change, it may prescribe changesto specific zones and streets that the planners knew would be politicallyunacceptable. It became clear in our conversations that it was unrealisticto leave the search space completely unrestrained where any zone couldbe rezoned to any land use, and any street could be reclassified to anystreet classification. Therefore, we narrowed the search space. We wentthrough each zone in both cities with the planners, and they told us thepolitically acceptable land uses for that zone. There were several zoneswhere the only acceptable land use was the status quo land use. A new landuse titled MIX was also added as an acceptable land use for some zones.The MIX land use is residential with some commercial use permitted. Theintent of this land use is to promote walkable communities. It was alsodecided that the only acceptable street classifications for any street wereclassifications with higher capacity, in terms of vehicles per hour, than thestatus quo classification. Thus, streets could be upgraded in capacity, butnot downgraded. With these restrictions, the search space dropped from10237 possible plans to 1086 possible plans.

The MOEA was re-executed with a generation size of nsize = 100 forngener = 100 generations. The total execution time was 26 hours on a lap-top computer with a Pentium III processor. Results from two executions areshown in Figures 10.1 and 10.2. In the first execution, the starting genera-tion consisted of 100 randomly generated plans. In the second execution, thestarting generation consisted of 98 randomly generated plans and 2 seededplans. The first seeded plan was a plan obtained by executing the MOEAwith traffic congestion as the single objective, and the second seeded planwas a plan obtained by executing the MOEA with change as the single ob-jective. Note that the seeded starting generation in Figure 10.2 produceda more diverse final generation than the unseeded starting generation inFigure 10.1.


The results were again shared with planning departments from bothcities. Although they were overwhelmed by 100 non-dominated feasibleplans in the final generation, they were able to pick out interesting ob-servations. The most glaring observation was that the status quo zoningplan was infeasible. It did not have near enough housing or employmentcapacity to meet the needs of the projected future population.

The feasible plan with the lowest value of change converted many FARMzones to residential and commercial zones in order to meet the housing andemployment constraints, and it left the non-FARM zones and the streets un-changed from the status quo. Since few people currently live on FARM land,this kind of change affects the fewest number of people. This is preciselywhat has happened in recent decades in both cities as fruit orchards andagricultural fields have been gobbled up by development. Thus, it appearsthat people have subconsciously opted for the minimization of change.

On the other hand, the feasible plan with the lowest value of traffic con-gestion rezoned many FARM and non-FARM zones throughout both cities,and reclassified virtually all streets to A7. Most of the land use changes in-volved mixing residential and commercial land use more uniformly through-out the cities in order to shorten trips.

Although planners from both cities did not select a plan from the resultsas their master city plan, they did regard are study as objective evidence forthe observations that emerged. Such evidence is needed for the persuasionof elected officials and citizens.

10.4. Regional Planning: The WFMR

The MOEA was applied to the WFMR as a whole. To do so, we first iden-tified the developable land. Developable land excluded national forest land,lakes, wetlands, and steep land on mountainsides. The developable landwas then divided into 343 districts. Districts are much larger in area thanthe zones in a city. In fact, a single district may represent an entire townor borough. Because districts are larger than city zones, a future plan can-not assign a single land use to an entire district. Instead, a future planmay assign a single "scenario" to a district. A scenario is a set of land usepercentages. Clustering analysis was used to identify 17 scenarios currentlyexisting in the WFMR. These 17 scenarios are listed in Table 10.51. Theleftmost column of Table 10.51 lists following different land uses:

Rl single family residential

236 Richard Balling

R2 duplex and four-plex residentialR3 multi-family apartment residentialR4 mobile home residentialR5 high density apartment residentialCl retail commercialC2 industrial commercialC3 warehouse commercialC4 office commercialAU airports and universitiesAG agriculturalPA parksVA vacant

Each column in Table 10.51 lists the land use percentages for a particularscenario. Thus, scenarios 1-3 and 14 are primarily open space scenarios,scenarios 4-7 are primarily residential scenarios, scenarios 8-10 are mixedusage scenarios, scenarios 11-15 are primarily commercial scenarios, andscenarios 16-17 are primarily airport and university scenarios. One integer-valued gene was assigned to each district. The integer value from 1 to 17corresponds to the planned future scenario for the district.

We learned in the Provo-Orem city planning problem that it was un-wise to let every district change to any possible scenario without restraint.Table 10.52 indicates the allowed scenario changes. A row in Table 10.52corresponds to the status quo scenario, and the X's indicate the allowed fu-ture scenarios. Note that scenarios 15, 16, and 17 were not allowed to changefrom the status quo. Scenario 15 is primarily heavy industrial, scenario 16is primarily university, and scenario 17 is primarily airport. In general, dis-tricts were allowed to change to scenarios that were slightly more developedthan their status quo scenarios.

We identified 260 inter-district streets in the WFMR. We used the samestreet classifications that were used in the Provo-Orem city planning prob-lem. Streets that are currently arterial were allowed to change to arterialclassifications with equal or greater number of lanes. Streets that were cur-rently collector are allowed to change to collector or arterial classificationswith equal or greater number of lanes. Streets that are currently freewayswere not allowed to change.

We used the same objectives and constraints that were used in theProvo-Orem city planning problem. Specifically, we minimized the totaltravel time of all trips in a 24-hour day, and we minimized the change


from the status quo. The population of the WFMR in 2000 was 1,702,450.The housing constraint required enough housing capacity for the projected2,401,000 residents for the year 2020. The employment constraint requiredenough employment capacity for the projected 1,210,000 jobs needed by theyear 2020. The green space constraint required future plans have at least165,000 acres of green space, which is 20% of the area of the developableland.

We ignored the change objective and executed the genetic algorithm tofind the minimum travel time plan (the constraints were included). Thenwe ignored the travel time objective and executed the genetic algorithmto find the minimum change plan. These two seed plans were added to98 randomly generated plans to form the starting generation of 100 plans.The genetic algorithm was then executed for 100 generations with bothobjectives. The execution required four days on a desktop computer with a1.7 GHz dual processor and 1 gigabyte of RAM. Objective function valuesfor the starting and final generations are plotted in Figure 10.3.

Table 10.53 gives numerical results for four selected plans: the statusquo plan, the minimum change plan, the minimum travel time plan, and acompromise plan selected from the final generation. Note that the statusquo plan does not satisfy the minimum constraints for housing and em-ployment, while the other three plans do. Note that the travel time for theminimum change plan is more than double the travel time for the mini-mum travel time plan. Note that the change for the minimum travel timeplan is more than 18 times greater than the change for the minimum changeplan. Finally, note that the compromise plan represents a good compromisebetween travel time and change.

As expected, the minimum change plan did not change any of the streets,and the minimum travel time plan reclassified all streets to A7. The landuse of the plans is more interesting. Figure 10.4 shows land use maps for thestatus quo, minimum change, and minimum travel time plans. The land useis divided into four maps for each plan. The first map shows the districtsthat are predominantly open space, the second map shows the districts thatare predominantly residential, the third map shows the districts that arepredominantly commercial, and the fourth map shows the districts that aremixed residential/commercial usage. All three plans have about the sameamount of predominantly commercial land. The minimum change and min-imum travel time plans have significantly less open space than the statusquo. The minimum change plan converted open space land to predomi-nantly residential land, and the minimum travel time plan converted both

238 Richard Balling

open space and residential land to mixed usage land. These observationsare similar to what was observed in the Provo-Orem city planning problem.

The approach and the results were presented to: 1) planners from theUtah Governor's Office of Planning and Budget, 2) planners from EnvisionUtah, 3) mayors and officials serving on the Utah Quality Growth Commis-sion, 4) mayors serving on the Mountainlands Association of Governments,and 5) planners from the Wasatch Front Regional Growth Commission.All of these people found the work interesting and relevant. However, nonewere anxious to select one of the non-dominated plans produced by thework right away. We believe there are two reasons for this. First, we believethey were intimidated by the fact that this approach is so radically differ-ent from the traditional approach. Second, we believe that the number ofnon-dominated plans produced is overwhelming. Further work is needed toreduce the non-dominated plans down to a handful of plans that decision-makers can assimilate. This reduction process must be done objectivelyrather than subjectively. The resulting handful of plans must represent dis-tinct conceptual ideas. Finally, many of these groups expressed the need toinclude mass transit in our modeling.

10.5. Coordinating Regional and City Planning

One of the problems we wanted to address in our research project wasthe issue of coordinating planning at the regional and city levels. Regionalplanning should not attempt to micromanage city planning by taking overthe development of zoning plans and street plans for each city in the region.Cities must be given autonomy to develop their own zoning and streetplans. On the other hand, if cities plan independently from one anotherand ignore regional planning altogether, then regional goals will not beachieved resulting in a chaotic and inefficient situation.

A proper balance between regional and city planning must be sought.The regional planning approach we have described thus far is consistentwith this balance. At the regional level, scenarios are selected for each dis-trict, and street classifications are selected for each inter-district street.Inter-district streets are major streets that run through multiple districtsas opposed to intra-district streets that lie within a district. Rememberthat districts are fairly large in size and may represent an entire town orborough. After a regional plan is selected, the results are sent down to thecities. The cities then subdivide the districts into zones and determine theland use for each zone. The inter-district streets are held fixed to the classi-


fications specified by the regional plan. The cities are allowed to determinethe classifications of the intra-district streets.

The objectives and constraints used at the city planning level need notbe the same as the objectives and constraints used at the regional planninglevel. However, the cities must include the minimization of land use devia-tion among their objective functions. Land use deviation is a measure of themismatch between the zoning plan of the city and the scenarios specifiedfor each district by the region. Recall that scenarios specify the percent-ages of each land use in each district. For a particular zoning plan, onecan determine the actual percentages of each land use in each district. Theland use deviation is the sum over the districts of the differences betweenspecified and actual percentages of land use multiplied by the area of thedistrict. With this objective function, cities try to match the regional landuse scenarios passed down to them.

We demonstrated this approach to regional and city coordination by se-lecting the compromise plan in Table 10.53 as the regional plan and passingthe results down to the cities of Provo and Orem. There are 35 districtsin Provo and Orem, which are divided into the same 195 zones as before.We re-executed the MOEA for Provo-Orem with the same objectives andconstraints as before, but with the addition of land use deviation as thethird objective. Table 10.54 matches the land uses from the city level tothe land uses from the regional level. Note that each row of the table addsup to 100%. Twenty-seven of the 45 major streets are inter-district streets,while the rest are intra-district streets. The inter-district streets were fixedto the classifications specified in the regional plan. The MOEA was exe-cuted for 100 generations with 100 plans in each generation. The startinggeneration was again seeded with plans where each of the three objectiveswas minimized individually. The MOEA produced a non-dominated set ofplans for the decision-makers. We noted that the land-use deviation couldnot be reduced to zero because the discreteness of the zones and the lim-itations on allowable land uses made it impossible to match the regionalpercentages exactly.

10.6. Conclusions

An MOEA was developed and applied to city and regional planning. At thecity level, the MOEA determined land uses for each zone and classificationsfor each street. At the regional level, the MOEA determined land-use per-centages for each district and classifications for each inter-district street.

240 Richard Balling

At both levels, traffic congestion and change from the status quo were min-imized while constraints on housing capacity, employment capacity, andgreen space capacity were satisfied. Coordination between city and regionallevels was demonstrated by re-executing the MOEA at the city level with athird objective, which minimized land use deviation from the regional plan.Inter-district streets were also fixed to the classifications specified by theregional plan, but the city was permitted determine the classifications ofintra-district streets.

The MOEA was executed for the Wasatch Front Metropolitan Region(WFMR) in the state of Utah in the USA. It was also executed for the citiesof Provo and Orem in the WFMR. Results were presented to both plan-ners and elected officials. After discussions with these people, the MOEAapproach was modified and re-executed. Because the MOEA approach is soradically different from the traditional approach to planning, and becauseit produced an overwhelming number of non-dominated plans, the plan-ners and elected officials were reluctant to select one of the plans producedby the MOEA approach. Nevertheless, they recognized the objectivity ofthe MOEA approach, and they were able to utilize the planning trendsand ideas produced by the MOEA approach. Many of these people viewthe MOEA approach as the future way to plan, and encourage its furtherdevelopment.

Acknowledgments

This work was funded by the USA National Science Foundation underGrant CMS-9817690, for which the author is grateful.

References

1. Utah Governor's Office of Planning and Budget,http://www.governor.utah.gov/dea/LongTermProjections.html.2. Growth Summit Survey Results, Dan Jones and Associates, Salt Lake City,Utah, 1995.3. Fouhy, E., "Utah Growth Summit Attracts Half-Million Citizens", CivicCatalyst Newsletter, Winter, 1996.4. Envision Utah Quality Growth Strategy, Envision Utah, Salt Lake City,Utah, November, 1999.5. Simmons, D.R., Simmons, R.T., and Staley, S.R., Growth Issues in Utah:Facts, Fallacies, and Recommendations for Quality Growth, The SutherlandInstitute, Salt Lake City, Utah, October, 1999.6. Balling, R.J., Taber, J.T., Day, K., and Wilson, S., "Land-Use and Trans-portation Planning for Twin Cities Using a Genetic Algorithm", Transporta-


tion Research Record, Vol. 1722, Pp. 67-74, December, 2000.7. Balling, R.J., Lowry, M., and Saito, M., "Regional Land-Use and Trans-portation Planning Using a Genetic Algorithm", Transportation ResearchBoard Annual Meeting, Washington, DC, January 12-16, 2003.8. Balling, R.J., "The Maximin Fitness Function; Multi-Objective City andRegional Planning", Second International Conference on Evolutionary Multi-Criterion Optimization, Faro, Portugal, April 8-11, 2003.9. MinUTP, Comsis Corporation, Silver Springs, Maryland, 1994.

Table 10.51. Land Use Percentages for Each Scenario.

Scenarios| 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9

R l 11.3 34.0 41.0 47.3 86.1 68.2 41.7 48.7 57.4R2 ~ O 2 ~ 0.3 0.7 0.7 2.0 1.8 46.4 2.9 0.6R3 0.1 0.0 0.2 0.0~ 0.8 1.0 0.0 4.8 2.4R 4 "~b~X~~ 0.0 0.7 0 .0~ 0.1 0.5 0.0 0.9 0.0R5 0.0 0.0 0.0 13.9 0.0 0.1 0.0 0.0 0.0C l 1.7 0.0 3.1 0.0 4.3 3.6 2.8 11.9 28.7C2 ~ L X ~ 0.0 " 2.4 0.0~ 1.0 0.8 0.0 4.0 ~0^~C3 0.2 0.0 0.5 0.0 0.1 0.2 0.3 0.3 0.0C4 0.1 0.0 ~ 0.1 0.0 0.2~ 0.3 0.2 0.5~~ 0.0AU ~2J3~ 0.1 ' 4.1 0.0 ~ 0.7 ~9.5 0.1 8.5 l u T "AG 78.4 0.0 39.5 36.2 2.0 7.0 8.6 3.1 1.2PA 0.2 64.6 1.0 0.0 0.4 1.9 0.0 7.8 0.6VA 3.5 0.9 ~ 4.7 1.9 2.2~ 4.2 0.0 6.1~ 5.1

~L0~ 11 12 13 14 15 16 17Rl 10.6 12.9 10.8 21.3 4.2 3.1 21.1 10.8R2 (To 1.9 0.0 ~ 1.0 O3 bT9 O8 (OR3 ~38.4 2.2~ 1.1 2.3 0.2~ 0.0 " 8.8 l.f~R4 ~blT~ 0.0 0.6 1.4 ~ 0.1 0.2 0.0 0.4R5 0.0 0.0 0.0 0.1 " 0.0 0.0 0.0 0.0C l 25.4 31.6 65.6 7.9 5.0 1.9 3.5 3.2C2 ~5~7T~ 22.2 "5.0 7.9 ~ 17.3 77.2 0.0 0.6C3 0.0 2.0 0.0 0.4 0.9 0.4 O0 (X0C4 ~olT~ 0.8 "O.O 0.5 ~ 0.1 0.1 9.9 0.0AU 0.0 7.6 2.9 24.5 4.1 0.3 47.7 76.1AG 0.0 0.3 4.9 4.4 9.3 15.5 0.2 1.8PA 0.0 1.7 0.0 2.4 ~ 5.7 ~oTb~ 4.6 2.4V A I 19.8 [ 14.6 I 9.1 I 24.4 | 52.7 | 0.1 | 1.1 | 3.3 |

242 Richard Balling

Table 10.52. Allowed Scenario Changes.

I 1 I 2 I 3 I 4 I 5 I 6 I 7 I 8 I 9 I 10 I 11 I 12 I 13 I 14 I 15 I 16 I 17_1 X X X X X X X X X

2 X X X ~X~ X X X ~X X Z Z H I ^ ^ '3 X ~X~^X X~~X~~X~^X

_4 X _ X X X ~ X X X ZZZ5 X X ^X X X IZZZIZ6 X~^X~^X~^Xf~~X~ X

_7 X__ _JX_ _X_ _X_ _X_ _X_8 X X X X X X X_9 X X X X X X X

10 X X X X X X ~~X11 X~~X~~X~ X ~X~ X12 X X X X13 X X X X X14 X X X X X X15 X

_16 X~ 1 7 | | | I [ | | | | 1 1 I I | 1 X

Table 10.53. Data for Four Selected Regional Plans.

status quo minimum minimum compromisechange travel time

change in persons affected 0 59,934 ~ 1,119,385~ 273,753travel time in hours 1,349,617 2,025,681 984,436 1,493,006

housing capacity in people 1,742,914 2,401,937 2,401,360 2,410,032employment capacity in jobs 995,293 1,210,048 1,466,150 1,376,804

green space in acres 349,583 248,541 247,840~ 228,256

Table 10.54. Regional and City Land Use Match.

I Rl I R2 I R3 I R4 I R5 I C l I C2 I C3 I C4 I AU I A G I P AF A R M 0 0 ~0 0 ~0 0 0 0 " 0 0 90 10V L D R 95 0 0 0 0 0 0 0 0 0 0 5L D R " 80 10 ' 0 10 0 ~0 0 0 0 0 ~ 0 0MDR 20 10 70 0 0 0 0 0 ~0 0 0 0HDR 0 ' 10 ~20 0 ^70 0 0 0 0 0 0 0CBD 0 0 0 0 0 30 2.5 2.5 50 15 0 0SC "0 0 0 0 0 ^ 5 0 0 25 0 ~ 0 0GC "0 0 0 0 ~ 0 ~35 10 10 35 10 0 0LI 0 0 0 0 0 10 60 30 0 0 0 0HI " O O O O O 0 95 0 5 0 0 0MIX 0 10 20 0 27.5 22.5~ 0 0 17.5 2 .5~ 0 0UNIV 0 0 6 6 0 0 0 0 ~0 100 0 0


Fig. 10.1. Provo-Orem Results with Random Starting Generation.

Fig. 10.2. Provo-Orem Results with Seeded Starting Generation.

244 Richard Balling

Fig. 10.3. WFMR Results with Seeded Starting Generation.


Fig. 10.4. WFMR Land Uses for Three Plans.

CHAPTER 11

A MULTI-OBJECTIVE EVOLUTIONARY ALGORITHMFOR THE COVERING TOUR PROBLEM

Nicolas JozefowiezLaboratoire d'Informatique Fondamentale de Lille,

Universite de Lille, FranceE-mail: [email protected]

Frederic Semet

Laboratoire d'Automatique, de Mecanique et d'Informatique industrielles etHumaines,

Universite de Valenciennes, FranceE-mail: frederic.semetQuniv-valenciennes.fr

El-Ghazali Talbi

Laboratoire d'Informatique Fondamentale de Lille,Universite de Lille, France


In this chapter, we present a multi-objective evolutionary algorithm forthe bi-objective covering tour problem, which is a generalization of thesingle-objective covering tour problem. In the latter, the objective is tofind a minimal length tour on a set of vertices V so that every vertexin a set W lies within a given distance c of a visited node. In the bi-objective covering tour problem, the parameter c is omitted, and replacedby an objective. Our evolutionary algorithm employs specially designedgenetic operators, and exploits special features of the problem to improveits efficiency. The evolutionary algorithm is compared with an exactalgorithm on generated benchmarks.

247

248 N. Jozefouiiez, F. Semet and E-G. Talbi

11.1. Introduction

Problems related to vehicle routing are among the most studied combina-torial optimization problems. They deal with the need to identify a tour, ora set of tours, on a set of nodes, or a set of arcs, while taking resources intoaccount. Famous examples of this class of problems are the well-known trav-eling salesman problem (TSP), and the vehicle routing problem (VRP) 32

when the routing is realised on the nodes, or the arc routing problem (ARP)when it is done on the arcs.

Academic routing problems often need adaptations for practical appli-cations. These adaptations usually take the form of new constraints incor-porated into the model of the problem. For instance, the basic version of theVRP deals with the construction of a minimum length collection of routesfor a fleet of vehicles among customers, which demand to be served goodsfrom a depot, so that the demands on a route do not exceed the vehiclecapacity. To consider practical aspects, several variants of this problem,including additional constraints, have been proposed 32, such as the VRPwith time windows. In this case, each customer must be served during itstime window.

Another strategy to improve the practical aspect of a problem is to con-sider several objectives. In the last decade, a growing number of studiesexplore this opportunity in different application areas. In the case of rout-ing problems, the objectives can be classified according to three aspects:the route (work, profit, makespan, balance ...), the nodes or arcs (time win-dows, customer service ...), and the resources (vehicles, product ...). Tables11.55 and 11.56 present different papers devoted to multi-objective routingproblems with their objectives classified according to these three categories.

The problems presented in Tables 11.55 and 11.56 are clearly statedand solved as multi-objective problems by the authors. However, accordingto Boffey 4, some single-objective routing problems are used as surrogatesfor multi-objective routing problems. In those problems, either only oneobjective is solved while the others appear as constraints; or the differentobjectives are combined into a single objective, for example by means of aweighted aggregating method. Feillet et al. 10 survey the class of problemscalled traveling salesman problems with profit, in which all the customersmay not be visited, but where a profit is associated with each customer,and can be collected by visiting it. Clearly, two opposite objectives can bedefined:

Tab

le 1

1.55

. M

ulti

-obj

ecti

ve r

ou

tin

g pr

oble

ms

in t

he

lite

ratu

re (

par

t 1)

.•

, 1

, 1

1 1

tt.

Au

tho

rs

Pro

blem

M

etho

d R

ou

te

Nod

es

Res

ourc

es

^K

elle

r -^

K

elle

r T

SP

wit

h pr

ofit

"L

exic

ogra

phic

«M

in.

tou

r le

ng

th

e_an

d G

oodc

hild

1

9 m

eth

od

»Max

. pr

ofit

?"

••H

euri

stic

§

Par

k an

d K

oell

ing

VR

P

Wei

ghte

d go

al p

ro-

Min

. to

tal

trav

el d

is-

«Max

. fu

lfil

lmen

t M

in.

det

erio

rati

on

of

^2

7'2

8 g

ram

min

g ta

nce

of

urg

ent

dem

and

good

s §"

••M

ax.

con

dit

ion

al

^de

pend

enci

es o

f §

stat

ion

s g.

Sut

clif

fe

and

VR

P

Co

nst

rain

ed

»Min

. to

tal

dis

tan

ce

Max

. ca

pac

ity

uti

- g

Boa

rd

31

MO

LP

«M

ax.

equa

liza

tion

li

zati

on

§of

veh

icle

tri

p ti

mes

*f

Lee

and

Uen

g 22

VR

P

Heu

rist

ic

»Min

. to

tal

dis

tan

ce

(£•

•Bal

ance

wor

kloa

d 2

(len

gth)

|

Ses

som

boon

et

al

. V

RP

TW

M

OE

A

Min

. to

tal

dis

tan

ce

Max

. cu

sto

mer

sat

is-

»Min

. n

um

ber

of

,_3

0 fa

ctio

n ve

hicl

es

°•M

in.

vehi

cle

wai

ting

S

tim

e Q

Hon

g an

d P

ark

iB

VR

PT

W

"Heu

rist

ic

Min

. to

tal

trav

el t

ime

Min

. to

tal

cust

om

er

§•

Goa

l p

rog

ram

min

g w

aiti

ng t

ime

2E

l-S

herb

eny

y V

RP

TW

fo

r a

Mu

ltio

bje

ctiv

e »M

in.

tota

l d

ura

tio

n M

in.

wai

ting

tim

e »M

in.

nu

mb

er o

f t§

Bel

gian

fi

rm

sim

ulat

ed

anne

al-

^Bal

anci

ng (

leng

th)

tru

cks

4in

g »M

ax.

flex

ibil

ity

»Min

. n

um

ber

of

5jco

vere

d tr

uck

s "T

3•M

in.

nu

mb

er

g_of

unc

over

ed t

ruck

s 2

•Min

. w

orki

ng t

ime

not

used

to CD

to sT

able

11.

56.

Mul

ti-o

bjec

tive

ro

uti

ng

pro

ble

ms

in t

he

lite

ratu

re (

par

t 2)

.

Au

tho

rs

Pro

ble

m

Met

hod

Rou

te

Nod

es

Res

ourc

esG

eige

r ll

VR

PT

W

MO

EA

M

in.

tota

l d

ista

nce

»M

in.

tim

e w

indo

w

Min

. n

um

ber

of

vehi

-vi

olat

ion

cles

•Min

. nu

mbe

r of

viol

atio

nsR

aho

ual

et

al.

VR

PT

W

MO

EA

M

in.

tota

l d

ista

nce

M

in.

num

ber

of v

io-

Min

. n

um

ber

of

vehi

- S;

late

d co

nst

rain

ts

cles

<-

,R

ibei

ro

and

Per

iod

ic V

RP

It

erat

ed l

ocal

sea

rch

«Min

. to

tal

dis

tan

ce

Mar

keti

ng o

bjec

tive

g

Lou

reng

o 29

^Bal

anci

ng (

nb

. of

°

visi

ted

cust

om

ers)

g'

Co

rber

an e

t al

. e

Sch

ool

bus

rou

t-

Sca

tter

sea

rch

Min

. m

akes

pan

Min

. nu

mbe

r of

veh

i-in

g cl

es

-^Jo

zefo

wie

z et

al.

1Y

VR

P

MO

EA

»M

in.

tota

l d

ista

nce

re

5

•Bal

anci

ng

(len

gth

) 5

Bar

an

and

VR

PT

W

Ant

col

ony

syst

em

Min

. to

tal

trav

elin

g M

in.

tota

l de

live

ry

Min

. nu

mbe

r of

veh

i-

o

Sch

aere

r 2

tim

es

tim

es

cles

§.

Lac

omm

e et

al.

'2

1

CA

RP

M

OE

A

«Min

. to

tal

len

gth

C

q•

Min

. m

akes

pan

O

Pac

heco

and

Mar

ti

Sch

ool

bus

rou

t-

Tab

u se

arch

M

in.

Mak

espa

n M

in.

nu

mb

er o

f ve

hi-

>-3

25

ing

cles

5

:P

aqu

ete

an

d M

OT

SP

Loc

al s

earc

h M

in.

tota

l le

ngth

Stu

tzle

2

6

Yan

et

al. a

4 M

OT

SP

MO

EA

M

in. t

otal

len

gth

Th

is s

tud

y co

veri

ng

tou

r »e

-con

stra

int

met

ho

d M

in. l

engt

h M

in. c

over

pro

ble

m

»M

OE

A

A Multi-Objective Evolutionary Algorithm for the Covering Tour Problem 251

(1) Collect the maximum of profit, and thus visit the maximum numberof customers, and increase the length of the tour.

(2) Minimize the travel length by excluding customers, and thus min-imize the profit generated by those customers.

When the objective is the maximization of profit while restricting the lengthof the route, the problem is called the selective traveling salesman problem.On the other hand, when the objective is the minimization of the lengthwhile ensuring a minimal profit, the problem is called the quota travelingsalesman problem. But, when it is considered from a bi-objective pointof view, a unique problem is defined. The only attempt to solve the bi-objective problem by means of a lexicographic method, was made by Keller18, and Keller and Goodchild 19.

In the present chapter, we are interested in another problem pointed byBoffey 4 as an implicit multi-objective routing problem: the covering tourproblem (CTP) 13. We defined a multi-objective evolutionary algorithm(MOEA) for the bi-objective model of this problem. MOEAs have gener-ated great interest among researchers for solving multi-objective problems(MOP) since they possess interesting features like working on a populationof solutions, which is helpful to the solution of a MOP made of a set of solu-tions. Several works and applications 5, including studies on multi-objectiverouting problems 30-11.17

! u s e MOEAs.This chapter is organized as follows. Section 11.2 presents the CTP, its

bi-objective generalization, as well as a heuristic and an exact method fromthe literature 13 used in the meta-heuristic and for experiments. Section11.3 introduces our MOEA for the CTP. In section 11.4, we assess theefficiency of the MOEA on generated benchmarks. Conclusions are drawnin section 11.5.

11.2. The Covering Tour Problem

11.2.1. The Mono-Objective Covering Tour Problem

The covering tour problem (CTP) is defined as follows. Let G = (V UW, E) be an undirected graph, where V U W is the vertex set, and E ={(vi,Vj)\vi,Vj e V U W, i < j} is the edge set. Vertex Vi is a depot, V isthe set of vertices that can be visited, T C V is the set of vertices thatmust be visited (v± G T), and W the set of vertices that must be covered. Adistance matrix C = (c^) satisfying the triangle inequality is defined on E.The CTP consists in determining a minimum length tour, or Hamiltoniancycle, over a subset of V so that the tour contains all vertices of T, and

252 N. Jozefowiez, F. Semet and E-G. Talbi

every vertex of W is covered by the tour, i.e., it lies within a distance cfrom a vertex of the tour. Such a tour may not always exist.

A generic application of the CTP is the design of a tour on a networkwhere the vertices of V represent points that can be reached by a vehicle,and all the points not on that route are easily reachable from it 7. Anexample is the selection of locations for post boxes among a set of candidatesites so that all users are located within reasonable distance of a post box,and the cost of a collection route is minimized 20. Another application,proposed by Hodgson et al. 15, is the provision of adequate primary healthcare in developing countries. In this study, a medical mobile facility cannotreach all the villages, or the whole population. Therefore, the goal is todetermine a minimal length tour for the mobile facility upon the practicalroads and to ensure that the maximal distance to travel for a patient, whooften walks, to the nearest visited points does not exceed a given length.This problem was applied to real data from the Suhum district in Ghana.

Few papers on the CTP can be found in the literature. Gendreau et al.13

proposed a model, a heuristic, and a branch-and-cut algorithm. The branch-and-cut algorithm is used in the study presented above on the routing ofmedical mobile facility in Ghana 15. Another model and three scatter searchalgorithms were proposed by Maniezzo et al. 23. Hachicha et al. 14 studiedan extension of the CTP: the m-CTP where m tours must be defined sothat every vertex of V belongs to at most one tour, and the length of a tourdoes not exceed a given constraint. The authors defined three heuristics forthe m-CTP and applied them on generated instances and on the Suhumdistrict data. Finally, Motta et al. 24 proposed a GRASP meta-heuristic fora generalized covering tour problem where the nodes from W can also bevisited.

11.2.2. The Bi-Objective Covering Tour Problem

In the present chapter, we are not interested in the solution of the CTP asdescribed above, but as a bi-objective problem. The bi-objective coveringtour problem (BOCTP) corresponds to the CTP, where the constraintsimposing that for all the nodes w E W there exists at least one node vvisited such that cvw is smaller than c, have been removed, and replacedby an objective. All the other constraints are maintained. The objectivesof the BOCTP are the minimization of:

(1) the tour length.(2) the cover.


We define the cover of a solution as the greatest distance between a nodew £ W, and the nearest visited node v £ V from w.

We will now present a criterion to compute the covers that correspond tofeasible solutions. This criterion will be used later in our MOEA. From thedefinition of the cover, for every couple (v, w) £ V x W, cvw is a candidatecover. However, not every candidate cover corresponds to a feasible solution.To evaluate the feasibility of a cover, we use the following criterion. Givenv £ V and w £ W, we have:

1)W eT,v'^v, cv>w>cvw

cvw is a feasible cover •& 2)Vu/ £ W, w' ^ w, 3i>' £ VS-T. Cy'yj' _ ^VW _ Cyl ui

The validity of this criterion is immediate since the implication is the defini-tion of the cover, and in the other direction, following the rules is equivalentto building a solution.

11.2.3. Optimization Methods

11.2.3.1. A Heuristic Method

In this paragraph, we will describe the heuristic designed by Gendreau et al.13, which is used in our meta-heuristic. This heuristic combines the GENIUSheuristic for the TSP 12 with the PRIMAL1 Set Covering heuristic by Balasand Ho 1 for the set covering problem (SCP).

The PRIMAL1 heuristic gradually includes nodes v in the solution ac-cording to a greedy criterion, in order to minimize a function f(cv,bv),where, at each step, cv is the cost of including node v £ V in the solu-tion, and bv is the number of nodes of W covered by v. cv is expressed bythe value of the minimum spanning tree built upon the edges defined bythe vertices present in this solution and v. The minimum spanning tree isbuilt using the Prism algorithm. The three following functions suggestedby Balas and Ho are used:

(1) / ( * A ) = 1 5 ^(2) f(Cy,bV) = t(3) f(Cy,by) =CV

PRIMAL1 first applies criterion 1 in a greedy fashion until all the verticesof W are covered. Then, the nodes, which cover an over covered node of W,are removed from the solution. After that, the solution is completed usingcriterion 2, and nodes, which cover over covered nodes of W, are removed.


The process is iterated with criterion 3. A second solution is constructedby applying this time the criteria in order 1, 3, 2. The best of these twosolutions is retained.

The following heuristic is run twice like in PRIMAL 1 with the twosequences of criteria.

STEP 1 Initialization. Set H <- T, z = oo. The current coveringcriterion is 1.STEP 2 Termination rules. If at least one vertex of W is notcovered by a vertex of H, go to STEP 3. Construct a Hamiltoniancycle over all the vertices of H using GENIUS. Let z be the length ofthe cycle. If z < z then z <— z and H «— H. If the current coveringcriterion is the last one, stop: the best solution is given by the touron H, its cost is z. Otherwise, remove vertices of H associated withover covered nodes of W and consider the next covering criterion.STEP 3 Vertex selection. Compute for every v G V \ H the coef-ficients cv and bv in the current set H. Determine the best vertexv* to include in H according to the current covering criterion. SetH <- H U {v*}. Go to STEP 2.

11.2.3.2. An Exact Method

We propose a multi-objective exact method which is based on the mono-objective branch-and-cut algorithm for the CTP developed by Gendreau etal. 13. It is able to solve instances, where the size of V is up to 100, andthe size of W up to 500. We employ the branch-and-cut algorithm in ane-constraint strategy to generate the optimal Pareto set for the BOCTP.It is easy to design the e-constraint method since the problem is usuallysolved with the cover considered as a constraint, and since we are able tocompute all the possible cover values by means of the criterion describedearlier. The method is as follows :

STEP 1 Compute the feasible covers using the previous criterion,and sort them out by decreasing order in a list I = (co, c\,..., c*).Set current-cover 4— CQ.STEP 2 Solve the CTP with cur rent .cover as a parameter usingthe branch-and-cut algorithm.STEP 3 Compute the cover cs of the solution s provided by thebranch-and-cut. Save s as a solution of the optimal Pareto set.Search a 6 I so that c; = maxCj.6/(ci < cs). If such a a exists, then


set current-cover <- a and go to STEP 2, otherwise stop.

11.3. A Multi-Objective Evolutionary Algorithm for theBi-Objective Covering Tour Problem

This section is organized as follows. In subsection 11.3.1, we present the gen-eral framework of our MOEA. In subsection 11.3.2, we explain the choiceswe made for the encoding of the solution. Finally, the genetic operators aredescribed in subsection 11.3.3.

11.3.1. General Framework

Our approach is based on a steady-state variant of NSGA II 8. The mainmechanisms and features of our MOEA are:

(1) The ranking function is the same as in NSGA II. The populationis sorted into different non-domination levels. The non-dominatedindividuals obtain the rank 1 and form the subset E\. Rank k isgiven to the solution only dominated by the individuals belongingto the subset E\ U Ei U • • • U Ek-i- Then a fitness equal to itsrank (1 is the best level) is assigned to each solution. Since wework on a bi-objective problem, this phase can be done efficientlyin O(nlogn).

(2) A crowding distance metric is used to provide diversity during thesearch. This metric gives an estimate of the density of the solu-tions surrounding a solution i in the population. This estimateis expressed by an approximation of the perimeter of the cuboidformed by the nearest neighbors of i.

(3) We added an archive whose purpose is to store the non-dominatedsolutions as they are found. Doing so insures that no non-dominatedsolutions will be lost due to the stochasticity of the algorithm. Thisarchive is also used for the stopping criterion of the MOEA. If it isnot updated for M generations in a row, the MOEA is stopped.

(4) The initial population is built as follows. First, the feasible covervalues are computed using the criterion explained earlier. Then,we select several values among the feasible covers, and apply theheuristic for the CTP with the selected covers as parameter. Byusing the heuristic, we begin with good quality solutions. Further-more, by selecting the starting covers, and notably the highest andlowest values, we can obtain information about the frontier such as


a first approximation of the extremities of the Pareto set. N + 1solutions are generated during this phase.

(5) Our approach is steady-state. A generation runs as follows. Therank and crowding distance are computed for the N + 1 solutionsbelonging to the population. The solution with the worst rank andthe worst crowding distance is discarded. Two parents are chosenfrom the N remaining solutions by means of a binary tournament,during which two solutions are compared by means of a crowdedtournament selection operator. According to this operator, a solu-tion i wins a tournament against another solution j if any of thefollowing conditions is true:

(a) n < rj, where r, is the rank of solution i.(b) ri — Tj and d; > dj, where di is the crowding distance of

solution i.

The first condition makes sure that the chosen solution lies in abetter non-dominated set. The second condition breaks ties whenboth solutions belong to the same non-dominated level by selectingthe less crowded individual. An offspring is generated from the twoselected parents by means of the genetic operators described in sub-section 11.3.3. We do not allow multiple occurrences of a solution inthe population. If the offspring already appears in the population,we generate a new offspring with the same parents. This process isrepeated until an offspring not already present in the population iscreated, or until 50 offspring have been generated unsuccessfully.If an offspring is successfully generated, it is inserted into the pop-ulation in replacement of the discarded individual, otherwise thepopulation does not change.

11.3.2. Solution Coding

Two components can be identified in a CTP solution. The first one corre-sponds to a set covering problem solution (SCP), i.e., the vertices that willappear in the tour; the other component is the tour upon the vertices thathave been chosen. It may be noted that the second objective is independentof the tour, and only depends on the vertices appearing in the solution. Inthe genetic operators, we are only interested in the SCP aspect of the so-lution. Actually, problems are encountered for the TSP aspect when oneis designing the genetic operators. Indeed, the two parents may be so dif-ferent in regards of the nodes visited and the edges used, that not enough


information can be passed from the parents to build a tour for the offspringwithout requiring the use of a method providing a solution of the TSP.Therefore, it has been decided that the genetic operators choose the nodesvisited, and that the tour is built by an embedded method dedicated to theTSP. In our implementation, we use the GENIUS heuristic 12.

Since GENIUS is a heuristic, it will not always be able to solve optimallythe TSP over the selected nodes. However, our hypothesis is: if V\ and V2are two subsets of V, and the optimal tour on Vi is better than the optimaltour on V2> then, in most cases, the tour generated by GENIUS on V\ wouldbe better than the tour generated by GENIUS on V2. Considering thishypothesis, we designed the MOEA to identify covers, and their associatedsets of vertices, which are good candidates for optimal Pareto solutions.Then, if there is a need, TSP dedicated methods may be applied on all thesolutions of the approximation generated, or on the solutions selected by adecision maker, to improve their length. The task of the retained methodwill be simplified, since good candidate sets will have been identified, andsince it appears from the experiments conducted by Gendreau et al.13, thatthe number of vertices in the optimal solutions are usually small comparedto \V\.

11.3.3. Genetic Operators

11.3.3.1. The Crossover Operator

The crossover we have developed built a solution by inserting one vertexat a time. The goal is to minimize the number of vertices in the solution atthe end of the crossover. To do so, we avoid to add a vertex that does nothave an effect on the cover value. Actually, the insertion of a vertex in asolution can only have two consequences: either the cover value decreases,or it stays unchanged. The first case appears if the cover value is given bya couple (v, w), and a vertex v', so that cviw < cvw, is added. Taking thisremark into account, the crossover was designed as follows:

STEP 1 Initialization. Set H <-T.STEP 2 Identify the couple {v,w) that gives the current covervalue. Build the set H' <- {v' G V \ H\cV'W < cvw).STEP 3 If H' = 0 then go to STEP 4. Choose a node v' e H' andremove it from H'. Include v' into H with a probability p computedaccording to the parent sets of visited nodes. If v' is included intoH, go to STEP 2, otherwise go to STEP 3.


STEP 4 Build a subset U of the vertices of H so that the coveris unchanged by the removal of an element of U from H. If Uis empty, exit. Otherwise, select u 6 U so that the value of theminimum spanning tree on H\{u) is minimal, and set H •(- H\{u}.Reiterate STEP 4.

The probability p in STEP 3 is computed by the same rules as in thecrossover fusion 3 for the SCP. Let i and j be the two parents, let v be thevertex we wish to include into the offspring, then p is computed as follows:

(1) if v appears in i and j , then p = 1.(2) if v does not appear in any parent, then p = 0;(3) if v appears in one parent only, let p' = r

T]rT. if r; ^ tj, otherwise

let p' = d d^d.. If v is used in i, then we set p = p', otherwise we

set p = 1 - p'.

11.3.3.2. The Mutation Operator

During the mutation phase, we change the status of each vertex v EV \Twith a probability iv\x\ • ^° change the status of a vertex means that if thevertex is in the solution, we remove it even if it increases the cover value;and if the vertex is not in the solution, we add it even if it does not improvethe cover value.

11.4. Computational Results

The MOEA was tested on a series of randomly generated instances. Togenerate the vertex set, |V| + |W| points were generated in a [0,100] x [0,100]square, according to a uniform distribution. The sets T and V were definedby taking the first \T\ and |V| points respectively, and W was defined asthe set of the remaining points. \V\ was set to 50, 75, 100, and 120; T to1, [0.10|V|J, |_0-20|V|J; and \W\ to |V|,2|V|,3|V|. For each combination, 5instances were generated. The MOEA was run 5 times on each instance.The parameters of the MOEA were the following: the population size Nwas set to 256, and the stopping criterion parameter M to 5000.

The MOEA was coded in C. Optimal Pareto sets were generated bymeans of the e-constraint method. The branch-and-cut algorithm was im-plemented in C with CPLEX 8.1. Runs were executed on a Pentium IV2.67Ghz, 512 Mo of RAM, and Debian Linux 3.0 as operating system.

Tables 11.57, 11.58, 11.59, and 11.60 provide three types of informa-tion. First, they give the computational times for the e-constraint method


and the MOEA. They also indicate the number of optimal Pareto solu-tions found by the exact method, and the maximum, average, minimum,and standard deviations of the ratio of optimal Pareto solutions reached bythe MOEA. The maximum, average, minimum, and standard deviations ofthe generational distance 33 of the approximation generated by the MOEAaccording to the optimal Pareto set are also provided. The generational dis-

tance is expressed as follows: ^—^=1 ' , where n is the number of solutionsin the approximation set, and di is the distance in objective space betweensolution i and the nearest solution of the optimal Pareto set. The coordi-nates of the solutions in the objective space were normalized, and therefore,all values x in the tables related to this metric must be read x x 10~4.

The column headings are the following ones:

NB Number of optimal Pareto solutions.time e Computational times for the e-constraint method.Max '/, Maximal ratio.Avg '/, Average ratio.Min '/. Minimal ratio.S.d. 7, Standard deviation of the ratio.Max GD Maximal generational distance.Avg GD Average generational distance.Min GD Minimal generational distance.S.d. GD Standard deviation of the generational distance.time EA Average computational time of the MOEA.

Now, we discuss these results according to two points: the quality ofthe approximations and the computational times. First, the MOEA is ableto generate good quality approximations. As a matter of fact, it is almostalways able to generate a significant part of the optimal Pareto sets. Fur-thermore, the generational distance indicates that the non optimal solutionsfound are not far from the optimal solutions. When the size of V increases,it appears that the approximations remain of good quality. The sizes ofT and W do not seem to have a significant impact on the quality of theapproximations.

Concerning the computational times, the exact method is faster for|V| = 50. This can be explained mainly by two facts. First, it only runsfor a number of times equal to the number of optimal Pareto solutions,which, on average, is not important. Whereas, the MOEA must run at theend for at least 5000 generations before stopping. Furthermore, for each


branch-and-cut algorithm, the cover is fixed, which allows the applicationof rules to simplify the problem 13. Due to these rules, the problems tosolve for the branch-and-cut are simpler, notably when \T\ is important asit can be seen in Table 11.57. Whereas, for the MOEA, since we search thecomplete optimal Pareto set, the cover is not fixed, and the problem mustbe solved without simplifications. For \V\ = 75, the MOEA is significantlyfaster than the exact approach when \T\ = 1. This can be explained bythe fact that the simplification rules do not reduce enough the problemsize to provide a significant advantage to the exact algorithm. When \T\increases, the simplification rules are more efficient, and the exact algorithmis faster. Note that for some results with \T\ = 7, and notably \W\ = 225,the simplifications are not important enough. These remarks are confirmedwhen |V| = 100. As a matter of fact, when \T\ = 1, the difference betweenthe two methods are really significant in favor of the MOEA. Furthermore,for \T\ = 10 and \W\ = 200 or 300, the MOEA is always faster due to thefact that the reductions are less important. For \T\ = 10 and \W\ — 100,the computational times are roughly the same. For \T\ = 20, the reductionsare still important enough for the exact algorithm to be faster, but it maybe noted that when \W\ increases, the advantage of the exact method is notas marked as previously. From these remarks, we can deduce that with theaugmentation of the size of V, the reduction rules will be less and less ableto improve the computational times of the exact algorithm, whereas thecomputational times of the MOEA increase moderately. This is confirmedwhen \V\ = 120. Note that results for jV| = 120 and \T\ = 1 are notreported due to the prohibitive computational times required by the e-constraint method.

11.5. Conclusions and Outlooks

In this chapter, we have proposed a study of the bi-objective covering tourproblem, which is a generalization of the covering tour problem. In thisgeneralization, a constraint of the original problem is expressed quite nat-urally as a second objective. For the bi-objective covering tour problem,a multi-objective evolutionary algorithm, incorporating special features forthe problem, has been designed and implemented. This meta-heuristic hasbeen tested on a set of generated instances, and compared with an exactalgorithm able to generate the complete optimal Pareto set. Results showthat the multi-objective evolutionary algorithm is able to generate goodquality approximations, while its execution time does not increase dramat-


ically when the size of the instances increases, which is the case for the exact

method. Considered future works are the study of other genetic operators

to decrease the computational time, while keeping good quality approxi-

mations, and the adaptation of the meta-heuristic to the real case of the

Suhum district in Ghana.

Acknowledgement

This work was partially supported by the Nord-Pas-de-Calais Region. This

support is gratefully acknowledged. Thanks is also due to the referee for his

valuable comments.

References

1. E. Balas and A. Ho. Set covering algorithm using cutting planes, heuristics,and subgradient optimization: A computational study. Mathematical Pro-gramming, 12:37-60, 1980.

2. B. Baran and M. Schaerer. A multiobjective ant colony system for vehiclerouting problem with time windows. In IASTED International Conferenceon Applied Informatics, pages 97-102, Innsbruck, Austria, 2003.

3. J. E. Beasley and P. C. Chu. A genetic algorithm for the set covering problem.European Journal of Operational Research, 94:392-404, 1996.

4. B. Boffey. Multiobjective routing problems. Top, 3(2):167-220, 1995.5. C. A. Coello Coello, D. A. Van Veldhuizen, and G. B. Lamont. Evolutionary

Algorithms for solving Multi-Objective Problems. Kluwer Academic Publish-ers, New York, May 2002. ISBN 0-3064-6762-3.

6. A. Corboran, E. Fernandez, M. Laguna, and R. Marti. Heuristic solutions tothe problem of routing school buses with multiple objectives. Journal of theoperational research society, 53:427-435, 2002.

7. J. R. Current and D. A. Schilling. The median tour and maximal coveringproblems. European Journal of Operational Research, 73:114-126, 1994.

8. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A Fast and Elitist Multi-objective Genetic Algorithm: NSGA-II. IEEE Transactions on EvolutionaryComputation, 6:182-197, 2002.

9. N. El-sherbeny. Resolution of a vehicle routing problem with multi-objectivesimulated annealing method. PhD thesis, Faculte Polytechnique de Mons,2001.

10. D. Feillet, P. Dejax, and M. Gendreau. Traveling salesman problem withprofits. Transportation Science, 2003. (to be published).

11. M.J. Geiger. Genetic algorithms for multiple objective routing. In MIC2001 -4 Metaheuristics International Conference, pages 349-353, Porto, Portugal,July 2001.

12. M. Gendreau, A. Hertz, and G. Laporte. New insertion and postoptimizationprocedures for the traveling salesman problem. Operations Research, 40:1086-1094, 1992.


13. M. Gendreau, G. Laporte, and F. Semet. The covering tour problem. Oper-ations Research, 45:568-576, 1997.

14. M. Hachicha, M. J. Hodgson, G. Laporte, and F. Semet. Heuristics for themulti-vehicle covering tour problem. Computers and Operations Research,27:29-42, 2000.

15. M. J. Hodgson, G. Laporte, and F. Semet. A covering tour model for planningmobile health care facilities in suhum district, ghana. Journal of RegionalScience, 38:621-638, 1998.

16. S-C. Hong and Y-B. Park. A heuristic for a bi-objective vehicle routing withtime window constraints. International Journal of Production Economics,62:249-258, 1999.

17. N. Jozefowiez, F. Semet, and E-G. Talbi. Parallel and hybrid models formulti-objective optimization: Application to the vehicle routing problem. InJ.J. Merelo Guervos et a l , editors, PPSN VII, volume 2439 of Lecture Notesin Computer Science, pages 271-280. Springer-Verlag, September 2002.

18. C. P. Keller. Multiobjective routing through space and time: The MVP andTDVP problems. PhD thesis, Department of Geography, The University ofWestern Ontario, London, Ontario, Canada, 1985. Unpublished thesis.

19. C. P. Keller and M. Goodchild. The multiobjective vending problem: A gen-eralization of the traveling salesman problem. Environment and Planning B:Planning and Design, 15:447-460, 1988.

20. M. Labbe and G. Laporte. Maximizing user convenience and postal servive ef-ficiency in post box location. Belgian Journal of Operations Research, Statis-tic, and Computer Science, 26:21-35, 1986.

21. P. Lacomme, C. Prins, and M. Sevaux. Multiobjective capacitated arc rout-ing problem. In C. M. Fonseca et al., editors, Evoluationary Multi-criterionOptimization, volume 2632 of LNCS, pages 550-564. Springer, 2003.

22. T-R. Lee and J-H. Ueng. A study of vehicle routing problem with load bal-ancing. International Journal of Physical Distribution and Logistics Manage-ment, 29:646-648, 1998.

23. V. Maniezzo, R. Baldacci, M. Boschetti, and M. Zamboni. Scattersearch methods for the covering tour problem. Technical report, SciencedellTnformazione, University of Bologna, Italy, June 1999.

24. L. Motta, L. S. Ochi, and C. Martinhon. Grasp metaheuristics for the gener-alized covering tour problem. In MIC"2001 - 4 Metaheuristics InternationalConference, pages 387-391, Porto, Portugal, July 2001.

25. J. Pacheco and R. Marti. Tabu search for a multi-objective routing problem.Technical Report TR09-2003, University of Valencia, 2003.

26. L. Paquete and T. Stiitzle. A Two-Phase Local Search for the BiobjectiveTraveling Salesman Problem. In C. M. Fonseca, P. J. Fleming, E. Zitzler,K. Deb, and L. Thiele, editors, Evolutionary Multi-Criterion Optimization.Second International Conference, EMO 2003, volume 2632 of Lecture Notesin Computer Sciences, pages 479-493, Faro, Portugal, April 2003. Springer-Verlag.

27. Y. Park and C. Koelling. A solution of vehicle routing problems in mul-tiple objective environment. Engineering Costs and Production Economics,


10:121-132, 1986.28. Y. Park and C. Koelling. An interactive computerized algorithm for mul-

ticriteria vehicle routing problems. Computers and Industrial Engineering,16:477-490, 1989.

29. R. Ribeiro and H. R. Lourengo. A multi-objective model for a multi perioddistribution management problem. In MIC2001 - 4 Metaheuristics Inter-national Conference, pages 97-102, Porto, Portugal, July 2001.

30. W. Sessomboon, K. Watanabe, T. Irohara, and K. Yoshimoto. A study onmulti-objective vehicle routing problem considering customer satisfactionwith, due-time (the creation of pareto optimal solutions by hybrid geneticalgorithm). Transaction of the Japan Society of Mechanical Engineers, 1998.

31. C. Sutcliffe and J. Board. Optimal solution of a vehicle routing problem:Transporting mentally handicapped adults to an adult training centre. Jour-nal of the Operational Research Society, 41:61-67, 1990.

32. P. Toth and D. Vigo, editors. The vehicle routing problem, volume 9 of SIAMMonographs on Discrete Mathematics and Applications. SIAM, December2001.

33. D. A. Van Veldhuizen. Multiobjective evolutionary algorithms: Classifications,analysis, and new innovations. PhD thesis, Departement of electrical andcomputer engineering, Graduate school of engineering, Air Force Institute ofTechnology, Wright-Paterson AFB, Ohio, May 1999.

34. Z. Yan, L. Zhang, L. Kang, and G. Lin. A New MOEA for Multi-ObjectiveTSP and Its Convergence Property Analysis. In C. M. Fonseca, P. J. Flem-ing, E. Zitzler, K. Deb, and L. Thiele, editors, Evolutionary Multi-CriterionOptimization. Second International Conference, EMO 2003, volume 2632 ofLecture Notes in Computer Sciences, pages 342-354, Faro, Portugal, April2003. Springer-Verlag.


Table 11.57. Results for \V\ = 50.

\T\ \W\ NB time Max Avg Min S.d. Max Avg Min S.d. timee % % % % GD GD GD GD EA

1 50 52 28 0.90 0.89 0.87 0.01 1.85 0.80 0.23 0.55 611 50 47 15 0.98 0.98 0.98 0.00 0.00 0.00 0.00 0.00 371 50 36 39 0.94 0.93 0.89 0.02 1.25 0.25 0.00 0.50 511 50 40 23 0.95 0.93 0.93 0.01 2.26 0.81 0.38 0.73 471 50 63 65 0.95 0.93 0.89 0.02 0.67 0.18 0.00 0.25 931 100 73 86 0.90 0.85 0.71 0.07 1.71 1.52 1.15 0.23 1501 100 40 46 0.95 0.95 0.95 0.00 0.00 0.00 0.00 0.00 381 100 47 28 0.91 0.90 0.87 0.02 3.94 3.54 3.41 0.20 561 100 94 137 0.86 0.82 0.72 0.05 2.84 0.90 0.08 0.99 1781 100 41 34 0.98 0.95 0.90 0.02 1.66 0.33 0.00 0.66 561 150 55 60 0.96 0.96 0.95 0.01 0.11 0.02 0.00 0.04 791 150 50 39 1.00 0.97 0.90 0.03 1.27 0.51 0.00 0.48 811 150 44 33 0.95 0.93 0.89 0.03 4.43 4.40 4.38 0.02 611 150 85 97 0.85 0.77 0.70 0.04 2.53 2.03 0.66 0.71 2281 150 60 69 0.93 0.89 0.85 0.03 1.25 1.17 1.11 0.05 955 50 46 26 0.93 0.91 0.87 0.02 1.58 1.39 0.77 0.31 1315 50 30 11 0.97 0.88 0.83 0.05 4.07 3.81 3.03 0.39 685 50 49 41 1.00 0.97 0.94 0.02 0.12 0.07 0.00 0.06 1115 50 45 22 0.82 0.77 0.71 0.04 3.88 2.74 2.09 0.61 124

5 50 25 5 0.96 0.93 0.92 0.02 1.72 1.38 0.00 0.69 305 100 34 21 0.94 0.91 0.91 0.01 0.00 0.00 0.00 0.00 765 100 33 10 1.00 0.98 0.94 0.02 0.00 0.00 0.00 0.00 755 100 31 16 0.90 0.90 0.87 0.01 0.00 0.00 0.00 0.00 405 100 29 18 1.00 0.99 0.97 0.02 1.65 0.33 0.00 0.66 615 100 33 10 0.94 0.93 0.91 0.01 0.82 0.33 0.00 0.40 395 150 38 18 0.97 0.96 0.95 0.01 0.00 0.00 0.00 0.00 915 150 29 7 1.00 0.96 0.93 0.02 1.16 0.46 0.00 0.57 445 150 38 49 0.97 0.96 0.94 0.01 0.13 0.03 0.00 0.05 1015 150 29 7 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 265 150 65 37 1.00 0.97 0.94 0.02 0.47 0.11 0.00 0.18 18110 50 17 3 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 7410 50 19 4 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 7810 50 16 12 0.75 0.69 0.63 0.04 4.26 4.25 3.78 0.71 8310 50 23 8 1.00 0.94 0.87 0.06 8.54 3.42 0.00 4.10 12710 50 22 6 0.95 0.95 0.95 0.00 2.03 2.03 2.03 0.00 7010 100 21 5 1.00 0.99 0.95 0.02 0.00 0.00 0.00 0.00 9310 100 12 2 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 7210 100 29 17 0.97 0.97 0.97 0.00 0.00 0.00 0.00 0.00 10210 100 29 10 0.86 0.86 0.86 0.00 0.00 0.00 0.00 0.00 7410 100 20 8 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 6410 150 19 4 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 8210 150 18 3 1.00 0.92 0.83 0.08 0.99 0.40 0.00 0.48 9610 150 27 8 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 4710 150 19 5 0.90 0.90 0.90 0.00 0.00 0.00 0.00 0.00 6010 150 39 17 0.85 0.84 0.79 0.02 0.48 0.28 0.09 0.12 144


Table 11.58. Results for |V| = 75.

T\ \W\ NB time Max Avg Min S.d. Max Avg Min S.d. timee % % % % GD GD GD GD EA

1 75 74 368 0.89 0.86 0.84 0.02 1.17 0.76 0.60 0.21 1281 75 82 804 0.79 0.73 0.67 0.04 3.95 2.42 1.23 0.96 1831 75 80 542 0.94 0.91 0.82 0.03 4.95 4.82 4.72 0.08 1991 75 92 962 0.83 0.79 0.76 0.02 1.74 1.24 0.33 0.52 2741 75 96 629 0.89 0.80 0.71 0.06 2.11 1.73 1.44 0.26 2431 150 68 457 0.88 0.87 0.82 0.02 3.10 2.82 2.68 0.15 1231 150 81 1859 0.69 0.67 0.60 0.03 4.01 2.71 1.70 0.80 2671 150 98 1449 0.87 0.83 0.81 0.02 3.22 2.46 0.76 0.92 5081 150 105 2606 0.69 0.66 0.60 0.03 2.79 2.19 1.71 0.35 3361 150 82 752 0.80 0.72 0.66 0.05 1.84 1.48 1.22 0.27 1711 225 124 7701 0.55 0.51 0.48 0.03 2.42 2.03 1.76 0.24 5261 225 100 1874 0.83 0.78 0.70 0.05 2.93 1.64 0.99 0.71 3051 225 67 292 0.90 0.87 0.85 0.01 1.42 0.71 0.26 0.50 1051 225 103 912 0.83 0.76 0.71 0.04 0.86 0.53 0.31 0.20 2681 225 108 2493 0.76 0.70 0.65 0.04 2.88 1.74 1.24 0.59 3507 75 58 196 0.97 0.91 0.83 0.05 2.52 0.99 0.00 0.85 2587 75 72 164 0.63 0.56 0.47 0.07 4.36 3.38 2.36 0.72 4647 75 91 5027 0.71 0.63 0.55 0.05 3.02 2.15 1.20 0.58 6977 75 60 173 0.85 0.84 0.83 0.01 0.93 0.42 0.19 0.26 2697 75 44 74 1.00 0.93 0.82 0.07 4.86 1.29 0.00 1.82 1907 150 63 144 0.92 0.85 0.79 0.04 2.10 1.56 1.10 0.36 2877 150 40 39 0.58 0.50 0.43 0.05 1.40 1.08 0.66 0.28 837 150 56 427 0.98 0.87 0.82 0.06 2.34 0.76 0.00 0.85 2657 150 26 19 1.00 0.99 0.96 0.02 0.00 0.00 0.00 0.00 607 150 62 546 0.89 0.87 0.85 0.02 0.87 0.54 0.16 0.32 1607 225 45 84 1.00 0.98 0.98 0.01 4.49 1.82 0.00 2.18 1837 225 52 323 0.58 0.56 0.51 0.02 6.31 5.86 5.57 0.25 1347 225 42 93 0.98 0.96 0.93 0.02 1.60 0.68 0.18 0.52 1767 225 64 1611 0.75 0.71 0.63 0.04 2.64 2.28 2.03 0.23 3957 225 77 805 0.78 0.72 0.69 0.03 6.27 4.71 3.83 0.85 56015 75 14 4 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 8515 75 33 56 0.88 0.76 0.70 0.07 0.39 0.30 0.12 0.11 20515 75 29 14 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 14615 75 43 123 0.65 0.65 0.63 0.01 5.11 3.94 2.78 0.75 31915 75 28 21 0.79 0.76 0.75 0.01 1.97 1.91 1.83 0.46 14715 150 19 12 1.00 0.92 0.79 0.09 3.18 1.24 0.00 1.52 11615 150 34 54 0.91 0.88 0.85 0.02 1.99 1.26 0.37 0.72 22115 150 28 59 0.89 0.86 0.82 0.03 0.69 0.40 0.21 0.23 15615 150 25 215 0.92 0.92 0.92 0.00 0.48 0.48 0.48 0.00 14115 150 36 60 0.94 0.92 0.89 0.03 1.64 1.35 0.91 0.24 18215 225 19 6 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 10315 225 25 22 0.88 0.82 0.76 0.04 4.62 3.43 2.93 0.61 21515 225 50 85 0.72 0.68 0.64 0.03 5.77 5.38 4.71 0.36 35415 225 38 56 0.97 0.94 0.89 0.03 4.27 4.01 3.94 0.13 27215 225 39 43 1.00 0.97 0.95 0.02 2.27 0.92 0.00 1.10 149

266 N. Jozefovriez, F. Semet and B-G. Talbi

Table 11.59. Results for |V| = 100.

\T\ \W\ NB time Max Avg Min S.d. Max Avg Min S.d. time

e % % % % GD GD GD GD EA1 100 129 29776 0.62 0.60 0.57 0.02 4.95 3.28 2.15 0.99 6231 100 98 6592 0.59 0.59 0.58 0.01 6.07 5.49 4.75 0.57 4151 100 129 35743 0.60 0.57 0.55 0.02 2.76 2.01 1.49 0.51 7401 100 127 6758 0.78 0.73 0.69 0.04 2.03 1.76 0.90 0.43 4461 100 141 19242 0.63 0.59 0.54 0.04 3.74 3.12 2.44 0.51 5721 200 128 28448 0.73 0.63 0.55 0.06 3.81 2.96 1.69 0.76 7031 200 159 39325 0.54 0.53 0.52 0.01 4.41 3.08 2.29 0.78 8421 200 112 15690 0.83 0.76 0.66 0.07 4.91 3.56 1.99 1.09 5341 200 116 6995 0.78 0.74 0.69 0.04 3.94 2.32 1.12 0.94 4471 200 142 19426 0.61 0.56 0.50 0.04 2.65 2.12 1.81 0.30 8031 300 143 15347 0.62 0.52 0.43 0.08 3.35 2.72 1.96 0.58 14141 300 98 8750 0.79 0.73 0.68 0.04 2.44 1.83 0.73 0.61 5001 300 102 6335 0.82 0.77 0.67 0.06 2.49 1.04 0.31 0.91 3891 300 123 37622 0.76 0.73 0.69 0.02 2.57 2.14 1.94 0.23 7731 300 146 41065 0.56 0.54 0.50 0.02 4.30 2.98 2.01 0.76 112110 100 63 3193 0.73 0.69 0.65 0.03 3.02 2.63 1.42 0.61 41410 100 50 157 0.50 0.56 0.48 0.04 9.34 7.39 6.36 1.10 17610 100 70 410 0.80 0.79 0.77 0.01 4.56 3.04 1.58 0.98 54310 100 38 103 0.97 0.97 0.97 0.00 0.00 0.00 0.00 0.00 15710 100 86 896 0.80 0.78 0.76 0.02 3.15 1.97 0.85 0.83 85610 200 60 495 0.98 0.79 0.62 0.13 4.23 2.02 0.00 1.79 40010 200 61 492 0.84 0.82 0.75 0.03 5.98 2.80 0.48 1.87 44710 200 48 555 0.90 0.77 0.73 0.06 3.37 2.55 0.07 1.25 37510 200 72 2582 0.81 0.76 0.72 0.03 1.95 1.23 0.74 0.40 67810 200 42 556 0.95 0.90 0.85 0.03 5.42 2.12 0.00 2.06 28710 300 60 649 0.95 0.91 0.85 0.04 0.61 0.49 0.43 0.07 41510 300 72 1897 0.88 0.80 0.74 0.05 1.84 1.24 0.55 0.43 63310 300 107 17554 0.63 0.59 0.56 0.02 2.19 1.79 1.44 0.29 108610 300 98 7426 0.63 0.58 0.52 0.04 3.61 3.39 3.12 0.16 108910 300 75 2020 0.84 0.80 0.76 0.03 3.61 0.94 0.21 1.34 71420 100 44 362 0.82 0.75 0.61 0.08 4.16 2.48 1.68 0.90 71020 100 12 4 1.00 1.00 1.00 0.00 0.00 0.00 0.00 0.00 15220 100 48 149 0.60 0.56 0.48 0.05 4.00 2.87 1.43 0.94 70020 100 12 6 0.50 0.50 0.50 0.00 5.90 5.25 4.43 0.62 51920 100 41 157 0.88 0.77 0.61 0.10 3.73 3.08 2.50 0.46 57420 200 16 13 0.94 0.94 0.94 0.00 0.00 0.00 0.00 0.00 135

20 200 49 323 0.67 0.63 0.59 0.03 8.56 8.30 8.11 0.15 63220 200 44 144 0.61 0.59 0.55 0.03 5.20 4.70 4.55 0.25 35520 200 71 1507 0.66 0.62 0.55 0.04 4.12 3.07 2.28 0.62 150020 200 41 400 0.68 0.67 0.66 0.01 6.39 6.36 6.31 0.04 32420 300 38 98 0.79 0.70 0.58 0.09 7.01 2.87 1.15 2.11 83520 300 35 812 0.80 0.75 0.69 0.04 4.97 3.82 2.31 0.95 55720 300 29 390 0.38 0.38 0.38 0.00 13.11 8.40 6.83 2.36 47020 300 54 148 0.59 0.57 0.54 0.02 2.28 1.65 1.13 0.47 52620 300 16 17 0.88 0.88 0.88 0.00 1.03 1.03 1.03 0.00 240


Table 11.60. Results for \V\ = 120.

\T\ \W\ NB time Max Avg Min S.d. Max Avg Min S.d. timee % % % % GD GD GD GD EA

12 120 50 609 0.88 0.80 0.70 0.07 4.94 1.12 0.04 1.92 569

12 120 69 1814 0.75 0.70 0.64 0.05 3.58 2.47 1.95 0.58 489

12 120 19 53 1.00 0.97 0.84 0.06 0.08 0.02 0.00 0.03 98

12 120 47 2072 0.91 0.85 0.81 0.05 3.96 1.98 0.66 1.19 368

12 120 76 24776 0.72 0.64 0.54 0.06 5.12 3.90 2.02 1.03 673

12 240 85 5192 0.69 0.67 0.65 0.02 5.28 4.64 2.94 0.88 1129

12 240 70 777 0.83 0.77 0.73 0.03 3.57 1.16 0.25 1.23 502

12 240 88 11446 0.72 0.64 0.57 0.05 2.73 2.48 2.22 0.20 963

12 240 89 9467 0.75 0.69 0.66 0.03 7.33 3.76 1.65 2.10 1424

12 240 84 32239 0.81 0.72 0.62 0.08 1.75 1.40 0.77 0.36 1149

12 360 99 11154 0.39 0.35 0.31 0.03 4.82 4.42 3.94 0.35 1233

12 360 102 34350 0.55 0.46 0.35 0.06 3.30 2.17 1.19 0.76 1881

12 360 117 8083 0.53 0.49 0.42 0.04 5.59 4.29 2.66 1.06 1746

12 360 82 10009 0.55 0.52 0.49 0.02 2.57 2.79 2.29 1.77 744

12 360 68 13811 0.71 0.65 0.57 0.05 4.44 3.89 3.49 0.32 613

24 120 46 566 0.87 0.83 0.80 0.02 7.87 4.53 0.28 2.78 590

24 120 60 829 0.77 0.72 0.68 0.03 3.71 2.49 2.04 0.62 1135

24 120 9 29 0.11 0.11 0.11 0.00 10.85 10.62 9.71 4.53 182

24 120 38 85 0.89 0.83 0.68 0.08 6.91 4.04 1.48 2.40 365

24 120 23 163 0.91 0.87 0.83 0.03 1.40 0.32 0.05 0.54 229

24 240 35 257 0.86 0.72 0.66 0.07 4.43 2.38 0.91 1.67 437

24 240 57 1498 0.95 0.89 0.74 0.08 3.42 1.59 0.37 1.12 748

24 240 59 12264 0.71 0.62 0.54 0.06 1.55 1.26 0.86 0.30 1038

24 240 48 194 0.71 0.68 0.65 0.03 8.16 7.72 7.39 0.29 634

24 240 64 1354 0.41 0.40 0.39 0.01 7.04 5.70 4.76 0.83 1145

24 360 59 2557 0.61 0.58 0.54 0.02 3.57 3.47 3.35 0.08 1276

24 360 62 839 0.82 0.76 0.68 0.05 1.81 1.43 1.07 0.32 1208

24 360 30 1204 1.00 0.99 0.97 0.02 0.55 0.11 0.00 0.22 407

24 360 48 406 0.77 0.68 0.64 0.05 3.08 1.93 1.33 0.62 744

24 360 51 313 0.88 0.83 0.76 0.05 1.83 0.89 0.38 0.52 572

CHAPTER 12

A COMPUTER ENGINEERING BENCHMARKAPPLICATION FOR MULTIOBJECTIVE OPTIMIZERS

Simon Kiinzli, Stefan Bleuler, Lothar Thiele, and Eckart Zitzler

Department of Information Technology and Electrical EngineeringSwiss Federal Institute of Technology (ETH) Zurich

Gloriastrasse 35, CH-8092 Zurich, SwitzerlandE-mail: {kuenzli,bleuler,thiele,zitzler} @tik.ee.ethz. ch

Among the vaious benchmark problems designed to compare and eval-uate the performance of multiobjective optimizers, there is a lack ofreal-world applications that are commonly accepted and, even more im-portant, are easy to use by different research groups. The main reasonis, in our opinion, the high effort required to re-implement or adapt thecorresponding programs.

This chapter addresses this problem by presenting a demand-ing packet processor application with a platform- and programming-language-independent interface. The text-based interface has two advan-tages: it allows (i) to distribute the application as a binary executablepre-compiled for different platforms, and (ii) to easily couple the appli-cation with arbitrary optimization methods without any modificationson the application side. Furthermore, the design space exploration ap-plication presented here is a complex optimization problem that is rep-resentative for many other computer engineering applications. For thesereasons, it can serve as a computer engineering benchmark applicationfor multiobjective optimizers. The program can be downloaded togetherwith different multiobjective evolutionary algorithms and further bench-mark problems from http://www.tik.ee.ethz.ch/pisa/.

12.1. Introduction

The field of evolutionary multiobjective optimization (EMO) has beengrowing rapidly since the first pioneering works in the mid-1980's and theearly 1990's. Meanwhile numerous methods and algorithmic componentsare available, and accordingly there is a need for representative benchmark

269

270 5. Kunzli, S. Bleuler, L. Thiele and E. Zitzler

problems to compare and evaluate the different techniques.Most test problems that have been suggested in the literature are artifi-

cial and abstract from real-world scenarios. Some authors considered multi-objective extensions of NP-hard problems such as the knapsack problem 27,the set covering problem 13, and the quadratic assignment problem 14.Other benchmark problem examples are the Pseudo-Boolean functions in-troduced by Thierens 23 and Laumanns et al. 15 that were designed mainlyfor theoretical investigations. Most popular, though, are real-valued func-tions 7>6. For instance, several test functions representing different types ofproblem difficulties were proposed by Zitzler et al. 24 and Deb et al. 9.

Although there exists no commonly accepted set of benchmark prob-lems as, e.g., the SPEC benchmarks in computer engineering, most of theaforementioned functions are used by different researchers within the EMOcommunity. The reason is that the corresponding problem formulations aresimple which in turn keeps the implementation effort low. However, thesimplicity and the high abstraction level come along with a loss of informa-tion: various features and characteristics of real-world applications cannotbe captured by these artificial optimization problems. As a consequence,one has to test algorithms also on actual applications in order to obtainmore reliable results. Complex problems in various areas have been tackledusing multiobjective evolutionary algorithms, and many studies even com-pare two or several algorithms on a specific application 7'6. The restrictedreusability, though, has prohibited so far that one or several applicationshave established themselves as benchmark problems that are used by dif-ferent research groups. Re-implementation is usually too labor-intensiveand error-prone, while re-compilation is often not possible because eitherthe source code is not available, e.g., due to intellectual property issues, orparticular software packages are needed that are not publicly available.

To solve this problem, we present a computer engineering application,namely the design space exploration of packet processor architectures, that

• provides a platform- and programming-language-independent in-terface that allows the usage of pre-compiled and executable pro-grams and therefore circumvents the problem mentioned above,

• is scalable in terms of complexity, i.e., problem instances of differentlevels of difficulty are available, and

• is representative for several other applications in the area of com-puter design 3-10.26.

The application will be described in terms of the underlying optimiza-

A Computer Engineering Benchmark Application 271

tion model in Section 12.2, while Section 12.3 focuses on the overall softwarearchitecture and in particular on the interface. Section 12.4 demonstratesthe application of four EMO techniques on several problem instances andcompares their performance on the basis of a recently proposed qualitymeasure 28. The last section summarizes the main results of this chapter.

12.2. Packet Processor Design

Packet processors are high-performance, programmable devices with spe-cial architectural features that are optimized for network packet process-ing6. They are mostly embedded within network routers and switches andare designed to implement complex packet processing tasks at high linespeeds such as routing and forwarding, firewalls, network address transla-tors, means for implementing quality-of-service (QoS) guarantees to differ-ent packet flows, and also pricing mechanisms.

Other examples of packet processors would be media processors whichhave network interfaces. Such processors have audio, video and packet-processing capabilities and serve as a bridge between a network and asource/sink audio/video device. They are used to distribute (real-time)multimedia streams over a packet network like wired or wireless Ether-net. This involves receiving packets from a network, followed by processingin the protocol stack, forwarding to different audio/video devices and ap-plying functions like decryption and decompression of multimedia streams.Similarly, at source end, this involves receiving multimedia streams fromaudio/video devices (e.g. video camera, microphone, stereo systems), prob-ably encrypting, compressing and packetizing them, and finally sendingthem over a network.

Following the above discussion, there are major constraints to satisfyand conflicting goals to optimize in the design of packet processors:

• Delay Constraints: In case of packets belonging to a multimediastream, there is very often a constraint on the maximal time apacket is allowed to stay within the packet processor. This upperdelay must be satisfied under all possible load conditions imposedby other packet streams that are processed simultaneously by thesame device.

• Throughput Maximization: The goal is to maximize the maxi-mum possible throughput of the packet processing device in terms

eIn this area of application, also the term network processor is used.

272 S. Kiinzli, S. Bleuler, L. Thiele and E. Zitzler

of the number of packets per second.• Cost Minimization: One is interested in a design that uses a

small amount of resources, e.g., single processing units, memoryand communication networks.

• Conflicting Usage Scenarios: Usually, a packet processor is usedin several, different systems. For example, one processor will beimplemented within a router, another one is built into a consumerdevice for multimedia processing. The requirements from these dif-ferent applications in terms of throughput and delay are typicallyin conflict to each other.

All of the above constraints and conflicting goals will be taken intoaccount in the benchmark application.

12.2.1. Design Space Exploration

Complex embedded systems like packet processors are often comprised of aheterogeneous combination of different hardware and software componentssuch as CPU cores, dedicated hardware blocks, different kinds of memorymodules and caches, various interconnections and I/O interfaces, run-timeenvironment and drivers, see, e.g., Figure 12.1. They are integrated on asingle chip and they run specialized software to perform the application.

Fig. 12.1. Template of a packet processor architecture as used in the benchmarkapplication

Typically, the analysis questions faced by a designer during a system-level design process are:

• Allocation: Determine the hardware components of the packetprocessor like microprocessors, dedicated hardware blocks for com-putationally intensive application tasks, memory and busses.


• Binding: For each task of the software application choose an allo-cated hardware unit which executes it.

• Scheduling Policy: For the set of tasks that are mapped ontoa specific hardware resource choose a scheduling policy from theavailable run-time environment, e.g., a fixed priority.

Most of the available design methodologies start with an abstract spec-ification of the application and the performance requirements. These speci-fications are used to drive a system-level design space exploration 17, whichiterates between performance evaluation and exploration steps, see alsoThiele et al. 19-20

! and Blickle et al. 3. Finally, appropriate allocations,bindings, and scheduling strategies are identified. The methodology used inthe benchmark application of this paper is shown in Figure 12.2.

Fig. 12.2. Design space exploration methodology used in the benchmark application

One of the major challenges in design exploration is to estimate the es-sential characteristics of the final implementation in an early design stage.In the case of packet processor design, the performance analysis has to copewith two major problems which make any kind of compositional analysisdifficult: (1) The architecture of such systems is highly heterogeneous—the different architectural components have different computing capabili-ties and use different arbitration and resource sharing strategies; (2) thepackets of one or different packet streams interact on the various resources,i.e., if a resource is busy in processing one packet, others have to wait.

274 5. Kiinzli, S. Bleuler, L. Thiele and E. Zitzler

This interaction between packet streams is of a tremendous complexity andinfluences packet delays and memory usage.

There is a large body of work devoted to system-level performance anal-ysis of embedded system architectures, see Gajski et al. 12 and the referencestherein. Currently, the analysis of such heterogeneous systems is mainlybased on simulation. The main advantage of using simulation as a meansfor performance evaluation is that many dynamic and complex interactionsin an architecture can be taken into account, which are otherwise diffi-cult to model analytically. On the other hand, simulation based tools sufferfrom high running times, incomplete coverage, and failure to identify cornercases.

Analytical performance models for DSP systems and embedded pro-cessors were proposed in, e.g., Agarwal 1, or Franklin and Wolf n . Thesemodels may be classified under what can be called a "static analyticalmodel". Here, the computation, communication, and memory resources ofa processor are all described using simple algebraic equations that do nottake into account the dynamics of the application, i.e., variations in re-source loads and shared resources. In contrast to this class of approaches,the models we will use in here may be classified under "dynamic analyticalmodels", where the dynamic behavior of the computation and communi-cation resources (such as the effects of different scheduling or bus arbi-tration schemes) are also modeled, see, e.g., Thiele and co-workers 16>22>4.Applications to stream-processing have been reported in different publica-tions 21.18,19,5_

12.2.2. Basic Models and Methods

According to Figure 12.2, basic prerequisites of the design space explorationare models for the architecture, the application, the run-time scheduling,and the application scenarios. Based on these models, we will describe themethod for performance analysis.

Architecture Template and Allocation Following Figure 12.1, themodel for a packet processor consists of a set of computation units or pro-cessing elements which perform operations on the individual packets. Inorder to simplify the discussion and the benchmark application, we will notmodel the communication between the processing elements, i.e., packets canbe moved from one memory element to the next one without constraints.

Definition 1: We define a set of resources R. To each resource r 6 R


we associate a relative implementation cost cost{r) > 0. The allocation ofresources is described by the function alloc(r) £ {0,1}. To each resource rthere are associated two functions /3"(A) > 0 and fil

r{A) > 0, denoted asupper and lower service curves, respectively.

Initially, we specify all available processing units as our resource set Rand associate the corresponding costs to them. For example we may havethe resources R = {ARM9, MEngine, Classifier, DSP, Cipher, LookUp,CheckSum, PowerPC}. During the allocation step (see Figure 12.2), weselect those which will be in a specific architecture, i.e., if alloc(r) = 1, thenresource r £ R will be implemented in the packet processor architecture.

The upper and lower service curves specify the available computing unitsof a resource r in a relative measure, e.g., processor cycles or instructions.In particular, /?"(A) and /?'(A) are the maximum and minimum number ofavailable processor cycles in any time interval of length A. In other words,the service curves of a resource determine the best case and worst casecomputing capabilities. For details, see, e.g., Thiele et al. 18.

Software Application and Binding The purpose of a packet processoris to simultaneously process several streams of packets. For example, onestream may contain packets that store audio samples and another one con-tains packets from an FTP application. Whereas the different streams maybe processed differently, each packet of a particular stream is processedidentically, i.e., each packet is processed by the same sequence of tasks.

Definition 2: We define a set of streams s £ S and a set of tasks t £ T.To each stream s there is an ordered sequence of tasks V(s) = [to,...,tn]associated. Each packet of the stream is first processed by task t0 £ T, thensuccessively by all other tasks until tn £ T.

As an example we may have five streams 5 = {RTSend, NRTDecrypt,NRTEncrypt, RTRecv, NRTForward}. According to Figure 12.3, the pack-ets of these streams when entering the packet processor undergo differentsequences of tasks, i.e., the packets follow the paths shown. For example, forstream s = NRTForward we have the sequence of tasks V(s) = [LinkRX,VerifyIPHeader, ProcessIPHeader, Classify, RouteLookUp, ... , Schedule,LinkTx].

Definition 3: The mapping relation M C T x R defines all possible bind-ings of tasks, i.e., if (t, r) £ M, then task t could be executed on resource r.This execution of t for one packet would use w(r, t) > 0 computing units of

276 S. Kiinzli, S. Bleuler, L. Thiele and E. Zitder

Fig. 12.3. Task graph of a packet processing application

r. The binding B of tasks to resources B C M is a subset of the mappingsuch that every task t £ T is bound to exactly one allocated resource r £ R,alloc(r) — 1. We also write r = bind(t) in a functional notation.

In a similar way as alloc describes the selection of architectural com-ponents, bind defines a selection of the possible mappings. Both alloc andbind will be encoded using an appropriate representation described later.The 'load' that a task t puts onto its resource r = bind(t) is denoted asw(r,t).

Figure 12.4 represents an example of a mapping between tasks and re-sources. For example, task 'Classify' could be bound to resource 'ARM9'or 'DSP'. In a particular implementation of a packet processor we mayhave 6m<i(Classify) = DSP, i.e., the task 'Classify' is executed on the re-source 'DSP' and the corresponding execution requirement for each packetis w(DSP, Classify) = 2.9. Of course, this is possible only, if the resource isallocated, i.e., a//oc(DSP) = 1.

Run-time Environment and Scheduling According to Figure 12.1,there is a memory associated to each processing element. Within this mem-ory, all packets are stored that need to be processed by the respectiveresource. The run-time environment now has different scheduling policiesavailable that determine which of the waiting packets will be processednext.

Definition 4: To each stream s there is associated an integer priorityprio(s) > 0. There are no streams with equal priority.


Fig. 12.4. Example of a mapping of task to resources

In the benchmark application, we suppose that only preemptive fixed-priority scheduling is available on each resource. To this end, we need toassociate to each stream s a fixed priority prio(s) > 0, i.e., all packets of sreceive this priority. From all packets that wait to be executed in a memory,the run-time environment chooses one for processing that has the highestpriority among all waiting packets. If several packets from one stream arewaiting, then it prefers those that are earlier in the task chain V(s).

Application Scenarios A packet processor will be used in several, pos-sibly conflicting application scenarios. Such a scenario is described by theproperties of the input streams, the allowable end-to-end delay (deadline)for each stream and the available total memory for all packets (sum of allindividual memories of the processing elements).

Definition 5: The properties of each stream s are described by upperand lower arrival curves a™ (A) and a's(A). To each stream s e S there isassociated the maximal total packet memory m(s) > 0 and an end-to-enddeadline d(s) > 0, denoting the maximal time by which any packet of thestream has to be processed by all associated tasks V(s) after his arrival.

The upper and lower arrival curves specify upper and lower bounds onthe number of packets that arrive at the packet processor. In particular,a"(A) and a's(A) are the maximum and minimum number of packets inany time interval of length A. For details, see, e.g., Thiele et al. 21.

Definition 6: The packet processor is evaluated for a set of scenarios


b £ B. The quantities of Definition 5 are defined for each scenario inde-pendently.

In addition, whereas the allocation alloc is defining a particular hard-ware architecture, the quantities that are specific for a software applicationare also specific for each scenario b £ B and must be determined indepen-dently, for example the binding bind of tasks to processing elements andthe stream priorities prio.

Performance Analysis It is not obvious how to determine for any mem-ory module, the maximum number of stored packets in it waiting to beprocessed at any point in time. Neither is it clear how to determine themaximum end-to-end delays experienced by the packets, since all packetflows share common resources. As the packets may flow from one resourceto the next one, there may be intermediate bursts and packet jams, mak-ing the computations of the packet delays and the memory requirementsnon-trivial.

Interestingly, there exists a computationally efficient method to deriveworst-case estimates on the end-to-end delays of packets and the requiredmemory for each computation and communication. In short, we construct ascheduling network and apply the real-time calculus (based on arrival andservice curves) in order to derive the desired bounds. The description ofthis method is beyond the scope of this chapter but can be found in Thieleet al. 21>18-19.

As we know for each scenario the delay and memory in comparisonto the allowed values d(b,s) and m(b,s), we can increase the input trafficuntil the constraints are just about satisfied. In particular, we do not usethe arrival curves aYb ^ and aL * directly in the scheduling network, butlinearly scaled amounts ipi> • a^b , and ipt, • aL s~., where the scaling factoript, is different for each scenario. Now, binary search is applied to determinethe maximal throughput such that the constraints on delay and memoryare just about satisfied.

For the following discussion, it is sufficient to state the following fact:

• Given the specification of a packet processing design problem bythe set of resources r € R, the cost function for each resourcecost(r), the service curves /?" and /?£, a set of streams s E S, a setof application tasks t 6 T, the ordered sequence of tasks for eachstream V(s), and the computing requirement w(r,t) for task t onresource r;


• given a set of application scenarios b e B with associated arrivalcurves for each stream a?b s and al,b SN , and a maximum delay andmemory for each stream d(b, s) and m(b, s);

• given a specific HW/SW architecture denned by the allocation ofhardware resources alloc(r), for each scenario b a specific priorityof each stream prio(b, s) and a specific binding bind(b, t) of tasks tto resources;

• then we can determine — using the concepts of scheduling network,real-time calculus and binary search — the maximal scaling factortpb such that under the input arrival curves ^ • aYb > and tpf, •aL SN the maximal delay of each packet and the maximal numberof stored packets is not larger than d(b, s) and m(b, s), respectively.

As a result, we can define the criteria for the optimization of packetprocessors.

Definition 7: The quality measures for packet processors are the asso-ciated cost cost = J2rl-Ralloc(r)cost(r) and the throughput ipt, for eachscenario b £ B. These quantities can be computed from the specification ofa HW/SW architecture, i.e., alloc{r), prio(b, s) and bind(b, t) for all streamss £ S and tasks t £ T.

Now, the benchmark application is defined formally in terms of an op-timization problem. In the following, we will describe the two aspects, rep-resentation and variation operators, that are specific to the evolutionaryalgorithm implementation.

Representation Following Figure 12.2 and Definition 7, a specificHW/SW architecture is denned by alloc(r), prio(b,s) and bind(b,t) forall resources r £ R, streams s £ S and tasks t £ T. For the representationof architectures, we number the available resources from 1 to \R\; the tasksare numbered from 1 to \T\, and each stream is assigned a number between1 and \S\. The allocation of resources can then be represented as integervector A £ {0, l} ' f l ' , where A[i] = 1 denotes, that resource i is allocated. Torepresent the binding of tasks on resources, we use a two-dimensional vectorZ £ {1 , . . . , |i?|}lBlxlTl, where for all scenarios b £ B it is stored which taskis bound to which resource. Z[«][j] = k means that in scenario i task j isbound to resource k. Priorities of flows are represented as a two-dimensionalvector P £ { l , . . . , | 5 | } | B | x | l S | , where we store the streams according to theirpriorities, e.g., P[i][j] = A; means that in scenario i, stream k has priority j ,with 1 being the highest priority. Obviously, not all possible encodings A,


Z, P represent feasible architectures. Therefore, a repair method has beendeveloped that converts infeasible solutions into feasible ones.

Recombination The first step in recombining two individuals is creatingexact copies of the parent individuals. With probability 1 - Pcross, theseindividuals are returned as offspring and no recombination takes place.Otherwise, crossing over is performed on either the allocation, the taskbinding or the priority assignment of flows.

With probability Pcross-aiioc, a one-point crossover operation is appliedto the allocation vectors Ai and A2 of the parents: First we randomly definethe position j where to perform the crossover, then we create the allocationvector Anewi for the first offspring as follows:

^nei«l[i] = Ai[i], if 1 < i < j

A n e w i \ i ] = A 2 \ i ] , H j < i < \ R \

Similarly, Anew2 is created. After this exchange in the allocation of re-sources, the repair method is called, to ensure, that for all tasks there is atleast one resource allocated, on which the task can be performed.

If the crossover is not done within the allocation vector, it is performedwith probability PcrOss-bind within the binding of tasks to resources. Indetail, a scenario b 6 B is randomly determined, for which the crossoverof the binding vectors should happen. Then, a one point crossover for thebinding vectors Z\\b\ and Z2[b] of the parents according to the followingprocedure is performed, where j is a random value in the interval [1, \T\].

Znewi[b][i\ = Z1[b]\i],itl<i<j

ZneWi[b]\i] = Z2[b)\i], if 3 <i<\T\

The binding Znew2 can be determined accordingly.Finally, if the crossover is neither in the allocation nor in the binding

of tasks to resources, the crossover happens in the priority vector. For arandomly selected scenario b, the priority vectors -Pi[6] and P2[b] are crossedin one point to produce new priority vectors Pnew\ and PneW2 following asimilar procedure as described above.

Mutation First, an exact copy of the individual to be mutated is created.With probability 1—Pmut,

n 0 mutation takes place and the copy is returned.Otherwise, the copy is modified with respect to either the allocation, thetask binding, or the priority assignment.

We mutate the allocation vector with probability Pmut-aiioc- To thisend, we randomly select a resource i and set Aneu,[i] = 0 with probability


Pmut-aiioc-zero, otherwise we set .Aneu,[i] = 1- After this change in the allo-cation vector, the repair method is called, which changes infeasible bindingssuch that they all map tasks to allocated resources only.

In case the mutation does not affect the allocation vector, with prob-ability Pmut-bind we mutate the binding vector Znew[b] for a randomlydetermined scenario b e B. That is we randomly select a task and map itto a resource randomly selected from the specification. If the resource isnot yet allocated in this solution, we additionally allocate it.

If we do neither mutate the allocation nor the binding, we mutate thepriority vector for a randomly selected scenario b. We just exchange twoflows within the priority list Pnew [b].

12.3. Software Architecture

So far, we have discussed evaluation, representation and variation for theproposed benchmark application. In the following, we will discuss how thesecomponents are combined with the fitness assignment and selection compo-nents. The overall software architecture is shown in Figure 12.5. It depictshow the application itself is separated from the multiobjective optimizer viaa text-based interface. In this context, the two main questions are: (i) whichelements of the optimization process should be part of the implementationof the benchmark application and (ii) how to establish the communicationbetween the two parts?

12.3.1. General Considerations

Essentially, most proposed multiobjective optimizers differ only in theirselection operators: how promising individuals are selected for variationand how it is decided which individuals are removed from the population.Accordingly, most studies comparing different optimizers keep the repre-sentation of individuals and the variation operators fixed in order to as-sess the performance of the selection operators, which form the problem-independent part of the optimization process 7>6.

Consistently with this approach, the packet processor benchmark mod-ule consists of the individual handling including their representation andthe objective function evaluation, as well as the variation operators (seeFigure 12.2 and its extension in Figure 12.5).

The division of the optimization process into two parts raises the prob-lem of communication between the benchmark application and the opti-mizer. Several options can be considered: restricting oneself to a specific


APPLICATION (VARIATION) OPTIMIZER (SELECTION)

Fig. 12.5. Overview of the separation between benchmark and optimizer

programming language that is available on many platforms, e.g., C or Java,would allow to provide the modules as library functions. However, couplingtwo modules which are written in different programming languages wouldthen be difficult, and it would in any case be necessary to re-compile orat least re-link the program for each optimizer. Alternatively, special com-munication mechanisms like UNIX sockets, which are independent of theprogramming language, could be used. The drawback is, though, that thesemechanisms are not supported on all platforms.

We have therefore decided to implement the benchmark and the opti-mizer as separate programs which communicate through text files. The useof text files guarantees that any optimizer can be coupled to the benchmarkeven if the two programs are written in different programming languagesand run on different machines with different operating systems as long asboth have access to a common file system. This flexibility does certainly notcome for free. There is an additional overhead in running time; however, it


is minimal and can be neglected in the case of real world applications as aseries of tests have shown 2.

The interface developed for this benchmark application has been pro-posed and described in detail in Bleuler et al. 2. This concept named PISAf

is applicable in a much wider range of scenarios since the chosen separationbetween selection and variation is suitable for most evolutionary multiob-jective optimizers as well as for many other stochastic search algorithms.Additionally, the interface definition provides means for extensions to ad-just it to specific needs. In the following we describe the main characteristicsof the interface and especially its communication protocol.

12.3.2. Interface Description

As mentioned above the two parts shown in Figure 12.5 are implemented asseparate programs. Since the two programs run as independent processes,they need a method for synchronization. The procedure is based on a handshake protocol which can be described using two state machines (see Figure12.6). In general only one process is active at one time. When it reaches anew state it writes this state, encoded as a number, to a text file. Duringthat time the other process has been polling this state file and now becomes

Fig. 12.6. Handshake protocol: The two processes can be modeled as finite state ma-chines using the state file to synchronize. The data files are not shown

fPISA stands for 'Platform- and programming-language-independent Interface for SearchAlgorithms'


active, while the first process starts polling the state file. Specifically itworks as follows.

Both programs independently perform some initialization, e.g., readingof parameter files. During this step, the benchmark application generatesthe initial population and evaluates the individuals. It then writes the IDsof all individuals and their objective values to a file and changes the statenumber. As soon as the optimizer is done with its initialization, it startspolling the state file ('wait' state in Figure 12.6). When the state number ischanged by the benchmark application, the optimizer reads the data file andselects promising parents for variation (in 'select'). Their IDs are writtento another text file. The optimizer can maintain a pool of IDs it mightconsider again for selection in future iterations. The list of these archivedIDs is also written to a text file. Then the state number is changed back. Thebenchmark program, which had been polling the state file, now (in 'variate')reads the list of archived IDs and deletes all individuals which are not onthis list. Then the benchmark reads the IDs of the parents and producesoffspring by variation of the respective individuals. They are evaluated, IDsand objective values are written to the text file, and the cycle can startagain.

Since the optimizer only operates in the objective space, it is not nec-essary to communicate the actual representation of the individuals to theoptimizer. The amount of data exchanged is thus small. For the exact spec-ification of the protocol and the format of the text files see Bleuler et al. 2.

12.4. Test Cases

The packet processor benchmark application has been implemented in Java.The corresponding tool EXPO, which has been tested under Solaris, Linuxand Microsoft Windows, provides a graphical user interface that allows tocontrol the program execution (c.f. Figure 12.7): the optimization run canbe halted, the current population can be plotted and individual processordesigns can be inspected graphically.

In the following, we will present three problem instances for the packetprocessor application and demonstrate how to compare different evolution-ary multiobjective optimizers on these instances.

12.4.1. Problem Instances

We consider three problem instances that are based on the packet processingapplication depicted in Figure 12.3. The set of available resources is the


Fig. 12.7. The user interface of the benchmark application: The main control windowin the upper left, a plot the current population in the upper right and a graphicalrepresentation of a network processor configuration in the lower part

same for all the problem instances. The three problem instances differ inthe number of objectives. We have defined a problem with 2 objectives, onewith 3 and a scenario including 4 objectives. For all the different instancesone objective is in common, the total cost of the allocated resources. Theremaining objectives in a problem instance are the performance ^b of thesolution packet processor under a given load scenario b £ B. In Table 12.61the different load characteristics for the remaining objectives are shown.Overall, three different loads can be identified; in Load 1 all flows have tobe processed; in Load 2 there are only the three flows real-time voice receiveand send, and non-real-time (NRT) packet forwarding present; in Load 3,the packet processor has to forward packets of flow 'NRT forward' andencrypt/decrypt packets of flows 'NRT encryption' and 'NRT decryption',respectively.

The size of the search space for the given instance can be computed


Table 12.61. Loads for the different scenarios for which the architectureshould be optimized.

Load I RT RT NRT NRT NRTscenario send receive encrypt decrypt forward2 Objectives Load 1 y/ y/ y/ y/ y/3 Objectives Load 2 \1 \1 - - \7

Load 3 - y/_ y/_ yj4 Objectives Load 1 y/ y/ -J y/ y/

Load 2 y/ y/ - - y/Load 3 - - y/ y/ y/

as follows. In the problem setting, there are 4 resource types on which allthe tasks can be performed. Therefore, we have more than 425 possibilitiesto map the tasks on the resources. Furthermore, the solution contains apriority assignment to the different flows. There are 5! possibilities to assignpriorities to the flows. So, if we take into account that there are otherspecialized resources available, the size of the search space is S > 425 x5! > 1017 already for the problem instance with 2 objectives and evenlarger for the instances with 3 or 4 objectives.

As an example, an approximated Pareto front for the 3-objective in-stance is shown in Figure 12.8—the front has been generated by the op-timizer SPEA2 2S. The x-axis shows the objective value corresponding to*Load2 under Load 2 (as defined in Table 12.61), the y-axis shows the objec-tive value corresponding to \PLoad?,, whereas the z-axis shows the normalizedtotal cost of the allocated resources.

The two example architectures shown in Figure 12.8 differ only in theallocation of the resource 'Cipher', which is a specialized hardware for en-cryption and decryption of packets. The performance of the two architec-tures for the load scenario with real-time flows to be processed is more orless the same. However, the architecture with a cipher unit performs around30 times better for the encryption/decryption scenario, at increased cost forthe cipher unit. So, a designer of a packet processor that should have thecapability of encryption/decryption would go for the solution with a cipherunit (solution on the left in Figure 12.8), whereas one would decide for thecheaper solution on the right, if there is no need for encryption.

12.4.2. Simulation Results

To evaluate the difficulty of the proposed benchmark application, wecompared the performance of four evolutionary multiobjective optimizers,namely SPEA2 25, NSGA-II 8, SEMO 15 and FEMO 15, on the three afore-


Fig. 12.8. Two solution packet processor architectures annotated with loads on re-sources for the different loads specified in Table 12.61

mentioned problem instances.For each algorithm, 10 runs were performed using the parameter settings

listed in Tables 12.62 and 12.63; these parameters were determined basedon extensive, preliminary simulations. Furthermore, all objective functionswere scaled such that the corresponding values lie within the interval [0,1].Note that all objectives are to be minimized, i.e., the performance values arereversed (smaller values correspond to better performance). The differentruns were carried out on a Sun Ultra 60. A single run for 3 objectives, apopulation size of 150 individuals in conjunction with SPEA2 takes about


Table 12.62. Parameters for population size and durationof runs dependent on the number of objectives.

# of objectives population size # of generations2 100 2003 150 3004 200 400

Table 12.63. Probabilities for mutation and crossover (cf. Sec-tion 12.2)

Mutation Pmut = 0.8-> Allocation Pmut-alloc = 0.3

*Tnut — alloc — zero — \J.O

-> Binding Pmut-bind = 0-5Crossover Pcross = 0.5-> Allocation Pcross-alloc = 0.3-> Binding | PCross-bind = 0-5

20 minutes to complete.In the following we have used two binary performance measures for the

comparison of the EMO techniques: (1) the additive e-quality measure 28,and (2) the coverage measure 27. The e-quality measure Ie+(A,B) returnsthe maximum value d, which can be subtracted from all objective valuesfor all points in the set of solutions A, such that the solutions in the shiftedset A1 equal or dominate any solution in set B in terms of the objectivevalues. If the value is negative, the solution set A entirely dominates thesolution set B. Formally, this measure can be stated as follows:

L+ (A, B) = max { min I max {a,i - bA > > ,y b€B {a€A {0<i<dim J J

where dim denotes the problem dimension. Figure 12.9 shows a graphicalinterpretation of the additive e-quality measure.

The coverage measure C(A, B), which is used as an additional referencehere, returns the percentage of solutions in B which are dominated by orequal to at least one solution in A.

Due to space restrictions, not all results can be presented here; instead,we will focus on the main ones. Overall, we can state that there exist differ-ences in the performance of different evolutionary algorithms on the packetprocessor design space exploration benchmark problem. From the resultsof the benchmark study, we can see that the SPEA2 and NSGA-II performcomparably well. For the problem with 2 objectives, NSGA-II performsslightly better than SPEA2, whereas SPEA2 performs better for 4 objec-


Fig. 12.9. Illustration of the additive e-quality measure Ie+, here IS+(A, B) = d whered < 0 as A entirely dominates B

tives with respect to the chosen performance measures; this confirms resultspresented in a study by Zitzler, Laumanns, and Thiele 25. The distributionsof the coverage and the additive e-quality measure values between SPEA2and NSGA-II are depicted in Figure 12.10. In both the cases, for 2 and 4objectives, NSGA-II achieves a better value for coverage over SPEA2, butfor more objectives, SPEA2 finds solutions that lead to smaller values forIe+ than NSGA-II.

Furthermore, we can see that FEMO, a simple evolutionary optimizerwith a fair selection strategy, performs worse than the other algorithmsfor all the test cases (see Figure 12.11 for details) — again, with respectto the two performance measures under consideration. Note that becauseof the implementation of selection in FEMO, it is possible that a solutionis selected for reproduction multiple times in sequence. This behavior candecrease the impact of recombination in the search process with FEMO.With respect to both coverage and especially the additive e-quality measure,SPEA2 is superior to FEMO. SEMO, in contrast, performs similarly toFEMO for the case with 2 objectives; however, SEMO shows improvedperformance with increasing number of objectives.

12.5. Summary

This chapter presented EXPO, a computer engineering application thataddresses the design space exploration of packet processor architectures.The underlying optimization problem is complex and involves allocating


Fig. 12.10. Comparison of SPEA2 and NSGA-II for 2 and 4 objectives

resources, binding tasks to resources, and determining schedules for the us-age scenario under consideration. The goal is to minimize the cost of theallocated resources and to maximize the estimated performance of the cor-responding packet processor architecture for each distinct usage scenario.Especially the last aspect, performance estimation, is highly involved andmakes the use of black-box optimization methods necessary. As shown in theprevious section, the application reveals performance differences between


Fig. 12.11. Comparison of SPEA2 and FEMO for 2 and 4 objectives

four selected multiobjective optimizers with several problem instances. Thissuggests that EXPO is well suited as a multiobjective benchmark applica-tion.

Moreover, the EXPO implementation provides a text-based interfacethat follows the PISA specification 2. PISA stands for 'platform- andprogramming-language-independent interface for search algorithms' andallows to implement application-specific parts (representation, variation,


objective function calculation) separately from the actual search strategy(fitness assignment, selection). Therefore, EXPO can be downloaded as aready-to-use package, i.e., pre-compiled for different platforms; no modifi-cations are necessary to combine it with arbitrary search algorithms. In ad-dition, several multiobjective evolutionary algorithms including SPEA2 25

and NSGA-II 8 as well as other well-known benchmark problems such asthe knapsack problem and a set of continuous test functions 24 '9 are avail-able for download at the PISA website http://www.tik.ee.ethz.ch/pisa/. AllPISA compliant components, benchmarks and search algorithms, can bearbitrarily combined without further implementation effort, and thereforethis interface may be attractive for other researchers who would like toprovide their algorithms and applications to the community.

Acknowledgments

This work has been supported by the Swiss Innovation Promotion Agency(KTI/CTI) under project number KTI 5500.2 and the SEP program atETH Zurich under the poly project TH-8/02-2.

References

1. A. Agarwal. Performance tradeoffs in multithreaded processors. IEEE Trans-actions on Parallel and Distributed Systems, 3(5):525—539, September 1992.

2. S. Bleuler, M. Laumanns, L. Thiele, and E. Zitzler. PISA — a platformand programming language independent interface for search algorithms. InC. M. Fonseca, P. J. Fleming, E. Zitzler, K. Deb, and L. Thiele, editors,Evolutionary Multi- Criterion Optimization (EMO 2003), Lecture Notes inComputer Science, pages 494-508, Berlin, 2003. Springer.

3. T. Blickle, J. Teich, and L. Thiele. System-level synthesis using evolutionaryalgorithms. Journal on Design Automation for Embedded Systems, 3(8):23-58, 1998.

4. S. Chakraborty, S. Kiinzli, and L. Thiele. A general framework for analysingsystem properties in platform-based embedded system designs. In Proc. 6thDesign, Automation and Test in Europe (DATE), Munich, Germany, March2003.

5. S. Chakraborty, S. Kiinzli, L. Thiele, A. Herkersdorf, and P. Sagmeister.Performance evaluation of network processor architectures: Combining simu-lation with analytical estimation. Computer Networks, 41(5):641-665, April2003.

6. C. A. Coello Coello, D. A. Van Veldhuizen, and G. B. Lamont. EvolutionaryAlgorithms for Solving Multi-Objective Problems. Kluwer Academic Publish-ers, New York, 2002.

7. K. Deb. Multi-objective optimization using evolutionary algorithms. Wiley,Chichester, UK, 2001.


8. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A Fast and Elitist Multi-objective Genetic Algorithm: NSGA-II. IEEE Transactions on EvolutionaryComputation, 6(2):182-197, April 2002.

9. K. Deb, L. Thiele, M. Laumanns, and E. Zitzler. Scalable multi-objective op-timization test problems. In Congress on Evolutionary Computation (CEC),pages 825-830. IEEE Press, 2002.

10. R. P. Dick and N. K. Jha. MOGAC: A Multiobjective Genetic Algorithm forHardware-Software Co-synthesis of Hierarchical Heterogeneous DistributedEmbedded Systems. IEEE Transactions on Computer-Aided Design of Inte-grated Circuits and Systems, 17(10):920-935, 1998.

11. M. Franklin and T. Wolf. A network processor performance and de-sign model with benchmark parameterization. In P. Crowley, M. Franklin,H. Hadimioglu, and P. Onufryk, editors, Network Processor Design: Issuesand Practices, Volume 1, chapter 6, pages 117-140. Morgan Kaufmann Pub-lishers, 2003.

12. D. D. Gajski, F. Vahid, S. Narayan, and J. Gong. Specification and Designof Embedded Systems. Prentice Hall, Englewood Cliffs, N.J., 1994.

13. A. Jaszkiewicz. Do multiple-objective metaheuristics deliver on theirpromises? a computational experiment on the set-covering problem. IEEETransactions on Evolutionary Computation, 7(2): 133-143, 2003.

14. J. Knowles and D. Corne. Instance generators and test suites for the mul-tiobjective quadratic assignment problem. In C. M. Fonseca, P. J. Fleming,E. Zitzler, K. Deb, and L. Thiele, editors, Evolutionary Multi-Criterion Op-timization (EMO 2003), Lecture Notes in Computer Science, pages 295-310,Berlin, 2003. Springer.

15. M. Laumanns, L. Thiele, and E. Zitzler. Running time analysis of multiob-jective evolutionary algorithms on pseudo-boolean functions. IEEE Transac-tions on Evolutionary Computation, 2004. Accepted for publication.

16. M. Naedele, L. Thiele, and M. Eisenring. Characterising variable task releasesand processor capacities. In 14th IFAC World Congress 1999, pages 251-256,Beijing, July 1999.

17. A. Pimentel, P. Lieverse, P. van der Wolf, L. Hertzberger, and E. Deprettere.Exploring embedded-systems architectures with Artemis. IEEE Computer,34(ll):57-63, November 2001.

18. L. Thiele, S. Chakraborty, M. Gries, and S. Kiinzli. Design space explorationof network processor architectures. In First Workshop on Network Processorsat the 8th International Symposium on High-Performance Computer Archi-tecture (HPCA8), pages 30-41, Cambridge MA, USA, February 2002.

19. L. Thiele, S. Chakraborty, M. Gries, and S. Kiinzli. A framework for evaluat-ing design tradeoffs in packet processing architectures. In Proc. 39th DesignAutomation Conference (DAC), pages 880-885, New Orleans, LA, June 2002.ACM Press.

20. L. Thiele, S. Chakraborty, M. Gries, and S. Kiinzli. Design space explorationof network processor architectures. In Network Processor Design: Issues andPractices, volume 1, chapter 4, pages 55-90. Morgan Kaufmann Publishers,2003.


21. L. Thiele, S. Chakraborty, M. Gries, A. Maxiaguine, and J. Greutert. Em-bedded software in network processors - models and algorithms. In Proc. 1stWorkshop on Embedded Software (EMSOFT), Lecture Notes in ComputerScience 2211, pages 416-434, Lake Tahoe, CA, USA, 2001. Springer Verlag.

22. L. Thiele, S. Chakraborty, and M. Naedele. Real-time calculus for schedulinghard real-time systems. In Proc. IEEE International Symposium on Circuitsand Systems (ISCAS), volume 4, pages 101-104, 2000.

23. D. Thierens. Convergence time analysis for the multi-objective counting onesproblem. In C. M. Fonseca, P. J. Fleming, E. Zitzler, K. Deb, and L. Thiele,editors, Evolutionary Multi-Criterion Optimization (EMO 2003), LectureNotes in Computer Science, pages 355-364, Berlin, 2003. Springer.

24. E. Zitzler, K. Deb, and L. Thiele. Comparison of multiobjective evolutionaryalgorithms: Empirical results. Evolutionary Computation, 8(2):173-195, 2000.

25. E. Zitzler, M. Laumanns, and L. Thiele. SPEA2: Improving the StrengthPareto Evolutionary Algorithm for Multiobjective Optimization. In K. Gi-annakoglou et al., editors, Evolutionary Methods for Design, Optimisationand Control with Application to Industrial Problems (EUROGEN 2001),pages 95-100. International Center for Numerical Methods in Engineering(CIMNE), 2002.

26. E. Zitzler, J. Teich, and S. S. Bhattacharyya. Multidimensional explorationof software implementations for DSP algorithms. Journal of VLSI SignalProcessing, 24(l):83-98, February 2000.

27. E. Zitzler and L. Thiele. Multiobjective Evolutionary Algorithms: A Com-parative Case Study and the Strength Pareto Approach. IEEE Transactionson Evolutionary Computation, 3(4):257-271, 1999.

28. E. Zitzler, L. Thiele, M. Laumanns, C. M. Foneseca, and V. G. da Fonseca.Performance assessment of multiobjective optimizers: An analysis and review.IEEE Transactions on Evolutionary Computation, 7(2):117-132, 2003.

CHAPTER 13

MULTIOBJECTIVE AERODYNAMIC DESIGN ANDVISUALIZATION OF SUPERSONIC WINGS BY USING

ADAPTIVE RANGE MULTIOBJECTIVE GENETICALGORITHMS

Shigeru Obayashi and Daisuke Sasaki

Institute of Fluid Science, Tohoku University2-1-1, Katahira, Sendai, 980-8577 JAPAN

E-mail: [email protected], [email protected]

This paper describes an application of Adaptive Range MultiobjectiveGenetic Algorithms (ARMOGAs) to aerodynamic wing optimization.ARMOGAs are an extension of MOGAs with the range adaptation. Theobjectives are to minimize transonic and supersonic drag coefficients, aswell as the bending and twisting moments of the wings for the super-sonic airplane. A total of 72 design variables are employed to describewing geometry in terms of wing's planform, thickness distribution, andwarp shape. Four-objective optimization successfully produced 766 non-dominated solutions. These solutions are first compared with the non-dominated wings obtained by the three-objective optimization and awing designed by the National Aerospace Laboratory (NAL). To ana-lyze the present non-dominated solutions further, Self-Organizing Maps(SOMs) have been used to visualize tradeoffs among objective functionvalues. The design variables are also mapped onto a separate SOM. Theresulting SOM generates clusters of design variables, which indicate rolesof the design variables for design improvements and tradeoffs. These pro-cesses can be considered as data miningof the engineering design.

13.1. Introduction

Multiobjective Evolutionary Algorithms (MOEAs) are getting popular inmany fields because they provide a unique opportunity to address globaltradeoffs between multiple objectives by sampling a number of Pareto solu-tions. In addition to perform the multiobjective optimization, it is gettingmore important to analyze the tradeoffs. To understand tradeoffs, visual-ization is essential. Although it is trivial to understand tradeoffs betweentwo objectives, tradeoff analysis in more than three dimensions is not triv-

295

296 5. Obayashi and D. Sasaki

ial as shown in Fig. 1. To visualize higher dimensions, the Self-OrganizingMap (SOM) by Kohonen1'2 is applied herein to Pareto solutions obtainedby the multiobjective design optimization.

In this paper, the design target is the wing for Supersonic Trans-port (SST). Many research activities have been performed for SSTworldwide.3"14 In Japan, the National Aerospace Laboratory (NAL) con-ducted the scaled supersonic experimental airplane project.5"9 For a newSST design, there exist many technical difficulties to overcome. Lift-to-dragratio must be improved, and the sonic boom should be minimized. However,there is a severe tradeoff between reducing the drag and boom. As a result,a new SST is expected to cruise at a supersonic speed only over the sea andto cruise at a transonic speed over the ground. This means the importantdesign objectives are not only to improve a supersonic cruise performancebut also to improve a transonic one. For example, a large sweep angle canreduce the wave drag, but it limits the low-speed aerodynamic performance.Therefore, there are many tradeoffs to be addressed in designing a SST.

The multipoint aerodynamic optimization of a wing shape for SST atboth supersonic and transonic cruise conditions was conducted by using theAdaptive Range Multiobjective Genetic Algorithm (ARMOGA).13 Bothaerodynamic drags were to be minimized under lift constraints, and thebending and pitching moments of the wing were also minimized instead ofimposing constraints on structure and stability. A high fidelity Computa-tional Fluid Dynamics (CFD) code, a Navier-Stokes code, was used to eval-uate the wing performance at both conditions. In this design optimization,planform shapes, camber, thickness distributions and twist distributionswere parameterized in total of 72 design variables. To alleviate the requiredcomputational time, parallel computing was performed for function evalu-ations. The resulting 766 non-dominated solutions are analyzed to revealtradeoffs in this paper. In addition, SOM is also used to understand thefour-dimensional objective functions and 72-dimensional design variables.

SOM is one of neural network models. SOM algorithm is based on unsu-pervised, competitive learning. It provides a topology preserving mappingfrom the high dimensional space to map units. Map units, or neurons, usu-ally form a two-dimensional lattice and thus SOM is a mapping from thehigh dimensions onto the two dimensions. The topology preserving mappingmeans that nearby points in the input space are mapped to nearby units inSOM. SOM can thus serve as a cluster analyzing tool for high-dimensionaldata. The cluster analysis of the objective function values will help to iden-tify design tradeoffs. Regarding four design objectives as a codebook vector,

Multiobjective Aerodynamic Design and Visualization of Supersonic Wings 297

SOM is first applied to visualize the design tradeoffs.Design is a process to find a point in the design variable space that

matches with the given point in the objective function space. This is, how-ever, very difficult. For example, the design variable spaces considered herehave 72 dimensions. One way of overcoming high dimensionality is to groupsome of design variables together. To do so, the cluster analysis based onSOM can be applied again.

Extracting a specific design variable from non-dominated solutions toform a codebook vector, the design variable space can be mapped ontoanother SOM. The resulting SOM generates clusters of design variables.Design variables in such a cluster behave similar to each other and thusa typical design variable in the cluster indicates the behavior/role of thecluster. A designer may extract design information from this cluster anal-ysis. These processes can be considered as data mining for the engineeringdesign.

Fig. 13.1. Visualization of Pareto front.

13.2. Adaptive Range Multiobjective Genetic Algorithms

Genetic Algorithms (GAs) search from multiple points in the design spacesimultaneously and stochastically, instead of moving from a single pointdeterministically like gradient-based methods. This feature prevents designcandidates from settling in a local optimum. Moreover, GAs do not requirecomputing gradients of the objective function. These characteristics lead

298 S. Obayashi and D. Sasaki

to the following advantages of GAs coupled with CFD: 1, GAs have thecapability of finding global optimal solutions. 2, GAs can be processed inparallel. 3, high fidelity CFD codes can easily be adapted to GAs withoutany modification. 4, GAs are not sensitive to any noise that might be presentin the results. 5, GAs are less prone to premature failure.

GAs have been extended to solve multiobjective prob-lems successfully.18-19 GAs use a population to seek optimal solutions inparallel. This feature can be extended to seek Pareto solutions in parallelwithout specifying weights between the objective functions. The resultantPareto solutions represent global tradeoffs.

As high-fidelity CFD solvers need a large computational time, an effi-cient MOEA is required for the aerodynamic optimization. ARMOGA isdeveloped for this purpose.

In the traditional binary coding, a large string length is necessary for realparameter problems, which may result in a slow convergence to a global op-timum. Adaptive Range Genetic Algorithm (ARGA), which was originallyproposed by Arakawa and Hagiwara, is a quite unique approach to solvesuch problems efficiently.15'16 Oyama developed real-coded ARGA and ap-plied them to the transonic wing optimization.17

ARMOGA has been developed based on ARGA to deal with multiplePareto solutions for multi-objective optimization. The main difference be-tween ARMOGA and a conventional Multi-Objective Genetic Algorithm(MOGA) is the introduction of the range adaptation. The flowchart of AR-MOGA is shown in Fig. 2. The population is reinitialized at every M gen-erations for the range adaptation so that the population advances towardpromising regions.

The basis of ARMOGA is the same as ARGA, but a straightforwardextension may cause a problem in the diversity of the population. To betterpreserve the diversity of solution candidates, the normal distribution for en-coding is changed. Figure 3 shows the search range with the distribution ofthe probability. Plateau regions are defined by the design ranges of selectedsolutions. Then the normal distribution is considered at both sides of theplateau.

The advantages of ARMOGA are following: It is possible to obtainPareto solutions efficiently because of the concentrated search of the prob-able design space. It also produces diversified solutions. On the other hand,it may be difficult to avoid the local minima, if global solutions are notincluded in the present search region.

The genetic operators adopted in ARMOGA is based on MOGAs.20


Selection is based on the Pareto ranking method and fitness sharing.21

Each individual is assigned a rank according to the number of individualsthat dominate it. A standard fitness sharing function is used to maintain thediversity of the population. To find the Pareto solutions more effectively, theso-called best-N selection21 is also adopted. Blended crossover (BLX-Q)2 2

described below is adopted. This operator generates children on a segmentdefined by two parents and a user specified parameter . The disturbance isadded to new design variables at a mutation rate of 20%. If the mutationoccurs, new design variables are obtained as

Childl =7 x Parentl + (1 - 7) x Parent2 + m x (ran2 - 0.5)

Child2 =(1 - 7) x Parentl + 7 x Parent2 + m x {rani - 0.5) (1)

7 = (1 + 2a) x rani - a

where a = 0.5 and Childl,2 and Parentl,2 denote encoded design variablesof the children (members of the new population) and parents (a matedpair of the old generation), respectively. The random numbers ranl-3 areuniform random number in [0,1] and m is set to 10% of the given range ofeach design variable.

Fig. 13.2. Flowchart of ARMOGA.


Fig. 13.3. Sketch of range adaptation.

13.3. Multiobjective Aerodynamic Optimization

13.3.1. Furmulation of Optimization

Four objective functions used here are(i) Drag coefficient at transonic cruise, Co,t(ii) Drag coefficient at supersonic cruise, CD,S(iii) Bending moment at the wing root at supersonic cruise condition, MB(iv) Pitching moment at supersonic cruise condition, MpIn the present optimization, these objective functions are to be minimized.The transonic drag minimization corresponds to the cruise over land; thesupersonic drag minimization corresponds to the cruise over sea. Lowerbending moments allow less structural weight to support the wing. Lowerpitching moments mean less trim drag.

The present optimization is performed at two design points for the tran-sonic and supersonic cruises. Corresponding flow conditions and the targetlift coefficients are described as(i) Transonic cruising Mach number, Moo,t — 0-9(ii) Supersonic cruising Mach number, M^^ = 2.0(iii) Target lift coefficient at transonic cruising condition, Ch,t = 0-15(iv) Target lift coefficient at supersonic cruising condition, CL,S = 0.10(v) Reynolds number based on the root chord length at both conditions,

Re = 1.0 x 107

The Reynolds number is taken from the wind tunnel condition. Flight al-titude is assumed at 10 km for the transonic cruise and at 15 km for thesupersonic cruise. To maintain lift constraints, the angle of attack is com-


puted for each configuration by using da obtained from the finite differ-ence. Thus, three Navier-Stokes computations per evaluation are required.During the aerodynamic optimization, wing area is frozen at a constantvalue.

Design variables are categorized to planform, airfoil shapes and the wingtwist. Planform shape is defined by six design variables, allowing one kinkin the spanwise direction. The definition is shown in Fig. 4 and constraintsfor the planform are summarized in Table 1. A chord length at the wingtip is determined accordingly because of the fixed wing area. Airfoil shapesare composed of its thickness distribution and camber line. The thicknessdistribution is represented by a Bezier curve defined by 11 polygons asshown in Fig. 5. The wing thickness is constrained for structural strength assummarized in Table 1. The thickness distributions are denned at the wingroot, kink and tip, and then linearly interpolated in the spanwise direction.Two camber surfaces composed of the airfoil camber lines are defined at theinboard and outboard of the wing separately. Each surface is represented bythe Bezier surface defined by four polygons in the chordwise direction andthree in the spanwise direction. Finally, the wing twist is represented by aB-spline curve with six polygons. In total, 72 design variables are used todefine a whole wing shape. A three-dimensional wing with computationalstructured grid is shown in Fig. 6. See Ref. 13 for more details for geometrydefinition and CFD information.

Fig. 13.4. Wing planform definition and schematic view of moment axes.


Table 1. Summary of constraints.

(a) Constraints for planform shape

Chord length at root 10 < Croot < 20Chord length at kink 3 < Cunk < 15Inboard span length 2 < bin < 7Outboard span length 2 < bout < 7Inboard sweep angle (deg) 35 < aroot < 70Outboard sweep angle (deg) 35 < otkink < 70Wing area S = 60Chord length at tip 1 < Ctip < 10Chord length Ct%P < Ckink < Croot

Span length bout < bin

Sweep angle ^ ^ ctkink < Q-root

(b) Constraints for thickness distribution

Maximum thickness 3 < Zps < 4Maximum thickness location 15 < Xp5 < 70Continuous first derivative at P5 Zpt — Zp5 = Zp6

~ . ^ P c -A-P-* — X-P-7 -X-P* ,Continuous second derivative at F5

Zp3 = ZPj

Continuous first derivative at leading edge Xp0 = Xpt

13.3.2. CFD Evaluation

To evaluate the design, a high fidelity Euler/Navier-Stokes code was used.Taking advantage of the characteristics of GAs, the present optimizationwas parallelized on SGI ORIGIN2000 at the Institute of Fluid Science,Tohoku University. The system has 640 Processing Elements (PE's) withpeak performance of 384 GFLOPS and 640 GB of memory.

A simple master-slave strategy was employed: The master PE managesthe optimization process, while the slave PE's compute the Navier-Stokescode. The parallelization rate became almost 100% because almost all theCPU time was dominated by CFD computations. The population size usedin this study was set to 64 so that the process was parallelized with 32-128PE's depending on the availability of job classes. The present optimizationrequires about six hours per generation for the supersonic wing case whenparallelized on 128 PE's.


Fig. 13.5. Thickness definition.

Fig. 13.6. Computational grid around a wing in C-H topology.

13.3.3. Overview of Non-Dominated Solutions

The evolution was computed for 75 generations. After the computa-tion, all the solutions evolved were sorted again to find the final non-dominated solutions. The non-dominated solutions were obtained in thefour-dimensional objective function space. To understand the distributionof non-dominated solutions, all non-dominated solutions are projected into


the two-dimensional objective function space between transonic and su-personic drag coefficients as shown in Fig. 7. In Fig. 7, Surface I showsthe tradeoff between aerodynamic performances. The wings near Surface Ihave impractically large aspect ratios. The planform shapes of the extremenon-dominated solutions that minimize the respective objective functionsappear physically reasonable as shown in Fig. 8. A wing with the minimaltransonic cruising drag has a less leading-edge sweep and a large aspect ra-tio. On the contrary, a wing with the lowest supersonic drag coefficient hasa large leading-edge sweep to remain inside the Mach cone. The pitchingmoment is reduced by lowering the sweep angle and the wing chord length.

All the present non-dominated solutions in Fig. 7 are labeled by thebending and pitching moments, respectively, as shown in Fig. 9. The wingsnear the tradeoff surface between transonic and supersonic drag coefficients(tradeoff surface I in Fig. 7) have impractically large bending moments asshown in Fig. 9(a). The bending moment is closely related to both transonicand supersonic drag coefficients. On the other hand, the pitching momenthas an influence only on supersonic drag coefficient in Fig. 9(b).

Fig. 13.7. Projection of non-dominated solutions into two-dimensional plane betweentransonic and supersonic drag coefficients.


Fig. 13.8. Planform shapes of the extreme non-dominated solutions.

13.4. Data Mining by Self-Organizing Map

13.4.1. Neural Network and SOM

SOM1'2 is a two-dimensional array of neurons:

M = { m i . . . m p x J (2)

One neuron is a vector called the codebook vector:

m; = [mi, ...rriij (3)

This has the same dimension as the input vectors (n-dimensional). Theneurons are connected to adjacent neurons by a neighborhood relation. Thisdictates the topology, or the structure, of the map. Usually, the neurons areconnected to each other via rectangular or hexagonal topology. One canalso define a distance between the map units according to their topologyrelations.

The training consists of drawing sample vectors from the input data setand "teaching" them to SOM. The teaching consists of choosing a winnerunit by means of a similarity measure and updating the values of codebookvectors in the neighborhood of the winner unit. This process is repeated anumber of times.

In one training step, one sample vector is drawn randomly from theinput data set. This vector is fed to all units in the network and a similaritymeasure is calculated between the input data sample and all the codebookvectors. The best-matching unit is chosen to be the codebook vector withgreatest similarity with the input sample. The similarity is usually defined


(a) Labeled according to bending moment

(b) Labeled according to pitching momentFig. 13.9. Projection of non-dominated front to supersonic and transonic drag tradeoffslabeled according to bending and pitching moments.


by means of a distance measure. For example in the case of Euclideandistance the best-matching unit is the closest neuron to the sample in theinput space.

The best-matching unit, usually noted as mc, is the codebook vectorthat matches a given input vector x best. It is denned formally as theneuron for which

| | x - m c | | = min[||x-mi||] (4)

After finding the best-matching unit, units in SOM are updated. During theupdate procedure, the best-matching unit is updated to be a little closer tothe sample vector in the input space. The topological neighbours of the best-matching unit are also similarly updated. This update procedure stretchesthe best-matching unit and its topological neighbours towards the samplevector. The neighbourhood function should be a decreasing function of time.In the following, SOMs were generated with more advanced techniques byusing Viscovery® SOMine 4.0 Plus.23

13.4.2. Cluster Analysis

Once SOM projects input space on a low-dimensional regular grid, the mapcan be utilized to visualize and explore properties of the data. When thenumber of SOM units is large, to facilitate quantitative analysis of the mapand the data, similar units need to be grouped, i.e., clustered. The two-stage procedure — first using SOM to produce the prototypes which arethen clustered in the second stage — was reported to perform well whencompared to direct clustering of the data.24

Hierarchical agglomerative algorithm is used for clustering here. The al-gorithm starts with a clustering where each node by itself forms a cluster. Ineach step of the algorithm two clusters are merged: those with minimal dis-tance according to a special distance measure, the SOM-Ward distance.23

This measure takes into account whether two clusters are adjacent in themap. This means that the process of merging clusters is restricted to topo-logically neighbored clusters. The number of clusters will be different ac-cording to the hierarchical sequence of clustering. A relatively small numberwill be chosen for visualization (Sec. 4.3), while a large number will be usedfor generation of codebook vectors for respective design variables (Sec. 4.4).


13.4.3. Visualization of Design Tradeoffs: SOM ofTradeoffs

As discussed above, a total of 766 non-dominated solutions were obtainedafter 75 generations as a three-dimensional surface in the four-dimensionalobjective function space as shown in Figs. 7 and 9. By examining the ex-treme non-dominated solutions, the archive was found to represent the non-dominated front qualitatively.

The present non-dominated solutions of supersonic wing designs havefour design objectives. First, let's project the resulting non-dominated frontonto the two-dimensional map. Figure 10 shows the resulting SOM withseven clusters. For better understanding, the typical planform shapes ofwings are also plotted in the figure. Lower right corner of the map corre-sponds to highly swept, high aspect ratio wings good for supersonic aero-dynamics. Lower left corner corresponds to moderate sweep angles goodfor reducing the pitching moment. Upper right corner corresponds to smallaspect ratios good for reducing the bending moment. Upper left corner thusreduces both pitching and bending moments.

Fig. 13.10. SOM of the objective function values and typical wing planform shapes.

Figure 11 shows the same SOM contoured by four design objective val-


ues. All the objective function values are scaled between 0 and 1. Lowsupersonic drag region corresponds to high pitching moment region. Thisis primarily because of high sweep angles. Low supersonic drag region alsocorresponds to high bending moment region because of high aspect ratios.Combination of high sweep angle and high aspect ratio confirm that super-sonic wing design is highly constrained.

Fig. 13.11. SOM contourd by each design objective.


13.4.4. Data Mining of Design Space: SOM of DesignVariables

The previous SOM provides clusters based on the similarity in the objectivefunction values. The next step is to find similarity in the design variablesthat corresponds to the previous clusters. To visualize this, the previousSOM is first revised by using 49 clusters as shown in Fig. 12. Then, all thedesign variables are averaged in each cluster, respectively. Now each designvariable has a codebook vector of 49 cluster-averaged values. This codebookvector may be regarded to represent focal areas in the design variable space.Finally, a new SOM is generated from these codebook vectors as shown inFig. 13.

This process can be done for encoded design variables (genotype) anddecoded design variables (phenotype). The genotype and phenotype gener-ated completely different SOMs. A possible reason is because the variousscaling appears in phenotype. For example, one design variable is between0 and 1 and another is between 35 and 70. The difference of order of mag-nitude in design variables may lead to different clusters. To avoid suchconfusion, the genotype is used for SOM here.

In Fig. 13, the labels indicate 72 design variables. DVs 00 to 05 cor-respond to the planform design variables. These variables have dominantinfluence on the wing performance. DVs 00 and 01 determine the spanlengths of the inboard and outboard wing panels, respectively. DVs 02 and03 correspond to leading-edge sweep angles. DVs 04 and 05 are root-sidechord lengths. DVs 06 to 25 define wing camber. DVs 26 to 32 determinewing twist. Figure 13 contains seven clusters and thus seven design vari-ables are chosen from each cluster as indicated. Figure 14 shows SOM's ofFig. 10 contoured by these design variables.

The sweep angles, DVs 02 and 03, make a cluster in the lower left cor-ner of the map in Fig. 13 and the corresponding plots in Fig. 14 confirmthat the wing sweep has a large impact on the aerodynamic performance.DVs 11 and 51 in Fig. 14 do not appear influential to any particular ob-jective. By comparing Figs. 14 and 10, DV 01 has similar distribution withthe bending moment Mb, indicating that the wing outboard span has animpact on the wing bending moment. On the other hand, DV 00, the winginboard span, has an impact on the pitching moment. DV 28 is related totransonic drag. DV 04 and 05 are in the same cluster. Both of them have animpact on the transonic drag because their reduction means the increase ofaspect ratio. Several features of the wing planform design variables and the


corresponding clusters are found out in the SOMs and they are consistentwith the existing aerodynamic knowledge.

Fig. 13.12. SOM of objective function values with 49 clusters.

13.5. Conclusions

The multipoint design optimization of a wing for a SST has been performedby using ARMOGA. Four objective functions were used to minimize thesupersonic and transonic drags and the bending and pitching moments. Thecomplete wing shape was represented by a total of 72 design variables. TheNavier-Stokes solver was used to evaluate the aerodynamic performances.Successful optimization results are obtained. The planforms of the extremeNon-dominated solutions appear physically reasonable. Global tradeoffs be-tween the objectives are presented.

In addition, design tradeoffs have been investigated for the design prob-lem of supersonic wings by using visualization and cluster analysis of thenon-dominated solutions based on SOMs. SOM is applied to visualize trade-offs between design objectives. Three-dimensional non-dominated front inthe objective function space has been mapped onto the two-dimensionalSOM where global tradeoffs are successfully visualized. The resulting SOMs


Fig. 13.13. SOM of cluster-averaged disign variables.

are further contoured by each objective, which provides better insights intodesign tradeoffs.

Furthermore, based on the codebook vectors of cluster-averaged valuesfor respective design variables obtained from the SOMs, the design variablespace is mapped onto another SOM. Design variables in the same clusterare considered to have similar influences in design tradeoffs. Therefore, byselecting a member (design variable) from a cluster, the original SOM inthe objective function space is contoured by the particular design variable.It reveals correlation of the cluster of design variables with objective func-tions and their relative importance. Because each cluster of design variablescan be identified influential or not to a particular design objective, the opti-mization problem may be divided into subproblems where the optimizationwill be easier to lead to better solutions.

These processes may be considered as data mining of the engineering de-sign. The present work demonstrates that MOEAs and SOMs are versatiledesign tools for engineering design.

Acknowledgments

The present computation was carried out in parallel using ORIGIN2000 inthe Institute of Fluid Science, Tohoku University. The authors would liketo thank National Aerospace Laboratory's SST Design Team for providing


Fig. 13.14. SOM contoured by design variables selected from clusters in Fig. 13.

many useful data.

References

1. T. Kohonen, Self-Organizing Maps, Springer, Berlin, Heidelberg (1995).2. J. Hollmen, Self-Organizing Map, http://www.cis.hut.fi/~jhollnien/dippa/

node7.html, last access on October 3 (2002).3. S. E. Cliff, J. J. Reuter, D. A. Saunders and R. M. Hicks, "Single-Point

and Multipoint Aerodynamic Shape Optimization of High-Speed Civil Trans-port," J. of Aircraft, 38, 6, (2001), pp.997-1005.

4. J. J. Alonso, I. M. Kroo and A. Jameson, "Advanced Algorithms for Designand Optimization of Quiet Supersonic Platforms," AIAA Paper 2002-0144(2002).

5. K. Sakata, "Supersonic Experimental Airplane Program in NAL and its CFD-Design Research Demand," Proc. of 2nd SST-CFD Workshop, (2000), pp.53-56.

6. K. Sakata, "Supersonic Experimental Airplane (NEXST) for Next Genera-tion SST Technology," AIAA Paper 2002-0527 (2002).

7. Y. Shimbo, K. Yoshida, T. Iwamiya, R. Takaki and K. Matsushima, "Aero-


dynamic Design of Scaled Supersonic Experimental Airplane," Proc. of 1stSST-CFD Workshop, (1998), pp.62-67.

8. T. Iwamiya, K. Yoshida, Y. Shimbo, Y. Makino and K. Matsuhima, "Aerody-namic Design of Supersonic Experimental Airplane," Proc. of 2nd SST-CFDWorkshop, (2000), pp.79-84.

9. Y. Makino and T. Iwamiya, "Aerodynamic Nacelle Shape Optimization forNAL's Experimental Airplane," Proc. of 2nd SST-CFD Workshop, (2000),pp.115-120.

10. R. Grenon, "Numerical Optimization in Aerodynamic Design with Applica-tion to a Supersonic Transport Aircraft," Proc. of 1st SST-CFD Workshop,(1998), pp.83-104.

11. H.-J. Kim, D. Sasaki, S. Obayashi and K. Nakahashi, "AerodynamicOptimization of Supersonic Transport Wing Using Unstructured AdjointMethod," AIAA J., 39, 6 (2001), pp.1011-1020.

12. S. Obayashi, D. Sasaki, Y. Takeguchi and N. Hirose, "Multiobjective Evo-lutionary Computation for Supersonic Wing-Shape Optimization," IEEETransactions on Evolutionary Computation, 4, 2 (2000), pp.182-187.

13. D. Sasaki, S. Obayashi and K. Nakahashi, "Navier-Stokes Optimization ofSupersonic Wings with Four Objectives Using Evolutionary Algorithm," J.of Aircraft, 39, 4 (2002), pp.621-629.

14. D. Sasaki, G. Yang and S. Obayashi, "Automated Aerodynamic OptimizationSystem for SST Wing-Body Configuration," AIAA Paper 2002-5549 (2002).

15. M. Arakawa and I. Hagiwara, "Development of Adaptive Real Range (AR-Range) Genetic Algorithms," JSME Int. J., Series C, 41, 4 (1998), pp.969-977.

16. M. Arakawa and I. Hagiwara, "Nonlinear Integer, Discrete and Continu-ous Optimization Using Adaptive Range Genetic Algorithms," Proc. of 1997ASME Design Engineering Technical Conferences, 1997.

17. A. Oyama, S. Obayashi and T. Nakamura, "Real-Coded Adaptive RangeGenetic Algorithm Applied to Transonic Wing Optimization," Applied SoftComputing, 1, 3 (2001), pp.179-187.

18. K. Deb, Multi-Objective Optimization using Evolutionary Algorithms, JohnWiley & Sons, Ltd., Chickester (2001).

19. C. A. Coello Coello, D.A. Van Veldhuizen and G. B. Lamont, EvolutionaryAlgorithms for Solving Multi-ObjectiveProblems , Kluwer Academic Publish-ers, New York, (2002).

20. C. M. Fonseca and P. J. Fleming,, "Genetic Algorithms for Multiobjec-tive Optimization: Formulation, Discussion and Generalization," Proc. of 5thICGA, (1993), pp.416-423.

21. S. Obayashi, S. Takahashi and Y. Takeguchi, "Niching and Elitist Models forMOGAs," Parallel Problem Solving from Nature - PPSN V, Lecture Notes inComputer Science, Springer, Berlin Heidelberg New York, (1998), pp. 260-269.

22. L. J. Eshelman and J. D. Schaffer, "Real-coded genetic algorithms and in-terval schemata," Foundations of Genetic Algorithms 2, Morgan KaufmannPublishers, Inc., San Mateo, (1993), pp.187-202.


23. Eudaptics software gmbh. http://www.eudaptics.com/technology/somine4.html, last access on October 3 (2002).

24. J. Vesanto and E. Alhoniemi, "Clustering of the Self-Organizing Map," IEEETransactions on Neural Networks, 11, 3 (2000), pp.586 -600.

CHAPTER 14

APPLICATIONS OF A MULTI-OBJECTIVE GENETICALGORITHM IN CHEMICAL AND ENVIRONMENTAL

ENGINEERING

Madhumita B. RayDepartment of Chemical and Biomolecular Engineering

National University of Singapore4 Engineering Drive 4

Singapore 117576E-mail: [email protected]

Multiobjective optimization involving simultaneous optimization of morethan one objective function is quite commonly encountered in chemicaland environmental engineering processes. With the implementation ofstringent regulations on air and water discharge of the particulate pollu-tants, development of efficient fluid-solid separation devices integratingthe physico-chemical processes with economic parameters is of significantcommercial importance. In this chapter, multiobjective optimization us-ing conflicting objectives such as maximization of the overall collectionefficiency and minimization of both the pressure drop and the cost incommonly used fluid-particulate separation devices namely cyclone sep-arator and venturi scrubber is illustrated using the Non-dominated Sort-ing Genetic Algorithm (NSGA).

14.1. Introduction

Genetic algorithms (GAs), which are a nontraditional search and optimiza-tion method, introduced by Holland1 in 1975, mimic the principles of ge-netics and natural selection. This is done by the creation of a population ofsolutions referred to as chromosomes or strings. Each chromosome is repre-sented in terms of a set of several binary numbers generated randomly, andencodes the values of the different parameters (decision variables) beingoptimized. These are equivalent to the chromosomes in DNA in the bio-logical systems. The chromosomes then go through a process of simulated"evolution". Bit-manipulation operators then implement "reproduction",

317

318 Madhumita B. Ray

"crossover", "mutation" and other biological operators of natural evolu-tion, to improve their "fitness".

GAs have several advantages over conventional optimization techniques:i) objective functions can be multimodal or discontinuous, ii) they requireinformation only on the objective function; gradient evaluation is not re-quired, iii) a starting solution is not required, iv) search is carried out usinga population of several points simultaneously, rather than a single point,v) they are better suited to handle problems involving several design oroperating variables (decision variables).

Simple genetic algorithms (SGAs) are suitable for optimization prob-lems involving single-objective functions. In such problems, a SGA usu-ally reaches the global optimum. However, for problems involving multiple-objective functions, unique optimal solutions rarely exist. Rather, a set ofseveral equally desirable, trade-off points may exist. These solutions arecalled non-dominated and conform the so-called Pareto optimal set. Noneof these non-dominated solutions are superior to any of the other points,and indeed, any one of them could be selected for design or operation. Thechoice of a desired solution among the Pareto set of points requires addi-tional knowledge about the problem, information which may be intuitiveand hence, non-measurable. Statistical techniques using the estimationsof several decision-makers are often used to decide the preferred solution.However, the Pareto optimal set assists in narrowing down the choices to beconsidered for a decision-maker, and thus is of great importance. The Non-dominated Sorting Genetic Algorithm (NSGA)2, an adaptation of SGA canbe used for multi-objective optimization.

Many of the current chemical engineering problems in core areas such asreaction engineering, transport phenomena, separation science, and biolog-ical systems require multi-objective optimization. A recent review presentsvarious applications of multi-objective optimizations in diverse chemicalengineering problems3. Previously, the NSGA has been successfully usedfor various chemical engineering problems including optimization of nylon-6 semi-batch reactor4'5, optimization of side-fired steam reformer6, wipedfilm PET reactor7and dialysis of beer in hollow fiber membranes.8

In addition to the major goal of achieving economic efficiency, mostof the recent chemical engineering examples include reliability, safety, haz-ard analysis, control performance, and environmental pollution. With theimplementation of stringent regulations on air and water discharge of theparticulate pollutants, development of efficient fluid-solid separation devicesintegrating the physico-chemical processes with economic parameters is of

Applications of a MOGA in Chemical and Environmental Engineering 319

significant practical importance. In this chapter, multiobjective optimiza-tion involving conflicting objectives such as maximization of the overall col-lection efficiency and minimization of the pressure drop, and in some cases,the cost in commonly used fluid-particulate separation devices is dealt withusing the NSGA.

14.2. Physical Problem

Control of particulate matter is a major aspect of industrial air pollutionengineering. Current federal regulations in developed countries call for "in-visible stack emissions" from industries. These regulations necessitate thedevelopment of new efficient systems for particulate removal as well as im-proving the existing designs. Particles are separated from the conveying flu-ids by a combination of several mechanisms namely: gravitational settling,centrifugal impaction, inertial impaction, direct interception, diffusion andelectrostatic attraction.

Industrial fluid-particle separation devices such as cyclones and hydro-cyclones are centrifugal or inertial separators that use centrifugal force toremove the fine particles from air and water, respectively. Since its inceptionin the early nineteenth century, cyclone separators are immensely popularin industry due to their simple and compact design, and low manufacturingand maintenance costs. However, industrial scale cyclones are not efficientin removing particles smaller than 10 micron, and the operating cost in-creases considerably with the decreasing particle size. Venturi scrubbers,yet another type of predominantly inertial separators, are reasonably effi-cient for the removal of submicron particles, and are also able to handlewet and corrosive gases. The major particle collection mechanism in ven-turi scrubbers is due to inertial impaction of the particulates with liquid(mainly water) droplets. Large power requirements for their operation dueto high pressure drop are the main drawback of the venturi scrubbers,and the pressure drop increases with the increasing collection efficiency.Naturally, reduction in operating pressure drop without compromising theparticle collection efficiency is the most desired objective in the design andoperation of such separators. In fact, these are the only two most impor-tant practical objective functions relevant to fluid-particle separation. Inthis chapter, we will see the application of the NSGA for optimal designsof cyclone separator and venturi scrubber using several examples.


14.3. Genetic Algorithm

Genetic Algorithm (GA) imitates biological evolution for optimizationproblems. It consists of a set of individual elements (the population) and aset of biologically similar operators that are used to change these individu-als. In simple genetic algorithm (SGA), the binary information on decisionvariables is first mapped into real values using prescribed bounds and thefitness (objective) functions of the chromosomes are evaluated using ap-propriate model. Using the Darwinian principle of survival of the fittest,a new population (generation) is created by performing reproduction ofthe chromosomes in the current population. This is done by copying thechromosomes in the earlier generation into a gene pool, with the number ofcopies made being proportional to their fitness functions. The chromosomesin the gene pool then undergo pair-wise random crossover and mutation op-erations in order to provide members of the next generation. In the course ofseveral generations, the fitness of the chromosomes improves, and fitter setsof strings emerge. 1>9~10 Non-dominated Sorting Genetic Algorithm (NSGA)was developed by Deb and Srinivas2 (1994) to solve optimization problemsinvolving multiple objectives. Principally, NSGA differs from SGA in theselection of population. In NSGA-I, an initial population of chromosomesis generated randomly. As mentioned earlier, a chromosome (or gene) is astring of numbers (often binaries), encoding information about the decisionvariables. The subsets (substrings) in any chromosome associated with thedifferent decision variables are then mapped into real values lying betweenthe corresponding specified bounds. A model for the process is then usedto evaluate the values of the fitness (objective) functions. Thereafter, a setof the good non-dominated chromosomes is identified by testing each of thechromosomes in the population against all others (pair-wise comparison)involving a large number of computational steps. Two solutions are non-dominating when on moving from one solution to another, an improvementin one of the objective functions occurs at the cost of deterioration in one(or more) of the other objective function(s). A chromosome is checked fordominance once during the computation for that generation. After testingall the chromosomes in this manner, a sub-set of the best non-dominatedchromosomes is identified. This is assigned a front number of unity (FrontNo. = 1). The remaining solutions are again compared as before, and thenext set of nondominated solutions is identified and assigned a Front No.of 2. This procedure is repeated for all the new chromosomes. Fronts withlower values of the front number are superior or non-dominated sets com-


pared to those with a higher front number.A high fitness value (which is usually the number of chromosomes, Np,

could be any other arbitrarily selected large value instead) is assigned ar-bitrarily to all the solutions in Front No. 1. The fitness values of individ-ual chromosomes in this front are then modified based on their "degree ofcrowding" or a sharing procedure obtained by dividing the fitness value bythe niche count of the chromosome. The niche count is a parameter pro-portional to the number of chromosomes/neighbors in its neighborhood (inthe decision variable space) within the same front, with distant neighborscontributing less than those nearby.

The niche count is obtained, e.g., for the ith chromosome, by computingits distance, djj, from another, jth chromosomes in the solution-space, andusing a sharing function, Sh, as given below:

Sh(dij) = j l - \~^rej d^ < ashare

Sh(dij) = 0, otherwise

In equation 1, crs/iare, a computational parameter, is the maximum dis-tance allowed between two chromosomes to qualify as neighbors and a isthe dimensionless exponent of the sharing function. Thus, if d - is largerthan ashare, its contribution to Sh is zero, while for d,j =0, its contributionto Sh is 1, and for the intermediate distances, Sh (dij) lies between 0 and 1.By summing up Sh (djj) for all values of j in any front comprising of non-dominated chromosomes, degree of crowding of the ith chromosome can befound out. This summation is referred to as the niche count of chromosomei.

The shared fitness value of chromosome, i, assigned earlier is the ratio ofthe common dummy fitness, and its niche count. Use of the shared fitnessvalue for reproduction helps to spread out the chromosomes in the front.This procedure is repeated for all the members of the first front. Once this isdone, these chromosomes are not considered for the time being, and all theremaining chromosomes are tested for non-dominance. The non-dominatedchromosomes in this round are classified into the next front (Front No.=2). The common fitness value assigned to all members of this front is abit lower than the lowest shared fitness value of the previous front (FrontNo. =1). Thereafter sharing is performed. This procedure is continued tillall the chromosomes in the population have been assigned shared fitnessvalues. This step is followed by reproduction. The chromosomes are copiedstochastically (best chromosome having a higher probability) into a mating

(1)


pool. Non-dominated members of the first front that have fewer neighborsget the highest representation in the mating pool. Dominated members ofthe later fronts, instead of getting "killed", are assigned some low fitnessvalues in order to maintain the diversity of the gene-pool. There are numer-ous selection techniques for the copying of the chromosomes, e.g., roulettewheel, tournament selection (popular), normalized geometric ranking, ex-pected value and linear normalization.

In the next stage, crossover and mutation are performed on these copiesto produce daughter chromosomes (and complete a generation). Crossoveris a genetic operator used to recombine the genetic material of the pop-ulation by selecting two chromosomes (randomly) and swapping part oftheir genetic information to produce new chromosomes. For example, apair of binary coded chromosomes, 101001 and 010110, after crossover atthe third (randomly selected) location, will give two chromosomes, 101110and 010001. Mutation operator moves the chromosome locally in the so-lution space to create a fitter chromosome. Each binary number in everysingle chromosome is changed with a specified mutation probability, usinga random number code. The mutation probability is small so as to avoidoscillatory behavior. The above procedure is repeated several times (gener-ations) until a satisfactory set of Pareto optimal solutions are obtained inthe gene-pool, having a reasonable spread of points. A flowchart showingNSGA-I is shown in Figure 14.1.

14.4. Problem Formulation

Example 1

The major criteria that are used in the determination of the performanceof gas-cleaning devices are the collection efficiency, ,and the pressure dropAp. In the first example, we evaluate both the pressure drop, Ap, and thetotal annual cost, Co.for an industrial operation treating 165 m3/s of airusing N parallel cyclones (multiclones). The train of N cyclones at the designstage was conducted for a paper mill.11 The average size of the particles inthe stream is 10 /xm with a log-normal size distribution, and the averageparticle density, ps, is 1600 kg/m3. The viscosity of the gas is 24.8 x 10~6

Pa.s.There are several collection efficiency and pressure drop models available

for cyclones in literature. The details of the models adopted in this exampleare available in Ravi et al. (2001)11. A schematic diagram showing all the


Fig. 14.1. Flow-chart of the NSGA (adapted from Mitra et a]., 1998)4.

dimensional parameters of standard reverse flow cyclone is presented inFigure 14.2.

The methodology of optimization is illustrated by selecting two objectivefunctions, Ii and I2, for simplicity. However, this is often sufficient for theoptimization involving fluid-particle separation device where the maximiza-tion of the overall collection efficiency, ryo is desired with the minimizationof the pressure drop, Ap. Here we will discuss several problems involvingall the decision variables and constraints, commonly used in cyclone design.Problem 1 can, thus, be described mathematically as:

Problem 1:


Fig. 14.2. Schematic diagram of the test cyclone.

Maxh(u)=h(N,D,%;%1%,%,%,%,±)=r,0 (a)MinJ2(u)=/2(Â&,§,g,;§,&,£,&)=Ap (b)subject to (s.t.):15.0 <Vi< 30.0m/s (c)uli < Ui < u\\ i = 1, 2, . . . , 9 (d)model equations.11 (2)

An alternate, 2-objective optimization problem (Problem 2) will also bediscussed. Here, r\o is maximized while the cost, Co, is minimized:


Problem 2:

MaxT , s _ T (AT n Ds_ B H S h a b \ _ „ / Q \

A(u) =h [jy,U, -rf-, p, ^ , p, -p, •s, -p) -j}0 (a)Minh^) = h{N,D,%%,%,%^,^^)=Co (b)s.t.:15.0 < Vi < 30.0m/s (c)u{<Ui<uV;i = l, 2, . . . , 9 (d)model equations11 (3)

The codes for the NSGA used here usually work with minimization ofthe objective functions. Since one of the two objective functions in Prob-lems 1 and 2 (equations 2 and 3) involves maximization, the problems areconverted into minimization problems by defining fitness functions, Fi andF2, both of which are to be minimized. A common procedure is to use.

Problems 1, 2:

Min

MinF2 = h (b)s.t.all earlier constraints (equations 2 or 3, c - e) (c) (4)

The procedure used to take care of the constraint on the range of valuesfor Vi (equations. 2 c and 3 c) is to penalize chromosomes violating theseconstraints by adding an arbitrarily large number, Pe, to the two fitnessfunctions, Fi and F2, so that such chromosomes become unfit and die out.

Bounds or limits on the variables

Nine decision variables, u»; i = 1, 2, . . . , 9 have been used in these prob-lems. These variables are the number, diameter and seven geometric ratios(shape) of the cyclones. The bounds used (first level or a-priori bounds)on the nine decision variables, u, for both Problems 1 and 2 are given inTable 14.64.

(a)


Table 14.64. Bounds on Decision Variables, u; [Problem preference case) andProblem 2]

i Uj u[ u" Stairmand*High

EfficiencyFIRST-LEVEL(a-priori) BOUNDS:

_1 N 1 2048 ~_2 D, m 0.3 0.7_3 D e / D 0.4 0.6 0.54 B/D~~ 0.325 0.425 0.375

_5 H/D 3 . 5 ~ 4.5 " 4.0_6 S/D 0.4 0.6 0.5_7 h /D 1.1 1,3 1.2_8 a /D 0.4 0.6 0.5_9 b / D 0.15 0.25 0.2

SECOND-LEVEL (over-riding) BOUNDS:(i) 0.4 < a/D < S/D in any chromosome(ii) If 0.5 < De/D < 0.6 in any chromosome,then 0.15 < b/D < (1 - De/D)/2 | | | |

The bounds have been chosen to encompass the values correspondingto the standard high-efficiency cyclone of Stairmand 12 design, which arealso provided for comparison in the table. The number, N, of cyclones is tobe taken as an integer. A reasonably large range (common for multiclones)is provided for the first decision variable, N. The constraints on the inletvelocity, v;, are those normally used in industrial practice. The lower boundon Vj helps ensure reasonably high values of r\o, while the upper bound helpsreduce problems of erosion, excessively high values of Ap, and reduce re-entrainment of solids. Similarly, a small range for the cyclone diameter, D,of 0.3 - 0.7 m has been taken. The lower limit helps prevent re-entrainmentof the collected solids from the cyclone wall. The upper bound on D, asfor the case of N, is somewhat arbitrarily selected, and has to be relaxed,at least to some extent, if the optimal solution lies at the upper bound.However, in multiclone operation, large diameter of the cyclones is seldomencountered.

In order to avoid violating the physics of gas-solid separation, severaladditional bounds and constraints need to be added to over-ride the ran-dom choice of decision variables. These bounds are chromosome-specific.These are being referred to as second level or over-riding constraints. Twoover-riding constraints (also shown in Table 1) are operative in the givenexample for the selections made for the a-priori bounds. For example, both


S/D and a/D have been selected to lie from 0.4 to 0.6. However, a well-known practice is to have a < S, since this minimizes the short-circuitingof the feed stream to the outlet before going through the separation space.The presence of these kinds of over-riding bounds necessitates adaptationof the mapping procedures for the binary chromosomes into real numbersin the NSGA procedures currently in use. One must, for example, first mapu ; i =1, 2, . . . , 7, using the normal techniques. The values of S/D chosenfor any chromosome must then be used to decide the bounds to be usedfor that chromosome while mapping the binary values of a/D into the real-number domain. Over-riding bounds assists in numerical convergence of theproblem.

Optimum Cyclone Design

The seven geometric parameters shown in Figure 14.2 are important forthe performance of cyclone. The problems (equations 2 and 3) are solved ona CRAY J916 computer. The average CPU time required for the solutionof the system is about 0.24 s.

The several decision variables and constraints may interact in a complexmanner, making it difficult to obtain meaningful solutions for the variousdecision variables (the presence of many decision variables makes interpre-tation of the results quite difficult). Thus, it is necessary to solve severalsimpler cases before attempting on the solution of the general problems.

The simpler cases for Problem 1 consider one decision variable at onetime (the remaining variables are fixed). For example, in Case 1, the onlydecision variable is the total number, N, of cyclones. All other variables aretaken as constants, at values suggested by Stairmand12, while the diameteris fixed at 0.5 m. Cases 2 and 3 involve two decision variables, of which oneis N and the other is either D or De/D. Case 4 is the more general problemwith all the nine decision variables used and is being called as the reference(ref) case.

The results of Cases 1-3 are shown in Figure 14.3. In Case 1, only N isallowed to vary. A Pareto optimal set is obtained, as shown in Figure 14.3a.The optimal value of N corresponding to different points on the Pareto setis observed to decrease with increasing values of rjo (Figure 14.3b). Thehighest value of N (at low r?o) is determined by the lower bound on theinlet velocity, v*. Typically, collection efficiency in cyclones increases withincreasing inlet velocity. Thus, higher values of r)0, are associated with lowervalues of N and higher values of Vj(Figure 14.3c), as expected. However, the


maximum value of the overall collection efficiency is quite low (about 60%).Case 2 involves both N (= ui) and D (= u2) as the decision variables.

The Pareto optimal set for this case (Figure 14.3a) is shifted to the rightas compared to that for Case 1, implying that higher collection efficienciesthan Case 1 are obtained. The highest value of rj0 is observed to be limitedby the upper bound of 30 m/s on the inlet velocity. It can be seen that thediameter, D, slowly moves to its lower bound of 0.3 m (see Figure 14.3d).

In Case 3, N (= ui) and De/D (= U3) are taken as the decision variables(while keeping D at 0.5 m). The increase in the value of D from 0.3 m inCase 2 to 0.5 m in Case 3 reduces the optimal values of rjo in the Paretooptimal set.

The difference in the qualitative behavior of the decision variables can beobserved by comparing Cases 1 and 2 (Figures 14.3b and c). It is observedin this case that N and Vj are almost constant till some value of rjo whileDe/D decreases (Figure 14.3e) to give higher rjo. Once De/D reaches itslower limit of 0.4, r]o increases further only by a sudden increase in De/Dand Vj,and a decrease in N. Thereafter, N and V; again stay constant, andDe/D decreases continuously with increasing rjo. The points of change in theVj and N curves coincide with the change in the De/D curve. The efficiencyincreases considerably with decreasing diameter (De) of the outlet (vortexfinder) although the trend indicates the presence of an optimum value.An optimum value of De/D lies between 0.33-0.5. Between N and De/D,the latter is more important in deciding the Pareto optimal set for thiscase. The inference that De/D predominates over N in deciding the Paretooptimal set in the simplified Case 3, would not have been so evident if wehad started out solving more complex problems from the very beginning.The methodology of first obtaining solutions for simplified cases with onlyone or two decision variables is thus highly recommended for all real-lifeand complex multiobjective optimization problems.

For the cases involving N and any one of the other geometrical parame-ters (viz. B/D, S/D, H/D, h/D, a/D and b/D) as the decision variables, Nis found to be the principal decision variable which controls the shape andcharacteristics of the Pareto optimal set.

The results of the more general reference Problem 1 are shown in Fig-ures 14.3 and 14.4 (filled circles). It is clear from Fig. 3 that the use ofseveral decision variables simultaneously, leads to Pareto optimal sets withmuch higher values of the overall collection efficiency.

Figure 14.4 gives the solution of Problem 2 (equation (3)). In this prob-lem, the annual cost, Co, is minimized while r\o is maximized. The bounds


Fig. 14.3. Results for the simplified Cases 1-4 for Problem 1, (a) Pareto optimal setsshowing Ap vs r\o for Cases 1-4; (b) N vs r\o for Cases 1-4; (c) v us no for Cases 1-4;(d) D vs rio for Case 2; (e) De/D vs no for Case 3.

of the decision variables are the same as in Table 14.64. The cost Paretooptimal set in Figure 14.4a is found to extend over a lower range of valuesof rj0. The importance of both N and De/D as decision variables control-ling the Pareto optimal set for Problem 2 is also observed. Figure 14.4eshows the calculated values of Ap corresponding to the different points onthe Covs.rjo Pareto optimal set. It is interesting to observe (Figure 14.4e)that the reference Pareto optimal set (Case 4, Problem 1) is almost in-distinguishable from the computed Ap vs. r\0 curve corresponding to thecost Pareto optimal set over the range where their values of 7yoare similar.


Similarly, the other decision variables superimpose in this range. The par-allelism between these two curves suggests that three-dimensional Paretooptimal sets (maximize T]o, minimize Ap, and minimize Co) will not leadto substantially different results.

Some numerical scatter is observed for the optimal values of the decisionvariables. Such scatter is common in GA and can possibly be somewhat re-duced but not eliminated by changing the computational parameters andwill be discussed later.

Example 2

The performance of a venturi scrubber depends largely on the mannerof liquid injection, drop size, liquid flux distribution and initial liquid mo-menta. Most of the particle collection occurs in the throat (see Figure 14.5)because of the presence of a high degree of turbulence in the region causedby large relative velocities between the drops and particles. In this example,we will see the application of the NSGA to determine the optimum nozzledistribution in a pilot scale venturi scrubber to improve the droplet fluxdistribution.

We will see two optimization problems involving venturi scrubber inthis section. In the first problem, a two-dimensional approach was used forthe determination of the collection efficiency. Three decision variables, theliquid-gas ratio (L/G), the gas velocity in the throat (VG,J/I), and the as-pect ratio of the throat, Z, were used. Optimal design curves were obtainedfor the pilot scale scrubber. In the second optimization problem of thescrubber, a three-dimensional model was used to determine the collectionefficiency. A three-dimensional approach will produce results about opti-mum nozzle arrangement in venturi scrubbers which is a very importantdesign variable dictating the flux distribution in scrubber. Three decisionvariables, the liquid-gas ratio (L/G), the gas velocity in the throat ( V Q ^ ) ,

and nozzle configuration, Nc, were used in the second problem. The modelsfor collection efficiency and pressure drop of the venturi scrubber can befound in Ravi et al. (2002) 13.

Problem Formulation

Problem 3 is, thus, described mathematically by


Fig. 14.4. Results of Problem 2 and Case 4, Problem 1, (a) Pareto optimal sets showingcost vs T)o for both cases; (b) N vs r\o for both cases; (c) Vj vs r]o for both cases; (d)De/D vs r\o for both cases; (e) Pareto optimal sets showing calculated Ap vs r\o for bothcases.

Max

h(u)=h (%,Vgth,Z) =T]O (a)Min

I2(u)=I2{§,Vgth,Z)=AP (b)

subject to (s.t.):

u\ <uz<uf;i = 1, 2, 3 (c)

model equations. 13 (d) (5)

Dust with a log-normal distribution of sizes (Mass median diameter =

5.0 /jm, standard deviation (<rp) — 1.5) was used for the collection efficiency


calculation in the venturi scrubber. The dimensions of the venturi scrubberare presented in Figure 14.5. As before, the problem was converted into apure minimization problem by denning fitness functions, Fi and F2 (similarto equation (4)), both of which are to be minimized. The bounds used forthe three decision variables in this problem are presented in Table 2. Addi-tional constraints can also be incorporated using penalty function Pe. Forexample, constraints can be imposed such that the collection efficiency isnever below 75% and pressure drop never exceeds 5000 Pa. Pe an arbitrar-ily large number is added to both the fitness functions, Fi and F2, for allthe chromosomes violating this requirement which ensures that such chro-mosomes become unfit and die out, almost instantaneously (referred to asinstant killing).

Optimum scrubber design

The results of Problem 3 are shown in Figure 14.6a (filled circles andsquares). Optimal solutions for Problem 3 using three decision variablesare presented by the filled circles while the unfilled circles represent theoptimal solutions using the NSGA with only a single objective function(maximization of r]o) using three decision variables. Plot of r\0 vs. Ap shownin Figure 14.6a has the characteristics of a Pareto optimal set, wherein animprovement (increase) in r)ois accompanied with a worsening (increase)of Ap. Plots of the three decision variables corresponding to the differentpoints on the Pareto optimal set are shown in Figures 14.6b-d. It is ob-served that the gas velocity at the throat, Vgth, varies along the pointson the Pareto optimal set. The values of L/G that provide optimal op-erating conditions are found to be almost constant, varying in a narrowrange of about 0.8xl0~3 - l.lxlO"3 (Figure 14.6b). Similarly, the aspectratio, Z (Figure 14.6d), needs to be maintained at around 2.5. A new di-mensionless number, VAT [= LRoZ/(Gdon.,)], which characterizes the non-uniformity in the flux distribution and the overall collection efficiency ob-served to lie in the range 1.0xl0~3-1.3xl0~3. Earlier, Ananthanarayananand Viswanathan (1998)14 who used the 'simulation procedure', and con-sidered only the maximization of r\0 found that this number lies in the rangeof 1.0xl0~3-1.5xl0~3 at the optimal point, for an optimization probleminvolving only one objective function (maximization of the collection effi-ciency). The venturi numbers for the present problem are shown in Fig-ure 14.6e.

The computed values of the pressure drop for the single-objective func-


Table 14.65. Bounds on the Decision Variables (Problem 3)

Decision Variable Lower Bound Upper BoundL/G (m^liquid/m3 gas) ~ 0.5xl0~a 2 .0xl(r 3

Votft (m/s) 20O) 11O0Z | 0-5 | 2.5

tion problem is higher than the values on the Pareto optimal set corre-sponding to two objective functions, and we could possibly select betterpoints (similar r)ogut lower Ap) using the Pareto optimal solutions withtwo objective functions.

A variation of Problem 3 where a 3-D collection efficiency model isused to consider an important design variable, the nozzle arrangement inthe throat of the scrubber (Problem 4) is discussed here. Previously, forProblem 3, a slice consisting of a single nozzle was taken along the ax-ial direction and the process was simulated in that plane, giving rise toa two-dimensional problem. In the three-dimensional problem, the entirescrubber was divided into grids and simulation was carried out throughoutthe scrubber for all the nozzles (Ravi et al., 2003)15. As before two deci-sion variables are selected, namely, the liquid-to-gas flow ratio, L/G, thegas velocity at the throat, Vo.t/i- In addition, a new variable, the nozzleconfiguration, Nc is also selected. The bounds used for the three decisionvariables are presented in Table 14.66.

From the experience of Problem 3, a range of 0.3xl0~3- 1.4xlO~3 m3 ofliquid/m3 of air has been chosen as the bounds for L/G. Bounds for the gasvelocity, Vcth, (40 - 120 m/s) were decided based on industrial practice.

Table 14.66. Bounds on the Decision Variables (Problem 4)

Decision Variable Lower Bound Upper BoundL/G (m3liquid/m3 gas) ' 0 .3xl(r3 1.4xlQ-3

Vqth (m/s) 40LO mo

N 1 1 | 5

The third decision variable, the nozzle configuration, Nc is the arrange-ment of nozzles for injection of the droplets. Five different nozzle configu-rations were considered. The optimum nozzle configuration is the configu-ration wherein the nozzles are arranged in a staggered triangular file.

If we now compare the results of Problems 3 and 4, we see that basiccharacteristics of the Pareto optimal sets remain the same for the problems,but the values are slightly different because of the difference in the rigor

334 Madhurnita B. Ray

Fig. 14.5. Schematic diagram of the venturi scrubber used in the optimization (dimen-sions are in cm).

involved in the models used. Typically, higher collection efficiency occursat a lower pressure drop in the 3-D case than that in the 2-D case, whichmay be due to better liquid distribution in the scrubber due to the nozzlearrangement in a staggered triangular pitch. The optimal L/G ratio in 3Dcase varied in a range of 0.4xl0~3-1.0xl0~3 depending on the nozzle ar-rangement, while it was about l.lxlO"3 for the 2D case (Problem 3). Thisreduction in optimum L/G value causes considerable saving in scrubbingliquid, subsequently reducing the pressure drop by 43%.

Computational Parameters

The number of generations required for convergence in the NSGA isproblem specific. In all the problems discussed above, essentially randomdistribution of feasible solutions occurs at the first generation (N9 — 1).However, by the end of tenth generation (N9 = 10), most of the undesiredsolutions die out and a Pareto optimal set seems to emerge, although con-siderable scatter is present at this stage (which dies out quite slowly). Byabout the 100"1 generation, generally the Pareto optimal set has normallybeen reached. Further generations do not affect the nondominated solutions


Fig. 14.6. Optimal solutions (filled circles and squares) for the reference case (Problem3) using three decision variables. Unfilled circles represent the optimal solutions usingthe NSGA with only a single objective function (maximization of 7/o), with three decisionvariables. Ap for the latter are computed values.


much.The most important numerical parameters involved in the NSGA are

the crossover (pc) and mutation (pm) probabilities, and the spreading pa-rameter, a. Unfortunately, the choice of these parameters is also problem-specific, and hence prior knowledge of these is rather limited. For Problems1-4, pc did not have much effect on the Pareto optimal set. However, thesame cannot be said for the effect of the mutation probability, pm . Highervalues of pm result in large gaps in the Pareto optimal set, as well as someamount of scatter occurs in the Pareto optimal set. On the other hand,solutions obtained with lower values of pm show scatter, particularly athigh values of r\0. The best value of this computational parameter has to beestablished by trial. The spreading parameter, a, determines the range cov-ered by the Pareto optimal set, and the best value of this parameter, too,has to be obtained by trial. The numerical parameters used in Problems1-4 are listed in Table 14.67.

Table 14.67. Numerical Parameters Used In Optimization of Gas-Solid SeparationDevices

Computational Parameters Problem 1 Problem 3 Problem 4Maximum number of generations, maxgen 500 80 100Population size, Np 100 ~50 100Probability of crossover, pc 0.65 0.65 0.55Probability of mutation, p m 0.001 0.001 0.001Random Seed 0.87619 0.87619 0.87619Spreading parameter, a 0.015 0.015 0.005Exponent controlling the sharing effect, a 2 2 2Computational time on (CRAY J916) 0.24 s 1.24 s 1.35 s

In this work, NSGA-I which has been used extensively earlier in chemicalengineering problems, was used. However, there are some disadvantages inNSGA-I. For example, the sharing function used to evaluate niche count ofany chromosome requires the values of two parameters, which are difficultto assign a-priori in NSGA-I. The total complexity of NSGA-I is MNp,where M is the number of objective functions, and Np is the number ofchromosomes in the population. In addition, NSGA-I does not use anyelite-preserving operator and so, good parents may get lost in time. Deb etal. (2002) have recently developed an elitist non-dominated sorting geneticalgorithm (NSGA-II) to overcome these limitations.16


14.5. Conclusions

The NSGA has been successfully applied for the optimization of gas-solidseparation devices used for particulate removal from air. Pareto optimalsolutions relating different process variables were obtained. The algorithmis quite robust for generating non-inferior solutions for large-scale complexproblems of industrial significance. A better understanding of the values ofthe computational parameters is required to increase the speed of conver-gence.

Index

a height of the cyclone inlet (m)b width of the cyclone inlet (m)B diameter of the base of the cyclone (m)Co total cost ($/yr)d0 orifice diameter of the nozzle in scrubber (mm)D diameter of the cyclone (m)De diameter of the exit pipe (m)Dp mass-mean diameter of solids (/im)h height of the cylindrical portion of the cyclone (m)H total height of the cyclone (m)/ objective function (dimensionless)L length of the venturi scrubber (m)L/G liquid to gas flow ratio, dimensionlessN number of cyclonesNg generation number, dimensionlesspc probability of crossover, dimensionlesspm probability of mutation, dimensionlessPe penalty function, dimensionlessS depth of the exit pipe of the cyclone (m)Ro half-width of the venturi throat parallel to water injection (m)u decision variable, dimensionlessVi inlet velocity in the cyclone (m/s)Vgth gas velocity at the throat of the scrubber (m/s)VN venturi number, dimensionlessWo width of venturi throat perpendicular to water injection (m)Z aspect ratio, dimensionless


Greek symbols

Ap pressure drop (Pa)

T]o overall collection efficiency

a spreading parameter

References

1. Holland, J. H., Adaptation in Natural and Artificial Systems, (Universityof Michigan Press, Ann Arbor, MI, 1975).2. N. Srinivas and K. Deb, "Multiobjective optimization using nondominatedsorting in Genetic Algorithms". Evol. Comp., 2, 221-248 (1994).3. V. Bhaskar, S.K. Gupta and A.K. Ray, "Applications of multi-objective op-timization in chemical engineering", Reviews in Chemical Engineering, Vol.16 (1), 1-54 (2000).4. K. Mitra, K. Deb and S. K. Gupta, "Multiobjective Dynamic Optimiza-tion of an Industrial Nylon 6 Semibatch Reactor Using Genetic Algorithm,"J. App. Polym. Sci., 69, 69-87 (1998).5. R R. Gupta and S.K. Gupta, "Multiobjective Optimization of an Indus-trial Nylon 6 Semibatch Reactor Using Genetic Algorithm J. App. Polym.Sci., 73, 729-739 (1999).6. J.K. Rajesh, S. K. Gupta, G. P. Rangaiah and A. K. Ray, "Multiobjec-tive Optimization of Steam Reformer Performance using Genetic Algorithm,"Ind. Eng. Chem. Res. 39, 706-717 (2000).7. V. Bhaskar, S.K. Gupta and A.K. Ray. Multiobjective Optimization of anindustrial wiped film PET reactor, AIChE J. 46, 1046-1058 (2000).8. C. C. Yuen, Aatmeeyata, S. K. Gupta and A. K. Ray, "MultiobjectiveOptimization of membrane separation modules using genetic algorithm", J.Memb. Sci., 176 (2), 177-196 (2000).9. D.E. Goldberg. Genetic Algorithms in Search, Optimization, and MachineLearning, (Addison-Wesley, Reading, MA, 1989).10. K. Deb. Optimization for Engineering Design: Algorithms and Examples(Prentice Hall of India, New Delhi, 1995).11. G. Ravi, S. K. Gupta and M. B. Ray. "Multiobjective Optimization of Cy-clone Separators using Genetic Algorithm". Ind. Eng. Chem. Res. 39, 4272-4286 (2000).12. C.J. Stairmand. "The Design and Performance of Cyclone Separators".Trans. Inst. Chem. Eng, 29, 356-383 (1951).13. G. Ravi, S.K. Gupta, S. Viswanathan and M. B. Ray, "Optimization ofVenturi Scrubbers using Genetic Algorithm" Ind. Eng. Chem. Res. 41, 2988-3002 (2002). 14. N.V. Ananthanarayanan and S. Viswanathan. "EstimatingMaximum Removal Efficiency in Venturi Scrubbers", AIChE J. 44, 2549-2560(1998).15. G. Ravi, S.K. Gupta, S. Viswanathan and M. B. Ray, Multiobjective Op-timization of Venturi Scrubbers Using a Three-dimensional Model For Collec-tion Efficiency, Journal of Chemical Technology and Biotechnology, 78(2-3),


308-313 (2003).16. K. Deb, A. Pratap, S. Agarwal and T. Meyarivan, A Fast and ElitistMulti-Objective Genetic Algorithm: NSGA-II, IEEE Transactions on Evolu-tionary Computation 6(2), 182-197 (2002).

CHAPTER 15

MULTI-OBJECTIVE SPECTROSCOPIC DATA ANALYSISOF INERTIAL CONFINEMENT FUSION IMPLOSION

CORES: PLASMA GRADIENT DETERMINATION

R.C. Mancini", S.J. Louis6, I.E. Golovkinc, L.A. Welsera, Y. Ochid, K.Fujitad, H. Nishimurad, J.A. Koche, R.W. Leee, J.A. Delettrez', F.J.

Marshall', I. Uschmann9, E. Foerster^, L. Kleind

a Department of Physics, University of Nevada, RenoDepartment of Computer Science, University of Nevada, Reno

cPrism Computational Sciences, Madison, WisconsinInstitute of Laser Engineering, Osaka University, Osaka, Japan

e Lawrence Livermore National Laboratory, Livermore, California* Laboratory for Laser Energetics, University of Rochester, Rochester, New York9 Institute of Optics and Quantum Electronics, Jena University, Jena, Germany

Department of Physics and Astronomy, Howard University, Washington, D.C.

We report on a spectroscopic method for the characterization of the spa-tial structure of inertial confinement fusion implosion cores based on theself-consistent analysis of simultaneous narrow-band X-ray images andX-ray line spectra. The method performs a search in multi-dimensionalparameter space for the temperature and density gradients that simul-taneously yield the best fits to narrow-band spatial emissivity profilesobtained from X-ray images, and spectral line shapes recorded with crys-tal spectrometers. A multi-objective Niched Pareto Genetic Algorithm(NPGA) was developed to efficiently implement the multi-criteria dataanalysis. The availability of the NPGA is critical for the practical im-plementation of this analysis method, since NPGA-driven searches inparameter space typically find suitable solutions in approximately 105

evaluations of the spectral model out of a total of 1018 possible cases(i.e. size of the parameter space). Furthermore, analysis of solutions onthe Pareto front permits us to address the issue of uniqueness of thesolution and the uncertainty of the optimal solution. The performanceof the NPGA is illustrated with spectroscopic data recorded in a seriesof stable and spherically symmetric implosion experiments where argon-doped deuterium-filled plastic shells were driven with the GEKKO XII(Institute of Laser Engineering, Japan) and OMEGA (Laboratory for

341

342 R.C. Mancini, et. al.

Laser Energetics, USA) laser systems. This measurement is relevant forunderstanding the spectral formation and plasma dynamics associatedwith the implosion process. In addition, since the results are indepen-dent of hydrodynamic simulations they are important for the verificationand benchmarking of detailed hydrodynamic simulations of high-energy-density plasmas.

15.1. Introduction

The determination of plasma core parameters during the implosion of an In-ertial Confinement Fusion (ICF) deuterium-filled plastic-microballoon canprovide critical information about the plasma state in the final stages of theimplosion. A wide variety of particle and radiation based instruments arecurrently used to diagnose implosion dynamics and, in particular, to deter-mine core conditions. In this connection, X-ray spectroscopic measurementshave proven to be a very powerful diagnostic for these high-energy-densityplasmas1. Over the years and with the aid of tracer elements, several typesof emission and absorption X-ray spectral features have been observed andmodeled in detail with the goals of understanding their plasma densityand temperature sensitivity, and in turn investigating their potential forplasma spectroscopy diagnostics. Mostly, these X-ray spectroscopic diag-nostics have relied only on the analysis of spectra and have been applied todeduce emissivity-averaged or effective temperature and density. However,when the line emission extends over the whole plasma source (i.e. implosioncore) the usefulness of these measurements is questionable.

We discuss a method for the determination of spatial gradients in ICFimplosion cores based on multi-objective analysis of X-ray spectroscopicdata. Previous X-ray spectroscopy studies of implosion cores have failedto address the gradient problem, even though hydrodynamic simulationspredicted significant gradients in these plasma conditions. The work thatwe describe here is a unique attempt to address this problem through thesimultaneous analysis of time-resolved X-ray images and X-ray line spectraof the implosion process. As implosion dynamics are an essential compo-nent of ICF and important to applications involving X-ray spectroscopydiagnostics, the quantitative measurement of the hydrodynamic behaviorof implosions is a significant step in developing a predictive capability. Theevolution of the plasma gradients is also of basic relevance to atomic physicsstudies of level population kinetics, electron thermal conduction, radiationenergy coupling and transfer, and spectral line formation.

The idea of the method is based on performing systematic searches in

Plasma Gradient Determination 343

a multi-dimensional parameter space for the temperature and density spa-tial gradients that simultaneously yield the best fits to narrow-band spa-tial emissivity profiles obtained from X-ray images and spatially-integratedspectral line shapes recorded with crystal spectrometers. A multi-objectiveNiched Pareto Genetic Algorithm (NPGA) was developed to efficiently im-plement this multi-objective data analysis technique. Indeed, the availabil-ity of the NPGA is critical for the practical implementation of this analysismethod, since NPGA-driven searches in parameter space typically find suit-able solutions in approximately 105 evaluations of the spectral model (andspatial gradient selections) out of a total of 1018 possible cases.

Although spatially resolved measurements of plasma conditions at thesehigh energy densities have not been carried out in the past, the study of theaverage properties of ICF implosion cores has progressed from experimen-tal measurements of peak electron densities in the compressed core2"6, tothe determination of spatially averaged, but temporally resolved, electrontemperatures, <Te(t)>, and densities, <Ne(t)> 7~li_ The temporally re-solved data, augmented by a comparison with hydrodynamic simulations,provided significant information on the evolution of the implosion15. Par-allel to these experimental developments, the iterative comparison of thespectral data with a theoretical computation of the spectrum emitted at aneffective temperature and densityhas become a standard tool to extract the<Te(t)> and <Ne(t)> for a diagnostic study of the emitting medium10'11.Nevertheless, although the experimental results cited above were a sub-stantial improvement over earlier work, one finds that the investigationof certain fundamental effects in the spectral profile, such as ion dynam-ics and plasma line shifts, require a better characterization of the plasmasource because of the possibility that spatial gradients affect the analysis16.

The spatial gradient determination in implosion cores discussed herealso provides an excellent source of data for an assessment of large-scaleradiation-hydrodynamic simulation codes, since comparison of the mea-sured gradient with computations of the implosion conditions gives an im-portant reference point for modeling the dynamics involved. The analysiscan therefore be considered to be the first step in a program to provide asimulation-free benchmark for both spectral syntheses and hydrodynamicsimulations17.


15.2. Self-Consistent Analysis of Data from X-ray Imagesand Line Spectra

The determination of plasma temperature and density gradients in thecore of an implosion requires the self-consistent modeling and analysis oftime-resolved X-ray line spectra and time-resolved X-ray narrow-band im-ages. This can be accomplished by doping the deuterium-gas fill with smallamounts of a suitable tracer element (e.g. argon) that can provide adequateline radiation signals for the analysis without affecting the hydrodynamics.The argon concentration should also be low enough to ensure that theargon line emission is as optically thin as possible. Analysis of spatially-integrated time-resolved line spectra usually results in spatially averagedvalues of plasma density and temperature, and can provide no informationconcerning the gradients. As an illustration of this problem, Fig. 15.1 dis-plays three combinations of one-dimensional (1-D) temperature and densitygradients that result in almost identical space-integrated spectra of the Is3pXP - Is2 XS He/3 line and associated Li-like satellite transitions in argon.Note that although, for illustrative purposes, these calculations were per-formed for linear gradients, more complicated gradients may in actualityoccur. Hence, to determine and characterize core gradients unambiguously,additional information has to be taken into account in the analysis.

Additional information for the characterization of core gradients can beobtained from the analysis of time-resolved X-ray narrow-band images. Thetwo-dimensional narrow-band images provide a spatial map of the emissiv-ity that is dependent on the spatial gradients in temperature and den-sity. In the implosion data discussed here, the narrow-band X-ray imagesare dominated by line transition emission and have a negligible contribu-tion from continuum emission. Although emissivity maps provide importantspatially-resolved information about the plasma source, they do not imposea sufficient constraint to provide spatial information on both temperatureand density gradients. On the other hand, using the constraints imposed byself-consistently fitting the spectra (whose overall broadening depends onthe density) and the spatially-resolved relative distribution of narrow-bandemissivities at a given time, provides for that time both the temperatureand density as a function of spatial coordinate. Thus, we can extract theelectron temperature Te(r,t) and density Ne(r,t) from this analysis18'19. Theconceptual idea of the analysis method and its implementation is schemat-ically illustrated in Fig. 15.2.

Time-resolved argon He^, He7 (Is4p XP - Is2 1S) and Ly/3 (3p 2P - Is


Fig. 15.1. Argon He/3 line space-integrated spectra (I, bottom right) and spatial emis-sivity profiles (E, bottom left) for three combinations of electron temperature (Te, topleft) and density (Ne, top right) gradients. Note, importantly, that the emission spectrain the lower right would be observably the same for all cases.

Fig. 15.2. Schematic illustration of the spectroscopic method for the determinationof electron temperature and density gradients in the core based on the self-consistentanalysis of data from X-ray images and line spectra.


2S) line spectra, and their associated He- and Li-like satellite transitions,were recorded and used for the analysis. Opacity effects on these lines aresmaller than for the Hea (Is2p XP - Is2 1S) and Lya (2p 2P - Is 2S)lines; in addition, using a small concentration of dopant (argon) will furtherreduce possible opacity effects. To ensure that this condition is satisfied, thespectra were analyzed with a detailed argon K-shell spectral model and codethat can consider both uniform and non-uniform plasmas, and optically thinand optically thick approximations. Opacity effects in the model were takeninto account by self-consistently solving the radiation transport equationand a set of collisional-radiative population kinetics equations20.

For a given set of data (Fig. 15.2), first we perform the analysis ofthe spectra considering a uniform plasma approximation. This allows usto extract the emissivity-averaged electron temperature and density of thecore, under the assumption that the lines are optically thin20. To checkthis assumption, the same spectra is also analyzed considering instead auniform plasma in the optically thick approximation, and the temperatureand density extracted in this way are compared with the optically thinresults. The difference between optically thin and thick analysis results canthen be used as a measure of the importance of opacity effects in the spectra.This idea was previously employed to systematically study opacity effectsin the spectra from argon-doped implosions at the NOVA laser facility ofLawrence Livermore National Laboratory21.

Next, we analyze the spectra using temperature and density gradientswhich are subject to the constraint of reproducing the correct value forthe emissivity-averaged temperature and density obtained with the uniformmodel analysis. Further, the same gradients are also used to fit the spatially-dependent emissivity extracted from the analysis of the argon He/3 X-raynarrow-band image. In this way, a set of self-consistent electron tempera-ture and density gradients are extracted from the data that simultaneouslyyield fits to the line spectrum and the narrow-band emissivity spatial pro-files, further subject to the emissivity average constraint. This procedure isschematically illustrated by the self-consistency iteration loop in Fig. 15.2.The search for suitable gradient functions is performed with the aid of aniched Pareto genetic algorithm technique. Emissivity profiles in the plasmasource (i.e. implosion core) are obtained from narrow-band X-ray imagesusing the Abel inversion procedure22. Although usually discussed for casesof cylindrical geometry, the Abel inversion method can also be consideredfor spherical geometry23 as well as generalized cylindrical geometry cases.Finally, we note that in argon-doped deuterium plasmas electron and ion


number densities can be considered equal since most of the electrons comefrom the ionization of deuterium. Hence, the extracted electron density gra-dients are the same as the ion number density (and mass density) gradients.

15.3. A Niched Pareto Genetic Algorithm forMulti-Objective Spectroscopic Data Analysis

Genetic algorithms are search and optimization algorithms based on themechanics of natural selection24. They are capable of finding solutions inpoorly understood search spaces while exploring only a small fraction of thespace, and can robustly deal with complex non-linear problems. We haveshown that in the case of spectroscopic analysis of implosion-core spatially-integrated line spectra (i.e. a single-objective optimization problem), ge-netic algorithms efficiently search a two-dimensional parameter space tofind the electron temperature and density values that yield best fits to thedata assuming a uniform plasma approximation25. However, as is illustratedin Fig. 15.2, the problem of core gradient determination requires multi-objective data analysis since several pieces of data (i.e. spatially-integratedline spectra and spatially-resolved emissivity profiles) have to be simultane-ously and self-consistently approximated with a single selection of electrontemperature and density gradients. Furthermore, flexible-enough encodingalgorithms of plasma gradients results in large, multi-dimensional searchspaces. Thus, an efficient and robust algorithm is required to effectivelyimplement the spectroscopic analysis illustrated in Fig. 15.2.

Our strategy is to use the principle of Pareto optimality in designing aPareto optimal genetic algorithm26"28. At each generation, there is a set ofnon-dominated solutions in fitness space that form a surface known as thePareto optimal front (or the Pareto front). The goal of a Pareto optimalgenetic algorithm is to find and maintain a representative sampling of thesolutions on the Pareto front. If the criteria are not self-contradictory, thenthere should be a point on the final convex front that satisfies all criteriawell (see Fig. 15.3). In our case, this solution will be considered as thesolution to the multi-objective spectral analysis problem18.

If there is no such point (concave front), an expert decision has to bemade about which (if any) solution on the concave Pareto optimal frontrepresents the most acceptable, physically sound solution. The result of theanalysis in this case may not be reliable. Tracing the Pareto optimal frontalso helps to address the issue of solution uniqueness.

The idea of the analysis is to simultaneously minimize for each of the


Fig. 15.3. Illustration of a two-objective search for successful (concave front, a) andunsuccessful (convex front, b) cases.

objectives the difference x2 between the experimental and synthetic datacomputed with the spectral model and denned by,

x2 = s^(/rp-/feor)2

i

where the Ii represents either intensity or emissivity and o is a weightfactor. A particular choice of the weight factor may have an impact on theperformance of the algorithm. It may also be important for the estimationof uncertainty intervals29. Since our primary goal was to develop and studythe performance of a niched Pareto genetic algorithm, we set the weightfactors to 1 for the spectra, and (1/Iexp)2 for the emissivities. This wasdone to compensate for possible large changes in the range of values of theemissivity profile. Therefore we measure the performance or fitness of eachcandidate as l/\2 (the higher the performance, the better the fit).

The crucial difference between a canonical genetic algorithm and theniched Pareto genetic algorithm is in the implementation of selection. Weimplemented Pareto domination tournament selection where two candidatesare picked at random from the population. A comparison set of individualsis also picked randomly from the population. Each of the candidates is thencompared against each individual in the comparison set. If one candidate isdominated by the comparison set, and the other is not, the latter is selectedfor reproduction. If neither or both are dominated by the comparison set,then we must use sharing to choose a winner. The equivalence class sharingimplemented in our model defines the winner as an individual that has thesmallest number of the other individuals inside its niche. This techniquehelps to maintain diversity along the Pareto front. Niche size gets adjustedautomatically for each generation based on the average area of the front.We also normalize the objective function for each generation so that theobjective function for each criterion ranges from 0 to 1.

In order to increase selection pressure we use an elitist scheme where:


1) members of the current generation and offspring are combined in acommon pool in each generation;

2) the solutions along the Pareto front are selected for the next genera-tion and removed from the pool;

3) the procedure is repeated until the next generation is filled.We have found empirically that elitism combined with uniform crossover

provides reliable and rapid convergence for our problem of spectroscopicanalysis18. The size of the comparison set controls selection pressure. How-ever, when using an elitist scheme, the algorithm is not very sensitive tosize. In our implementation we compare each candidate against 5 individu-als. Probabilities of crossover and mutation are 0.95 and 0.05, respectively.A systematic study was performed by varying the genetic algorithm pa-rameters in order to ensure reliability and optimize performance of thealgorithm18'30. The implementation we discussed above turned out to bethe best for our purposes.

15.4. Test Cases

Before applying the spectroscopic analysis and its implementation to actualdata using a niched Pareto genetic algorithm, we tested it out in a num-ber of cases using "synthetic" data where the solution was known. First,we considered the case of parabolic (with missing linear term) tempera-ture and density gradients. As shown in Fig. 15.4, each gradient formulawas determined by two coefficients that were computed from the values oftwo parameters: the values of the gradients at the center (i.e. T(0) andiV(0)) and edge (i.e. T(R) and N(R)) of the implosion core. Each of theseparameters was allowed to take values in a suitable finite range and was en-coded using five bits. Thus, the chromosome length in this case was 20. Thepopulation size was 100 and we ran the NPGA code for 150 generations.

Fig. 15.4. Encoding and chromosome of parabolic temperature and density gradients.One electron Volt (eV) of temperature is equivalent to 11605K.

350 R.C. Mancini, et. at.

With this gradient encoding and chromosome length and structure,the size of the parameter space (i.e. total number of possible temperatureand density gradient combinations) is 324 = 1,048,576. Hence, an exhaustivesearch of the parameter space would require 324 evaluations of the spectralmodel. The NPGA code finds the right solution after evaluating the spectralmodel for 3,000 to 5,000 temperature and density gradients. This representsa small fraction of the total parameter space for this problem. This is alsoimportant because spectral model evaluations can be computationally ex-pensive and, in order to investigate the uniqueness of the solution, analysisruns are repeated several times for the same dataset but using a differentrandom initialization of the first generation. To illustrate the performanceof the algorithm, Fig. 15.5 shows results for a typical run of this test caseusing two objectives.

The emissivity distribution and the line spectrum in Fig. 15.5 are thoseof the argon He/3 spectral feature20. This is one of the spectral signaturesthat are used in the spectroscopic analysis of X-ray line emission from im-plosion cores, and the electron temperature and number density conditionsare typical of the conditions achieved at the collapse of laser-driven im-plosion experiments10'13"15. The "synthetic" data are characterized by sixspatial zones, and 10% noise has been added to it in order to approxi-mate real data. Good fits to the line spectrum are found first since thespectrum is spatially integrated; however, as the run progresses, good fitsto the spatially-resolved emissivity are also found. Eventually, a convexPareto front begins to develop leading to gradients that simultaneouslyand self-consistently fit well both line spectrum and emissivity profile. Theoptimal solution is extracted from the upper right corner of the convexPareto front, and it coincides with the right solution (see Fig. 15.5). Pointsalong the Pareto front but away from the optimal solution do not satisfyboth objectives simultaneously and can clearly be discarded.

It is also important to investigate points in the vicinity of the optimalsolution. If the gradients associated with these points are similar to those ofthe optimal solution, then the solution is unique and the spread in solutionscan be used as part of the uncertainty estimation. On the other hand,if there are points that satisfy all objectives well but nevertheless havedifferent gradients, then we have alternative solutions and the analysis isambiguous. It is expected that the higher the level of noise in the data, thelarger the chances of finding alternative solutions. For up to 10% of noise,we have found that the solution is unique. However, starting at 15% to 20%of noise level alternative solutions are found and the results of the analysis


Fig. 15.5. Parabolic gradients test case results. Top: early development and propagationof Pareto front in fitness space. Middle: self-consistent fits to spatially-resolved emissiv-ity profile (left) and spatially-integrated line spectrum (right). Bottom: self-consistentdensity (left) and temperature (right) gradients. Gradients in the vicinity of the optimalsolution are also displayed.

become ambiguous.Next, we consider a test case based on "synthetic" data computed using

plasma core gradients from a laser-driven implosion numerical simulationperformed with the one-dimensional Lagrangian radiation-hydrodynamicscode LILAC14. Fig. 15.6 displays the time-history of core electron temper-ature and density gradients calculated by LILAC through the collapse ofthe implosion. In this case, the simulation was done for a plastic target of


937 /im of initial exterior diameter, 24 //m of wall thickness, filled with 20atm of deuterium, doped with 0.1% of argon, and irradiated with a squarelaser pulse of 500 ps duration and 15 kJ of UV (1/3 ^m wavelength) laserenergy. The main shock wave hits the center of the target at t = 1.9 ns.Gradients are shown every 200 ps for a period of 1 ns. At t = 3.0 ns thecore radius reaches a minimum value of about 53 fxm, which correspondsto a convergence ratio for the core of about 8.

Fig. 15.6. Time-history of core temperature (top) and density (bottom) gradi-ents through the implosion collapse computed with the one-dimensional radiation-hydrodynamics code LILAC.

The functional dependence of these gradients on radius cannot be wellapproximated by the simple parabolic gradients considered in the previ-ous test case. Thus, a more general algorithm for encoding implosion coregradients is needed. First, we work with six spatial zones since the implo-sion cores are about 60 /zm in diameter and current X-ray imagers havea spatial resolution of 10 /zm. In each spatial zone temperature and den-sity can take values within a suitable range. Using a 5-bit encoding, thisresults in 32 uniformly-spaced values for both the temperature and the


density. Further, maximum relative changes between adjacent spatial zonesare bounded (typically by 60%), and to avoid unrealistic changes in thesezone-by-zone given gradients a polynomial fit is performed. Thus, the totalnumber of temperature and density gradients described by this algorithm(i.e. the size of the search parameter space) is ((32)6)2=1.2xl018. Now,the chromosome length is 60 and we work with populations of 200 to 300members. In our experience, this gradient-generating algorithm has provento be thorough and flexible to accommodate core gradients through thecollapse of the implosion. However, it may need to be extended or changedto deal with other application problems.

As another test-case illustration of the performance of the spectroscopicanalysis driven by the NPGA code, Figs. 15.7 and 15.8 show the comparisonbetween LILAC simulation gradients (see Fig. 15.6) and the correspondinggradients found by the NPGA. Again, the analysis is based on "syntheticdata" associated with the argon He/3 X-ray image and line spectrum. Forall times, the NPGA code consistently finds the correct gradients and ap-proximates quite well the spatial-coordinate functional dependence.

Fig. 15.7. Comparison of LILAC simulation and NPGA-found gradients for t=2.6nsand t=2.8ns.


Fig. 15.8. Comparison of LILAC simulation and NPGA-found gradients for t=3.0nsand t=3.2ns.

Finally, we note that a parallel version of the NPGA code for spectralanalysis production runs was also developed31. The evaluation of the spec-tral model is computationally more expensive than the execution of theNPGA logic, and the evaluations of the spectral model are independentfrom each other. Hence, the parallelization of the spectral model evalu-ations in the NPGA code is quite straightforward and relatively easy toimplement. It should result in significant speed-up so long as the cost ofpopulation member evaluation dominates the communication cost. Indeed,we did a study of the parallel NPGA performance with number of nodes ina PC-cluster and found a linear speed-up with up to 20 nodes.

15.5. Application to Direct-Drive Implosions at GEKKOXII

The laser-driven direct-drive implosion experiments were performed at theOsaka University GEKKO XII laser system. The array of diagnostics in-strumentation included a monochromatic X-ray framing camera, essentialfor the spatial resolution of the plasma. The X-ray monochromatic imageswere used to obtain, for the first time in an implosion experiment, spatially


and temporally resolved data on the collapsing core. In addition, time-resolved but spatially averaged, streak spectrograph spectral data for theusual spatially averaged diagnostic were recorded. The drive consisted of a12 beam, 2.55 kJ Nd glass laser operating at 526 nm. Random phase plateswere used to smooth individual laser beams. The laser pulse was composedof a 0.2 ns pre-pulse followed by a 1.6 ns square pulse with rise time of0.05 ns. Targets were plastic shells, 500 /jm in diameter, with 8 /an wallthickness, filled with 30 atm of deuterium and doped with 0.075 atm ofargon (for diagnostic purposes). The implosion was diagnosed by recordingboth the compressed core image and the argon line spectrum. In particu-lar, time-resolved X-ray monochromatic images and simultaneous spatiallyintegrated X-ray spectra of the He/3 spectral feature were recorded. For thespatial information, a two-dimensional X-ray monochromatic framing cam-era imaging the central 19 eV of the He/? line emitted by the argon in thecore was employed32. This X-ray imager monitors the implosion symme-try and provides up to 5 frames with At=40 ps duration, 50 ps interframetime, and 10 fxm spatial resolution. The X-ray spectrometer consisted ofa flat RbAP (100) crystal coupled to an X-ray streak camera with 10 pstime resolution and resolving power, A/AA, of 600. It is important thatthe entire core of the implosion is in the field of view of the spectrographso that a spatial average over the imploded core radius is recorded forthe determination of both <Te(t)> and <Ne(t)>. As an example of bothtypes of data, the second frame, At2, recorded by the imager in the interval342-382 ps and the X-ray streak spectrograph, with the same time intervalmarked, are presented in Fig. 15.9. The image displays a central maximumin the intensity, as expected in these spherical implosions. The uniform-plasma-model analysis of the time resolved spectral data, integrated overthe 40 ps time interval corresponding to the second imager frame, yieldsthe spatially averaged values, <Te >=620 eV and <Ne >=3xlO23 cm"3.The corresponding synthetic spectra fit to the data is shown in Fig. 15.9.The next step is to work with the image data containing the spatial infor-mation for the determination of the narrow band emissivity profile in theimplosion core and map it onto a radial (spherical) coordinate system usinga spherically symmetric Abel inversion23.

Given the spatial emissivity profile and the spatially-integrated linespectrum, the NPGA code searches parameter space for the temperatureand density gradients that yield the best fit to these data. The NPGA codeefficiently finds the gradients after evaluating about 105 temperature anddensity gradients, out of a total number of 1018 possible gradients (size


Fig. 15.9. Argon He/3 image (top left) and line spectrum (top right) data for GEKKOXII shot 22091. Fits yielded by the self-consistent gradients found with the NPGA codefor the emissivity profile (middle left) and line spectrum (middle right). Also shown is theline spectrum fit based on a uniform model analysis. Bottom: self-consistent gradientsfound with the NPGA-driven spectral analysis.

of parameter space). The self-consistent gradients, and the fits they yieldto the data, are displayed in Fig. 15.9. Their emissivity-weighted averagesare 600 eV for the electron temperature and 3xlO23 cm"3 for the electronnumber density. These values are consistent to within 4% of those obtainedwith the uniform model analysis. The gradients' uncertainties indicatedin Fig. 15.9 account for deviations from (perfect) spherical symmetry andthe spread of solutions about the optimal Pareto front solution. Finally,Fig. 15.10 shows the time-history of core gradients extracted by the NPGAcode from a sequence of four consecutive framed images and their corre-


sponding line spectra. In all cases we found that temperature and densityare counter-correlated. This is consistent with the idea of an isobaric core,and it is also in good agreement with one-dimensional hydrodynamic codesimulations17.

Fig. 15.10. Time-history of core gradients for GEKKO XII shot 22091 determined bythe NPGA-driven spectral analysis. Each time interval represents a time-integration over40 ps.

15.6. Application to Indirect-Drive Implosions at OMEGA

The indirect-drive implosion experiments were performed at the OMEGAlaser facility of the Laboratory for Laser Energetics at the University ofRochester, with support from the National Laser Users' Facility programand in collaboration with Lawrence Livermore National Laboratory. Thetarget consisted of a gold cylindrical hohlraum with a plastic capsule in-side. The gold hohlraum was 2500 /xm long, 1600 fim in diameter, and had1200 fim laser entrance holes (LEH). The plastic capsule had an externaldiameter of 510 fim with a wall thickness of 35 fim, i.e., the initial corediameter was 440 /xm, and it was placed at the center of the hohlraum.The core was filled with 50 atm of deuterium and 0.1 atm of argon. Theargon tracer is added to the core fill for diagnostic purposes and resulted


in a typical optical depth of the argon He/? line of 0.3, and less for theLy/3 line. The capsules were designed so that the hohlraum X-ray radiationdid not directly penetrate to the fill gas. These hohlraum targets were ir-radiated with 30 UV OMEGA beams, split into 15 beams per LEH thatwere arranged in two cones of 5 and 10 beams each. The beam cones werepointed to produce two rings of beams on each end of the hohlraum. Thelaser energy per beam was 500 J, for a total UV laser energy of 15 kJ, pro-ducing a hohlraum radiation temperature of 210 eV. Three diagnostic holesplaced on the side of the hohlraum provided lines-of-sight for the MMI-2and GMXI X-ray imagers, and for a streaked X-ray crystal spectrometer.MMI-2 is a pinhole-array flat multi-layer mirror Bragg reflector instrumentthat records numerous narrow-band (~75 eV/image) X-ray images in thephoton energy range from 3000 eV to 5000 eV with ~ 10 /xm spatial resolu-tion. The pinhole array is comprised of 1280 pinholes, each 5 fim in diameterand separated by 70 /zm, and is attached to the hohlraum target 15 mmfrom the capsule. MMI-2 operates with a magnification of 8. The wave-length dispersion effect is provided by a WBC4 multi-layer mirror, withlayer thickness of approximately 15 A. Data from MMI-2 can be used toconstruct narrow-band images from several lines as well as continuum im-ages. In addition, a space-integrated X-ray line spectrum can be extractedfrom MMI-2 data. The line spectrum covers the spectral range of the He/3and Ly/? lines and their associated Li- and He-like satellite transitions33"36.

We focus on the analysis of data recorded with MMI-2. Fig. 15.11 showsthe image data recorded by MMI-2 in OMEGA shot 26787. A characteristicof MMI-2 core image data is that while both horizontal (x) and vertical (y)axes represent spatial resolution, the y-axis is also a spectral resolutionaxis. Thus, several groups of adjacent core images display narrow-bandsof bright (line) emission covering different portions of the image. Workingwith groups of images (see Fig. 15.11), core X-ray images associated withdifferent narrow-band ranges can be extracted from the data33'35.

In addition, a wide horizontal integration of the image data produces thespatially-integrated line spectrum. Given the narrow-band spatial emissiv-ity profiles obtained from the reconstructed core images and the spatially-integrated line spectrum, we performed the spectroscopic analysis of coregradients using the NPGA code. Each dataset is analyzed at least ten timesto check for solution uniqueness, each time starting with a different randominitialization of the first generation. We found that the NPGA code alsoworks very well for data from indirect-drive implosions, and finds the so-lution after approximately 105 model evaluations (i.e. gradient selections).


Fig. 15.11. MMI-2 data from OMEGA shot 26787. Top: image data covering the spec-tral region of Lya, He/3 and Ly/3 line emissions. Middle: extraction of six sub-imagesthat contain Ly/3 line emission. Bottom: Ly/3 core image constructed from the additionof the six Ly/3 sub-images and the substraction of a nearby continuum image.

The self-consistent gradients, and the fits they yield to the data, are shownin Fig. 15.12. Comparison between emissivity-weighted averages and uni-form model analysis results is also good36.

15.7. Conclusions

We have discussed the use of a niched Pareto genetic algorithm for thedetermination of plasma temperature and density gradients through thecollapse of laser-driven implosion cores based on the self-consistent analy-sis of simultaneous spatially-resolved X-ray image data and the spatially-integrated X-ray line spectrum. The idea is to explore a parameter space ofgradient functions searching for gradients that yield the best fits to X-raynarrow-band emissivity profiles and line spectra. In addition, the emissivity-weighted averages of these gradients are subject to matching the results ofthe uniform model analysis. Since the analysis involves satisfying (simulta-neously) multiple objectives and searching in multi-dimensional parameterspaces, it does require an efficient and robust algorithmic implementationin a computer code. In this connection, we have developed a niched Paretogenetic algorithm that drives the spectroscopic analysis and successfullyfinds self-consistent temperature and density gradients. The method was


Fig. 15.12. Self-consistent gradients for OMEGA shot 26787 (bottom), and the fits theyyield to the emissivity spatial profile (He/3, top right) and the spatially-integrated linespectrum (top left) including the argon He/3 (3680 eV) and Ly/3 (3935 eV) line emissions.Emissivity-weighted averages of the gradients and uniform model analysis results bothyield 950 eV for the electron temperature and 1.3xlO24 cm"3 for the electron numberdensity.

first applied to a number of test cases based on "synthetic" data wherethe gradient solution was known, and subsequently applied to real-worldcases with data recorded in direct- and indirect-drive laser-driven implo-sion cores. In all cases, the NPGA code successfully solved the inversionproblem posed by the spectroscopic analysis.

An important component of the analysis method is the gradient encod-ing algorithm. In an effort to be thorough and flexible, our current gradientencoding algorithm results in large parameter spaces. An exhaustive searchof these parameter spaces would involve evaluating the spectral model forabout 1018 different combinations of temperature and density gradients.In this regard, the performance of the NPGA code is impressive since itfinds the solution in approximately 105 model evaluations. This is impor-tant since spectral model evaluations can be (depending upon the amountof physics included in the model) computationally expensive. We empha-size that the NPGA code we have developed is general and thus it can beapplied to other problems of spectroscopic analysis.

Recent new applications of the NPGA technique to plasma spec-troscopy include three-objective analysis of several X-ray narrow-band im-


ages and line spectrum, and multi-objective analysis of continuum-basedX-ray images for diagnosis of cryogenic implosions37. Preliminary resultslook promising and suggest yet another spectroscopic analysis inversionproblem where the NPGA technique can also play a critical role.

Acknowledgments

This work was supported by DOE-NLUF Grants DE-FG03-01SF22225 andDE-FG03-03SF22696, NSF Grant 9624130, ONR contract 00014-03-1-0104,LLNL under the auspices of DOE contract W-7405-ENG-48, DOE-HEDSDE-FG03-98DP00213, and the Japan-Germany and Japan-US collabora-tion program of JSPS.

References

1. Plasma spectroscopy in inertial confinement fusion and soft x-ray laserresearch, H. Griem, Phys. Fluids B 4, 2346 (1992).2. Direct measurement of compression of laser-imploded targets using x-rayspectroscopy, B. Yaakobi, D. Steel, E. Thorsos, A. Hauer and B. Perry, Phys.Rev. Lett. 39, 1526 (1977).3. Compression measurements of neon-seeded glass microballoons irradiatedby CO2 laser light, K.B. Mitchell, D.B. van Husteyn, G.H. McCall, P. Leeand H.R. Griem, Phys. Rev. Lett. 42, 232 (1979).4. X-ray spectroscopic diagnosis of laser-produced plasmas, with emphasis online broadening, J.D. Kilkenny, R.W. Lee, M.H. Key and J.G. Lunney, Phys.Rev. A 22, 2746 (1980).5. Time-resolved spectroscopic measurement of high-density in argon-filledmicroballoon implosions, C.F. Hooper, Jr., D.P. Kilcrease, R.C. Mancini,L.A. Woltz, D.K. Bradley, P.A. Jaanimagi and M.C. Richardson, Phys. Rev.Lett. 63, 267 (1989).6. High Z x-ray spectroscopy of laser-imploded capsules, B.A. Hammel, P.Bell, C.J. Keane, R.W. Lee and C.L.S. Lewis, Rev. Sci. lustrum. 61, 2774(1990).7. X-ray spectroscopic measurements of high densities and temperaturesfrom indirectly driven inertial confinement fusion capsules, B. Hammel, C.J.Keane, M.D. Cable, D.R. Kania, J.D. Kilkenny, R.W. Lee and R. Pasha,Phys. Rev. Lett. 70, 1263 (1993).8. Study of indirectly driven implosion by x-ray spectroscopic measurements,H. Nishimura, T. Kiso, H. Shiraga, T. Endo, K. Fujita, A. Sunahara, H. Tak-abe, Y. Kato and S. Nakai, Phys. Plasmas 2, 2063 (1995).9. Spectroscopic analysis of hot dense plasmas: a focus on ion dynamics, C.F.Hooper, Jr., D.A. Haynes, Jr., D.T. Garber, R.C. Mancini, Y.T. Lee, D.K.Bradley, J. Delettrez, R. Epstein and P.A. Jaanimagi, Laser Part. Beams 14,713 (1996).


10. The effects of ion dynamics and opacity on Stark broadened argon lineprofiles, D.A. Haynes, Jr., D.T. Garber, C.F. Hooper, Jr., R.C. Mancini, Y.T.Lee, D.K. Bradley, J. Delettrez, R. Epstein and P.A. Jaanimagi, Phys. Rev.E 53, 1042 (1996).11. Spectroscopy of compressed high energy density matter, N. Woolsey, A.Asfaw, B. Hammel, C. Keane, C.A. Back, A. Calisti, C. Mosse, R. Stamm,B. Talin, J.S. Wark, R.W. Lee and L. Klein, Phys. Rev. E 53, 6396 (1996).12. Evolution of electron temperature and electron density in indirectly drivenspherical implosions, N. Woolsey, B.A. Hammel, C.J. Keane, A. Asfaw, C.A.Back, J.C. Moreno, J.K. Nash, A. Calisti, C. Mosse, R. Stamm, B. Talin, L.Klein and R.W. Lee, Phys. Rev. E 56, 2314 (1997).13. Competing effects of collisional ionization and radiative cooling in iner-tially confined plasmas, N. Woolsey, B.A. Hammel, C.J. Keane, C.A. Back,J.C. Moreno, J.K. Nash, A. Calisti, C. Mosse, R. Stamm, B. Talin, A. Asfaw,L.S. Klein and R.W. Lee, Phys. Rev. E 57, 4650 (1998).14. Characterization of direct-drive-implosion core conditions on OMEGAwith time-resolved argon K-shell spectroscopy, S.P. Regan, J.A. Delettrez,R. Epstein, P.A. Jaanimagi, B. Yaakobi, V.A. Smalyuk, F.J. Marshall, D.D.Meyerhofer, W. Seka, D.A. Haynes, Jr., I.E. Golovkin and C.F. Hooper, Jr.,Phys. Plasmas 9, 1357 (2002).15. Time- and space-resolved x-ray spectroscopy for observation of the hotcompressed core region in a laser driven implosion, Y. Ochi, K. Fujita, I. Niki,H. Nishimura, N. Izumi, A. Sunahara, S. Naruo, T. Kawamura, M. Fukao,H. Shiraga, H. Takabe, K. Mima, S. Nakai, I. Uschmann, R. Butzbach andE. Forster, J. Quant Spectrosc. Radiat. Transfer 65, 393 (2000).16. The effects of gradients on the diagnostic use of spectral features fromlaser compressed plasmas, R.W. Lee, J. Quant. Spectrosc. Radiat. Transfer2, 87 (1982).17. Temporal evolution of temperature and density profiles of a laser com-pressed core, Y. Ochi, I. Golovkin, R. Mancini, I. Uschmann, A. Sunahra,H. Nishimura, K. Fujita, S. Louis, M. Nakai, H. Shiraga, N. Miyanaga, H.Azechi, R. Butzbach, E. Forster, J. Delettrez, J. Koch, R.W. Lee, and L.Klein, Rev. Sci. lustrum. 74, 1683 (2003).18. Spectroscopic modeling and analysis of plasma conditions in implosioncores, I.E. Golovkin, PhD Dissertation, University of Nevada, Reno (2000).19. Spectroscopic determination of dynamic plasma gradients in implosioncores, I. Golovkin, R. Mancini, S. Louis, Y. Ochi, K. Fujita, H. Nishimura,H. Shirga, N. Miyanaga, H. Azechi, R. Butzbach. I. Uschmann, E. Forster, J.Delettrez, J. Koch, R.W. Lee, L. Klein, Phys. Rev. Lett. 88, 045002 (2002).20. High-order satellites and plasma gradients effects on the argon He/5 lineopacity and intensity distribution, I.E. Golovkin and R.C. Mancini, J. Quant.Spectrosc. Radiat. Transfer 65, 273 (2000).21. Opacity analysis of the He/3 line in argon-doped indirect drive implosionsat NOVA, I.E. Golovkin, R.C. Mancini, N.C. Woolsey, C.A. Back, R.W. Leeand L. Klein, Proceedings of the First International Conference on InertialFusion and Science Applications, page 1123 (Elsevier Sc. Pub., 2000).


22. Transformation of observed radiances into radial distribution of the emis-sion of a plasma, Bockasten, K., Journal of the Optical Society of America51, 943 (1961).23. Abel inversion of cryogenic laser target images, B. Yaakobi, F.J. Marshalland J. Delettrez, Optics Comm. 133, 43 (1997).24. D.E. Goldberg, Genetic Algorithms in Search, Optimization and MachineLearning (Addison-Wesley Pub., 1989).25. Analysis of x-ray spectral data with genetic algorithms, I.E. Golovkin,R.C. Mancini, S.J. Louis, R.W. Lee and L. Klein, J. Quant. Spectrosc. Ra-dial Transfer 75, 625 (2002).26. Multi-objective optimization using the niched Pareto genetic algorithm,J. Horn, N. Nafpliotis and D.E. Goldberg, Proceedings of the First IEEEConference on Evolutionary Computation, p82 (1994).27. C.A. Coello-Coello, D.A. Van Veldhuizen and G.B. Lamont, Evolution-ary Algorithms for Solving Multi-Objective Problems (Kluwer Academic Pub.,2002).28. K. Deb, Multi-Objective Optimization using Evolutionary Algorithms (J.Wiley Pub., 2001).29. R.L. Coldwell and G.I. Bamford, Theory and Operation of Spectral Anal-ysis using ROBFIT (American Institute of Physics Pub., 1991).30. Multi-criteria search and optimization: an application to x-ray plasmaspectroscopy, I.E. Golovkin, R.C. Mancini, S.J. Louis, R.W. Lee and L. Klein,Proceedings of the 2000 Congress of Evolutionary Computation, pl521 (IEEEPress, 2000).31. Parallel implementation of niched Pareto genetic algorithm code for x-rayplasma spectroscopy, I.E. Golovkin, R.C. Mancini and S.J. Louis, Proceedingsof the 2002 Congress on Evolutionary Computation, (IEEE Press, 2002).32. Time-resolved ten-channel monochromatic imaging of inertial confine-ment fusion plasmas, I. Uschmann, K. Fujita, I. Niki, R. Butzbach, H.Nishimura, J. Funakura, M. Nakai, E. Forster and K. Mima, Applied Op-tics 39, 5865 (2000).33. Processing and analysis of x-ray line spectra and multi-monochromaticx-ray images for implosion core gradient determination, L.A. Welser, MSThesis, University of Nevada, Reno (2003).34. Spectroscopic determination of core gradients in inertial confinement fu-sion implosions, L.A. Welser, R.C. Mancini, I.E. Golovkin, J.A. Koch, H.E.Dalhed, R.W. Lee, F.J. Marshall, J.A. Delettrez and L. Klein, American In-stitute of Physics Conf. Proc. 635, 61 (2002).35. Processing of multi-monochromatic x-ray images from indirect drive im-plosions at OMEGA, L.A. Welser, R.C. Mancini, J.A. Koch, S. Dalhed, R.W.Lee, I.E. Golovkin, F. Marshall, J. Delettrez and L. Klein, Rev. Sci. Instrum.74, 1951 (2003).36. Analysis of the spatial structure of inertial confinement fusion implosioncores at OMEGA, L.A. Welser, R.C. Mancini, J.A. Koch, N. Izumi, H.E.Dalhed, H. Scott, T.W. Barbee, Jr., R.W. Lee, I.E. Golovkin, F. Marshall, J.Delettrez and L. Klein, J. Quant. Spectrosc. Radiat. Transfer 81, 487 (2003).


37. Multi-spectral imaging of continuum emission for determination of tem-perature and density profiles inside implosion plasmas, J.A. Koch, S. Haanand R.C. Mancini, J. Quant. Spectrosc. Radiat. Transfer (2004, in press).

CHAPTER 16

APPLICATION OF MULTIOBJECTIVE EVOLUTIONARYOPTIMIZATION ALGORITHMS IN MEDICINE

Michael LahanasDepartment of Medical Physics and Engineering

Klinikum Offenbach, Starkenburgring 6663069 Offenbach am Main, Germany


We present an overview of the application of multiobjective evolutionaryalgorithms (MOEAs) in medicine. We describe how MOEAs are usedfor image processing, computer aided diagnosis, treatment planning anddata mining tasks. The benefits of the use of MOEAs in comparison toconventional methods are discussed.

16.1. Introduction

Multiple objectives have to be considered for many real world problems andan increasing number of such problems can be solved by multiobjectiveevolutionary optimization algorithms (MOEAs). l Previously, by using aweighted sum of the individual objectives, the multiobjective (MO) problemwas transformed into a specific single objective (SO) optimization problemsolved by SO optimization methods.

A review of the applications of evolutionary algorithms in medicine hasbeen presented by Pena-Reyes et al. 2 We consider here the applicationof MOEAs in medicine. MOEAs are now used for the solution of manymedical problems, starting from the reconstruction of medical images fromprojections, the analysis of data for the diagnosis of symptoms up to treat-ment optimization. MOEAs can be used in scheduling optimization wherethe hospital resources have to be optimally used considering various con-straints. For data mining tasks, such as partial classification, MOEAs canbe applied for the discovery of rules in medical databases.

MOEAs are used in medicine for the solution of inverse problems. What

365

366 M. Lahanas

is meant in simplistic terms: first you know the ideal answer, and secondyou take into account any constraints and mathematically determine theoptimum parameter values to provide the ideal answer. In other wordsyou have the result and the inverse problem is to determine the cause ofthis result. We have to solve inverse problems in order to determine theinternal structure using measurements performed outside the human body,that is, non-invasively. This can be done e.g. by measuring radiation, X-ray,ultrasound etc. that passes through the body.

Three strategies can be used for solving MO optimization problems:

• An a priori method. The decision making (DCM) is specified interms of a scalar function and an optimization engine is used to ob-tain the corresponding solution. This approach requires knowledgeof the optimal weights (importance factors). Often such knowledgedoes not exist and the optimization procedure has to be repeated,by trial and error methods with different sets of weights, until asatisfactory solution is found.

• An a posteriori method. An optimization engine exists which findsall solutions. Decision making is applied at the end of the optimiza-tion manually, or using a decision engine. This method decouplesthe optimization from the decision making process. A new decisionis possible without having to repeat the optimization.

• Mixture of a priori and a posteriori methods. During the optimiza-tion periodically information obtained may be used to reformulatethe goals as some of these can not be achieved. Such a methodis used by Yan Yu 3 for the solution of radiotherapy treatmentplanning problems.

We consider only applications using the a posteriori method. The maintwo tasks of this approach are:

• Obtaining a representative set of non-dominated solutions.• Using the trade-off information to select a solution from this set,

i.e. the decision making process.

16.2. Medical Image Processing

Three-dimensional imaging modalities such as x-ray computed tomography(CT), 3D-ultrasound or magnetic resonance (MR) imaging provide impor-tant information to the physician. Image reconstruction from projectionsis a key problem in medical image analysis. The image to be reconstructed

Application of Multiobjective Evolutionary Optimization Algorithms in Medicine 367

should reproduce as close as possible the measured projections. The recon-struction can be viewed as an optimization process that has to consider theeffects of noise in the measured projections.

An important task in medicine is the segmentation of the images inanatomical structures that can provide information such as the volume ofa tumor. Multiple characteristics, so-called features, are used for the seg-mentation process, such as moments, fractal dimensions etc. MOEAs wereapplied for the reconstruction of 3D objects such as the left ventricle 4 andfor the selection of an optimal subset of features for optimal segmentationresults for MR brain images. 5

16.2.1. Medical Image Reconstruction

A MOEA has been proposed by Xiaodong Li et al 6 for the CT imagereconstruction problem. Three objectives were considered:

• Minimize the sum of the squared error between the original pro-jection data and the re-projection data.

A(x) = ( G - H x ) T ( G - H x ) (1)

where x is an n dimensional projection data vector, H is an m x nprojection matrix, G is an m dimensional projection data vector.

• Maximize the entropy as a measurement of the global imagesmoothness. This is important if the image is contaminated withnoise.

n

Mx) = Ylxjlog(xj) (2)i=i

• Optimize the local smoothness in the neighborhood Nj of the j-thpixel of the reconstructed image:

" 1/s(x) = ^v{xj,Xi)\xi G Nj,v(xi,Xj) = - ^2(xi-Xj)

2 (3)

For the optimization the MO genetic local search algorithm MOGLS 7

was used.The selection probability -P(x') for the i-th solution x* from N solutions

is given by

p(*1) = v ; ( , j U (4)

368 M. Lahanas

where the weight vector w = (wi,W2,w3) is used for /(x):

3

/(x) = ^ W j / i ( x ) (5)j=i

Initially a small parameter value for A is used that increases towardsthe end of the optimization. A window crossover is used and a local searchis applied. The number of steps and the step size are varied during theoptimization.

The parameters to be optimized are the grey values of the reconstructedimage. A neighborhood is defined for two images using a parameter 5. Alarge step size S is used initially for the local search with only a few opti-mization steps. The step size decreases and the number of steps increasesduring the evolution in order to improve the convergence of the algorithmand the quality of the obtained solutions.

The algorithm can be described by the following steps:

(1) Initialize Ng solutions with random images.(2) Find the non-dominated set.(3) For each individual.

(b) Produce a direction specified by the weights (wi,w^,w3).(b) Select a pair of parent solutions using Eq. 4.(b) Apply crossover.(b) Select best children from crossover and apply local search at

specified direction.

(4) Select Ne elite solutions using Eq. 4 and apply for each a localsearch.

(5) Combine the solutions Ng and Ne.(6) If the termination conditions are satisfied stop else go to step (2).

The reconstruction results using the Shepp-Logan phantom were com-pared with the fast backprojection Matlab 5.2 algorithm with and with-out random noise added to the data. The authors imply that the use ofMOGLS provides a better control of the reconstructed image, especiallyfor noisy data, than the conventional algorithm. The reconstructed imagesdepends on how the single objectives are weighted. The MO approach pro-vides in principle a set of all possible images out of which the optimal canbe selected.


16.3. Computer Aided Diagnosis

We can define computed-aided diagnosis (CAD) as the diagnosis a physicianmakes using output from a computerized analysis of medical data. Suchinformation could be the malignancy likelihood from a breast mammogram.A data set of features from both normal (without disease) and abnormal(with disease) cases is used for "training" the classifier i.e. determiningthe classifier parameter values so that it correctly classifies other data setsof unknown pathology. The training of a classifier can be viewed as anoptimization process where the quantity to be optimized is the performanceof the classifier on an independent data set.

Binary classifiers 8 consider two objective functions: The sensitivity de-scribing how well they classify the abnormal cases and the specificity de-scribing how well they classify the normal cases. There is a trade-off betweenthese two objectives. Traditional methods of classifier training combinethese two objective functions, or analogous class performance measures, intoa single scalar objective function optimized by SO optimization techniques.Various combination functions are tried until a suitable objective functionis found. 9 Most classifiers do not aggregate sensitivity and specificity di-rectly such as artificial neural networks (ANN) that use a sum-of-squareserror function. I0 A binary classifier separates two classes of observationsand assigns new observations to one of the two classes the normal (no dis-ease evident) and abnormal (indicative of disease) class, denoted by pn andpa, respectively.

The set of features corresponding to an observation can be expressedas a vector x = [x\,X2,... ,xp]. The space spanned by the feature vec-tor is denoted by S. An automated classifier uses a parameter vector wto partition S completely into two disjoint sets of observations: The setCn(w) that belongs to class pn and the set Ca(w) belonging to class pa,i.e. Cn(w) U Ca(w) = S and Cn(w) n Ca(w) = 0.

The vector w can represent, for example, the weights of an ANN orthe threshold values in a rule-based classifier. Given a measurement x, theclassifier assigns x to class pn if x € Cn (w) or to class pa if x £ Ca (w).

For MO diagnostic classification the members of the Pareto-optimal setcorrespond to operating points on an optimal receiver operating character-istic (ROC) curve 11, whose performances describe the limiting sensitivity-specificity tradeoffs that the classifier can provide for the given trainingdata set.

370 M. Lahanas

16.3.1. Optimization of Diagnostic Classifiers

Kupinski et al u applied the niched Pareto genetic algorithm NPGA 12

to optimize the performance of two diagnostic classifiers and to generateROCs. A linear classifier and an ANN were trained both by conventionalmethods and by NPGA. For NPGA a binary representation was used for theparameter vector w. The optimization was performed using 100 generations.A training set of 100 normal and 100 abnormal cases were used and for theevaluation a set of 10000 normal and 10000 abnormal cases respectively. Thepopulation size was 500 members. A single point crossover and standardmutation was used. The crossover and mutation probability was 0.3 and 0.05respectively. These parameters were found to be suitable for the problemsstudied even if the crossover rate seems to be unusually small. The NPGAconvergence depends strongly on the tournament size and a value of fourwas used. The sharing parameter was 0.1 or 10% of the range of eachobjective.

For the linear classifier three parameters are optimized. For the train-ing of the ANN with two inputs, two hidden units and one output, nineparameters were optimized. A very large population of 3000 solutions wasused.

For both cases, in general, the results using NPGA were superior tothe results obtained with conventional training methods. The conventionalmethod for the ANN optimization was sensitive on local minima, much morethan NPGA. The task of classifier optimization and ROC curve generationare combined with NPGA into a single task. It was demonstrated thatconstructing the ROC curve in this way may result in a better ROC curvethan is produced by conventional methods.

The optimization with NPGA requires more computation time than con-ventional non-stochastic optimization methods. For the conventional ROCcurve generation 20 scalar optimizations were used producing solutions thatwere not evenly distributed. NPGA produced a much larger number of uni-formly distributed points on the ROC curve. If more classes have to beconsidered in the classification then the performance of NPGA over theconventional methods in terms of optimization time will be more advanta-geous.

16.3.2. Rules-Based Atrial Disease Diagnosis

An example of CAD using MOEAs for the diagnosis of Paroxysmal AtrialFibrillation (PAF) was presented by F. Toro et al. 13 The heart arrhythmia


causes most frequently embolic events that can generate cerebrovascularaccidents. The diagnosis of patients that suffer PAF used the analysis ofElectrocardiogram (ECG) traces with no explicit fibrillation episode. Thisnon-invasive examination can be used to decide whether more specific andcomplex diagnosis testing is required.

A database for PAF diagnosis applications was used that included reg-isters obtained from 25 healthy individuals and 25 patients diagnosed withPAF. The ECG register was described by 48 parameters (pi ..-Pis) thatcharacterize each subject. The diagnosis was based on weighted thresholddependent decision rules determined by a MOEA applied to improve theability to automatically discriminate registers of the two groups with a cer-tain degree of accuracy. For each parameter four different decision ruleswere used:

(1) If Pi < UiiLowi) then CPAF = CPAF + Wa

(2) If P i < Ui(Low2) then CPAF = CPAF - Wi2

(3) If pi > UiiHigh) then CPAF = CPAF + Wi3

(4) If > Ui(High2) then CPAF = CPAF - Wi4

where U represents different thresholds and weights Wij € [0,1]. For the48 parameters we have a total of 192 weights and 192 threshold parameters.CPAF is a level that determines the final diagnosis. After a statistical studya subset of 32 rules and their associated 32 thresholds was selected thatmaximizes the discrimination power of the classifier.

If the CPAF level is within a security interval [-F, F] then there is notenough certainty about the diagnosis and the case is left undiagnosed. Thediagnosis is positive (a PAF patient) if CPAF > F and negative if CPAF <F. The MO procedure uses two optimization objectives: the classificationrate CR and the coverage level CL:

(1) CR = Number of correct diagnosed cases / Number of diagnosedcases

(2) CL — Number of diagnosed cases / Total number of cases

Two PAF diagnosis cases have been considered:

• Optimization of weights of decision rules given by an expert. Inthis case the chromosome length is 32.

• Optimization of threshold and weights of the decision rules givenby an expert. In this case the chromosome length is 64.

Three MOEA algorithms were tested: The Strength Pareto Evolutionary

372 M. Lahanas

Algorithm SPEA 14, the Single Front genetic Algorithm SFGA 15 and theNew Single Front Genetic Algorithm NSFGA. 16

A mutation and crossover probability 0.01 and 0.6, respectively was usedand each algorithm evolved 1000 generations with a population size set to200.

NSFGA, SFGA and SPEA showed a similar performance. The best re-sults were obtained when both threshold and weights were optimized. Theresults are similar to other results using classic schemes but the MO op-timization leads to multiple solutions that can be of interest for certainpatients who suffer from other disorders and certain solutions could bemore suitable.

16.4. Treatment Planning

Every year more than one million patients only in the United States willbe diagnosed with cancer. More than 500000 of these will be treated withradiation therapy. 17 Cancer cells have a smaller probability than healthynormal cells to survive the radiation damage. The dose is the amount of en-ergy deposited per unit of mass. The physical and biological characteristicsof the patient anatomy and of the source, such as intensity and geometryare used for the calculation of the dose function, i.e. the absorbed doseat a point in the treatment volume. The dose distribution specifies thecorresponding three-dimensional non-negative scalar field.

A dose distribution is possible if there is a source distribution whichis able to generate it. A physician prescribes the so-called desired dosefunction i.e. the absorbed dose as a function of the location in the body.The objectives of dose optimization are:

• Deliver a sufficiently high dose in the Planning Target Volume(PTV) which includes besides the Gross Tumor Volume (GTV)an additional margin accounting for position inaccuracies, patientmovements, etc.

• Protect the surrounding normal tissue (NT) and organs at risk(OARs) from excessive radiation. The dose should be smaller thana critical dose Dcra specific for each OAR.

Radiation oncologist use for the evaluation of the dose distribution qual-ity a cumulative dose volume histogram (DVH) for each structure ( PTV,NT or OARs), which displays the fraction of the structure that receives atleast a specified dose level. The objectives are called DVH-based objectives


if expressed in terms of DVHs related values.The determination of the dose distribution for a given source distri-

bution, the so-called forward problem, is possible and a unique solutionexists. For the inverse problem, i.e. the determination of the source distri-bution for a given dose distribution, is not always possible or the solutionis not unique. Optimization algorithms are therefore used to minimize thedifference between the desired and the obtained dose function.

16.4.1. Brachytherapy

High dose rate (HDR) brachytherapy is a treatment method for cancerwhere empty catheters are inserted within the tumor volume. A single 192Irsource is moved inside the catheters at discrete positions (source dwellpositions, SDP) using a computer controlled machine.

The dose optimization problem considers the determination of the ndwell times (or simply weights) for which the source is at rest and deliversradiation at each of the n dwell positions, resulting in a dose distributionwhich is as close as possible to the desired dose function. The range of nvaries from 20 to 300. If the positions and number of catheters and the SDPsare given after the implantation of the catheters, we term the process post-planning. The optimization process to obtain an optimal dose distributionis called dose optimization.

The additional determination of an optimal number of catheters andtheir position, so-called inverse planning, is important as a reduction ofthe number of catheters simplifies the treatment plan in terms of time andcomplexity, reduces the possibility of treatment errors and is less invasivefor the patient. Dose optimization can be considered as a special type ofinverse planning where the positions and number of catheters and the SDPsare fixed.

16.4.1.1. Dose Optimization for High Dose Rate Brachytherapy

MO dose optimization for HDR brachytherapy was first applied by Lahanaset al18 using NPGA, 12 NSGA 19 and NRGA 20 with a real encoding forthe SDPs weights. A number of 3-5 DVH derived objectives, depending onthe number of OARs was used. The results were superior to optimizationresults using a commercial treatment planning system.

More effective was the application of SPEA 21 using dose variance basedobjectives that enables the support from deterministic algorithms that pro-vide 10-20 solutions with which the population is initialized. A faster op-

374 M. Lahanas

timization than with SPEA was possible using NSGA-II. 22'23 Both SPEAand NSGA-II require the support from a deterministic algorithm that im-proves significantly the optimization results and the convergence speed.Pareto global optimal solutions can be obtained with L-FBGS that allowsto evaluate the performance of MOEAs for the HDR dose optimizationproblem. The optimization with 100 individuals and 100 generations re-quires less than one minute which is the time required to obtain a singlesolution with simulated annealing.

NSGA-II was used for dose optimization using DVH-based objectives,for which deterministic algorithms cannot be used as multiple local minimaexist. 25 The DVH-based objectives provide a larger spectrum of solutionsthan the dose variance-based objectives.

The archiving method of PAES 26 was included and the algorithm wassupported by L-BFGS solutions using variance-based objectives. 23 A SBXcrossover 27 and polynomial mutation 28 were used. Best results were ob-tained for a crossover probability in the range 0.7-1.0 and a mutation prob-ability 0.001-0.01.

16.4.1.2. Inverse Planning for HDR Brachytherapy

The NSGA-II algorithm was applied for the HDR brachytherapy inverseplanning problem 29 where the optimal position and number of cathetershas to be found additional to the dwell position weights of the selectedcatheters.

A two-component chromosome is used. The first part W contains thedwell weight of each SDP for each catheter with a double precision floating-point representation. The second part C is a binary string which representswhich catheters have been selected: the so-called active catheters.

The inverse planning algorithm is described by the following steps:

(1) Determine geometrically the set of all allowed catheters.(2) Initialize individuals with solutions from a global optimization al-

gorithm.(3) Perform a selection based on constrained domination ranking.(4) Perform a SBX crossover for the SDP weights chromosome and one

point crossover for the catheter chromosome with rescaled dwelltimes.

(5) Perform a polynomial mutation for the SDP weights chromosomeand flip mutation for the catheter chromosome with rescaled dwelltimes.


(6) Perform a repair mechanism to set the number of used catheters ofeach solution in a given range.

(7) Reset scaling according to number of active SDPs.(8) Evaluate dosimetry for each individual.(9) If termination criteria are satisfied output set of non-dominated

archived solutions else go to (3).

Inverse planning considers a range of solutions with different number ofactive SDPs, Therefore the dwell weights of the parents before crossover aredivided by the number of active SDPs to be independent on this number.After mutation the weights of each offspring are multiplied by the numberof SDPs in the active catheters encoded in the C chromosome.

For dose optimization and inverse planning decision making (DCM)tools are necessary to filter a single solution 30>31 from the non-dominatedset that matches at best the goals of the treatment planner. Dose opti-mization and inverse planning with MOEAs together with DCM tools wereimplemented in the commercial Real-Time HDR prostate planning systemSWIFT™ (Nucletron B.V., Veenendaal, The Netherlands) and patientsare now treated by this system. A display table of a list of values for allsolutions of the objectives, DVHs for all OARs, the NT and the PTV ofeach solution is provided. Other parameters are Dg0 (dose that covers 90%of the PTV), V150 (percentage of PTV that receives more than 150% ofthe prescription dose) and the extreme dose values. The entire table forevery such quantity can then be sorted and solutions can be selected andhighlighted by the treatment planner. Constraints can be applied such asto show only solutions with a PTV coverage, i.e. percentage of the PTVthat receives at least 100% of the prescription dose, larger than a specifiedvalue. Solutions that do not satisfy the constraints are removed from thelist. This reduces the number of solutions and simplifies the selection of anoptimal solution. The DVHs of all selected solutions can be displayed andcompared, see Fig. 16.1.

Other decision-making tool are projections of the Pareto front onto apair of selected objectives. For M objectives the number of such projectionsis M ( M - l ) / 2 .

The position of selected solutions can be seen in these projections. Thishelps to identify their position in the multidimensional Pareto front andto quantify the degree of correlation between the objectives and of thepossibilities provided by the non-dominated set. The Pareto front providesinformation such as: the range of values for each objective and the trade-off

376 M. Lahanas

Fig. 16.1. Example of DVHs (a) for the PTV and (b) for the urethra of a representativeset of non-dominated solutions. A single solution selected by a treatment planner isshown.

between other DVH derived quantities, see Fig. 16.2.

Fig. 16.2. Example of a trade-off between the percent of PTV that is covered at leastwith the prescribed dose DVH(Drej) and the percent of volume with a dose higher thana critical dose limit (a) for the urethra and (b) for the rectum. For the urethra a rapidlyincreasing fraction receives an over dosage as the coverage for the PTV increases above80%.

With MOEAs the best possible solution can be obtained, consideringthe objective functions and the implant geometry, and this increases theprobability of treatment success.

16.4.2. External Beam Radiotherapy

In external beam radiotherapy, or teletherapy, high energy photon beamsare emitted from a source on a rotating gantry with the patient placedso that the tumor is at the center of the rotation axis. Haas et al 32'33


proposed the use of MOEAs for the solution of the two main problems inradiotherapy treatment planning. First find an optimal number of beamsand their orientation and second determine the optimum intensity distri-bution for each beam. Both problems are considered separately. A beamconfiguration is selected based on experience or using geometric methods.Then the intensity distributions of these beams is optimized. In the lastfew years mostly SO algorithms have been proposed for the simultaneoussolution of both problems.

16.4.2.1. Geometrical Optimization of Beam Orientations

The aim of beam orientation optimization is to find a configuration of beamssuch that a desired dose distribution can be achieved. A single beam woulddeposit a very high dose to the NT. Using more beams it is possible toincrease the dose in the tumor, keeping the dose in the surrounding healthytissue at a sufficiently low level, but the treatment complexity increases.

The idea of using geometrical considerations in the cost function wasfirst proposed by Haas et al 34 and NPGA to obtain an optimum beamconfiguration. Simplifications such a limitation in 2D, using the most rep-resentative 2D computed tomography slice in the plan have been used. Thegeometric objective functions to be minimized are:

(1) Difference between the area where all M beams overlap and thearea of the PTA:

fPTA = area(Si n B2 • • • D BM) - area(PTA) (6)

(2) Overlap area between each beam and the j-th OAR:

• ^ areaiBi D OAR A , ^foAR, =J2 Pi ~M ~ (?)

i=l

_ f /3(SPTA - SOAR) if SOAR < SPTA /gs

11 if SOAR > SPTA

SPTA and SOAR are distances shown in Fig. 16.3 and j3 a parameter

that favors beam entry points further away from OARs.(3) Overlap from pair wise beam intersections to minimize hot spots:

M-\ M

fNT=Y;Yj area(Bi n Bi) (9)i=l j=i

378 M. Lahanas

Fig. 16.3. Geometric parameters used for the solution of the beam orientation problem.The gantry angle 6 of a field (beam) is shown. The patient body including the normaltissue NT, one organ at risk (OAR) and the planning target area (PTA) which includesthe tumor is shown.

An example of the geometry of a radiation field and parameters usedby Haas et al is shown in Fig. 16.3.

An integer representation for the beam gantry angle was used. Thelength of each chromosome is equal to the number of beams involved in theplan. A particular solution, i.e. a chromosome, is represented as a vectorCT = (#i,... 9M) where 9i is the i-th individual beam gantry angle.

For the integer representation a intermediate recombination is used 35,such that the parents Cp\ and Cp2 produce the offspring CO:

Co = round{CP1 + j{CP2 - CP1)) (10)

where 7 is a random number in the interval [—0.25,1.25]. 36 A mutationoperator is used to introduce new beam angles into the population by gen-erating integers that lie in the range [0... 359°].

Important was the inclusion of problem specific operators which attemptto replicate the approach followed by experienced treatment planners. Suchan operator is used to generate k equispaced beams as this distribution willreduce the area of overlap between the beams. One gantry angle from aparticular chromosome is selected randomly positioning the k — 1 remainingbeams evenly. A further mutation operator is used to perform a local searchby shifting randomly by a small amount (less than 15°) one of the selectedbeam gantry angles.


16.4.2.2. Intensity Modulated Beam Radiotherapy Dose Optimization

In IMRT each beam is divided in a number of small beamlets (bixels),see Fig. 16.4. The intensity of each beamlet can individually be adjusted.A sparse dose matrix is precalculated and contains the dose value at eachsampling point from each bixel with a unit radiation intensity. The intensity(weight) of each beamlet has to be determined such that the produced dosedistribution is "optimal". The number of parameters can be as large as10000.

Fig. 16.4. Principle of IMRT dose optimization. The contours of the body, the PTV andone OAR are shown. The problem is to determine the intensities of the tiny subdivisions(bixels) of each beam, so that the resulting dose distribution is optimal.

Lahanas et al used NSGA-IIc 37 algorithm for the optimization of theintensity distribution in IMRT where the orientation and the number ofbeams are fixed. 38 The dose variance-based objective functions are: for thePTV the dose variance fprv around the prescription dose Dref, for NTthe sum of the squared dose values /NT and for each OAR the varianceIOAR for dose values above a specific critical dose value D°rAR.

1 NPTV

/— = ^ £ « T V - ^ 7 ) 2 (ID

(12)

380 M. Lahanas

, NoARf V^ TJfjOAR r,OAR\/jOAR nOAR\2 /-• n \JOAR=Jj 2 ^ H\di ~ Dcr )\dj ~ Dcr ) V16)

H(x) is the Heaviside step function. d?TV, d^T and d®AR are the calcu-lated dose values at the j-th sampling point for the PTV, the NT and eachOAR respectively. Npxv, NNT and NOAR are the corresponding numberof sampling points. Depending on the number of OARs we have 3-6 objec-tives. For the multidimensional problem it was required to use supportedsolutions 39, i.e. solutions initialized by another optimization algorithm.Even if constraints can be used for some of the objectives a large numberof non-dominated solutions is required to obtain a representative set of themultidimensional Pareto front. An archive was used, similar to the PAESalgorithm, with all non-dominated solutions archived. This allows to keepthe population size in the range 200-500 and the optimization time belowone hour. Tests show that NSGA-IIc and SPEA are not able to producehigh quality solutions. Only a very small local Pareto optimal front canbe found far away from the very extended global Pareto front that can beobtained by the gradient based optimization algorithm L-BFGS. 24

Strong correlations exist between the optimization parameters and couldbe the reason of the efficiency of the L-BFGS algorithm that uses gradientinformation not available to the genetic algorithms. Using a fraction of so-lutions initialized by L-BFGS and an arithmetic crossover NSGA-IIc is ableto produce a representative set of non-dominated solutions in a time lessthan the time required using sequentially the L-BFGS algorithm each timewith a different set of importance factors. Previous methods used in IMRTinclude simulated annealing 40 which is very slow, iterative approaches andfiltered back-projection 41 .

The large number of objectives and the non-linear mapping from deci-sion to objective space requires a very large number of solutions to obtaina representative non-dominated set with SO optimization algorithms. 44

The benefit of using MOEAs is the information of the trade-off betweenthe objectives which is essential for selecting an optimal solution.

E. Schreibmann et al 45 applied NSGA-IIc for IMRT inverse planning.The user specifies a minimum and maximum number of beams, usually3-9, to be considered. Constraints can be applied by using the constraintdomination relation. A two-component chromosome is used, with a partfor weights and beams, similar to inverse planning in brachytherapy. Aftermutation L-BFGS is applied with 30 iterations to optimize the intensity


distributions of each solution. The number of iterations increases duringthe evolution. Clinical acceptable results can be obtained in one hour. Morethan 5000 archived solutions are obtained after 200 generations using apopulation size of 200 solutions. Arithmetic crossover is used with a randommixing parameter a £ [0,1] and a flip mutation. A mutation and crossoverprobability 0.01 and 0.9 respectively is used.

16.4.3. Cancer Chemotherapy

Petrovski et al 46 applied MOEAs for the cancer chemotherapy treatmentproblem. Anti-cancer drugs are given to a patient in n doses at times£i , . . . ,£„. Each dose is a cocktail of d drugs characterized by the concentra-tions Cij,i = 1 , . . . , n , and j — l,...,d. The problem is the optimizationof the concentrations C,j.

The response of the tumor to the chemotherapy treatment is modelledanalytically by:

^T=N(t) \]n(J^-£KjJ2Cij(H(t-ti)-H(t-ti+1?) (14)

where N(t) is the number of tumor cells at time t, A and 0 are tumorgrowth parameters, H(t) the Heaviside step function and KJ denote theefficacy of the anticancer drugs.

The objectives of the MO optimization are:

(1) Maximization of the tumor eradication

/ i ( c ) =f i n ( « ! ) )* <15)

(2) Prolong the patient survival time T

f2(c,tlt...,tn)=T (16)

The toxic nature the drugs provide limits on the single and combinedconcentration for the treatment. The concentrations C%j have to satisfyvarious constraints:

• Maximum instantaneous dose Cmax for each drug.

0 i ( c ) = {CmaXti-Cij >0,Vie [ l , . . . , n ] , V j e [l,...,d]} ( 1 7 )

382 M. Lahanas

• Maximum cumulative dose Ccum for each drug.n

<fc(c) = {Ccumtj - Y, Cij > 0, Vj G [ 1 , . . . , d]} (18)8 = 1

• Maximum tumor size of the tumor allowed.

<fe(c) = {Nmax - N(ti) > 0, Vi G [ 1 , . . . , n]} (19)

• Restriction of toxic side effects by the chemotherapy.d

ff4(c) = {Cs-e//,fc-53»?*iC*i>O)Vte[l,...,n],Vfce[l,...)m]}

(20)

%j represent the risk of damaging the k-th organ or tissue by the j-thdrug

SPEA was used for the MO chemotherapy treatment optimization witha maximum of 10000 generations. The crossover probability was 0.6 anda large mutation probability 0.1 was used. The population size was TV =50 and the external archive size of SPEA was 5. A binary encoding wasused for the decision variables Cij. Each individual is represented by n d-dimensional vectors Cij each of them with 4 bytes that corresponds to 25possible concentration units for each drug.

For the optimization the constraints are added to the objective functionsas penalties

m

^Pjmax2{-gm{c),Q) (21)i=i

where Pj are penalty parameters.For a breast cancer chemotherapy treatment case d = 3 drugs were

considered (Taxotere, Adriamycon and Cisplatinum) using n = 10. Newtreatment scenarios have been found by the application of SPEA. The rep-resentative set contains a number of treatment strategies some of themwhich were not found by SO optimization algorithms. This provides thetherapists a larger repertoire of treatment strategies out of which the mostsuitable for certain cases can be used.

16.5. Data Mining

Knowledge Discovery can be seen as the process of identifying novel, usefuland understandable patterns in large data sets.


The goal of classification 47 is to predict the value (the class) of a user-specified goal attribute based on the values of other attributes, the so-calledpredicting attributes. Classification rules can be considered a particularkind of prediction rules where the rule antecedent ("IF part") contains acombination - typically, a conjunction - of conditions on predicting attributevalues, and the rule consequent ("THEN part") contains a predicted valuefor the goal attribute.

Complete classification may be infeasible when there are a very largenumber of class attributes. Partial classification, known as nugget discovery,seeks to find patterns that represent a "strong" description of a particularclass. The consequent is fixed to be a particular named class. Given a recordt, antecedent^) is true if t satisfies the predicate antecedent. Similarlyconsequent^) is true if t satisfies the predicate consequent. The subsetsdefined by the antecedent or consequent are the sets of records for whichthe relevant predicate is true. Three sets of records are defined 47:

A = {t G D\antecedent(t)} i.e. the set of records defined by the an-tecedent,

B = {t 6 D\consequent(t)} i.e. the set of records defined by the conse-quent,

C = {t € D\antecedent(t) A consequent(t)}.The cardinality of these sets are a, b and c respectively. The confidence

confer) and the coverage cov(r) of a rule r are:

• conf(r) = c/a• cov(r) = c/b

A strong rule may be defined as one that meets certain confidence andcoverage thresholds normally set by a user.

16.5.1. Partial Classification

Iglesia et al47 used NSGA-II for nugget discovery. An alternative algorithmARAC, which can deliver the Pareto global optimal front of all partial clas-sification rules above a specified confidence/coverage threshold, was usedfor the analysis of the NSGA-II results. The objectives used are conf{r)and conf(r).

The antecedent comprises a conjunction of Attribute Tests, ATs. A bi-nary encoded string is used to represent the solution as a conjunctive rule.The first part of the string represents the m nominal attributes. Each nu-meric attribute is represented by a set of Gray-coded lower and upper limits

384 M. Lahanas

using 10 bits. For all attributes, when the data is loaded, the maximum andminimum values are calculated and stored.

The second part of the string represents categorical attributes, with asmany bits for each attribute as distinct values the categorical attributecan take. If a bit assigned to a categorical attribute is set to 0 then thecorresponding label is included as an inequality in one of the conjuncts.

To evaluate a solution, the bit string is first decoded, and the data in thedatabase is scanned. For a database with n attributes, the ATs for nominalattributes can be expressed in various forms such as:

ATj — v where v is a value from the domain of ATj, for some 1 < j < n.A database record x meets this simple value test if x[ATj] = v.

ATj ^ v for some 1 < j < n. A record x meets this inequality test ifx[ATj] < v.

A bit string decoded represents to the following format:IFh <ATi <UlA...Alm <ATm <umAATm+1 £ lm+1 A... AATn # /„THENcwhere li is given by the first p bits of the binary string, u\ is given

by the following p bits, etc. If a lower limit for the i-th attribute is set toits lowest possible attribute value or the upper limit is set to its highestpossible value then there is no need to include that limit in the decodednugget. If a categorical attribute has a value of 1 for all the bits allocatedto its labels then there is no need to include the attribute. For each recordthe values of the fields are compared against the nuggets, and the class isalso compared. The values c, a are updated accordingly whereas b and dare evaluated at the data loading stage. Once all the records have beenexamined the coverage and confidence are calculated for each nugget.

The initialization of the population with random solutions providedmany solutions of very poor quality. The performance of NSG A-II improvedif all solutions were initialized with the default rule with some of the bitsmutated. The default rule is the rule that predicts the class without anypre-conditions.

Good performance for NSGA-II was obtained for a crossover probabilityin the range 0.6-0.8. The results are sensitive on the mutation probabilityand a value of 0.02 was optimal. The optimization requires 80-100 genera-tions using a population size of 120-140 solutions.

The algorithm was applied among other dataset to a ContraceptiveMethod database, a subset of a contraceptive prevalence survey. The sam-ples are married women who were either not pregnant or did not know if


they were at the time of the interview. The problem is to predict the currentcontraceptive method choice (42% "no use", 23% "long-term methods",or 35% "short-term methods" cases) of a woman based on demographicand socio-economic characteristics. The results of NSGA-II were comparedwith results obtained with ARAC which is able to find all non-dominatedsolutions, thus the global optimal Pareto front. For large databases thecomputational time increases rapidly and the complexity is such that itsactual computational time is unknown. The classification time by MOEAis proportional to the database size, but the number of rules found is lim-ited by the population size. The results showed that NSGA-II fairly wellreproduced the non-dominated front obtained by ARAC.

For large databases, MOEA can be used to find a good approximationof the Pareto optimal set of rules. ARAC can be used to find an initial setof rules, to be used for the initialization of the MOEA algorithm. For largedatabases, the initial set of solutions found by ARAC may be a constrainedset, but then MOEA can be used to drive the search further without con-straints. NSGA-II and ARAC can be used in combination for knowledgediscovery for large databases for the partial classification task.

16.5.2. Identification of Multiple Gene Subsets

Data mining has proven to be an important tool in DNA microarray dataanalysis by uncovering patterns and relationships in gene expression data.Microarrays have revolutionized the way in which researchers analyze geneexpression patterns. It is possible to screen a large number of genes andto observe their activity under various conditions. This gene-expressionprofiling is expected to revolutionize cancer diagnosis.

Reddy and Deb 48 applied the NSGA-II algorithm for the identificationof gene subsets for the classification of samples.

Microarray data for three types of cancer (leukemia, lymphoma andcolon) were analyzed. The data sets contain levels of gene expressions for afew 1000 genes. The data is divided in a training and a test subset and theobjectives are:

(1) Minimize the gene subset size.(2) Minimize the number of misclassifications in the training set.(3) Minimize the number of misclassifications in the test set.

A binary string was used the length of which corresponds to the numberof genes to be considered. Genes with a corresponding bit set to 1 are

386 M. Lahanas

included in the gene subset of the individual. The population is initializedwith only 10% of the bits set to 1. A single point crossover and a bit-wisemutation operator were used.

For the leukemia case 6817 genes are considered and a population sizeof 1000 individuals with 2000 generations was used. The lymphoma andcolon data set have 4026 and 2000 genes respectively.

The results showed that a 100% classification for leukemia and lym-phoma can be obtained with only a few genes. A similar result is obtainedfor the colon data set with a smaller classification rate.

The NSGA-II algorithm was modified to accumulate solutions that al-though have a different phenotype have identical objective values. Thismulti-modal NSGA-II version discovered as an example for the leukemiaset 630 different three-gene combinations that achieve a perfect classifica-tion.

A smaller number of genes have been found that are frequently used inthe various combinations with their role has to be examined by a biologicalpoint of view. The analysis shows that MOEAs are able to provide subsetsof genes that are important for a high level of classification.

16.6. Conclusions

MOEAs provide in medicine for image reconstruction problems, classifi-cation and CAD a range of solutions out of which an optimal solutioncan be obtained. These selected solutions are in general better than solu-tions obtained by SO optimization algorithms. The performance of MOEAsover conventional scalar objective optimizations are more pronounced forcomplex problems with many objectives. A very large number of SO op-timization runs is necessary to obtain a representative non-dominated set.The mapping from decision to objective space produces solutions by theconventional methods that are clustered and not necessarily uniformly dis-tributed over the entire Pareto front. MOEAs can produce more uniformlydistributed solutions and in regions not accessible by conventional convexweighted scalar optimization.

MOEAs are now used increasingly in radiotherapy treatment planningin clinical practice, especially in HDR brachytherapy. The possibility existsto use MOEAs for low dose rate (LDR) brachytherapy treatment, wherenow SO genetic algorithms or evolutionary MO methods guided by ar-tificial intelligence are used.3 MOEAs have been applied for IMRT doseoptimization and inverse planning where the number of parameters is very


large. MOEAs with a support from deterministic algorithms can providea representative set of non-dominated solutions with a clinical acceptablequality. Inverse planning with MOEAs determines optimal beam directionsand numbers of fields for specific types of cancer.

MO determines a representative set of the entire, sometimes unexpectedcomplex, Pareto front. While the optimization aspects have been discussedin details the decision making process for the selection of an optimal solutionis not considered in most of the presented MOEA applications. For HDRbrachytherapy treatment planning with MOEAs tools have been developedthat allow the planner via visualization methods to determine an optimalsolution.

For some high dimensional problems MOEAs alone fail to produce suf-ficient good solutions. The solutions are far from the global Pareto optimalfront. Even with a large number of generations the population convergesprematurely. Initialization of the population with solutions provided byother methods and knowledge inclusion is important and improves signifi-cantly the performance of MOEAs. Such a case are hybrid algorithms ap-plied in IMRT which requires the optimization of as many as 5000 or moreparameters. In cooperation with deterministic gradient based optimizationalgorithms they produce sufficiently fast clinical acceptable solutions.

References

1. C. A. Coello Coello, D. A. Van Veldhuizen and G. B. Lamont, EvolutionaryAlgorithms for Solving Multi-Objective Problems, (Kluwer Academic Pub-lishers, New York, 2002).

2. C. A. Pena-Reyes and M. Sipper, Evolutionary Computation in Medicine:An Overview, Artificial Intelligence in Medicine 19, 1-23 (2000).

3. Yan Yu. Multiobjective decision theory for computational optimization inradiation therapy, Med. Phys. 24, 1445-1454 (1997).

4. J. Aguilar and P. Miranda, Resolution of the left ventricle 3D reconstruc-tion problem using approaches based on genetic algorithm for multiobjectiveproblems, In Proceeding of the 1999 Conference on Evolutionary Compu-tation (eds. P. Angeline, Z. Michalewicz, M. Schoenauer, X. Yao and A.Zalzala), pp. 913-920, Vol. 2, Washington, D.C., July 1999.

5. H. E. Rickard, Feature selection for self-organizing feature map neural net-works with applications in medical image segmentation, Masters Thesis,Department of Electrical Engineering, University of Louisville, Dec. 2001.

6. X. Li, T. Jiang and D. J. Evans, Medical image reconstruction using amulti-objective genetic local search algorithm, Intern. J. Computer Math.74, 301-314 (2000).

7. H. Ishibuchi and T. Murata, A multi-objective genetic local search algorithm

388 M. Lahanas

and its application to flowshop scheduling, IEEE Trans. Systems, Man andCybernetics, Part C: Application and Reviews 28, 392-403 (1998).

8. L. Devroye, L. Gyrfi and G. Lugosi, A probabilistic Theory of Pattern Recog-nition, (New York, Springer Verlag, 1996).

9. M. A. Anastasio, H. Yoshida, R. Nagel, R. M. Nishikawa and K. Doi,A genetic algorithm-based method for optimizing the performance of acomputer-aided diagnosis scheme for detection of clustered microcalcifica-tions in mammograms, Med. Phys. 25, 1613-1620 (1998).

10. C. Bishop, Neural networks for pattern recognition, (Oxford Univ. Press,Oxford UK, 1995).

11. M. A. Kupinski and M. A. Anastasio, Multiobjective genetic optimizationof diagnostic classifiers with implications for generating receiver operatingcharacteristic curves, IEEE Transactions in medical imaging 18, 675-685(1999).

12. J. Horn, N. Nafpliotis and D. E. Goldberg, A Niched Pareto Genetic Al-gorithm for Multiobjective Optimization, Proceedings of the First IEEEConference on Evolutionary Computation, IEEE World Congress on Com-putational Intelligence, IEEE Service Center, Vol. 1, pp. 82-87, June 1994.

13. F. de Toro, E. Ros, S. Mota and J. Ortega, Non-invasive Atrial Disease Di-agnosis Using Decision Rules: A Multi-objective Optimization Approach, inC. M. Fonseca, P. J. Fleming, E. Zitzler, K. Deb and L. Thiele (eds.), Evo-lutionary Multi-Criterion Optimization. Second International Conference,EMO 2003, pp. 638-647, Springer. Lecture Notes in Computer Science. Vol.2632, Faro, Portugal, April 2003.

14. E. Zitzler and L. Thiele, Multiobjective Evolutionary Algorithms: A Com-parative Case Study and the Strength Pareto Approach, IEEE Transactionson Evolutionary Computation 37, 257-271 (1999).

15. F. de Toro, J. Ortega, J. Fernandez and A. F. Diaz, PSFGA: A ParallelGenetic Algorithm for Multiobjective Optimization. 10th Euromicro Work-shop on Parallel, Distributed and Network-Based Processing, F. Vajda andN. Podhorszki (eds.), IEEE, pp. 384-391, 2002.

16. F. Toro, E. Ros, S. Mota and J. Ortega, Multi-objective Optimization Evo-lutionary Algorithms Applied to Paroxysmal Atrial Fibrillation DiagnosisBased on the k-Nearest Neighbours Classifier, pp. 313-318 in F. J. Gar-ijo, J. C. R. Santos, M. Toro (eds.): Advances in Artificial Intelligence -IBERAMIA 2002 Proceedings. Lecture Notes in Computer Science 2527Springer 2002.

17. C. A. Peres and L. W. Brady, Principles and practice of radiotherapy.(Lippincott-Raven, Philadelphia, 3rd edition, 1998).

18. M. Lahanas, D. Baltas and N. Zamboglou, Anatomy-based three-dimensional dose optimization in brachytherapy using multiobjective ge-netic algorithms, Med. Phys. 26, 1904-1918 (1999).

19. N. Srinivas and K. Deb, Multiobjective Optimization Using Nondomi-nated Sorting in Genetic Algorithms. Evolutionary Computation 2, 221-248(1994).

20. M. Fonseca and P. J. Fleming, Multiobjective optimization and multiple


constraint handling with evolutionary algorithms I: A unified formulation.Research Report 564, Dept. Automatic Control and Systems Eng. Universityof Sheffield, Sheffield, U.K., Jan. 1995.

21. N. Milickovic, M. Lahanas, D. Baltas and N. Zamboglou, Comparison ofevolutionary and deterministic multiobjective algorithms for dose optimiza-tion in brachytherapy, in Proceedings of the first international conference.EMO 2001, Zurich, Switzerland, edited by E. Zitzler, K. Deb, L. Thiele, C.A. Coello Coello, D. Corne. Lecture Notes in Computer Science, Vol. 1993,Springer, pp. 167-180, 2001.

22. K. Deb, A. Pratap, S. Agarwal and T. Meyarivan, A Past and Elitist Multi-objective Genetic Algorithm: NSGA-II. IEEE Transactions on EvolutionaryComputation 6, 182-197 (2002).

23. M. Lahanas, D. Baltas and N. Zamboglou, A hybrid evolutionary multiob-jective algorithm for anatomy based dose optimisation algorithm in HDRbrachytherapy. Phys. Med. Biol. 48, 399-415 (2003).

24. D. C. Liu and J. Nocedal, On the limited memory BFGS method for largescale optimization, Mathematical Programming 45, 503-528 (1989).

25. J. O. Deasy, Multiple local minima in radiotherapy optimization problemswith dose-volume constraints, Med. Phys. 24, 1157-1161 (1997).

26. J. D. Knowles and D. W. Corne, Approximating the nondominated frontusing the Pareto Archived Evolution Strategy, Evolutionary Computation8, 149-172 (2000).

27. K. Deb and R. B. Agrawal, Simulated binary crossover for continuous searchspace, Complex Systems 9, 115-148 (1995).

28. K. Deb and M. Goyal, A combined genetic adaptive search (GeneAS) forengineering design Computer Science and Informatics 26, 30-45 (1996).

29. M. Lahanas, K. Karouzakis, S. Giannouli, R. F. Mould and D. Baltas, In-verse planning in brachytherapy: Radium to High Dose Rate 192 IridiumAfterloading, to be published in Nowotwory Journal of Oncology 2004.

30. C. A. Coello Coello, Handling Preferences in Evolutionary MultiobjectiveOptimization: A Survey, Piscataway, Congress on Evolutionary Computa-tion, IEEE Service Center, Vol. 1, pp. 30-37, New Jersey, July 2000.

31. D. Cvetkovic and I. C. Parmee, Preferences and their Application in Evo-lutionary Multiobjective Optimisation, IEEE Transactions on EvolutionaryComputation 6, 42-57 (2002).

32. O. C. L. Haas, K. J. Burnham and J. A. Mills, On Improving the selectivityin the treatment of cancer: a systems modelling and optimization approach.J. Control Engineering Practice 5, 1739-1745 (1997).

33. O. C. L. Haas, Radiotherapy Treatment Planning: New System Approaches.Advances in Industrial Control Monograph. (Springer Verlag, London,1999).

34. O. C. L. Haas, K. J. Burnham and J. A. Mills Optimization of beam orienta-tion in radiotherapy using planar geometry Phys. Med. Biol. 43, 2179-2193(1998).

35. O. C. L. Haas, Optimisation and control systems modelling in radiotherapytreatment planning, PhD Thesis, Coventry University, 1997.

390 M. Lahanas

36. A. Chipperfield, P. Fleming, H. Polheim and C. Fonseca C 1995 Geneticalgorithm toolbox user's guide, Research Report 512. University of Sheffield:Department of Automatic Control and Systems Engineering.

37. K. Deb and T. Goel, Controlled elitist non-dominated sorting genetic al-gorithms for better convergence, in Proceedings of the first internationalconference. EMO 2001, Zurich, Switzerland, edited by E. Zitzler, K. Deb,L. Thiele, C. A. Coello Coello, D. Corne. Lecture Notes in Computer Sci-ence, Vol. 1993, Springer, pp. 67-81, 2001.

38. M. Lahanas, E. Schreibmann, N. Milickovic and D. Baltas. Intensity modu-lated beam radiation therapy dose optimization with multiobjective evolu-tionary algorithms, in C. M. Fonseca, P. J. Fleming, E. Zitzler, K. Deb andL. Thiele (eds.), Evolutionary Multi-Criterion Optimization. Second Inter-national Conference, EMO 2003, pp. 648-661, Springer. Lecture Notes inComputer Science. Vol. 2632, Faro, Portugal, April 2003.

39. X. Gandibleaux, H. Morita and N. Katoh, The supported solutions usedas a genetic information in population heuristic, in Proceedings of the firstinternational conference. EMO 2001, Zurich, Switzerland, edited by E. Zit-zler, K. Deb, L. Thiele, C. A. Coello Coello, D. Corne. Lecture Notes inComputer Science, Vol. 1993, Springer, pp. 429-442, 2001.

40. S. Webb, Optimization of conformal radiotherapy dose distributions by sim-ulated annealing, Phys. Med. Biol. 34, 1349-1370 (2001).

41. T. Bortfeld, J. Urkelbach, R. Boesecke and W. Schlegel, Methods of im-age reconstruction from projections applied to conformation therapy, Phys.Med. Biol. 35, 1423-1434 (1990).

42. J. D. Knowles, D. Corne and J. M. Bishop, Evolutionary Training of Artifi-cial Neural Networks for Radiotherapy Treatment of Cancers in Proceedingsof the 1998 IEEE International Conference on Evolutionary ComputationIEEE Neural Networks Council, 0-7803-4871-0, pp 398-403.

43. J. D. Knowles and D. Corne, Evolving neural networks for cancer radio-therapy, in Chambers, L.(ed.), Practical Handbook of Genetic Algorithms:Application 2nd Edition, Chapman Hall/CRC Press, pp. 443-448. ISBNL-58488-240-9, 2000.

44. M. Lahanas, E. Schreibmann and D. Baltas, Multiobjective inverse planningfor intensity modulated radiotherapy with constraint-free gradient-basedoptimization algorithms Phys. Med. Biol. 48, 2843-2871 (2003).

45. E. Schreibmann, M. Lahanas, L. Xing and D. Baltas, Multiobjective evolu-tionary optimization of number of beams, their orientation and weights forIMRT, Phys. Med. Biol. 49, 747-770, (2004).

46. A. Petrovski and J. McCall, Multiobjective optimization of cancerchemotherapy using evolutionary algorithms in Proceedings of the first in-ternational conference. EMO 2001, Zurich, Switzerland, edited by E. Zitzler,K. Deb, L. Thiele, C. A. Coello Coello, D. Corne. Lecture Notes in Com-puter Science, Vol. 1993, Springer, pp. 531-545, 2001.

47. B. de la Iglesia, G. Richards, M. S. Philpott and V. J. Rayward-Smith, Theapplication and effectiveness of a multi-objective metaheuristic algorithmwith respect to data mining task of partial classification, submitted for


publication to European Journal of Operational Research, 2003.48. A. R. Reddy and K. Deb, Identification of multiple gene subsets using multi-

objective evolutionary algorithms, in C. M. Fonseca, P. J. Fleming, E. Zit-zler, K. Deb and L. Thiele (eds.), Evolutionary Multi-Criterion Optimiza-tion. Second International Conference, EMO 2003, pp. 623-637, Springer.Lecture Notes in Computer Science, Vol. 2632, Faro, Portugal, April 2003.

CHAPTER 17

ON MACHINE LEARNING WITHMULTIOBJECTIVE GENETIC OPTIMIZATION

Rajeev Kumar

Department of Computer Science & EngineeringIndian Institute of Technology Kharagpur

Kharagpur, WB 721 302, IndiaE-mail: [email protected]

We describe a generic framework for solving high-dimensional and com-plex domains of machine learning. We use multiobjective genetic opti-mization to act as a pre-processor for partitioning the learning tasks into simpler domains which could then be solved by traditional learningapproaches. We define two main objectives - minimization of learningcosts and minimization of errors - for partitioning. For (further) im-proving generalization, we use multiple machine learning algorithms foreach partition and label a partition by a vector of learning-costs andconfidence-levels, and thus provide multiple views of the solution spaceby a set of machine learners. This is a multiobjective optimization prob-lem whose solutions are not known a priori. Therefore, we use a multi-objective evolutionary algorithm implementation which produces diversesolutions and monitors convergence without needing a priori knowledgeof the partitions emerging out of the optimization criteria.

17.1. Introduction

An essential attribute of an intelligent machine is its ability to learn from ex-amples, produce general hypotheses from training data, and make effectivedecisions when presented with unseen data. During the process of learningfrom examples, the machine learner not only approximates the functionalrelationship of the restricted domain covered by the training set but alsoattempts to understand the wider sampling it has not seen of the parentfunction. The understanding of unseen sampling may result from interpo-lation and extrapolation. Most real-world applications (RWA) do not give

393

394 Rajeev Kumar

precise input-output mappings; data-sets may be noisy containing distortedpatterns, they may have partially occluded high-dimensional images, anddecision boundaries may be non-linear. Achieving a good generalization isa non-trivial task for most machine learning models because they inducevarying degrees of freedom while learning such real-life patterns24.

There are many approaches to machine learning - logical & fuzzy rules,decision trees & Bayes decision theory, supervised & non-supervised clus-tering, connectionist intelligence & statistical inferences, and genetic search& stochastic methods42'44. At an abstraction level, a machine learner de-sires that the learning errors in prediction, classification or approximationbe minimized for a given finite set of known patterns. Complementary tothis, for unseen patterns, machine learning aims at achieving a good perfor-mance; such a good performance may not be achieved simply by optimizinga single value of an error-function. A learning model mainly minimizes er-rors in training data while generalization is influenced by many factors suchas learning model's architecture and design parameters, and data-sets usedfor training, validation and testing.

Many of the machine learning algorithms and models work on the prin-ciple of iterative refinement. The generalization of such models and algo-rithms mainly aims at avoiding underfitting as well as overfitting whileapproximating functions and demarcating decision boundaries. This is re-lated to solve the well-known bias-variance dilemma21. There exists anotherdimension to the problem of generalization which relates to the scaling oflearning models for solving arbitrarily complex problems. Scaling modelsto larger systems is a difficult problem because larger models require in-creasing amounts of training time and data, and eventually complexity ofthe optimization task reaches computationally unmanageable proportions.Simply increasing the complexity of the model is a popular solution butmay unjustifiably increase the number of free-parameters of the learner'sarchitecture which can lead to poor generalization10.

In the context of addressing complex learning domains, two basic ap-proaches have emerged as possible solutions to the poor scalability of in-telligent models: ensemble based22 and modular systems54. The family ofensemble-based approaches relies on combining predictions of multiple mod-els, each of which is trained on the same database; in general, the emphasisis on improving the accuracy for a better generalization and not on simpli-fying the function approximators.

Another advantage of decomposing the input space in to multiple parti-tions is that each of the partitions can be learnt through multiple machine

On Machine Learning with Multiobjective Genetic Optimization 395

learning algorithms, and thus yielding multiple views of each of the parti-tions. Each view can be labeled as a vector of multiple costs - the learningcost, and the learning and validation errors. This representation has theadvantage that a user may pick a view from the vector and select a ma-chine learning algorithm based on the resource availability and the learningaccuracy needed for a particular application. Hence, we label each partitionwith multiple costs. Decomposing a pattern-space into multiple partitionsusing a set of multiple objective is an NP-hard problem20'27. Therefore, weuse randomized search heuristics like evolutionary algorithms (EAs) for thepartitioning task.

In recent years, EAs have emerged as a powerful black-box optimizationtool to solve NP-hard combinatorial optimization problems. In the multi-objective scenario, EAs often find effectively a set of diverse and mutuallycompetitive solutions without applying much problem-specific information.Additionally, achieving proper diversity in the solutions while approach-ing convergence is a challenge in multiobjective optimization, especially forunknown problems in black-box optimization. There are many implemen-tations of multiobjective EAs, for example, MOGA18, NSGA16>5Y, PAES32

and SPEA66'65. These implementations achieve diverse and equivalent solu-tions by some diversity preserving mechanism, and they do not talk aboutconvergence. However, some recent studies have been done on combiningconvergence with diversity5'43 for problems whose optimal Pareto-set isknown. Kumar & Rockett41 proposed the use of rank-histograms for moni-toring convergence of the Pareto-front while maintaining diversity withoutany explicit diversity preserving operator.

Pattern space partitioning problem belongs to the class of problems forwhich the optimal solution space is not known a priori. Therefore, in thiswork, we use the Pareto Converging Genetic Algorithm (PCGA)41 whichhas been demonstrated to work effectively across complex problems andachieves diversity without needing a priori knowledge of the solution space.PCGA excludes any explicit mechanism to preserve diversity and allows anatural selection process to maintain diversity. Thus, multiple, equally goodfinal solutions to the problem, are generated.

We use PCGA to partition the task in a generic but near-optimal man-ner as a pre-processor to the learning domain. We argue that separatingthe task of decomposition from the regime of modular learning simplifiesthe overall learning architecture and this strategy of data-processing beforeits submission to a classifier considerably reduces the learning complexity.Additionally, only those patterns which lie close to the decision boundaries

396 Rajeev Kumar

possibly warrant multiple learning efforts in order to improve the predictionaccuracy and the clusters which contain only one data class are implicitlylabeled without ambiguity.

The rest of the chapter is organized as follows. The next section presentsan overview on some issues in machine learning (sub-section 17.2.1), gen-eralization (sub-section 17.2.2) and, the application of our multiobjectiveevolutionary algorithm to solve real-world problems (sub-section 17.2.3).Section 17.3 formulates the partitioning problem. Section 17.4 describesthe implementation of multiobjective evolutionary algorithm used for pat-tern space partitioning. A summary of the results is presented in section17.5. Finally, we conclude this chapter with a summary in section 17.6.

17.2. An Overview

17.2.1. Machine Learning

A learning model aims at capturing the global nature of approximations. Incase of iterative refinement, the model incrementally adapts in the directionof decreasing error-function based on some learning rules. At the same time,it is believed that a global error-surface may have an extensive flat area andsignificant variations in local minima. Such situations are very commonwith high-dimensional inputs which often require very long learning timeor result in unsuccessful training. Another major phenomenon contributingto the problem of slow/difficult training is cross-talk, i.e., the presence ofconflicting information in the training data that retards learning. Cross-talks are identified as: temporal58 and spatial47. In temporal cross-talk aneural network receives inconsistent training information at different timesin the training cycle; receiving inconsistent information at a single instant intime is treated as spatial cross-talk. By analogy, catastrophic interference55

results from sequential training when the disjoint blocks of training dataare presented in sequence.

In the context of addressing complex learning domains, the 'divide-and-conquer' algorithm divides a complex task into a number of simpler sub-tasks such that each subspace is learnt by an expert, and then combines theknowledge acquired by experts to arrive at an overall decision. It is believedthat the partitioning of a space into sensible subspaces and the subsequentlearning of subspaces by corresponding simpler experts, reduces overall totalcomputational/learning complexity. Analogously, it is a general practice todecompose a k - class classification problem into k two-class problems.

The simplest form of divide-and-conquer algorithm is a tree-structured


classifier. Such algorithms have their origin in statistics, e.g., Classifica-tion and Regression Tree (CART) algorithm of Brieman et. al.8, and ID3& C4.5 induction tree algorithms of Quinlan49. These algorithms fit sur-faces to data by explicitly dividing the input space into a nested sequenceof regions (tree) and by fitting surfaces within these regions. In a simpleexample, a hierarchical partitioning of feature space using hyperplanes par-allel to feature axes results in a binary decision tree; non-leaf nodes aredecision nodes and the leaf ones are terminals. With stronger connectionsbetween statistics and neural networks, many researchers have combinedthe tree-structured concept of partitioning an input space with non-linearnon-parametric functional approximation capabilities of neural networks9.In this approach, a classification tree is grown and simple neural networksare employed at each decision node of the tree. This hybrid approach signifi-cantly decreases error-rates on comparing with those of decision tree havinga constant function in the decision node but on the expense of increasedtraining time. The tree structure combines with smaller multi-neural netslocated at each decision node of the tree yields comparable error rates withshorter training time when compared with a single large neural network,though these observations vary with the application and the approach.

In general, modularity is attractive for problem solving in complex do-mains of machine learning driven by a desire for:

• computation at each stage is lesser than a single unpartitioned one,• problem is more constrained and solvable, convergence is faster,• misclassification errors are fewer, predictions are more accurate and

approximations are superior,• problems of spatial/temporal cross-talks and catastrophic interfer-

ence are reduced,• model has more structured sub-models and components, and• model can be suitably mapped on a multiprocessor system.

Nonetheless decomposition has its own difficulties - partitions in the ab-sence of a priori knowledge of the pattern space are not unique. There areadditional requirements and complexities in distributed learning and theirintegration into the next hierarchy. The credit assignment problem (i.e.,the problem of getting the right training information to the right moduleso as to improve overall system performance) becomes more complex andadds another dimension to the problem of getting the right information tothe right module. This implies that the modularity can serve meaningfullyonly if both inter-module as well as intra-module assignments are beneficial

398 Rajeev Kumar

to credit assignment. Modularity assigns a set of function approximatorsto each sub-problem so that the modules learn to specialize in differenttasks and then combine their individual solutions. Jacobs et. al.29 inferredthat function decomposition is an under-constrained problem and differentmodular architectures may decompose a function in different ways which iscertainly not a happy situation from the generalization point of view withtoo many degrees of freedom. They also concluded that the modular archi-tecture could be restricted to a well-suited solution if domain knowledgeis incorporated for a desirable decomposition. For those problems whereone has some prior knowledge of the pattern space and the decompositioninto sub tasks is explicit; this is a trivial task, e.g., image corner labeling38,and phonemic classification61. In the absence of any prior knowledge of thepattern space, Jordon & Jacobs demonstrated the decomposition-through-competition31 approach where decomposition and learning phases are com-bined. They designed the hierarchical mixtures of experts where expert net-works compete to learn training patterns, and gating network mediates thecompetition. Their architecture performs task decomposition in the sensethat it learns to partition a task into functionally independent subtasks andallocates a distinct network to learn each task. In subsequent years, differ-ent researchers have used different ways of incorporating a priori knowledgeinto modular architecture for both functionality of task-decomposition andmodular learning integrated together. In this work, we adopt a differentline of work and separate out partitioning and learning phases.

17.2.2. Generalization

Another dimension of work in modular system is to combine predictionsof multiple learners (in ensemble approach, each partition can be learnt bymultiple learners) to improve accuracy10'28'54. The emphasis, in general,is not on partitioning the input data but on improving the accuracy. Theability of a learner is judged on how correctly the learner responds to un-seen data, i.e., how well it generalizes. Geman et. al.21 have shown thatgiven infinite training data, consistent learners approximate the Bayesiandecision boundaries to arbitrary precision, thus providing similar general-ization. However, finite and noisy training data sets are reality and differentlearners when trained with such data provide different generalization.

In this context, Hansen and Salamon22 suggested the use of an ensemblewhere each model is trained on the same database. However, winner-takes-all strategy may not be an ideal choice, since potentially valuable informa-


tion may be wasted by discarding the results of less successful models63.This observation motivates the use of combining the outputs of several ex-perts for making a decision. This approach is particularly suited to difficultproblems with limited training data and high dimensional patterns. An-other term analogous to a combiner is meta-learning which is defined aslearning from information generated by a set of learners. It can be viewedas the learning of meta-knowledge on the learned information10.

The concept of combining has been studied in recent years in severalforms10'28'54. A weighted averaging of the outputs of several learners, votingschemes and arbiters have been suggested as an alternative to selectingthe best model. Other developments in the statistics community includestacked regression6, bootstrap aggregation7, and stacked generalization63.Ali &: Pazzani1 modeled the degree of error reduction due to the use ofmultiple models. There are many other references too.

A rule of thumb for obtaining good generalization is to use the smallestmodel that fits the data. Unfortunately, it is not obvious which size is thebest; a model that is not sufficiently complex is very sensitive to initial con-ditions and learning parameters. A small neural network learns extremelyfast but has a high probability of getting trapped in a local minimum andthus may fail to train, leading to underrating. On the other hand, largernetworks have more functional flexibility than small networks so are ableto better fit the data. A network that is too large may fit the noise and notjust the signal and this leads to overfitting. Overfitting produces excessivevariance whereas underfitting produces excessive bias in the outputs; biasand variance are complementary terms and the best generalization is ob-tained with the optimum balance between bias and variance, i.e., increasemodel bias in order to reduce model variance for avoiding overfitting21'52.Drawing an inference that only large model shows overfitting is not correct;small networks start overfitting before they have learnt all they could.

Viewed in abstract terms of bias and variance, we can say that for a goodgeneralization we need to control the effective complexity of the learner foran optimum mix of both bias and variance. For example, in case of con-nectionist architecture, the network complexity can be defined simply interms of the size of the weights, the number of connections, the numberof hidden units and the number of layers24. Analogously, several methodshave been proposed for controlling the network complexity: there are ap-proaches where one starts with a relatively large network and prune out theleast significant connections or derives them to insignificance. Similarly onecan start with a small network and add units during the learning process

400 Rajeev Kumar

with the goal of arriving at an optimal network. There are other dependen-cies as well e.g., initial network conditions3, learning rate, cross-validation,stopping criterion and more importantly the curse of dimensionality24.

One major contributor to model complexity is the model-size and it isalways desirable to minimize the number of free parameters. Many studieshave been carried out for selecting a proper size, nonetheless it remains anunresolved problem. Some theoretical studies have established the upper-bounds on the number of hidden nodes for connectionist architectures; buta priori knowledge of the upper-bounds can neither provide a practicalguess on the number of hidden-nodes required for mapping a training setinvolving a large number of samples nor minimize the free parameters. Someresearchers also defined the theoretical lower-bounds based on the Vapnik-Chervonenkis (VC) Dimension assuming that the future test samples aredrawn from the distribution of training-samples. Weigend62 avoided overfit-ting if net-size was guided by the eigen-value spectra, and others advocateduse of the effective number of parameters in a non-linear system for achiev-ing better generalization. But how to decide the effective dimensionality orthe number of parameters remains an open issue46'62.

Another promising approach to avoiding under-/over-fitting and in-creasing flexibility of learning is to start with a large model and throughregularization or pruning improve generalization50. In case of neural net-works, weight decay is a subset of regularization methods which adds apenalty term to objective function. The penalty term penalizes large weightsand thus the complexity; large weights can cause excessive variance in theoutput. Different researchers defined different penalty terms for weight de-cay/elimination. A fundamental problem with weight decay is that theproper coefficient for this term is not known a priori, and different types ofweights in the network require different decay constants for good generaliza-tion. Other type of approaches are based on pruning out the least significantconnections either by removing individual weights or by removing completeunits, e.g., optimal brain damage/surgeon. Many researchers have also pro-posed correlation and heuristics based pruning/merging methods for modelsimplification. These approaches are found effective on a few problem sets.Pruning based generalization demands selection of algorithms, effective pa-rameters and setting of stopping criterion. Furthermore, it is shown by manyresearchers that pruning is not always beneficial and some algorithms maynot be effective. Pruning has been applied to many other models too, e.g.,decision trees and rule-based system.

Early stopping monitors the errors on a validation set and halts learn-


ing when the error on validation set starts increasing. Here selection of themodel is not guided by the training process convergence but rather trainingprocess is used to perform a search to find a model with superior gener-alization performance. The objective of this approach is to stop trainingbefore the model starts fitting noise. The results of many researchers haveprovided strong evidence for the efficiency of stopped training. At the sametime, it has been shown that for finite validation set there is a dispersionof stopping points around the best stopping point, and this increases theexpected generalization error. Other obvious problems are: there is no guar-antee that the validation curve passes through the optimal point, it maygo up and down many times during training. The validation set is again alimited sampling and may not represent the universe. It also requires cru-cial decisions regarding selection of training and validation sets, and thenumber of examples to be divided into these two sets. Selection of whatstrategy to be followed - leave-one-out, cross-validation, bootstrapping, orbagging - is another issue.

In accordance with No Free Lunch theorems64, there is no reason, inthe absence of prior information about the problem, to prefer one learningalgorithm or the model to another. This theorem also establishes that forany algorithm, any elevated performance over one class of problems is offsetby the performance over another class. On the other hand, given a finite setof feature values, the Ugly Duckling theorem17 states that in the absenceof assumptions, there is no privileged feature representation. However, theMinimal Description Length principal17 prefers one type over another -specifically, simpler over the complex ones. It is unarguably accepted thatsimpler the model and the algorithm, the superior is the generalization.Therefore, in this work, we facilitate (i) data partitioning into simpler do-mains,(ii) smoother decision boundaries by (possibly) excluding variationsof noise, and (iii) use simpler features for decomposition into sensible par-titions to minimize the model and learning complexity which is expectedto yield improved prediction accuracy and offer better generalization.

17.2.3. Multiobjective Evolutionary Algorithms (MOEA) &Real-World Applications (RWA)

In this sub-section, we briefly review the issues related to the use of multiob-jective evolutionary algorithms in solving real-world applications. Since thewhole book-volume is compiled for real-world applications, we mention onlythose factors which we address and use in solving the partitioning problem.

402 Rajeev Kumar

(For detailed coverage of multiobjective genetic optimization - see Deb15

and Coello et al.13; a current list of references is maintained by Coello12).We classify multiobjective optimization problems into the following

three distinct classes :

(i) Class A - Optimization problems which can be represented by an-alytical functions.

(ii) Class B - Combinatorial optimization problems in NP which canbe verified in polynomial time, e.g., 0 - 1 knapsack, Hamiltonianpath and ^-clique problems. For this class of problems, many (1 +e) approximation algorithms exist; e is usually a small quantity27.

(iii) Class C - Combinatorial optimization NP-hard problems for whichpolynomial-time good approximation algorithms are not known. Itis difficult to approximate Pareto-front for this class of problem.We consider the partitioning problem in this class.

Problems falling in Class A are most studied problems and numerousstudies have been done on many functions to study various aspects of theseproblems including multi-modality and deception - see, for example, Deb14.Solution space of such problems is known a priori, and can otherwise beobtained by many off-the-shelf tools. Many such problems have become de-facto standards for benchmarking and comparing the performance of newerMOEAs with the other well known algorithms. In fact, they serve as a fittingbetween the obtained solution-space and the desired one rather than solv-ing a problem. Therefore, such problems have been extensively researchedto evaluate the efficacy of (i) genetic operators used in exploring the searchspace, (ii) producing/preserving diversity across the Pareto-front, and (iii)assessing the convergence by measuring the closeness of the obtained solu-tions to the real Pareto-front.

Many metrics have been proposed for quantitative evaluation of thequality of solutions23'33'59'66'67. Essentially, these metrics are divided intotwo classes:

• Diversity Metrics : Coverage and sampling of the obtained solutionsacross the front, and

• Convergence Metrics : Distance of the obtained solution-front fromthe (known) optimal Pareto-front.

Some of these metrics (e.g. generational distance, volume of space covered,error-ratio measures of closeness of the Pareto-front to the true Paretofront) are only applicable where solution is known. Other metrics (e.g., ratio


of non-dominated individuals, uniform distribution) quantify the Pareto-front and can only be used to assess diversity.

17.2.3.1. Achieving Diversity

Many techniques and operators have been proposed to achievediversity13'15. The commonly used techniques for preventing genetic driftand promoting diversity are: sharing, mating restrictions, density count(crowding) and pre-selection operators. These approaches can be groupedinto two classes: parameter-based and parameter-less. The niching/sharingtechniques have been commonly employed to find a diverse set of solutionsalthough such techniques work best when one has a priori knowledge of thesolution. On knowing the number of niches, a sharing function using someuser-defined parameters computes the extent of sharing and produces mul-tiple (near-) optimal solutions. Some work has been done on parameter-lessMOO too. Most of the work has been done to test the efficacy of the EAsin solving known problems rather than solving the problem per se.

In summary, most explicit diversity preserving methods need priorknowledge of many parameters and the efficacy of such mechanisms de-pends on successful fine-tuning of these parameters. Interestingly, in a re-cent study, Purshouse & Fleming48studied the effect of sharing on a wide-range of two-criteria benchmark problems using a range of performancemeasures and concluded that sharing can be beneficial, but can be provesurprisingly ineffective if parameters are not properly tuned. They statisti-cally observed that parameter-less sharing is more robust than parameter-based equivalents (including those with automatic fine-tuning during pro-gram execution).

Nonetheless, most of the MOO algorithms, e.g., MOGA18, NSGA-II16,PAES32 and SPEA265, use some diversity promoting mechanism in someform or the other, and have been successfully applied to many problemswhich can be represented by analytical functions. Some recent work includestreating diversity as another objective to be optimized60.

17.2.3.2. Monitoring Convergence

We have classified real-world problems into two groups - Class B and ClassC - the first whose solution is known o priori or can be approximated bysome means and second, those for which the solution space is unknown. Forclass A problems, tolerance limits or achievable percentages of defined goalscan give some indication of solutions moving towards goal-convergence, and

404 Rajeev Kumar

thus solutions obtained by genetic optimization could be compared. Forexample, the 0 - 1 knapsack problem has been attempted by many EAresearchers and several approximate Pareto-front have been obtained, e.g.,Zitzler & Thiele66. Many metrics are available in the literature, e.g., Zitzler& Thiele66 and Tan et al.59 which measure the diversity of the obtainedPareto-front and the distance between the obtained front and the desiredone. Thus, efficacy of the genetic implementation may be measured andresults obtained by the genetic optimization verified. However, for prob-lems where we have neither prior knowledge nor any approximation of thesolution space, the issue of convergence is an important issue.

Therefore, such real-problems can not be recasted in the form of ananalytical function, prior visualization of the solution set is not possi-ble, and proper selection of the niche parameters is difficult. Secondly,species formation in high-dimensional domains does not scale well and isa computationally-intensive task. Moreover, although the sharing/matingrestrictions employed by various authors partly solve the problem of pre-mature convergence, they do not necessarily guarantee overall convergence.Some recent studies have been done on combining convergence with di-versity. Laumanns et al.43 proposed an e-dominance for getting an e-approximate Pareto-front for problems whose optimal Pareto-set is known.Their technique does not work for unknown problems. Similarly, Bosman& Thierens attempted to combine diversity and convergence too5.

In real-world search problems belonging to Class C, the location of theactual Pareto-front is, by definition, unknown and the identification of the'best value' of some criterion does not necessarily mean global convergence.In problem domains of low objective dimensionality, the Pareto-front canbe examined for genetic diversity but not for convergence; high-dimensionalobjective spaces cannot generally be visualized for either diversity or con-vergence. (Some performance metrics, e.g., volume of space covered, dis-tribution (Tan et al.59) can provide information about diversity alone.)Knowledge of the propagation of the solution front through successive gen-erations of the population, however, can serve as a clue for convergence.

Viewed as a statistical sampling problem over the objective space, justbecause a given solution point dominates all others in the (finite) samplesdoes not imply that it is drawn from the Pareto optimal set - the given non-dominated point could itself be dominated by another, yet undiscovered,solution which in turn, need not necessarily be drawn from the Pareto-optimal set. In the past, a simple upper bound on the number of genera-tions/iterations has been used as a stopping point while others have em-


ployed the production of some percentage of non-dominated individuals inthe total population as a stopping criterion. The first of these is unsatisfac-tory since either a large amount of processor time could be wasted producingfurther generations for an optimization which has already converged; alter-natively there is no way of knowing that a particularly stubborn problem isstill far from convergence. The second option is ill-conceived since solutionsare non-dominated relative to the population sample not the universe of op-timal solutions. In this context, the rank-histograms, proposed by Kumar& Rockett39'41 monitor convergence of the Pareto-front for problems of un-known nature; assessing convergence does not need any a priori knowledgefor monitoring movement towards the Pareto-front.

17.2.3.3. Avoiding Local Convergence

For solving unknown problems there is a common concern whether theobtained solution is close to the true Pareto-front or not. Due to the finitesize of the population, from any initialization of the population there is afinite set of genetic material to be permuted and combined. Towards theend of the EA run most of the chromosomes will become rather similarand so crossover becomes a weak driver for population advancement; mostof the gains will effectively be made by random walk due to mutation. Atsome stage, the rate at which population improves slows down and fewfurther gains of significance are achieved. Hopefully, here the EA will haveconverged to the Pareto front but conceivably it may have got stuck at somesub-optimal point.

This can be the case even with simple analytical problems. While work-ing on a known bimodal problem by recasting it into a two-objective one,Deb14 concluded that he could not avoid the population getting stuck to alocal Pareto-front in spite of fine-tuning of the diversity preserving opera-tors and continuing the optimization for a very large number of generations.Kumar & Rockett41 investigated the same problem and also got identicalresults without any diversity promoting mechanism.

For such cases, there is little point in continuing the optimization and itshould be terminated. We argue that there is always a certain inheritanceof genetic material or content belonging to one population and there maynot be much appreciable evolutionary gain beyond a certain number ofgenerations. This implies that the genetic precursors available within a finitepopulation may be inherently incapable of evolving to the true Pareto-front.Instead, we suggest that alternative genetic material should be acquired

406 Rajeev Kumar

in the form of another population. Each population sample is run to itsown convergence, the obtained solutions are then merged and tested acrosspopulations. Therefore, we suggest this strategy of EA optimization throughindependently initialized populations - as a test on convergence - to beparticularly suited to harder problems of unknown nature37'41.

17.3. Problem Formulation

If we consider partitioning of a pattern space as a mapping P from an Ndimensional space to j subspaces of dimensionality rij, then the formulationis an JV-dimensional function decomposition into many rij - dimensionalsub-functions subject to certain criteria, Obj-fi(X). Since nj represents(hopefully) a less complex domain, a learner can approximate such a sub-domain with less effort; one of the measures of complexity is the localintrinsic dimensionality which is computed by the Principal ComponentAnalysis (PCA)46. (In principle, we refer to the intrinsic dimensionality forreferring to learning complexity rather than the true dimensions.)

Fig. 17.1. Approximation of a 2D function by a set of (approximately) linear functions.

Figure 17.1 illustrates a simple case of decomposing a 2D function intomany sub-functions. Here the 2D function is parameterized by a series ofintrinsically ID functions and although in this illustrative example, thereduction in dimensionality is trivial, this will not be the case for higher di-mensional spaces. Linear functions represent the simplest problem domainsto be learnt by a machine learning algorithm and such functions can belearnt by a simple input-output mapping.

Alternatively - in Figure 17.2 - the hyperspherical clusters can enclosepattern blobs and two situations can arise in practice: (i) all the patterns ineach hypersphere belong to a single class, i.e. partitioning alone demarcatesthe decision boundaries and no explicit classification stage is needed, and(ii) the patterns belong to multiple classes necessitating the use of some


Fig. 17.2. Partitioning of the pattern space in to (a) disjoint clusters, and (b) overlappedclusters. Outliers may be excluded.

post-partitioning classifier. Outliers may still be excluded. Partitioning isnot unique. Many probable partitioned blobs can exit. For example, weshow two exemplar cases of hyperspheres enclosing patterns in Figure 17.3.

Fig. 17.3. Partitioning of pattern space into clusters. Enclosing hyperspheres can belocated in many ways. Two situations are depicted.

The basis for such a partitioning is that the clusters are generated onthe basis of fitness for purpose, i.e., they are explicitly optimized for thesubsequent mapping onto a machine learner for subspace learning. Thisapproach transforms the problem-dependent partitioning task, in a genericmanner, by dividing the pattern space into a set of hyperspherical regionsby a set of objectives for optimizing the performance; the data within eachsphere being learned by individual learners are then combined.

We perform optimization on a vector space of objectives and explorethe search space for a set of equally viable partitions of the pattern space.We identify the following three subsets of objectives for partitioning thepattern space, optimizing learning efforts, and improving generalization.

408 Rajeev Kumar

I. Maximize Modularity

i. Minimize learning cost : we consider learning cost as a func-tion of intrinsic dimensionality46'62. For example, for one par-titioning case shown in Fig. 17.1, the intrinsic dimensionalityis effectively one. We have taken a conservative estimate in de-termining the intrinsic dimensionality and included the com-ponents up to some proportion, say 0.95, of the total variancewithin a hypersphere as the determining criterion of intrinsicdimensionality. Thus, our objective is to minimize the averageintrinsic dimensionality of the subspaces. Assuming learningcomplexity of a machine learner to be a quadratic or higherorder functions, this will yield substantial reduction in com-putation. For example, in a feedforward network, this is of theorder of O(N3) - see Hinton26.

ii. Minimize number of partitions : we wish to maximize the mod-ularity, but it should be based on minimizing the overall train-ing effort. Alternatively, this objective can be withdrawn andthe number of partitioning hyperspheres can be specified inadvance based on some prior knowledge of the problem do-main.

iii. Minimize overlap of partitions : we do not aim at getting com-pletely disjoint partitions emerging out of the partitioning.However, we aim to avoid repetition of learning effort on sim-ilar sets of patterns in different modules but allow some over-lap of hyperspheres to prevent the formation of a no-man'sland between the partitions.

II. Maximize Generalization Ability

iv. Maximize data density : this measure attempts to producecompact solutions, and thus minimizes the probability of tak-ing a random-walk by decision surfaces in no-man's land. Weconsider the number of patterns included within a partition,normalized by the surface content of the hypersphere as thedata density measure.

v. Maximize regularity of decision surfaces : this objective aimsat increasing the learning accuracy by regularizing decisionsurfaces which is, however, difficult to quantify. For this, weconsider the nearest neighbor classification error to indicatehow well the partitions preserve the total structure of the


pattern space as a separability measure.vi. Minimize validation errors of multiple learners : aims at get-

ting the validation errors for few machine learning algorithmsand confirming the suitability of machine learners for the re-sultant partition.

III. Generic Measures for Search

vii. Maximize fraction of excluded patterns of each class : we aimto include within all the partitions as many training patternsas possible from each class. Hopefully, outliers within the pat-tern space can be excluded because the objective does not aimto include all patterns. The purpose is that the decision sur-faces should not be formed by simply omitting the patternsbelonging to the minority class (es).

viii. Maximize inclusion of patterns of a single class in a singlepartition : for such partitions, there is no need for any post-partition learning efforts; the patterns can be unambiguouslylabeled just by an inclusion membership.

ix. Maximize equitable distribution of patterns : this aims at hav-ing a balanced training set for post-partition learning. Wewish to minimize partitioning where partitions have imbal-anced training set; (generalization ability is usually poor foran imbalanced training set).

The above elements in the objective vector are distinct, competing andcomplimentary. From the obtained set of solutions, a small subset based onsubranking of objectives is picked for subsequent learning.

In our technique clusters are explicitly optimized for their subsequentmapping onto the machine learner - rather than emerging as some implicitproperty of the clustering algorithm30. Most traditional clustering algo-rithms rely on some similarity measure; they may also fail to converge to alocal minimum53. Some work has been done, e.g., Chang & Lippman11 andSrikanth et al.56, for obtaining optimal clusters, however, we did not aimto get optimal trade-offs but tried to find good approximations.

In conceptualization, the proposed strategy for feature space decompo-sition has strong links to the recursive partitioning algorithms of Henrichon& Fu25 and Friedman19 for non-parametric classification using hyperplanesparallel to feature axes. Similarly, we aim to partition feature spaces intosubspaces and their corresponding mapping onto multiple machine learners.

410 Rajeev Kumar

17.4. MOEA for Partitioning

The partitioning problem as formulated in the previous section belongs tothe class C category of problems as per the classification scheme defined insection 17.2.3. From evolutionary algorithm's point of view, the problem ischaracterized by the following features:

• No a priori knowledge of the solution space is available. For thepartitioning problem involving multiple objectives, solution space ismostly discrete, and there may not exist any well defined uniformlydistributed spread of solutions across the Pareto-front.

• Bounds of the objective space are not known. None of the objec-tive value is known nor any of the optimal-points in the objectivespace is known. There exists no information regarding niches orlocal/global minima/maxima.

• This is an NP-hard combinatorial optimization problem and nopolynomial-time good approximation algorithm is known.

• Problem formulation as given in section 17.3 is new; no previouslyobtained result for the partitioning problem is available to com-pare/validate.

While solving an optimization problem using evolutionary algorithmmost of the EAs need the following:

• For diversity: Some information about the niches or the shape of thePareto-front to make the diversity preserving mechanism effective.

• For convergence: Some information about the final solutions whichusually serves as the stopping criterion. In the absence of suchan information, the algorithm is run for a very large number ofiterations and thus, wasting CPU time.

• Avoidance of local optima: Information about the existence of anylocal minima/maxima in the Pareto-front is not available.

Therefore, in this work, we use an algorithm implementation which doesnot need any explicit sharing mechanism and does need an approximatePareto-front to check for convergence. We choose an algorithm whose fea-tures are as follows:

• Implicitly achieves diversity by varying the selection pressure with-out any knowledge of the problem domain,

• Monitors convergence (without any knowledge of the problem do-main) to terminate the further evolution of generations, thus avoid-


ing wastage of computing resources,• Preserves known, good solutions, and• Check for the avoidance of a run getting stuck at a local optima

by an independently initialized population approach. This is essen-tially a test for (near-)global convergence.

To the best of our knowledge, the Pareto Converging Genetic Algorithm(PCGA)41 is the only multiobjective optimization algorithm which doesnot need any problem-dependent knowledge and monitors convergence forunknown solution space.

17.4.1. The Algorithm

The Pareto Converging Genetic Algorithm (PCGA) used in this work is asteady-state algorithm and can be seen as an example of (// + 2) evolu-tionary strategy in terms of its selection mechanism. In this algorithm, theindividuals are compared against the total population set according to a tiedPareto-ranking scheme18 and the population is selectively moved towardsconvergence by discarding the lowest ranked individuals in each evolution.In doing so, we require no parameters such as the size of sub-population intournament selection or sharing/niching parameters.

Initially, the whole population of size N is ranked and fitness is assignedby interpolating from the best individual (rank = 1) to the lowest (rank< N) according to some simple monotonic function. A pair of mates israndomly chosen biased in the sizes of the roulette wheel segments andcrossed-over and/or mutated to produce offspring. The offspring are in-serted into the population set according to their ranks against the wholepopulation and the lowest ranked two individuals are eliminated to restorethe population size to N. The process is iterated until a convergence cri-terion based on rank-histogram is achieved. For details of the algorithms -see Kumar & Rockett39'41.

If two individuals have the same objective vector, we lower the rankof one of the pair by one; this way, we are able to remove the duplicatesfrom the set of nondominated solutions without loss of generality. For ameaningful comparison of two real numbers during ranking, we restrict thefloating-point precision of the objective values to a few units of precision.This algorithm does not explicitly use any diversity preserving mechanism,however, lowering the rank of the individual having the identical objectivevector (with restricted units of precision) is analogous in some way to a sortof sharing/niching mechanism (in objective space) which effectively controls

412 Rajeev Kumar

the selection pressure and thus partly contributes to diversity (For otherfactors that contribute to diversity, see PCGA41). The algorithm has beentested on many benchmark analytic functions along with many real-worldcomplex problems for producing diverse solutions on the Pareto-front; someresults are reported in Kumar et al.40'37.

17.4.2. Chromosome Representation

To represent sub-space partitions, we use variable length individuals whereeach sub-block encodes the hypersphere centre and radius. Each unit of achromosome is a real number and (N + 1) such units form a block whereN is the dimensionality of the pattern space. Variable length of individualsare necessitated by the fact that the number of clusters emerging from thesearch is unknown and so the number of clusters in the (near-) optimalsolution is also evolved genetically such that the number of blocks forminga chromosome represents the number of partitions of the pattern space.

17.4.3. Genetic Operators

For the genetic search we employed a single point cross-over operation onthe hypersphere limits and a Gaussian mutation. The crossover point istaken on the boundaries between hypersphere description records to pre-vent the formation of illegal chromosomes. Apart from meaningful recombi-nation, this has the added advantage that good clusters can be retained butshuffled among solutions which is needed to get (near-) optimal partitions.We use the mechanism of adding zero-mean Gaussian noise to the centrecoordinates and the radii of the hyperspheres for mutation.

17.4.4. Constraints & Heuristics

We draw the constraints on the decision variables from the bounds of thepattern space. We have investigated two approaches for initializing the chro-mosomes: In one approach the cluster centers were randomly initialized andin the other a hypersphere was centered on a randomly selected data pat-tern. The second approach of seeding a chromosome proved particularlywell-suited to sparsely populated pattern spaces and thus significantly re-duced the search effort.

The search operations are further minimized using a few heuristics. Oneheuristic acts on the upper bound of the radii of a hypersphere where theupper bound is divided by the square-root of the dimensionality of the


space. This heuristic prevents the potential inclusion of all the patternsinto each partition of the feature space. Complementary to this, anotherheuristic limits the minimum fraction of the patterns included in a hyper-sphere: partitions containing less than some fraction of the total patternsare prevented from forming a separate cluster. The other heuristic actson the maximum number of clusters forming a solution since preventingthe number of partitions from becoming arbitrarily large helps restrict thesearch in the space of many (partially or wholly overlapped) clusters.

In spite of the constraints and heuristics, exploring such a search spacefor a set of equally viable partitions of the pattern space is a complexoptimization. One simplification is to look for some predetermined numberof clusters. This can be useful if one has prior knowledge of the patternspace from, say, viewing the data with standard ordination techniques, orone can tune the computation to some fixed number of partitions afterbecoming acquainted with the nature of the solutions obtained during theinitial EA runs.

17.4.5. Convergence

We use Intra-island rank histograms to monitor the rate of convergencewithin a single population, and /nier-island rank histograms to combineevidence about the satisfactory convergence of a series of EA runs41. Bothtypes of the histograms together do not guarantee a true convergence, how-ever, they do help us in approximating the convergence and avoiding thewastage of compute resources.

Fig. 17.4. Two sets of Intra-Island Rank Histograms. The decreasing tail of (b) indicatesthe movement of total population towards convergence.

Intra-Island Rank Histogram entries are generated from the ratio of thenumber of individuals at a given rank in the current population to thatof a combined and re-ranked populations of the current and the preceding

414 Rajeev Kumar

epochs. We are interested, for convergence, in the shift of the set of non-dominated solutions between epochs, hence the ratioing of rank entries.Two typical Intra-island rank histograms are shown in Figure 17.4 in whicha decreasing length of histogram tail denotes the movement of total popula-tion towards convergence. The rank-ratio of the bin belonging to rank unityshould remain at 0.5 in an ideal converged state which shows that the wholepopulation in two successive epochs remains non-dominated without anyshuffling, though this is not an indicator of a convergence (Figure 17.4(b)).

For solving complex multiobjective optimization problems there is acommon concern whether the obtained solution is close to the true Pareto-front or not. We argue that there is always a certain inheritance of geneticmaterial or content belonging to one independent run and there may not bemuch appreciable evolutionary gain beyond a certain number of generations.

We run each population sample to an approximation of (intra-island)convergence, the obtained solutions are then merged across islands andcompared through Pareto-ranking. The shift of the Pareto-front is moni-tored with an inter-island rank histogram. In a scenario when some of thenon-dominated solutions of either of the contributing islands are demotedto being dominated, the inter-island rank histograms are depicted in Figure17.5. The smaller the entry in the bin corresponding to unity rank and thewider the histogram tail, the larger is the shift in the set of best solutionsand the greater the reshuffling that has taken place. The desired outcomefrom merging the non-dominated members of two or more islands is thatnone of the non-dominated solutions is down-graded to dominated statusand all solutions combine to form a similar or better sampled Pareto-front;the inter-island rank histogram of the combined solutions indicates unityin the bin corresponding to non-dominated rank.

Fig. 17.5. Two samples of Inter-Island Rank Histograms. The shift of Pareto-front isindicated by a tail and improved diversity by a larger value in the bin corresponding tounity rank.


We use the PCGA which naturally performs good sampling of the solu-tion space and ensures population advancement towards the Pareto-front.We compute both Intra-island and Inter-island rank histograms for moni-toring the convergence. From the obtained set of solutions, a small subsetbased on sub-ranking of objectives were picked for learning. The ANCHORconnectionist architecture35 which was developed for integrating multipleheterogeneous learners is particularly suitable for hierarchical learning ofsubspaces. For this, the Net Definition Language (NDL)36 can be used forspecifying the interfaces needed for connectionist architecture. This frame-work can be easily extended to include other paradigms of machine learning.

17.5. Results and Discussion

In this section, we include a few representative results to demonstrate the'proof-of-concept' that the partitioning strategy proposed here works wellfor complex problems of machine learning. More importantly, we show thatthe optimization problem of unknown nature in high-dimensional objec-tive space could effectively be solved to obtain quality solutions by properdesigning of multiobjective evolutionary algorithm and without needing apriori knowledge of the solution space.

From a machine learning perspective, the efficacy of the partitioningstrategy coupled with the genetic search should result into the partitionsthat address (a few of) the following:

• Are the partitioned pattern subspaces compact?• Do the partitions contain patterns of a single class alone?• Are we able to exclude outliers and maximize the included pat-

terns?• Is the total training time of multiple partitions lesser than the time

needed for the monolithic data?• Are the validation errors minimized resulting in superior general-

ization?

And, from a genetic optimization perspective, the efficacy of the algorithmimplies obtaining diversity along the high-dimensional objective surfacewhile ensuring a (near-) optimal convergence.

For the above, we applied the partitioning strategy to a range of knownsynthetic problems as well as unknown real machine learning problems. Wegenerated synthetic data with known structures and so that the resultingpartitions can be evaluated. We generated two types of data - one, where

416 Rajeev Kumar

all blobs of patterns belonging to different classes were separated, and theother, where they overlap. We generated many sets of data by varying thefollowing parameters: true dimensions of data (six to thirty-six dimensions),effective dimensionality (three to twelve dimensions), separation of blobs(well separated to just separated), multi-class data (two to four class data),and overlap of different degree among blobs. For each of the datasets, wegenerated two sets of data - one training set which is used for partitioning,and other validation set used for validating the partitions and the machinelearning results.

First, we partitioned synthetic data from two, 3-dimensional Gaussianblobs embedded in a 6-dimensional space and within this, we examined twocases: one where the two blobs are just separated and the other where theyoverlap. For the just-separated data a large number of equivalent solutionsevolved, most of which comprised two clusters of three intrinsic dimensions,each containing only data from the separate classes, although some solutionscontained exemplars from both classes. From the point of view of the EA,all non-dominated solutions are equivalent but some may be more desirablein practice. For the overlapped Gaussian blobs, the EA produced partitionsof intrinsic dimensionality of 3 or 4 containing a fraction of data from theother class, both of which would be expected for this dataset. Positioningtwo hyperspheres on the (known) Gaussian centers and carrying-out anexhaustive search for the two best hypersphere radii produced partitionswhich were comparable to the typical EA results indicating that the EA wasfinding close-to-optimal clusters for both cases. This exercise was aimed atproof-of-principle that the genetic optimization technique works, and thatthe learning effort can be reduced with near-optimal partitions.

We also considered the partitioning of a four-class synthetic problem intwelve variables; here each Gaussian blob was of three (mutually exclusive)dimensions and just separated from the others. This proved to be a bitharder than the six dimensional problem. Most of the family of equivalentsolutions produced comprised four clusters of three intrinsic dimensions;there was some overlap among them. Nonetheless, we were not aiming atperfect solutions but we are interested in (minimally) overlapped solutionsso that potentially, all the volume of interest could be mapped for sub-sequent classification. A number of Pareto-equivalent solutions, however,contained seven or eight dimensional hyperspheres. We experimented withother datasets as well.

Next, we consider many sets of high-dimensional data sets taken fromUCI repository of machine learning databases4. We investigated the be-


havior of the partitioning algorithm in two modes - (i) variable number ofclusters emerged out of the genetic search, and (ii) we fixed the number ofclusters to two, four, six and eight in respective clustering runs dependingon viewing the data with standard ordination techniques. In the followingparagraphs, we include a brief abstract detailing the characteristics of thepartitions emerged out of the genetic search. (However, a detailed descrip-tion of the results obtained from land use classification of multispectral im-age data can be seen in Kumar & Rockett40 - in this work, a seven-elementobjective vector was designed for feedforward neural learning.)

Typically, the following three types of partitions emerged out of thepartitioning algorithm - Figure 17.6:

(i) Type - I partition contained members of one class only,(ii) Type - II partition contained roughly an equal split of members of

each class; mostly in the range of 40 : 60 split for a two-class data,(iii) Type - HI partition contained most members of a single class and

a few, say 5%-10% members of the other classes, and finally(iv) Outliers (non-clutered members) are not included in any partition.

The above is the general spectrum of solutions obtained by genetic opti-mization across a range of high-dimensional machine learning datasets. (Wedid not experiment with datasets of lower dimensions.)

Fig. 17.6. Three types of clusters emerged form the partitioning algorithm. Some pat-terns may not be included in any hypersphere.

Majority of the partitions contained samples of a single class. So onceinclusion within a hypersphere was established, labeling an unknown datumwas trivial. This category of clusters does not require any post-partitioning

418 Rajeev Kumar

effort for learning since to label an unknown point it is sufficient to de-termine in which cluster it is included. The effectiveness of such labelingwas confirmed by the fact that in all the several thousand Pareto-optimalclusters we examined, we did not find any case where a cluster containinga single class of training data was subsequently found to include a singlemember of the other class from the test data. Thus, a pattern belonging tosuch a cluster is implicitly labeled without ambiguity.

In the second category of partitions, mapping the clustered data forlearning on to a certain type of machine learning algorithm, e.g., connec-tionist architecture is fairly straightforward since the roughly equal numbersof exemplars from each class together with the reduced size of the subset tobe learned both simplify training. This also confirmed that for most suchclusters the validation errors are fewer. For learning with nearest neigh-bor (NN) type of classifier, classification of an unknown point required farfewer nearest neighbor distance calculations than needed for classificationbased on the whole training set. Thus, in A;-NN classification, our parti-tioned dataset gave error rates which were not degraded over a monolithicclassifier, but the time to compute a label was reduced significantly.

For the third category of cluster, some of the machine learning algo-rithms are well-known to pose problems for learning such an unbalanceddata set. For example, a feedforward network could be used but specialmeasures are required to accommodate the unbalanced training2. Alterna-tively, a /c-NN classifier could be employed to decide the final classificationwithin a hypersphere with less computation than would be required fornearest neighbor classification on the whole training set although clearly,unless at least k members of the minority class are included; otherwise theclassification effectively degenerates to the first category. As a further op-tion, a small fraction of examples within a hypersphere could be ignored atthe cost of a minute increase in error rate by treating all included patternsas from the grossly dominant class.

The objective was not to include all training data within the clustersand so some data points were excluded from the solutions and these arepotentially outliers. The presence of outliers in a training set is known topose problems and they can degrade the performance. In our strategy, iso-lated eponymous outliers may well be discarded since the relevant objectivetries only to maximize the number of patterns utilized. Similarly, clusters ofoutliers caused by some systematic measurement failure are likely to gener-ate their own hypersphere which may well be significantly separated fromother patterns with the same class label.


The strategy adopted in this work also supports the concept of en-semble based approaches. In an ensemble approach, we suggest that onlythose clusters where more than one class is represented need to be multiplymapped on suitable classifiers. Thus, the principal advantage of our par-titioning approach is that only those patterns which lie near the decisionboundaries warrant learning effort. We also propose a way of dealing withmultiple instances of patterns (which is possible because one of the objec-tives directly promotes some overlap): the clusters can be assigned prioritiesbased on inclusion of patterns of a single class and if a pattern is includedin more than one cluster having different priorities, it can be safely assignedto the class of the higher-priority cluster. These additions may well enhancethe accuracy of ensemble-based approaches and the functional-simplicity ofmodular systems.

In general, it is difficult to compare the overall performance of machinelearning algorithms, in absolute terms of mis-classification errors. Most ofthe learning algorithms give different error-rates with different set of pa-rameters for the same dataset. Therefore, we do not include a quantitativecomparison of the error-rates. In our work, we have observed, in general,that the classification rate was not reduced, which is highly significant. Inmost cases, classification accuracy, however, was improved, the improve-ment may not be statistically significant.

Another major advantage of this strategy is that the total computationalefforts can be divided in to off-line and on-line efforts. Total computationfor the genetic search is significant which is off-line. The off-line learningof individual partitions is reduced. Nonetheless, each of the partition canbe multiply learnt by different algorithms, and is labeled with a confidencelabel. During recall, which is on-line, this information is used, and an un-known datum is labeled with higher confidence and much reduced compu-tational efforts. For example, if the unknown pattern belongs to the firstcategory of cluster, the pattern is unambiguously labeled just by knowingthe inclusion membership in constant time. Thus, the computational effortfor training and recall is significantly reduced; the generalization ability isimproved too.

17.6. Summary & Future Work

We have discussed the issues in solving difficult problems of machine learn-ing. High dimensional data rendered by real-world applications is mostlynoisy containing distorted and overlapped patterns. Scaling of machine

420 Rajeev Kumar

learning algorithms and models, and achieving good generalization of learntmappings for real-world data are challenging tasks. As a result, modularsystems and ensemble based approaches have emerged as the potential solu-tions to solving complex domains of intelligent models. Whereas modularityaddresses the drawback of scalability, ensemble-based approaches empha-size on improving the accuracy for a superior generalization.

In this work, we have presented a generic framework for solving complexproblems of machine learning by addressing both - modularity for simpli-fication of learning complexity and ensembles for multiple learning effortsto improve upon the prediction accuracy. We identified a set of objectivesand formulated the partitioning problem, without needing any applicationspecific knowledge, as a multi-criteria optimization problem, involving com-petitive and conflicting objectives. Such a partitioning problem is hard.

We used a MOEA for partitioning and adopted an algorithm whichproduces diverse solutions and monitors convergence without needing apriori knowledge of the partitions emerging out of the optimization criteria.We also employed a distributed version of the algorithm and generatedsolutions from multiple tribes to ensure convergence. However, we did notaim to get optimal trade-offs but tried to find a good approximation, i.e.,a set of solutions whose objective values are hopefully not too far fromthe (unknown) optimal objective values. We tested this approach, first onsynthetic data of known characteristics and of varying complexities to assessthe efficacy of our approach and the MOEA-implementation. We observedthat the implementation used in this work was able to find partitions withthe desired trade-offs. Then, we tested our approach on many other data-sets taken from real applications.

The partitioning strategy adopted here is a divide-and-conquer strat-egy for scalability and explicitly optimizing the patterns for subsequentmapping on to multiple learners and was found to reduce the learning com-plexity. The other merit of this work is to have multiple views of a partitionwhich supports the concept of ensemble-based approaches to improve thegeneralization ability. We have observed while working on many datasets,that data-partitions which contain only one-class of data in a single parti-tion, do not require any further processing and are explicitly labeled withoutany ambiguity. In a partitioned ensemble-based approach, we suggest thatonly those clusters where more than one class is represented needs to bemultiply mapped on suitable classifiers. Thus, the principal advantage ofour partitioning approach is that only those patterns which lie near de-cision boundaries warrant the learning effort, possibly multiple efforts for


enhanced accuracy. Thus, this evolutionary approach coupled with need-based multiple learning efforts simplifies the functional mapping, enhancesthe accuracy and offers better generalization.

There can be many possible extensions of this work. For example, wemay use hyperellipsoids instead of hyperspheres. In this work, we have usedhyperspheres since their chromosomal representation is quite compact com-pared to other geometric primitives. Another extension of the work may beto explore the already partitioned pattern space for further partitioning;such recursively-partitioned pattern spaces can be directly mapped to hi-erarchical classifiers.

Acknowledgments

Author gratefully acknowledges discussions with Peter Rockett and ParthaChakrabarti during different stages of this work. A part of the work issupported from the Ministry of Human Resource Development, Governmentof India project grant.

References

1. K. Ali and M. Pazzani. Error reduction through learning multiple descrip-tors. Machine Learning, 24(3): 173 - 202, 1996.

2. R. Anand, K. G. Mehrotra, C. K. Mohan, and S. Ranka. An improvedalgorithm for neural network classification of imbalanced training sets. IEEETrans. Neural Networks, 4(6): 962 - 969, 1993.

3. A. Atiya and C. Ji. How initial conditions affect generalization performancein large networks? IEEE Trans. Neural Networks, 8(2): 448 - 451, 1997.

4. C. L. Blake and C. J. Merz. UCI Repository of machine learning databases[http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: Univer-sity of California, Department of Information and Computer Science. 1998.

5. P. A. N. Bosman and D. Thierens. The balance between proximity and di-versity in multiobjective evolutionary algorithms. IEEE Trans. EvolutionaryComputation, 7(2): 174 - 188, 2003.

6. L. Breiman. Stacked regression. Machine Learning, 24(1): 49 - 64, 1996.7. L. Breiman. Bagging predictors. Machine Learning, 24(2): 123 - 140, 1996.8. L. Breiman, J. H. Friedman, R. A. Olshen, and C. J. Stone. Classification

and Regression Trees, 1984. New York, NY: Chapman k. Hall.9. W. Buntime. Learning classification tress. Statistics & Computing, 2: 63 - 73,

1992.10. P. Chan, S. Stolfo, and D. Wolpert. Working Notes AAAI Workshop Inte-

grating Multiple Learned Models for Improving and Scaling Machine Learn-ing Algorithms, 1996. Menlo Park, LA: AAAI Press.

11. E. I. Chang and R. P. Lippmann. Using genetic algorithms to improve pat-tern classification performance. In Advances in Neural Information Pro-

422 Rajeev Kumar

cessing System, R. P. Lippmann, J. E. Moody and D. S. Touretzky, Eds., 3:797 - 803, 1991. San Mateo, CA: Morgan Kaufmann.

12. C. A. C. Coello. List of References on Evolutionary Multiobjective Opti-mization. [http://www.lania.mx/~ccoello/EMOO/EMOObib.html].

13. C. A. C. Coello, D. A. Van Veldhuizen, and G. B. Lamont. Evolutionary Al-gorithms for Solving Multi-Objective Problems, 2002. Boston, MA: Kluwer.

14. K. Deb. Multi-objective genetic algorithms: problem difficulties and con-struction of test problems. Evolutionary Computation, 7(3): 205 - 230, 1999.

15. K. Deb. Multiobjective Optimization Using Evolutionary Algorithms, 2001.Chichester, UK: Wiley.

16. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A Fast and Elitist Mul-tiobjective Genetic Algorithm: NSGA-II. IEEE Trans. Evolutionary Com-putation, 6(2): 182 - 197, 2002.

17. R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification, 2nd Edition,2001. New York, NY: Wiley.

18. C. M. Fonseca and P. J. Fleming. Multiobjective optimization and mul-tiple constraint handling with evolutionary algorithms - Part I: a unifiedformulation. IEEE Transactions on Systems, Man and Cybernetics-Part A:Systems and Humans, 28(1): 26 - 37, 1998.

19. J. H. Friedman. A recursive partitioning decision role for nonparametricclassification. IEEE Trans. Computers, 26(4): 404 - 408, 1977.

20. M. R. Garey and D. S. Johnson. Computers and Intractability: A Guide tothe Theory of NP-Completeness, 1979. San Francisco, LA: Freeman.

21. S. Geman, E. Bienenstock, and R. Doursat. Neural networks and thebias/variance dilemma. Neural Networks, 4(1): 123 - 1232, 1994.

22. L. Hansen and P. Salamon. Neural network ensemble. IEEE Trans. PatternAnalysis & Machine Intelligence, 12(10): 993 - 1001, 1990.

23. M. P. Hansen and A. Jaszkiewicz. Evaluating the quality of approximationsof the nondominated set. Tech. Rep., Inst. of Mathematical Modeling, Tech.Univ. of Denmark, IMM Tech. Rep. IMM-REP-1998-7, 1998.

24. S. Haykin. Neural Networks - A Comprehensive Foundation, Second Edi-tion, 1999. Englewood Cliffs, NJ: Prentice Hall.

25. E. G. Henrichon and K. S. Fu. A nonparametric partitioning procedure forpattern classification. IEEE Trans. Computers, 18(7): 614 - 624, 1969.

26. G. E. Hinton. Connectionist learning procedures. Artificial Intelligence, 40:185 - 234, 1989.

27. D. Hochbaum (Ed.). Approximation Algorithms for NP-Hard problems,1997. Boston, MA: PWS.

28. R. A. Jacobs. Methods for combining experts' probability assessments. Neu-ral Computations,!^): 450 - 463, 1995.

29. R. A. Jacobs and M. I. Jordan. Learning piecewise in modular neural net-work architecture. IEEE Trans. Systems, Man & Cybernetics, 23: 337 - 345,1993.

30. A. K. Jain and R. C. Dubes. Algorithms for Clustering Data, 1988. Engle-wood Cliffs, NJ: Prentice-Hall.

31. M. I. Jordan and R. A. Jacobs. Hierarchical mixtures of experts and the


EM algorithm. Neural Computation, 6(2): 181 - 214, 1994.32. J. D. Knowles and D. W. Corne. Approximating the non-dominated front

using the Pareto Archived Evolution Strategy. Evolutionary Computation,8(2): 149 - 172, 2000.

33. J. D. Knowles and D. W. Corne. On metrics for comparing nondominatedsets. In Proc. Congress on Evolutionary Computation (CEC-02), 711 - 716,2002. Piscataway, NJ: IEEE Press.

34. R. Kumar, On generalization of machine learning with neural-evolutionarycomputations. In Proc. 3rd Int. Conf. Computational Intelligence & Multi-media Applications (ICCIMA-99), 112 - 116. 1999. Los Alamitos, CA: IEEEComputer Society Press.

35. R. Kumar. ANCHOR - A connectionist architecture for partitioning fea-ture spaces and hierarchical nesting of neural nets. Int. Journal ArtificialIntelligence Tools, 9(3): 397 - 416, 2000.

36. R. Kumar. A neural network compiler system for hierarchical organization.ACM SIGPLAN Notices, 36(2): 26 - 36, 2001.

37. R. Kumar. Multicriteria network design using distributed evolutionary al-gorithm. In Proc. Int. Conf. High Performance Computing (HiPC), LNCS2913: 343 - 352, 2003. Berlin Heildberg: Springer-Verlag.

38. R. Kumar, W. C. Chen, and P. I. Rockett. Bayesian labeling of image cornerfeatures using a grey-level corner model with a bootstrapped modular neuralnetwork. In Proc. IEE Int. Conf. Artificial Neural Networks, 440: 82 - 97,1997. London: IEE Conference Publication.

39. R. Kumar and P. I. Rockett. Assessing the Convergence of Rank-BasedMultiobjective Genetic Algorithms. In Proc. IEE-IEEE 2nd Int. Conf. Ge-netic Algorithms in Engineering Systems: Innovations & Applications, 446:19 - 23, 1997. London: IEE Conference Publication.

40. R. Kumar and P. I. Rockett. Multiobjective genetic algorithm partitioningfor hierarchical learning of high-dimensional pattern spaces : a learning-follows-decomposition strategy. IEEE Trans. Neural Networks, 9(5): 822 -830, 1998.

41. R. Kumar and P. I. Rockett. Improved sampling of the Pareto-front inmultiobjective genetic optimizations by steady-state evolution : a Paretoconverging genetic algorithm. Evolutionary Computation, 10(3): 283 - 314,2002.

42. P. Langley. Elements of Machine Learning, 1996. Morgan Kaufmann.43. M. Laumanns, L. Thiele, K. Deo and E. Zitzler. Combining convergence and

diversity in evolutionary multiobjective optimization. Evolutionary Compu-tation, 10(3): 263 - 182, 2002.

44. T. M. Mitchell. Machine Learning, 1997. New York, NY: McGraw-Hill.45. C. A. Murthy and N. Chowdhury. In search of optimal clusters using genetic

algorithms. Pattern Recognition Letters 17(8): 825 - 832, 1996.46. E. Oja. Neural networks, principal components and subspaces. Int. J. Neural

Systems, 1(1): 61 - 68, 1989.47. D. C. Plaut and G. E. Hinton. Learning sets of filters using backpropagation.

Computer, Speech & Languages, 2: 35 - 61, 1987.

424 Rajeev Kumar

48. R. C. Purshouse and P. J. Fleming. Elitism, sharing and ranking choices inevolutionary multi-criterion optimization. Research Report No. 815, Dept.Automatic Control &; Systems Engineering, University of Sheffield, 2002.

49. J. R. Quinlan. C4-5: Programs for Machine Learning, 1993. New York, NY:McGraw Hill.

50. R. Reed. Pruning algorithms - a survey. IEEE Trans. Neural Networks, 4(5):740 - 747, 1993.

51. G. Rudolph and A. Agapie. Convergence properties in some multiobjectiveevolutionary algorithms. In Proc. Congress of Evolutionary Computation(CEC-00), 1010 - 1016, 2000. Piscataway, NJ: IEEE Press.

52. C. Schaffer. Overfitting avoidance as a bias. Machine Learning, 10(2): 153 -178, 1993.

53. S. Z. Selim and M. A. Ismail. K-means type algorithms: a generalized conver-gence theorem and characterization of local optimality. IEEE Trans. PatternAnalysis & Machine Intelligence, 6(1): 81 - 87, 1984.

54. A. J. C. Sharkey. Combining Artificial Neural Nets: Ensemble and Mod-ular Multi-Net System: Perspective in Neural Computing. Springer-Verlag,London, 1999.

55. N. E. Sharkey and A. J. C. Sharkey. An analysis of catastrophic interference.Connection Science, 7(3-4): 301 - 329, 1995.

56. R. Srikanth et al. A variable length genetic algorithm for clustering andclassification. Pattern Recognition Letters, 16(8): 789 - 800, 1995.

57. N. Srinivas and K. Deb. Multiobjective Optimization using Non-dominatedSorting in Genetic Algorithms. Evolutionary Computation, 2: 221 - 248,1994.

58. R. S. Sutton. Two problems with backpropagation and other steepest-descent learning procedure for networks. In Proc. Annual Conf. CognitiveSociety, 828 - 831, 1986. Hillsdale: Lawrence Erlbaum.

59. K. C. Tan, T. H. Lee, and E. F. Khor. Evolutionary algorithms for multi-objective optimization: performance assessment and comparisons. In Proc.Congress on Evolutionary Computation (CEC-01), 979 - 986, 2001. Piscat-away, NJ: IEEE Press.

60. A. Toffolo and E. Benini. Genetic diversity as an objective in multiobjectiveevolutionary algorithms. Evolutionary Computation, 11(2): 151 - 167, 2003.

61. A. Waibel, H. Sawai, and K. Shikano. Modularity and scaling in large phone-mic neural networks. IEEE Trans. Acoustics, Speech & Signal Processing,37(12): 1888 - 1897, 1989.

62. A. S. Weigend. On overfitting and the effective number of hidden units. InProc. 1993 Connectionist Models Summer School, M. C. Mozer, P. Smolen-sky, D. S. Touretzky, J. L. Elman, and A. S. Weigend, Eds., 335 - 342, 1994.Hillsdale, NJ: Lawrence Erlbaum.

63. D. H. Wolpert. Stacked Generalization. Neural Networks, 5(2): 241 - 259,1992.

64. D. H. Wolpert and W. G. Macready. No free lunch theorems for optimiza-tion. IEEE Trans. Evolutionary Computation, 1(1): 67 - 82, 1997.

65. E. Zitzler, M. Laumanns, and L. Thiele. SPEA2: Improving the strength


Pareto evolutionary algorithm. In Proc. Evolutionary Methods for Design,Optimization and Control with Applications to Industrial Problems (EURO-GEN), 2001.

66. E. Zitzler and L. Thiele. Multiobjective Evolutionary Algorithms: a Com-parative Case Study and the Strength Pareto Approach. IEEE Trans. Evo-lutionary Computation, 3: 257 - 271, 1999.

67. E. Zitzler, L. Thiele, M. Laumanns, C. M. Fonseca, and V. G. da-Fonseca.Performance assessment of multiobjective optimizers : an analysis and re-view. IEEE Trans. Evolutionary Computation, 7(2): 117 - 132, 2003.

CHAPTER 18

GENERALIZED ANALYSIS OF PROMOTERS:A METHOD FOR DNA SEQUENCE DESCRIPTION

R. Romero Zaliza, I. Zwir6 and E. Ruspinic

a Department of Computer ScienceFacultad de CienciasExactas y Naturales

Universidad of Buenos AiresBuenos Aires, Argentina

E-mail: rromeroQdc.uba.ar

Department of Molecular MicrobiologyWashington University School of Medicine

St. Louis, Missouri, U.S.A.Email:[email protected]. edu g

cArtificial Intelligence CenterSRI International

Menlo Park, California, U.S.A.E-mail: [email protected]

Recent advances in the accessibility of databases containing represen-tations of complex objects—exemplified by repositories of time-seriesdata, information about biological macromolecules, or knowledge aboutmetabolic pathways—have not been matched by availability of tools thatfacilitate the retrieval of objects of particular interest while aiding to un-derstand their structure and relations. In applications such as the analy-sis of DNA sequences, on the other hand, requirements to retrieve objectson the basic of qualitative characteristics are poorly met by descriptionsthat emphasize precision and detail rather than structural features.

This chapter presents a method for identification of interesting qual-itative features in biological sequences. Our approach relies on a gen-eralized clustering methodology, where the features being sought corre-spond to the solutions of a multivariable, multiobjective optimization

s Corresponding author. Currently at Departamento de Ciencias de la Computation eI.A., E.T.S. Ingenien'a Informatica, C/ Daniel Saucedo Aranda s/n, 18071 Granada,Espafia. Tel:(34) 958-240469, Email:[email protected].

427

428 R. Romero Zaliz et al.

problem and generally correspond to fuzzy subsets of the object beingrepresented. Foremost among the optimization objectives being consid-ered are measures of the degree by which features resemble prototypicalstructures deemed to be interesting by database users. Other objectivesinclude feature distance and, in some cases, performance criteria relatedto domain-specific constraints.

Genetic-algorithm methods are employed to solve the multiobjectiveoptimization problem. These optimization algorithms discover candidatefeatures as subsets of the object being described that lie in the set ofall Pareto-optimal solutions—of that problem. These candidate featuresare then inter-related employing domain-specific relations of interest tothe end users.

We present results of the application of a method termed GeneralizedAnalysis of Promoter (GAP) to identify one of the most important fac-tors involved in the gene regulation problem in bacteria, which is crucialfor detecting regulatory behaviors or genetic pathways as well as genetranscription: the RNA polymerase motif. The RNA polymerase or pro-moter motif presents vague submotifs linked by different distances, thus,making its recognition in DNA sequences difficult. Moreover, multiplepromoter motifs can be present in the same regulatory regions and all ofthem can be potential candidates until experimental mutagenesis is per-formed. GAP is available for public use in http://soar-tools.wustl.edu.

18.1. Introduction

One of the big challenges of the post genomic era is determining when,where and for how long genes are turned on or off4. Gene expression isdetermined by protein-protein interactions among regulatory proteins andwith RNA polymerase, and protein-DNA interactions of these trans-actingfactors with cis-acting DNA sequences in the promoters of regulated genes22'7. Therefore, identifying these protein-DNA interactions, by means ofthose DNA motifs that characterize the regulatory factors that operatein the transcription of a gene12'23, becomes crucial for determining whichgenes participate in a regulation process, how they behave and how are theyconnected to build genetic networks. The RNA polymerase or promoter isan enzyme that transcribes a gene or recruits other regulatory factors tointeract with it, producing cooperative regulations 22. Different computa-tional methods have been applied to discover promoter motifs or patterns9,11,15,2,12 However, most of them failed to provide accurate predictions inprokaryotic promoters because of the variability of the pattern, which com-prises more than one vague submotif and variable distances between them.Moreover, multiple occurrences of promoters in the same regulatory regionof one gene can be found (e.g. different promoters can be used for gene

Generalized Analysis of Promoters: A Method for DNA Sequence Description 429

activation and repression, or can interact with different regulatory factorsfrom the same regulatory pathway 19 '17).

This paper presents a method termed Generalized Analysis of Promot-ers (GAP), which applies generalized clustering techniques 27>35 to the dis-covery of qualitative features in complex biological sequences, particularlymultiple promoters in bacterial genomes. The motivation for the devel-opment of this methodology is provided by requirements to search andinterpret databases containing representations of this type of objects interms that are close to the needs and experience of the users of thosedata-based descriptions. These qualitative features include both interest-ing substructures and interesting relations between those structures, wherethe notion of interestingness is provided by domain experts by meansof abstract qualitative models or learned from available databases. TheGAP method represents promoter features as fuzzy logic expressions withfuzzy predicates, whose membership functions are learned from probabilis-tic distributions28'21'36. The proposed method takes adventage of a newdeveloped Multi-Objective Scatter Search (MOSS) algorithm to identifymultiple promoters occurrences within genomic regulatory regions by opti-mizing multiple criteria that those features that describe promoters shouldsatisfy. This methodology formalizes previous attempts to produce exhaus-tive searches of promoters12, most of which emphasize the processing ofdetailed system measurements rather than that of qualitative features ofdirect meaning to users (called perceptions by Zadeh) 32.

Therefore, this chapter is organized as follows: Section 2 describes thegeneralized clustering framework; Section 18.3 explains the problem of dis-coverying and describing bacterial promoters; Section 18.4 applies the GAPmethod to the promoter discovery problem in Escherichia coli (E. coli)genome; Section 18.5, shows the results obtained by the proposed methodand its evaluation; and Section 18.6 summarizes the concluding remarks.

18.2. Generalized Clustering

The method presented in this paper belong to a family of techniques forthe discovery of interesting structures in datasets by classification of itspoints into a finite number of fuzzy subsets, or fuzzy clustering. index-Fuzzy clustering methods were introduced by Ruspini25 to provide a richerrepresentation scheme, based on a flexible notion of partition, for the sum-marization of dataset structure, and to take advantage of the ability of


continuous-analysis techniques to express and treat classification problemsin a formal manner.

In Ruspini's original formulation the clustering problem was formulatedas a continuous-variable optimization problem over the space of fuzzy par-titions of the dataset. This original formulation of the clustering problem asan optimization problem has been largely retained in various extensions ofthe approach, which differ primarily on the nature of the functionals beingoptimized and on the constraints that the partition must satisfy3.

The original approach proposed by Ruspini, however, focused on thedetermination of the clustering as a whole, i.e., a family of fuzzy subsetsof the dataset providing a disjoint, exhaustive partition of the set into in-teresting structures. Recent developments, however, have emphasized thedetermination of individual clusters as fuzzy subsets having certain optimalproperties. From this perspective, a fuzzy clustering is a collection of opti-mal fuzzy clusters—that is, each cluster is optimal in some sense and thepartition satisfies certain conditions—rather than an optimal partition—that is, the partition, as a whole, is optimal in the sense that it minimizessome predefined functional defining classification quality. Redirecting thefocus of the clustering process to the isolation of individual subsets hav-ing certain desirable properties provides also a better foundation for thedirect characterization of interesting structure while freeing the clusteringprocess from the requirement that clusters be disjoint and that partitionsbe exhaustive.

In the context of image-processing applications, for example, featuresmay correspond to certain interesting prototypical shapes. In these appli-cations not every image element may belong to an interesting feature whilesome points might belong to more than one cluster (e.g., the intersectionof two linear structures). It was, indeed, in the context of image-processingapplications that Krishnapuram and Keller14 reformulated the fuzzy clus-tering problem so as to permit the sequential isolation of clusters. Thismethodology, called possibilistic clustering, does not rely, like previous ap-proaches, on prior knowledge about the number of clusters while permittingto take full advantage of clustering methods based on the idea of prototype.

Prototype-based classification methods3 are based on the idea that adataset could be represented, in a compact manner, by a number of pro-totypical points. The well-known fuzzy c-means method of Bezdek—theearliest fuzzy-clustering approach exploiting this idea—seeks to describea dataset by a number of prototypical points lying in the same domainas the members of that dataset. Extensions of this basic idea based on a


generalization of the notion of prototypical structure in a variety of ways(e.g., as line or curve segments in some euclidean space) are the basis formethods that seek to represent datasets in terms of structures that havebeen predefined as being of particular interest to those seeking to under-stand the underlying physical systems being studied. Generally speaking,however, these methods require that prototypical structures belong to cer-tain restricted families of objects so as to exploit their structural properties(e.g., the linear structure of line segments or hyperplane patches).

The generalized clustering methodology presented in this paper belongsto this type of approaches, extending them by consideration of arbitrarydefinitions of interesting structures provided by users by means of a familyof parameterized models M = [Ma] and a set of relations between them26,35 j n addition to a variety of geometric structures, these models may alsobe described by means of structures (e.g., neural networks) learned fromsignificant examples of the features being defined or in terms of very generalconstraints that features might satisfy to some degree (soft or fuzzy con-straints). As is the case with possibilistic clustering methods, our approachis based on the formulation of the qualitative-feature identification prob-lem in terms of the optimization of a continuous functional Q(F, Ma) thatmeasures the degree of matching between a fuzzy subset F of the datasetand some instantiation Ma of the family of interesting models27.

Our approach recognizes, however, that simple reliance on optimizationof a single performance index Q would typically result in the generationof a large number of features with small extent and poor generalization asit is usually easier to match smaller subsets of the dataset than significantportions of it. For this reason, it is also necessary to consider, in addition tomeasures Q of representation quality, additional criteria S gauging the sizeof the structure being represented. In addition, it may also be necessary toconsider application-specific criteria introduced to assure that the result-ing features are valid and meaningful (e.g., constraints preventing selectivepicking of sample points so that they lie, for example, close to a line insample space).

This multiobjective problem might be treated by aggregation of the mul-tiple measures of feature desirability into a global measure of cluster quality

. A problem with this type of approach, which is close in spirit to mini-mum description length methods24, is the requirement to provide a-priorirelative weights to each one of the objectives being aggregated. It shouldbe clear that assignment of larger weight to measures Q of quality repre-sentation would lead to small features with higher degrees of matching to


models in the prototype families while, conversely, assigning higher weightsto measures S of cluster extent would tend to produce larger clusters albeitwith poor modeling ability. Ideally, a family of optimization problems, eachsimilar in character to the others but with different weights assigned toeach of the aggregated objectives, should be solved so as to produce a fullspectrum of candidate clusters.

Rather than following such a path—involving the solution of multipleproblems—our approach relies, instead, on a reformulation of the general-ized clustering problem as a multiobjective optimization problem involvingseveral measures of cluster desirability27. In this formulation, subsets ofthe dataset of potential interest are locally optimal in the Pareto sense,i.e., they are locally nondominated solutions of the optimization problem.11.Locally nondominated solutions of a multiobjective optimization problemare those points in feature space such that their neighbors do not havebetter objective values for all objectives while being strictly superior in atleast one of them, (i.e., a better value, for a neighbor, of some objectiveimplies a lower value of another). The set of these solutions is called thelocal Pareto-optimal or local effective frontier. We employ a multiobjectivegenetic algorithm (MGA)27 based on an extension of methods originallyproposed by Marti and Laguna 18>8 to solve this problem. This method isa particularly attractive tools to solve such complex optimization problemsbecause of their generality and their capability, stemming from applicationof multimodal optimization procedures, to isolate local optima.

18.3. Problem: Discovering Promoters in DNA Sequences

Biological sequences, such as DNA or protein sequences, are a good exampleof the type of complex objects that maybe described in terms of meaning-ful structural patterns. Availability of tools to discover these structures andto annotate the sequences on the basis of those discoveries would greatlyimprove the usefulness of these repositories that currently rely on methodsdeveloped on the basis of computational efficiency and representation accu-racy rather than on terms of structural and functional properties deemedto be important by molecular biologists.

An important example of biological sequences are prokaryotic promoterdata gathered and analyzed by many compilations 10'9>16 that reveal thepresence of two well conserved sequences or submotifs separated by vari-

hThe notions of proximity and neighborhood in feature space is application dependent.


able distances and a less conserved sequence. The variability of the distancebetween submotifs and their fuzziness, in the sense that they present sev-eral mismatches, hinder the existence of a clear model of prokaryotic core-promoters. The most representative promoters in E. coli (i.e. a70subunits)are described by the following conserved patterns:

(1) TTGACA: This pattern is an hexanucleotide conserved sequencewhose middle nucleotide is located approximately 35 pair of basesupstream of the transcription start site. The consensus sequence forthis pattern is TTGACA and the nucleotides reported in 16 reveal thefollowing nucleotide distribution: T69T7gG6iA56C54A54, where forinstance the first T is the most seen nucleotide in the first positionof the pattern and is present in 69 % of the cases. This pattern isoften called -35 region.

(2) TATA AT: This pattern is also an hexanucleotide conserved se-quence, whose middle nucleotide is located approximately 10 pairof bases upstream of the transcription start site. The consensus se-quence is TATAAT and the nucleotide distribution in this pattern isT77A76T6oA6iA56T82, which is often called -10 region1*.

(3) CAP Signal: In general, a pyrimidine (C or T) followed by a purine(A or G) compose the CAP Signal. This signal constitutes the tran-scription start site (TSS) of a gene.

(4) Distance(TTGACA, TATAAT). The distance between theTTGACA and TATAAT consensus submotifs follows a data dis-tribution between 15 and 21 pair of bases. This distance is criticalin holding the two sites at the appropriate distance for the geome-try of RNA polymerase 10.

The identification of the former RNA polymerase or promoters sites be-comes crucial to detect gene activation or repression, by the way in whichsuch promoters interact with different regulatory proteins (e.g. overlappingsuggest repression and distances of approximately 40 base pairs suggesttypical activation). Moreover, combining the promoter sites with other reg-ulatory sites 37 can reveal different types of regulation, harboring RNApolymerase alone, RNA polymerase recruiting other regulatory protein, orcooperative regulations among more than one regulator22. Different meth-ods have been used to identify promoters 30>15.2>9

) but several failed to per-form accurate predictions because of their lack of flexibility, by using crispinstead of fuzzy models for the submotifs (e.g., TATAAT or TTGACA 3 1),or restricting distances between submotifs to fixed values (e.g., 17 base


pairs12). The vagueness of the compound promoter motifs and the uncer-tainty of identifying which of those predicted sites correspond to a func-tional promoter can be completely solved only by performing mutagenesisexperiments22. Thus more accurate and interpretable predictions would beuseful in order to reduce the experiment costs and ease the researcherswork.

18.4. Biological Sequence Description Methods

In this paper we present results of the application of GAP to the discoveryof interesting qualitative features in DNA sequences based on those ideasdiscussed in Section 18.2. The notion of interesting feature is formally de-fined by means of a family of parameterized models M = {Ma} specifiedby domain experts27 who are interested in finding patterns such as epochdescriptors of individual or multiple DNA sequences. These idealized ver-sions of prototypical models are the basis for a characterization of clustersas cohesive sets that is more general than their customary interpretationas "subsets of close points." To address the promoter prediction problemwe take advantage of the ability of representing imprecise and incompletemotifs, the fuzzy sets representations flexibility and interpretability, andthe multi-objective genetic algorithms ability to obtain optimal solutionsusing different criteria.

Our proposed method GAP represents each promoter submotif (i.e., -10 and -35 regions and the distance that separates them) as fuzzy models,whose membership functions are learned from data distributions13'21. Inaddition, as a generalized clustering method, GAP considers the quality ofmatching with each promoter submotif model (Q), as well as the size ofthe promoter extend (5), by means of the distance between submotifs, asthe multiple objectives to be optimized. To do so, we used a Multi-objectiveScatter Search (MOSS) optimization algorithm 18'8, which obtains a setof multiple and optimal promoter descriptions for each promoter region.Moreover, the former matching is also considered by MOSS as a multi-modal problem, since there is more than one solution for each region. GAP,by using MOSS, overcomes other methods used for DNA motif discovery,such as Consensus/Patser based on weight probabilistic matrices (see Sec-tion 18.5), and provides the desired trade-off between accurate and inter-pretable solutions, which becomes particurary desirable for the end users.The extension of the original Scatter Search (SS) heuristic 18 uses the DNAregions where promoters should be detected as inputs and finds all optimal


relationships among promoter submotifs and distance models. In order toextend the original SS algorithm to a multi-objective environment we needto introduce some concepts6'5:

A multi-objective optimization problem is defined as:

Maximize Qm(x, Ma), m = 1,2,..., \M\;subject to gj(x) > 0, jg = l,2,...,J;

hk(x) = 0, k = l,2,...,K;xi < xi < x\ ,i — l,2,...,n.

where Ma is a generalized clustering model, \M\ corresponds to the numberof models and Qm the objectives to optimize, J to the number of inequal-ity constraints, K to the number of equality constraints and finally n isthe number of decision variables. The last set of constraints restrict eachdecision variable Xi to take a value within a lower x\ and an upper x\ 'bound. Specifically, we consider the following instantiations:

• \M\ = 3. We have three models: M^ and M\ are the models foreach of the boxes,TTGACA-box and TATAAT-box, respectively,and M\ corresponds to the distance between these two boxes (recallEquations 1 and 2, and Figure 18.1).

• \Q\ = 3. We have three objectives consisting of maximizingthe degree of matching to the fuzzy models (fuzzy membership):Ql{x,Mi),Q2{x,Ml) and Q3(x,Ml)

• J = 1. We have just one constraint g-\_: the distance between boxescan not be less than 15 and no more than 21 pair of bases.

• K = 0. No equality constraints needed.• Only valid solutions are kept in each generation.• The boxes can not be located outside the sequence searched, that

is, it can not start at negative positions or greater than the lengthof the query sequence.

Definition 8: A solution x is said to dominate solution y (x -< y), ifboth conditions 1 and 2 are true: (1) The solution x is no worse than yin all objectives: fi(x) ^ fi(y) for alH = 1,2,... ,M; (2) The solution xis strictly better than y in at least one objective: fj(x) < fj(y) for at leastone i 6 {1,2,..., M}. If x dominates the solution y it is also customary towrite that x is nondominated by y.


In order to code the algorithm, three different models were developed.Both submotif models were implemented by using their nucleotide con-sensus frequency as discrete fuzzy sets, whose membership function hasbeen learned from distributions13 The first model corresponding to theTATAAT-box was formulated as:

Ml = iHataat{x) =Ml(si)U...U/4(a;) (1)

where the fuzzy discrete set corresponding to the first nucleotide of thesubmotif To.77A076To.60A0.6iA0.56T0.82 was defined as n\(xi) = A/0.08+T/0.77 + G/0.12 + C/0.05, and the other fuzzy sets corresponding to po-sitions 2-6 were calculated in a similar way according to data distributionsfrom16. The second model corresponding to the TTGACA-box was de-scribed as:

Ml = littgaca (X) = n\ (Xl ) U ... U l4 (X) (2)

where the fuzzy crisp set corresponding to the first nucleotide of the submo-tif T0.69T0.79G0.51A0.56C0.54A0.54 was defined as tf(x) = A/0.12+T/0.69+G/0.13 + C/0.06 and the other fuzzy sets corresponding to positions 2-6were calculated in a similar way accordingly to data distributions from16.The union operation corresponds to fuzzy set operations21'13. The thirdmodel, i.e., the distance between the previous submotifs, was built as afuzzy set, whose triangular membership function M^ (see Figure 18.1) waslearnt from data distributions9 centered in 17, where the best value (one)is achieved. Therefore, the objective functions Qm correspond to the mem-bership to the former fuzzy models Ma.

Fig. 18.1. Graphical representation of M%

Combination Operator and Local Search. We used a block representation


to code each individual, where each block corresponds to one of the pro-moter submotifs (i.e., TATAAT-box or TTGACA-box). Particularly, eachblock was represented by two integers, where the first number correspondsto the starting point of the submotif, and the second one represents the sizeof the box (see Figure 18.2). The combination process was implemented as

Phenotype

ttgaca tataatgtttatttaatgtttacccccataaccacataatcgcgttacact

t tchar 6 char 29

Genotype

Gen 0 Gen 1[(6,6)] [(29,6)]

/i = 0.578595 h = 0.800000 f3 = 1.000000

Fig. 18.2. Example of the representation of an individual

a one-point combine operator, where the point is always located betweenboth blocks. For example, given chromosomes with two blocks A and B,and parents P = A\Bi and P' = A2B2, the corresponding siblings wouldbe S — A1B2 and 5' — A^B\. The local search was implemented as a searchfor nondominated solutions in a certain neighborhood. For example, a localsearch performed over the chromosome space involves a specified numberof nucleotides located on the left or right sides of the blocks composingthe chromosome. The selection process considers that a new mutated chro-mosome that dominates one of its parent will replace it, but if it becomesdominated by its ancestors no modification is performed. Otherwise, if thenew individual is not dominated by the nondominated population found sofar, it replaces its father only if it is located in a less crowded region (seeFigure 18.3).

Algorithm. We modified the original SS algorithm to allow multiple-objective solutions by adding the nondominance criterion to the solutionranking6. Thus, nondominated solutions were added to the set in any order,but dominated solutions were only added if no more nondominated solu-tions could be found. In addition to maintaining a good set of nondominatedsolutions, and to avoid one of the most common problems of multi-objectivealgorithms such as multi-modality6, we also kept track of the diversity of


the available solutions through all generations. Finally, the initial popula-tions were created randomly and unfeasible solutions corresponding to outof distance ranges between promoter submotifs (gi) were checked at eachgeneration. Figure 18.4 clearly illustrates the MOSS algorithm proposed inGAP.

1: Randomly select which block g in the representation of the individual c to applylocal search.

2: Randomly select a number n in [—neighbor,neighbor] and move the block g, nnucleotides. Notice that it can be moved upstream or downstream. Resulting blockwill be g' and resulting individual will be called c'.

3: if c' meets the restrictions then4: if c' dominates c then5: Replace c with c'6: end if7: if c' does not dominate c and c' is not dominated by c and c' is not dominated

by any solution in the Non-Dominated set then8: Replace c with c' if crowd(c') < crowd(c).9: end if

10: end if

Fig. 18.3. Local search

18.5. Experimental Algorithm Evaluation

The GAP method was applied to a set of known promoter sequences re-ported in9. In this work 261 promoter regions and 68 alternative solutions(multiple promoters) defined in9 for the corresponding sequences (totalizing329 regions) constituted the input of the method.

To evaluate the performance of GAP, we first compare the obtained re-sults with the ones retrieved by a typical DNA sequence analysis method,the Consensus/Patser n . Then, we compare the ability of MOSS withthe other two Multiobjective Evolutionary Algorithms (MOEAs), i.e., theStrength Pareto Evolutionary Algorithm (SPEA)33 and the (ju + A) Multi-Objective Evolutionary Algorithm (MuLambda)20.

All of the former MOEA algorithms share the same following properties:

• They store optimal solutions found during the search in an externalset.

• They work with the concept of Pareto dominance to assign fitnessvalues to the individuals of the population.


1: Start with P = 0. Use the generation method to build a solution and the localsearch method to improve it. If x 0 P then add x to P, else, reject x. Repeat untilP has the user specified size.

2: Create a reference set RefSet with 6/2 nondominated solutions of P and 6/2 solu-tions of P more diverse from the other 6/2. If there are not enough nondominatedsolutions to fill the 6/2, complete the set with dominated solutions.

3: NewSolution <— true4: while Exists a Solution not yet explored (NewSolution = true) do5: NewSolution <— false6: Generate subsets of RefSet where there is at least one nondominated solution

in each one.7: Generate an empty subset TV to store nondominated solutions.8: while subset to examine do9: Select a subset and mark it as examined.

10: Apply combination operators to the solutions in the set.11: Apply local search to each new solution x found after the combination process

as explained in Figure 18.3 and name it xb.12: if xb is nondominated by any x G N and xb £ N then13: Add xb to N.14: end if15: end while16: Add solutions y 6 N to P if there are no solution z £ P that dominates y.16: NewSolution <— true.17: end while

Fig. 18.4. MOSS algorithm

Particularly, SPEA is a well-known algorithm that has some specialfeatures 33, including:

• The combination of the above techniques in a single algorithm.• The determination of the fitness value of an individual by using the

solutions stored in the external population, where dominance fromthe current population becomes irrelevant.

• All individuals of the external set participate in the selection pro-cedure.

• A niching method is given to preserve diversity in the population.This method is based on Pareto optimality and does not require adistance parameter (e.g., the niche radius in a sharing function6).

MuLambda is a relative new algorithm with a very different design fromother Pareto approaches. This algorithm has the following characteristics20:

• It does not use any information from the dominated individualsof the population. Only nondominated individuals are kept fromgeneration to generation.


• The population size is variable.• It applies clustering to reduce the number of nondominated solu-

tions stored without destroying the features of the optimal Paretofront.

As we explained earlier, the MOSS approach has the following proper-ties:

• The local search is used to improve those solutions found duringthe execution of the algorithm.

• The diversity of the solutions is kept by including in every genera-tion a set of diverse solutions into the current population.

To compare the results obtained from the former three algorithms, weuse the same objective functions described in Section 18.4 and execute thesealgorithms 20 times with different seeds for each input sequence. A promoteris said to be found if it appears in, at least, one of the execution result sets.The parameters used in the experiments are listed in Table 18.68.

Parameter ValueNumber of generations 200RefSet 16Non-Dominated population size 300

Table 1. Parameters for algorithms

Our method overcomes Consensus/Patser11 by detecting the 93.1 %of the available promoters, while this method, based on weight matrices,identifies the 74 %. Moreover, GAP, by using MOSS also overcomes theother MOEA algorithms as it is illustrated in Table 18.69.

Original Alternative %originals %alternatives Total %totalMOSS 243 59 93.10% 86.76% 302 91.79%SPEA 217 43 83.14% 63.24% 260 79.03%{H + A) GA 223 52 85.44% 76.47% 275 83.59%

Table 2. Results with different Multi-Objective Genetic Algorithms for all sequences.The Original column indicates the number of conserved promoter locations reportedin the literature. The Alternative column indicates alternative locations also reportedin the literature


We should note that there exist more than one possible description foreach promoter region, as it is illustrated in Figure 18.5 for the Ada genereported in Harley & Reynolds compilation9. These alternative descriptionswere also found by MOSS in a higher percentage than the other methods(86.76 %). The complete set of results is illustrated in the Appendix.

g t t g g t t t t t g c g t g a t g g t g a c c g g g c a g c c t a a a g g c t a t c c t t

Fig. 18.5. Different solutions for the Ada sequence - Three different alternative locationsfor the preserved sequences were included in the final set of the MOSS method matchingwith the three alternatives reported in the literature

In addition to the number of promoters detected by using differentMOEA algorithms, we use two other functions C34 and D (see Equations3 and 4) to have a better understanding of each algorithm performance.

Definition 9: Let X',X" C X two set of decision vectors. The functionC maps the ordered pairs (X', X") to the [0,1] interval:

o e X ;3a € X : a ^ a }\

< - A ^ > ^ ; = j-^77j W

D(X', X") = \{a € X'; a" £ X" : a" £ a A aV a"}\ (4)

The value C(X',X") = 1 in the former definitions means that all solu-tions in X" are equal to or dominated by the solutions in X'. Its oppositevalue, C(X',X") = 0, represents the situation where no solutions in X"are covered by any solutions in X1. Both C(X',X") and C(X",X') mustbe considered since C(X',X") it is not necessary equal to 1 - C{X",X').Function D(X',X") counts the number of individuals in X' that do notdominate X" and are not found in X".

We show in Table 18.70 the average results obtained for the comparisonsamong the MOEA algorithms. The first Table measures the C(X', X"),andthe other measures the D(X',X"). These numbers were obtained by ex-ecuting the algorithms 20 times with different seeds and calculating theaverage value for both functions and sequences.

As we previously suggested, function D counts the number of nondom-inated individuals of an algorithm that were not found in the other twoMOEAs. The MOSS algorithm achieves the best value of D in all experi-ments, while SPEA and MuLambda present lower values. Moreover those


C{X',X") I MOSS I SPEA I ^ + A D(X',X") I MOSS I SPEA 1 jx + AMOSS - 0.538 0.360 MOSS - 14.204 12.977SPEA 0.013 - 0.054 SPEA 0.170 - 0.876fj, + X 0.029 0.349 - n + A 1.066 2.284

Table 3. Sequence results

results obtained by MOSS do not present much fluctuation between dif-ferent sequences. MOSS leads the rankings followed by MuLambda andSPEA in the last position of the table. In addition, the diversity of solu-tions found by MOSS is considerably better than the other two algorithms(aproximately seven times better according to the D value). Finally, MOSSbecomes the most robust algorithm by finding, on average, a specific pro-moter 16.81 times of the 20 runs. In contrast, SPEA obtains a promoter6.48 times of the total 20 runs and and MuLambda 9.33 of the times.

18.6. Concluding Remarks

Generalized-clustering algorithms—solving multivariable, multiobjective,optimization problems—provide effective tools to identify interesting fea-tures that help to understand complex objects such as DNA sequences. Wehave proposed GAP, a promoter recognition method that was tested bypredicting E.coli promoters. This method combines the advantages of fea-ture representation based on fuzzy sets and the searching abilities of multi-objective genetic algorithms to obtain accurate as well as interpretable solu-tions. Particularly, these kinds of solutions are the most useful ones for theend users. That is, they allow to detect multiple occurrences of promoters,shedding light on different putative transcription start sites. The ability offinding multiple promoters becomes more useful when the whole intergenicregions are considered, allowing to predict distinct regulatory activities,harboring activation or repression. The present approach can be extendedto identify other DNA motifs, which are also conected by variable distances,such as binding sites of transcriptional regulators (e.g., direct or invertedrepeats). Therefore, by combining multiple and heterogeneous DNA motifs(e.g., promoters, binding sites, etc.), we can obtain different descriptionsof the cis-acting regions and, thus, different regulatory environments. Thepresent implementation of GAP is available for academic use in the SOAR-TOOLS web site (http://soar-tools.wustl.edu) and will be updated soonwith a new dataset from RegulonDB database29 (in process).


Appendix

Tables 18.71 through 18.74 illustrate the set of solutions found by GAP byconsidering the set of promoter examples published in 9. The last column ofthe tables indicates whether the GAP recognized the promoter or not by thesimbols / and D, respectively. The first column corresponds to the nameof the sequence, the second column shows the beginning character positionof the TTGACA-box, and the third column shows the character positionwhere the TATAAT-box begins. These positions are the ones recognized byGAP. Only one result for each sequence is shown due to space limitations.The fourth column corresponds to the sequence itself with each of the boxesclearly depicted.

References

1. T. Back, D. Fogel, and Z. Michalewicz, Eds. Handbook of Evolutionary Com-putation. Institute of Physics Publishing and Oxford University Press, 1997.

2. L. Bailey and C. Elkan T. The value of prior knowledge in discovering motifswith meme. In Proc Int Conf Intell Syst Mol Biol, volume 3, pages 21-29,1995.

3. J. C. Bezdek. Fuzzy clustering. In Handbook of Fuzzy Computation. E. H.Ruspini, P. P. Bonissone, & W. Pedrycz, Eds.: F6.2. Institute of PhysicsPress, 1998.

4. S. Brenner. Genomics. the end of the beginning. Science, 287(5461):2173-2179, 2000.

5. C. Coello Coello, D. Van Veldhuizen, and G. Lamont. Evolutionary Algo-rithms for Solving Multi-Objective Problems. Kluwer Academic Publishers,New York, May 2002.

6. K. Deb. Multi-Objective Optimization using Evolutionary Algorithms. JohnWiley & Sons, 2001.

7. M. Gibson and E. Mjolsness. Computational Modeling of Genetic and Bio-chemical Networks, chapter Modeling the Activity of Single Genes. The MITPress, 2001.

8. D. E. Goldberg. Genetic Algorithms in Search Optimization and MachineLearning. Addison-Wesley, 1989.

9. C. B. Harley and R. P. Reynolds. Analysis of e.coli promoter sequences.Nucleic Acids Research, 15(5):2343-2361, 1987.

10. D. K. Hawley and W. R. McClure. Compilation and analysis of escherichiacoli promoter DNA sequences. Nucleic Acids Research, ll(8):2237-2255,1983.

11. G. Z. Hertz and G. D. Stormo. Identifiying DNA and protein patternswith statistically significant alignments of multiple sequences. Bioinformat-ics, 15(7/8):563-577, 1999.

12. A. M. Huerta and J. Collado-Vides. Sigma70 promoters in escherichia coli:


specific transcription in dense regions of overlapping promoter-like signals. JMol Biol., 333(2):261-278, 2003.

13. G. J. Klir and T. A. Folger. Fuzzy sets, uncertainty, and information. PrenticeHall International, 1988.

14. R. Krishnapuram and J. Keller. A possibilistic approach to clustering. IEEETransactions on Fuzzy Systems, 98-110, 1993.

15. C. E. Lawrence, S. F. Altschul, M. S. Bogurski, J. S. Liu, A. F. Neuwald, andJ. C. Wootton. Detecting subtle sequence signals: a gibbs sampling strategyfor multiple alignment. Science, 262(5131):208-214, 1993.

16. S. Lisser and H. Margalit. Compilation of e.coli mrna promoter sequences.Nucleic Acids Research, 21(7):1507-1516, 1993.

17. B. Magasanik J. Collado-Vides, and J. D. Gralla. Control site location andtranscriptional regulation in escherichia coli. Microbiol Rev, 55(3):371-394,1991.

18. R. Marti M. Laguna. Scatter Starch: Methodology and Implementations inC. Kluwer Academic Publishers, Boston, 2003.

19. C. Mouslim, T. Latin and E. A. Groisman. Signal-dependent requirement forthe co-activator protein rcsa in transcription of the rcsb-regulated ugd gene.J Biol Chem, 278(50):50588-95, 2003.

20. C. Newton R. Sarker, K. Liang. A new multiobjective evolutionary algorithm.European Journal of Operational Research, 140:12-23, 2002.

21. W. Pedrycz, P. P. Bonissone and E. H. Ruspini. Handbook of fuzzy computa-tion. Institute of Physics, 1998.

22. M. Ptashne and A. Gann. Genes and signals. Cold Spring Harbor LaboratoryPress, 2002.

23. M. G. Reese. Application of a time-delay neural network to promoter an-notation in the drosophila melanogaster genome. Computers & Chemistry,26(1):51 56, 2002.

24. J. Rissanen. Stochastic Complexity in Statistical Inquiry. World Scientific,1989.

25. E. H. Ruspini. A new approach to clustering. Information and Control 15:1,22-32, 1969.

26. E. H. Ruspini and I. Zwir. Automated qualitative description of measure-ments. In Proc. 16th IEEE Instrumentation and Measurement TechnologyConf, 1999.

27. E. H. Ruspini and I. Zwir. Automated Generation of Qualitative Represen-tations of Complex Object by Hybrid Soft-computing Methods. In PatternRecognition: From Classical to Modern Approaches. S. K. Pal &: A. Pal, Eds.World Scientific Company, Singapore, 2001.

28. E. H. Ruspini and I. Zwir. Automated generation of qualitative represen-tations of complex object by hybrid soft-computing methods. In S. K. Paland A. Pal, editors, Lecture Notes in Pattern Recognition. World ScientificCompany, 2001.

29. H. Salgado et al. Regulondb (version 3.2): transcriptional regulation andoperon organisation in escherichia coli k-12. Nucleic Acids Research, 29:72-74, 2001.


30. A. Ulyanov and G. Stormo. Multi-alphabet consensus algorithm for identi-fication of low specificity protein-DNA interactions. Nucleic Acids Research,23(8):1434-1440, 1995.

31. J. van Helden, B. Andre and J. Collado-Vides. A web site for the computa-tional analysis of yeast regulatory sequences. Yeast, 16(2):177—187, 2000.

32. L. A. Zadeh. Outline of a Computational Theory of Perceptions Based onComputing with Words. In Soft Computing and Intelligent Systems: Theoryand Applications. N .K. Sinha, M. M. Gupta & L. A. Zadeh, Eds.: 3-22.Academic Press, San Diego, 2000.

33. E. Zitzler and L. Thiele. Multiobjective Evolutionary Algorithms: A compar-ative Case Study and the Strength Pareto. IEEE Transactions on Evolution-ary Computation, 3:4, 257-271, 1999.

34. E. Zitzler, L. Thiele, and K. Deb. Comparison of multiobjective evolutionaryalgorithms: Empirical results. Evolutionary Computation 8:2, 173-195, 2000.

35. I.Zwir and E.H.Ruspini. Qualitative Object Description: Initial Reports ofthe Exploration of the Frontier. In Proc. of the EUROFUSE-SIC99. Bu-dapest, Hungary, 485-490, 1999.

36. I. Zwir, R. Romero Zaliz and E. H. Ruspini. Automated biological sequencedescription by genetic multiobjective generalized clustering. Ann N Y AcadSci, 980:65-82, 2002.

37. I. Zwir, P. Traverso, and E. A. Groisman. Semantic-oriented analysis of reg-ulation: the phop regulon as a model network. In Proceedings of the 3rdInternational Conference on Systems Biology (ICSB), 2003.


sequence I ttgaca tataat 1 promoter | found

ada - - AGCGGCTAAAGGTG TTGACG TGCGAGAA ATGTTTAGC TAAACT TCTCTCATGTG •alaS 15 39 AACGCATACGGTAT TTTACC TTCCCAGTC AAGAAAACT TATCTT ATTCCCACTTTTCAGT •/ampC 15 37 TGCTATCCTGACAG TTGTCA CGCTGATT GGTGTCGT TACAAT CTAACGCATCGCCAATG /"ampC/C16 7 30 GCTATC TTGACA GTTGTCAC GCTGATTGG TATCGT TACAATCTAACGTATCG SaraBAD 15 37 TTAGCGGATCCTAC CTGACG CTTTTTAT CGCAACTC TCTACT GTTTCTCCATACCCGTT SaraC 15 38 GCAAATAATCAATG TGGACT TTTCTGCC GTGATTATA GACACT TTTGTTACGCGTTTTTG •/araE 12 37 CTGTTTCCGAC CTGACA CCTGCGTGA GTTGTTCACG TATTTT TTCACTATGTCTTACTC •/aral(c) 13 35 AGCGGATCCTAC CTGGCG CTTTTTAT CGCAACTC TCTACT GTTTCTCCATACCCGTT •/aralfc)X(c) 13 37 AGCGGATCCTAC CTGGCG CTTTTTATC GCAACTCTC TACTAT TTCTCCATACCCGTTTT SargCBH 15 39 TTTGTTTTTCATTG TTGACA CACCTCTGG TCATGATAG TATCAA TATTCATGCAGTATT • /argCBH-Pl/6- 15 36 TTTGTTTTTCATTG TTGACA CACCTCT GGTCATAA TATTAT CAATATTCATGCAGTAT •/argCBH-Pl/LL 15 36 TTTGTTTTTCATTG TTGACA CACCTCT GGTCATGA TATTAT CAATATTCATGCAGTAT SargE-Pl 15 38 TTACGGCTGGTGGG TTTTAT TACGCTCA ACGTTAGTG TATTTT TATTCATAAATACTGCA SargE-P2 15 38 CCGCATCATTGCTT TGCGCT GAAACAGT CAAAGCGGT TATGTT CATATGCGGATGGCG SargE/LL13 15 38 CCGCATCATTGCTT TGCGCT GAAACAGT CAAAGCGGT TATATT CATATGCGGATGGCG •/argF 15 38 ATTGTGAAATGGGG TTGCAA ATGAATAA TTACACATA TAAAGT GAATTTTAATTCAATAA •/argl 7 30 TTAGAC TTGCAA ATGAATAA TCATCCATA TAAATT GAATTTTAATTCATTGA •/argR 12 35 TCGTCGCCGCG TTGCAG GAGCAAGG CTTTGACAA TATTAA TCAGTCTAAAGTCTCGG •"aroF 15 37 TACGAAAATATGGA TTGAAA ACTTTACT TTATGTGT TATCGT TACGTCATCCTCGCTG •/aroG 15 38 AGTGTAAAACCCCG TTTACA CATTCTGA CGGAAGATA TAGATT GGAAGTATTGCATTCA •/aroH 15 37 GTACTAGAGAACTA GTGCAT TAGCTTAT TTTTTTGT TATCAT GCTAACCACCCGGCGAG •/bioA 15 39 GCCTTCTCCAAAAC GTGTTT TTTGTTGTT AATTCGGTG TAGACT TGTAAACCTAAATCT •/bioB 15 38 TTGTCATAATCGAC TTGTAA ACCAAATT GAAAAGATT TAGGTT TACAAGTCTACACCGAA •/bioP98 15 38 TTGTTAATTCGGTG TAGACT TGTAAACC TAAATCTTT TAAATT TGGTTTACAAGTCGAT •/C62.5-P1 - - CACCTGCTCTCGC TTGAAA TTATTCTC CCTTGTCCC CATCTC TCCCACATCCTGTTTT DcarAB-Pl 15 38 ATCCCGCCATTAAG TTGACT TTTAGCGC CCATATCTC CAGAAT GCCGCCGTTTGCCAGA ScarAB-P2 15 39 TAAGCAGATTTGCA TTGATT TACGTCATC ATTGTGAAT TAATAT GCAAATAAAGTGAG •/cat 13 36 ACGTTGATCGGC ACGTAA GAGGTTCC AACTTTCAC CATAAT GAAATAAGATCACTACC •/cit.util-379 - - AAACAGGCGGGG GTCTCA GGCGACTAA CCCGCAAAC TCTTAC CTCTATACATAATTCTG Dcit.util-431 14 38 GACAGGCACAGCA TTGTAC GATCAACTG ATTTGTGCC AATAAT TAAATGAAATCAC •/CloDFcloacin 15 37 TCATATATTGACAC CTGAAA ACTGGAGG AGTAAGGT AATAAT CATACTGTGTATATAT •/CtoDFnal 15 39 ACACGCGGTTGCTC TTGAAG TGTGCGCCA AAGTCCGGC TACACT GGAAGGACAGATTTGG •/colEl-B 15 36 TTATAAAATCCTCT TTGACT TTTAAAA CAATAAGT TAAAAA TAAATACTGTAA •/colEl-C 15 37 TTATAAAATCCTCT TTGACT TTTAAAAC AATAAGTT AAAAAT AAATACTGTACATATAA ScolEl-Pl 15 38 GGAAGTCCACAGTC TTGACA GGGAAAAT GCAGCGGCG TAGCTT TTATGCTGTATATAAAA /"colEl-P2 15 37 TTTTTAACTTATTG TTTTAA AAGTCAAA GAGGATTT TATAAT GGAAACCGCGGTAGCGT SColEllO-13 13 37 GCTACAGAGTTC TTGAAG TAGTGGCCC GACTACGGC TACACT AGAAGGACAGTATTTGG •/colicinElP3 15 37 TTTTTAACTTATTG TTTTAA AAGTCAAA GAGGATTT TATAAT GGAAACCGCGGTAGCGT •/crp 15 38 AAGCGAGACACCAG GAGACA CAAAGCGA AAGCTATGC TAAAAC AGTCAGGATGCTACAG •/cya 15 38 GTAGCGCATCTTTC TTTACG GTCAATCA GCAAGGTGT TAAATT GATCACGTTTTAGACC •/dapD - - AAGTGCATCAGCGG TTGACA GAGGCCCTC AATCCAAAC GATAAA GGGTGATGTGTTTACTG •deo-Pl 14 39 CAGAAACGTTTTA TTCGAA CATCGATCT CGTCTTGTGT TAGAAT TCTAACATACGGTTGC •/deo-P2 10 35 TGATGTGTA TCGAAG TGTGTTGCG GAGTAGATGT TAGAAT ACTAACAAACTCGCAA Sdeo-P3 15 37 ACACCAACTGTCTA TCGCCG TATCAGCG AATAACGG TATACT GATCTGATCATTTAAA •/divE 15 38 AAACAAATTAGGGG TTTACA CGCCGCAT CGGGATGTT TATAGT GCGCGTCATTCCGGAAG •/

Table 4. Results for the training sequences


sequence ttgaca tataat promoter founddnaA-lp 15 39 TGCGGCGTAAATCG TGCCCG CCTCGCGGC AGGATCGTT TACACT TAGCGAGTTCTGGAAA 7dnaA-2p 15 38 TCTGTGAGAAACAG AAGATC TCTTGCGC AGTTTAGGC TATGAT CCGCGGTCCCGATCG •/dnaK-Pl 15 39 1TTGCATCTCCCCC TTGATG ACGTGGTTT ACGACCCCA TTTAGT AGTCAACCGCAGTG •"dnaK-P2 15 37 ATGAAATTGGGCAG TTGAAA CCAGACGT TTCGCCCC TATTAC AGACTCACAACCACA SdnaQ-Pl 15 37 GCCAGCGCTAAAGG TTTTCT CGCGTCCG CGATAGCG TAAAAT AGCGCCGTAACCCC •/Fpla-oriTpX 15 38 GAACCACCAACCTG TTGAGC CTTTTTGT GGAGTGGGT TAAATT ATTTACGGATAAAG SFplas-traM 15 38 ATTAGGGGTGCTGC TAGCGG CGCGGTGT GTTTTnTA TAGGAT ACCGCTAGGGGCGCTG •Fplas-traY/Z 14 37 GCGTTAATAAGGT GTTAAT AAAATATA GACTTTCCG TCTATT TACCTTTTCTGATTATT SfrdABGD 12 34 GATCTCGTCAA ATTTCA GACTTATC GATCAGAC TATACT GTTGTACCTATAAAGGA SfumA 15 38 GTACTAGTCTCAGT TTTTGT TAAAAAAG TGTGTAGGA TATTGT TACTCGCTTTTAACAGG ^7-5-tnpA 15 38 ACACATTAACAGCA CTGTTT TTATGTGT GCGATAATT TATAAT ATTTCGGACGGTTGCA •/7-<5-tnpR 14 36 ATTCATTAACAAT TTTGCA ACCGTCCG AAATATTA TAAATT ATCGGACACATAAAAAC Sgal-Pl 15 38 TCCATGTCACACTT TTCGCA TCTTTGTT ATGCTATGG TTATTT CATACCATAAG •/gal-P2 15 37 CTAATTTATTCCAT GTCACA CTTTTCGC ATCTTTGT TATGCT ATGGTTATTTCATACC •/gal-P2/mut-l 14 36 TAATTTATTCCAT GTCAGA CTTTTCGC ATCTTTGT TATACT ATGGTTATTTCATAC •/gal-P2/mut-2 14 36 TAATTTATTCCAT GTCACA CTTTTCGC ATTTTTGT TATGCT ATGGTTATTTCATAC SglnL 15 40 CAATTCTCTGATGC TTCGCG CTTTTTATC CGTAAAAAGC TATAAT GCACTAAATGGTGC Sgin 15 38 TAAAAAACTAACAG TTGTCA GCCTGTCC CGCTTATAA GATCAT ACGCCGTTATACGTT •/gltA-Pl 15 37 ATTCATTCGGGACA GTTATT AGTGGTAG ACAAGTTT AATAAT TCGGATTGCTAAGTA SgltA-P2 15 39 AGTTGTTACAAACA TTACCA GGAAAAGCA TATAATGCG TAAAAG TTATGAAGTCGGT •/glyA 15 38 TCCTTTGTCAAGAC CTGTTA TCGCACAA TGATTCGGT TATACT GTTCGCCGTTGTCC SglyA/geneX 15 39 ACACCAAAGAACCA TTTACA TTGCAGGGC TATTTTTTA TAAGAT GCATTTGAGATACAT •/gnd 15 38 GCATGGATAAGCTA TTTATA CTTTAATA AGTACTTTG TATACT TATTTGCGAACATTCCA SgroE - - TTTTTCCCCC TTGAAG GGGCGAAG CCATCCCCA TTTCTC TGGTCACCAGCCGGGAA •gyrB 11 38 CGGACGAAAA TTCGAA GATGTTTACCGTGGAAAAGGG TAAAAT AACGGATTAACCCAAGT Shis 14 38 ATATAAAAAGTTC TTGCTT TCTAACGTG AAAGTGGTT TAGGTT AAAAGACATCAGTTGAA yhis A 15 38 GATCTACAAACTAA TTAATA AATAGTTA ATTAACGCT CATCAT TGTACAATGAACTGTAC •/hisBp 15 38 CCTCCAGTGCGGTG TTTAAA TCTTTGTG GGATCAGGG CATTAT CTTACGTGATCAG •/hisJ(St) 15 37 TAGAATGCTTTGCC TTGTCG GCCTGATT AATGGCAC GATAGT CGCATCGGATCTG •"hisS 15 38 AAATAATAACGTGA TGGGAA GCGCCTCG CTTCCCGTG TATGAT TGAACCCGCATGGCTC •/htpR-Pl 15 38 ACATTACGCCACTT ACGCCT GAATAATA AAAGCGTGT TATACT CTTTCCTGCAATGGTT •/htpR-P2 15 39 TTCACAAGCTTGCA TTGAAC TTGTGGATA AAATCACGG TCTGAT AAAACAGTGAATG •/htpR-P3 15 38 AGCTTGCATTGAAC TTGTGG ATAAAATC ACGGTCTGA TAAAAC AGTGAATGATAACCTCGT •"ilvGEDA 15 38 GCCAAAAAATATCT TGTACT ATTTACAA AACCTATGG TAACTC TTTAGGCATTCCTTCGA SilvIH-Pl 14 37 CTCTGGCTGCCAA TTGCTT AAGCAAGA TCGGACGGT TAATGT GTTTTACACATTTTTTC •/ilvIH-P2 15 38 GAGGATTTTATCGT TTCTTT TCACCTTT CCTCCTGTT TATTCT TATTACCCCGTGT •/ilvIH-P3 14 37 ATTTTAGGATTAA TTAAAA AAATAGAG AAATTGCTG TAAGTT GTGGGATTCAGCCGATT •/ilvIH-P4 15 38 TGTAGAATTTTATT CTGAAT GTCTGGGC TCTCTATTT TAGGAT TAATTAAAAAAATAGAG •/ISlins-PL 15 37 CGAGGCCGGTGATG CTGCCA ACTTACTG ATTTAGTG TATGAT GGTGTTTTTGAGGTGCT SISlins-PR 13 36 ATATATACCTTA TGGTAA TGACTCCA ACTTATTGA TAGTGT TTTATGTTCAGATAAT •/IS2I-II 7 30 GATGTC TGGAAA TATAGGGG CAAATCCAC TAGTAT TAAGACTATCACTTATT •/lad 15 38 GACACCATCGAATG GCGCAA AACCTTTC GCGGTATGG CATGAT AGCGCCCGGAAGAGAGT SlacPl 15 39 TAGGCACCCCAGGC TTTACA CTTTATGCT TCCGGCTCG TATGTT GTGTGGAATTGTGAGC •/lacP115 14 37 TTTACACTTTATG CTTCCG GCTCGTAT GTTGTGTGG TATTGT GAGCGGATAACAATTT VlacP2 15 38 AATGTGAGTTAGCT CACTCA TTAGGCAC CCCAGGCTT TACACT TTATGCTTCCGGCTCG •/lep 15 37 TCCTCGCCTCAATG TTGTAG TGTAGAAT GCGGCGTT TCTATT AATACAGACGTTAAT Sleu 2 25 G TTGACA TCCGTTTT TGTATCCAG TAACTC TAAAAGCATATCGCATT •/leuItRNA 15 37 TCG ATA ATT A ACT A TTGACG AAAAGCTG AAAACCAC TAGAAT GCGCCTCCGTGGTAGCA Slex 15 38 TGTGCAGTTTATGG TTCCAA AATCGCCT TTTGCTGTA TATACT CACAGCATAACTGTAT SlivJ 15 38 TGTCAAAATAGCTA TTCCAA TATCATAA AAATCGGGA TATGTT TTAGCAGAGTATGCT •/lpd 7 30 TTGTTG TTTAAA AATTGTTA ACAATTTTG TAAAAT ACCGACGGATAGAACGA SIpp 15 38 CCATCAAAAAAATA TTCTCA ACATAAAA AACTTTGTG TAATAC TTGTAACGCTACATGGA •/IppPl 13 37 ATCAAAAAAATA TTCTCA ACATAAAAA ACTTTGTGT TATACT TGTAACGCTACATGGA •/lppP2 13 37 ATCAAAAAAATA TTCTCA ACATAAAAA ACTTTGTGT TATAAT TGTAACGCTACATGGA •/lppRl 13 36 ATCAAAAAAATA TTCACA ACATAAAA A ACTTTGT GTAATA CTTGTAACGCTACATGGA •/Mima 15 38 ATGCGCAACGCGGG GTGACA AGGGCGCG CAAACCCTC TATACT GCGCGCCGAAGCTGACC •/macll 14 38 CCCCCGCAGGGAT GAGGAA GGTGGTCGA CCGGGCTCG TATGTT GTGTGGAATTGTGAGC •/macl2 14 38 CCCCCGCAGGGAT GAGGAA GGTCGGTCG ACCGGCTCG TATGTT GTGTGGAATTGTGAGC •/mac21 14 38 CCCCCGCAGGGAT GAGGAA GGTCGACCT TCCGGCTCG TATGTT GTGTGGAATTGTGAGC sTmac3 14 37 CCCCCGCAGGGAT GAGGAA GGTCGGTC GACCGCTCG TATGTT GTGTGGAATTGTGAGCG •/mac31 14 37 CCCCCGCAGGGAT GAGGAA GGTCGGTC GACCGCTCG TATATT GTGTGGAATTGTGAGCG •/malBFG 15 37 AGGGGCAAGGAGGA TGGAAA GAGGTTGC CGTATAAA GAAACT AGAGTCCGTTTAGGTGT • /malK 15 37 CAGGGGGTGGAGGA TTTAAG CCATCTCC TGATGACG CATAGT CAGCCCATCATGAATG •"malPQ 15 38 ATCCCCGCAGGATG AGGAAG GTCAACAT CGAGCCTGG CAAACT AGCGATAACGTTGTGT SmalPQ/A516Pl 12 34 ATCCCCGCAGG ATGAGG AGCCTGGC AAACTAGC GATGAT AACGTTGTGTTGAA •/malPQ/A516P2 15 39 ATCCCCGCAGGAGG ATGAGG AGCCTGGCA AACTAGCGA TAACGT TGTGTTGAAAA •/malPQ/A517/A 15 37 CCCCGCAGGATGAG GTCGAG CCTGGCAA ACTAGCGA TAACGT TGTGTTGAAAA •/malPQ/Ppl2 - - ATCCCCGCAGGAT GAGGAA GGTCAACA TCGAGCCTG GAAAAC TAGCGATAACGTTGTGT •malPQ/Ppl3 14 38 ATCCCCGCAGGAT TAGGAA GGTCAACAT CGAGCCTGG CAAACT AGCGATAACGTTGTGT - /malPQ/Ppl4 14 37 ATCCCCGCAGGAT GAGGAA GGTCAACA TCGAGCCTG GAAACT AGCGATAACGTTGTGT •/malPQ/Ppl5 14 38 ATCCCCGCAGGAT GAGAAA GGTCAACAT CGAGCCTGG CAAACT AGCGATAACGTTGTGT SmalPQ/PP16 15 38 ATCCCCGCAGGATA AGGAAG GTCAACAT CGAGCCTGG CAAACT AGCGATAACGTTGTGT •malPQ/Ppl8 15 38 ATCCCCGCAGGATG GGGAAG GTCAACAT CGAGCCTGG CAAACT AGCGATAACGTTGTGT •/malT 15 37 GTCATCGCTTGCAT TAGAAA GGTTTCTG GCCGACCT TATAAC CATTAATTACG •/manA 15 38 CGGCTCCAGGTTAC TTCCCG TAGGATTC TTGCTTTAA TAGTGG GATTAATTTCCACATTA •/metA-Pl 15 38 TTCAACATGCAGGC TCGACA TTGGCAAA TTTTCTGGT TATCTT CAGCTATCTGGATGT •/metA-P2 15 38 AAGACTAATTACCA TTTTCT CTCCTTTT AGTCATTCT TATATT CTAACGTAGTCTTTTCC •/metBL 12 35 TTACCGTGACA TCGTGT AATGCACC TGTCGGCGT GATAAT GCATATAATTTTAACGG SmetF 8 31 TTTTCGG TTGACG CCCTTCGG CTTTTCCTT CATCTT TACATCTGGACG •/



sequence I ttgaca tataat I promoter ) foundmicF 15 37 GCGGAATGGCGAAA TAAGCA CCTAACAT CAAGCAAT AATAAT TCAAGGTTAAAATCAAT •/motA 15 39 GCCCCAATCGCGCG TTAACG CCTGACGAC TGAACATCC TGTCAT GGTCAACAGTGGA •/MuPc-1 6 33 AAATT TTGAAA AGTAACTTTATAGAAAAGAAT AATACT GAAAAGTCAATTTGGTG •/MuPc-2 9 32 GGAACACA TTTAAA AACCCTCC TAAG1TTTG TAATCT ATAAAGTTAGCAATTTA SMuPe 15 38 TACCAAAAAGCACC TTTACA TTAAGCTT TTCAGTAAT TATCTT TTTAGTAAGCTAGCTA •/NRlrnaC 15 39 GTCACAATTCTCAA GTCGCT GATTTCAAA AAACTGTAG TATCCT CTGCGAAACGATCCCT SNRlrnaC/m 15 38 TCACAATTCTCAAG TTGCTG ATTTCAAA AAACTGTAG TATCCT CTGCGAAACGATCCCT •/NTPlrnalOO 11 35 GGAGTTTGTC TTGAAG TTATGCACC TGTTAAGGC TAAACT GAAAGAACAGATTTTGT SnusA 7 30 CAGTAT TTGCAT TTTTTACC CAAAACGAG TAGAAT TTGCCACGTTTCAGGCG •/ompA 12 34 GCCTGACGGAG TTCACA CTTGTAAG TTTTCAAC TACGTT GTAGACTTTAC SompC 15 38 GTATCATATTCGTG TTGGAT TATTCTGC ATTTTTGGG GAGAAT GGACTTGCCGACTG •/ompF 7 30 GGTAGG TAGCGA AACGTTAG TTTGAATGG AAAGAT GCCTGCAGACACATAAA SompF/pKI217 3 26 GG TAGCGA AACGTTAG TTTGCAAGC TTTAAT GCGGTAGTTTATCAC SompR 15 36 TTTCGCCGAATAAA TTGTAT ACTTAAG CTGCTGTT TAATAT GCTTTGTAACAATTT •/pl5primer 15 38 ATAAGATGATCTTC TTGAGA TCGTTTTG GTCTGCGCG TAATCT CTTGCTTGAAAACGAAA ^pl5rnal 15 39 TAGAGGAGTTAGTC TTGAAG TCATGCGCC GGTTAAGGC TAAACT GAAAGGACAAGTTTTG •/P22ant 15 38 TCCAAGTTAGTGTA TTGACA TGATAGAA GCACTCTAC TATATT CTCAATAGGTCCACGG •/P22mnt 15 38 CCACCGTGGACCTA TTGAGA ATATAGTA GAGTGCTTC TATCAT GTCAATACACTAACTT S± £i£x t\. l o or CATCTTAAA1 AAAC> IICiAOI AAAGA11C CTTTAGTA GATAAT TTAAGTGTiCTTTAAT wP22PRM 9 32 AAATTATC TACTAA AGGAATCT TTAGTCAAG TTTATT TAAGATGACTTAACTAT •/pBR3l3Htet 12 35 AATTCTCATGT TTGACA GCTTATCA TCGATAAGC TAGCTT TAATGCGGTAGTTTAT •/pColViron-Pl 15 38 TCACAATTCTCAAG TTGATA ATGAGAAT CATTATTGA CATAAT TGTTATTATTrTAC SpCo)Viron-P2 13 35 TGTTTCAACACC ATGTAT TAATTGTG TTTATTTG TAAAAT TAATTTTCTGACAATAA •/pEG3503 6 30 CTGGC TGGACT TCGAATTCA TTAATGCGG TAGTTT ATCACAGTTAA •/phiXA 15 38 AATAACCGTCAGGA TTGACA CCCTCCCA ATTGTATGT TTTCAT GCCTCCAAATCTTGGA SphiXB 15 39 GCCAGTTAAATAGC TTGCAA AATACGTGG CCTTATGGT TACAGT ATGCCCATCGCAGTT •/phiXD 15 39 TAGAGATTCTCTTG TTGACA TTTTAAAAG AGCGTGGAT TACTAT CTGAGTCCGATGCTGTT •/Iambdacl7 15 38 GGTGTATGCATTTA TTTGCA TACATTCA ATCAATTGT TATAAT TGTTATCTAAGGAAAT •/lambdacin 15 38 TAGATAACAATTGA TTGAAT GTATGCAA ATAAATGCA TACACT ATAGGTGTGGTTTAAT •/lambdaL57 14 37 TGATAAGCAATGC TTTTTT ATAATGCC AACTTAGTA TAAAAT AGCCAACCTGTTCGACA SlambdaPI 15 38 CGGTTTTTTCTTGC GTGTAA TTGCGGAG ACTTTGCGA TGTACT TGACACTTCAGGAGTG •/lambdaPL 15 38 TATCTCTGGCGGTG TTGACA TAAATACC ACTGGCGGT GATACT GAGCACATCAGCAGGA •/lambdaPo 15 38 TACCTCTGCCGAAG TTGAGT ATTTTTGC TGTATTTGT CATAAT GACTCCTGTTGATAGAT •/lambdaPR 15 38 TAACACCGTGCGTG TTGACT ATTTTACC TCTGGCGGT GATAAT GGTTGCATGTACTAAG •/lambdaPR' 15 38 TTAACGGCATGATA TTGACT TATTGAAT AAAATTGGG TAAATT TGACTCAACGATGGGTT •/lambdaPRE 15 39 GAGCCTCGTTGCGT TTGTTT GCACGAACC ATATGTAAG TATTTC CTTAGATAACAAT SlambdaPRM 15 38 AACACGCACGGTGT TAGATA TTTATCCC TTGCGGTGA TAGATT TAACGTATGAGCACAA •/pBR322bla 15 38 TTTTTCTAAATACA TTCAAA TATGTATC CGCTCATGA GACAAT AACCCTGATAAATGCT •/pBR322P4 15 42 CATCTGTGCGGTAT TTCACA CCGCATATGGTGCACTCTCAG TACAAT CTGCTCTGATGCCGCAT SpBR322primer 15 38 ATCAAAGGATCTTC TTGAGA TCCTTTTT TTCTGCGCG TAATCT GCTGCTTGCAAACAAAA •/pBR322tet 15 38 AAGAATTCTCATGT TTGACA GCTTATCA TCGATAAGC TTTAAT GCGGTAGTTTATCACA •/pBRH4-25 4 27 TCG TTTTCA AGAATTCA TTAATGCGG TAGTTT ATCACAGTTAA •/pBRPl 15 42 TTCATACACGGTGC CTGACT GCGTTAGCAATTTAACTGTGA TAAACT ACCGCATTAAAGCTTA •/pBRRNAI 15 39 GTGCTACAGAGTTC TTGAAG TGGTGGCCT AACTACGGC TACACT AGAAGGACAGTATTTG •/pBRtet-10 15 38 AAGAATTCTCATGT TTGACA GCTTATCA TCGATGCGG TAGTTT ATCACAGTTAA •/pBRtet-15 15 38 AAGAATTCTCATGT TTGACA GCTTATCA TCGGTAGTT TATCAC AGTTAAATTGC •/pBRtet-22 15 39 AAGAATTCTCATGT TTGACA GCTTATCAT CGATCACAG TTAAAT TGCTAACGCAG •/pBRtet/TA22 10 33 TTCTCATGT TTGACA GCTTATCA TCGATAAGC TAAATT TTATATAAAATTTAGCT •/pBRtet/TA33 10 33 TTCTCATGT TTGACA GCTTATCA TCGATAAGC TAAATT TATATAAAATTTTATAT •/pori-l 15 38 CTGTTGTTCAGTTT TTGAGT TGTGTATA ACCCCTCAT TCTGAT CCCAGCTTATACGGT Spori-r - - GATCGCACGATCTG TATACT TATTTGAGT AAATTAACC CACGAT CCCAGCCATTCTTCTGC Dppc - - CGATTTCGCAGCAT TTGACG TCACCGCT TTTACGTGG CTTTAT AAAAGACGACGAAAA QpSClOloriPl 3 30 TT TTGTAG AGGAGCAAACAGCGTTTGCGA CATCCT TTTGTAATACTGCGGAA •/pSC101oriP2 8 30 ATTATCA TTGACT AGCCCATC TCAATTGG TATAGT GATTAAAATCACCTAGA SpSC101oriP3 15 38 ATACGCTCAGATGA TGAACA TCAGTAGG GAAAATGCT TATGGT GTATTAGCTAAAGC •/pyrBl-Pl 15 37 CTTTCACACTCCGC CCTATA AGTCGGAT GAATGGAA TAAAAT GCATATCTGATTGCGTG •/pyrBl-P2 13 36 TTGCATCAAATG CTTGCG CCGCTTCT GACGATGAG TATAAT GCCGGACAATTTGCCGG SpyrD 15 38 TTGCCGCAGGTCAA TTCCCT TTTGGTCC GAACTCGCA CATAAT ACGCCCCCGGTTTG •/pyrE-Pl 15 38 ATGCCTTGTAAGGA TAGGAA TAACCGCC GGAAGTCCG TATAAT GCGCAGCCACATTTG •/pyrE-P2 14 38 GTAGGCGGTCATA CTGCGG ATCATAGAC GTTCCTGTT TATAAA AGGAGAGGTGGAAGG SR100rna3 15 39 GTACCGGCTTACGC CGGGCT TCGGCGGTT TTACTCCTG TATCAT ATGAAACAACAGAG •/R100RNAI 15 38 CACAGAAAGAAGTC TTGAAC TTTTCCGG GCATATAAC TATACT CCCCGCATAGCTGAAT SR100RNAII 15 38 ATGGGCTTACATTC TTGAGT GTTCAGAA GATTAGTGC TAGATT ACTGATCGTTTAAGGAA •/R1RNAII 15 37 ACTAAAGTAAAGAC TTTACT TTGTGGCG TAGCATGC TAGATT ACTGATCGTTTAAGGAA •/recA 15 37 TTTCTACAAAACAC TTGATA CTGTATGA GCATACAG TATAAT TGCTTCAACAGAACAT •/rnh 15 38 GTAAGCGGTCATTT ATGTCA GACTTGTC GTTTTACAG TTCGAT TCAATTACAGGA •/rn(pRNaseP) 15 38 ATGCGCAACGCGGG GTGACA AGGGCGCG CAAACCCTC TATACT GCGCGCCGAAGCTGACC •/rplj 15 38 TGTAAACTAATGCC TTTACG TGGGCGGT GATTTTGTC TACAAT CTTACCCCCACGTATA •/rpmHlp 15 38 GATCCAGGACGATC CTTGCG CTTTACCC ATCAGCCCG TATAAT CCTCCACCCGGCGCG •/rpmH2p 15 38 ATAAGGAAAGAGAA TTGACT CCGGAGTG TACAATTAT TACAAT CCGGCCTCTTTAATC •/rpmH3p 15 38 AAATTTAATGACCA TAGACA AAAATTGG CTTAATCGA TCTAAT AAAGATCCCAGGACG •/rpoA 15 38 TTCGCATATTTTTC TTGCAA AGTTGGGT TGAGCTGGC TAGATT AGCCAGCCAATCTTT SrpoB 15 37 CGACTTAATATACT GCGACA GGACGTCC GTTCTGTG TAAATC GCAATGAAATGGTTTAA SrpoD-Pa 13 36 CGCCCTGITCCG CAGCTA AAACGCAC GACCATGCG TATACT TATAGGGTTGC •/rpoD-Pb 9 33 AGCCAGGT CTGACC ACCGGGCAA CTTTTAGAG CACTAT CGTGGTACAAAT •/rpoD-Phs 13 36 ATGCTGCCACCC TTGAAA AACTGTCG ATGTGGGAC GATATA GCAGATAAGAA •/rpoD-Phs/min - - CCC TTGAAA AACTGTCGATGTGGGACGATA TAGCAG ATAAGAATATTGCT •rrn4.5S 14 37 GGCACGCGATGGG TTGCAA TTAGCCGG GGCAGCAGT GATAAT GCGCCTGCGCGTTGGTT SrrnABPl 15 37 TTTTAAATTTCCTC TTGTCA GGCCGGAA TAACTCCC TATAAT GCGCCACCACTGACACG •/



sequence ttgaca tataat promoter foundrrnABP2 15 37 GCAAAAATAAATGC TTGACT CTGTAGCG GGAAGGCG TATTAT GCACACCCCGCGCCGC 7rrnB-P3 14 40 CTATGATAAGGAT TACTCA TCTTATCCTT ATCAAACCGT TAAAAT GGGCGGTGTGAGCTTG • /rrnB-P4 15 36 GCGTATCCGGTCAC CTCTCA CCTGACA GTTCGTGG TAAAAT AGCCAACCTGTTCGACA •/rrnDEXP2 15 37 CCTGAAATTCAGGG TTGACT CTGAAAGA GGAAAGCG TAATAT ACGCCACCTCGCGACAG SrrnD-Pl 15 37 GATCAAAAAAATAC TTGTGC AAAAAATT GGGATCCC TATAAT GCGCCTCCGTTGAGACG •/rrnE-Pl 15 37 CTGCAATTTTTCTA TTGCGG CCTGCGGA GAACTCCC TATAAT GCGCCTCCATCGACACG •/rrnG-Pl 15 37 TTTATATTTTTCGC TTGTCA GGCCGGAA TAACTCCC TATAAT GCGCCACCACTGACACG •/rrnG-P2 15 37 AAGCAAAGAAATGC TTGACT CTGTAGCG GGAAGGCG TATTAT GCACACCGCCGCGCCG •/rrnXl 15 37 ATGCATTTTTCCGC TTGTCT TCCTGAGC CGACTCCC TATAAT GCGCCTCCATCGACACG SRSFprimer 15 38 GGAATAGCTGTTCG TTGACT TGATAGAC CGATTGATT CATCAT CTCATAAATAAAGAA v^RSFrnal 15 39 TAGAGGAGTTTGTC TTGAAG TTATGCACC TGTTAAGGC TAAACT GAAAGAACAGATTTTG •/S10 15 37 TACTAGCAATACGC TTGCGT TCGGTGGT TAAGTATG TATAAT GCGCGGGCTTGTCGT •"sdh-Pl 14 37 ATATGTAGGTTAA TTGTAA TGATTTTG TGAACAGCC TATACT GCCGCCAGTCTCCGGAA Ssdh-P2 15 37 AGCTTCCGCGATTA TGGGCA GCTTCTTC GTCAAATT TATCAT GTGGGGCATCCTTACCG •/spc 15 38 CCGTTTATTTTTTC TACCCA TATCCTTG AAGCGGTGT TATAAT GCCGCGCCCTCGATA Sspot42r 15 37 TTACAAAAAGTGCT TTCTGA ACTGAACA AAAAAGAG TAAAGT TAGTCGCGTAGGGTACA •/ssb 15 39 TAGTAAAAGCGCTA TTGGTA ATGGTACAA TCGCGCGTT TACACT TATTCAGAACGATTTT •/str 15 38 TCGTTGTATATTTC TTGACA CCTTTTCG GCATCGCCC TAAAAT TCGGCGTCCTCATAT •/SucAB 15 39 AAATGCAGGAAATC TTTAAA AACTGCCCC TGACACTAA GACAGT TTTAAAAGGTTCCTT •/supB-E 15 38 CCTTGAAAAAGAGG TTGACG CTGCAAGG CTCTATACG CATAAT GCGCCCCGCAACGCCGA JT7-A1 15 38 TATCAAAAAGAGTA TTGACT TAAAGTCT AACCTATAG GAT ACT TACAGCCATCGAGAGGG ST7-A3 15 38 GTGAAACAAAACGG TTGACA ACATGAAG TAAACACGG TACGAT GTACCACATGAAACGAC •/T7-C 15 38 CATTGATAAGCAAC TTGACG CAATGTTA ATGGGCTGA TAGTCT TATCTTACAGGTCATC •/T7-D 15 38 CTTTAAGATAGGCG TTGACT TGATGGGT CTTTAGGTG TAGGCT TTAGGTGTTGGCTTTA •/"T7A2 15 39 ACGAAAAACAGGTA TTGACA ACATGAAGT AACATGCAG TAAGAT ACAAATCGCTAGGTAAC •/T7E 11 34 CTTACGGATG ATGATA TTTACACA TTACAGTGA TATACT CAAGGCGACTACAGATA •/TAC16 10 32 AATGAGCTG TTGACA ATTAATCA TCGGCTCG TATAAT GTGTGGAATTGTG •/TnlOPin 9 33 TCATTAAG TTAAGG TGGATACAC ATCTTGTCA TATGAT CAAATGGTTTCGCGAAA STnlOPout 15 38 AGTGTAATTCGGGG CAGAAT TGGTAAAG AGAGTCGTG TAAAAT ATCGAGTTCGCACATC •/TnlOtetA 15 39 ATTCCTAATTTTTG TTGACA CTCTATCAT TGATAGAGT TATTTT ACCACTCCCTATCAGT ./"TnlOtetR 15 39 TATTCATTTCACTT TTCTCT ATCACTGAT AGGGAGTGG TAAAAT AACTCTATCAATGATA •/TnlOtetR* 11 34 TGATAGGGAG TGGTAA AATAACTC TATCAATGA TAGAGT GTCAACAAAAATTAGG •/TnlOxxxPl 15 37 TTAAAATTTTCTTG TTGATG ATTTTTAT TTCCATGA TAGATT TAAAATAACATACC •"Tnl0xxxP2 15 38 AAATGTTCTTAAGA TTGTCA CGACCACA TCATCATGA TACCAT AAACATACTGACGG •/Tnl0xxxP3 11 38 CCATGATAGA TTTAAA ATAACATACCGTCAGTATGTT TATGGT ATCATGATGATGTGGTC •/Tn2660bla-F3 15 38 TTTTTCTAAATACA TTCAAA TATGTATC CGCTCATGA GACAAT AACCCTGATAAATGCT STn2661bla-Pa 15 38 GGTTTATAAAATTC TTGAAG ACGAAAGG GCCTCGTGA TACGCT TATTTTTATAGGTTAA •/Tn2661bla-Pb 5 28 CCTC GTGATA CGCTTATT TTTATAGGT TAATGT C ATGATA AT AATGGTTT •/Tn501mer 14 39 TTTTCCATATCGC TTGACT CCGTAGATG AGTAGGGAAG TAAGGT TACGCTATCCfcATTTC STn501merR 15 37 CATGCGCTTGTCCT TTCGAA TTGAAATT GGATAGCG TAACCT TACTTCCGTACTCA STn5TR 15 38 TCCAGGATCTGATC TTCCAT GTGACCTC CTAACATGG TAACGT TCATGATAACTTCTGCT •/Tn5neo 15 38 CAAGCGAACCGGAA TTGCCA GCTGGGGC GCCCTCTGG TAAGGT TGGGAAGCCCTGCAA STn7-PLE 15 38 ACTAGACAGAATAG TTGTAA ACTGAAAT CAGTCCAGT TATGCT GTGAAAAAGCAT •tnaA 15 37 AAACAATTTCAGAA TAGACA AAAACTCT GAGTGTAA TAATGT AGCCTCGTGTCTTGCG •/tonB 15 39 ATCGTCTTGCCTTA TTGAAT ATGATTGCT ATTTGCATT TAAAAT CGAGACCTGGTTT •/trfA 15 39 AGCCGCTAAAGTTC TTGACA GCGGAACCA ATGTTTAGC TAAACT AGAGTCTCCTT •/trfB 15 38 AGCGGCTAAAGGTG TTGACG TGCGAGAA ATGTTTAGC TAAACT TCTCTCATGTG •/trp 15 38 TCTGAAATGAGCTG TTGACA ATTAATCA TCGAACTAG TTAACT AGTACGCAAGTTCACGT •/trpP2 15 38 ACCGGAAGAAAACC GTGACA TTTTAACA CGTTTGTTA CAAGGT AAAGGCGACGCCGCCC ytrpR 15 39 TGGGGACGTCGTTA CTGATC CGCACGTTT ATGATATGC TATCGT ACTCTTTAGCGAGTACA •/trpS 15 38 CGGCGAGGCTATCG ATCTCA GCCAGCCT GATGTAATT TATCAG TCTATAAATGACC •/trxA 15 39 CAGCTTACTATTGC TTTACG AAAGCGTAT CCGGTGAAA TAAAGT CAACTAGTTGGTTAA StufB 15 38 ATGCAATTTTTTAG TTGCAT GAACTCGC ATGTCTCCA TAGAAT GCGCGCTACTTGATGCC StyrT 15 37 TCTCAACGTAACAC TTTACA GCGGCGCG TCATTTGA TATGAT GCGCCCCGCTTCCCGAT •/"tyrT/109 15 39 ACAGCGCGTCTTTG TTTACG GTAATCGAA CGATTATTC TTTAAT CGCCAGCAAAAATAA •/tyrT/140 - - TTAAGTCGTCACTA TACAAA GTACTGGCA CAGCGGGTC TTTGTT TACGGTAATCG •tyrT/178 13 34 TGCGCGCAGGTC GTGACG TCGAGAA AAACGTCT TAAGTC GTGCACTATACA StyrT/212 2 24 C ATGTCG ATCATACC TACA.CAGC TGAAGA. TATGATGCGGGCAGGTCGTGACG JtyrT/6 - - ATTTTTCTCAAC GTAACA CTTTACAG GCGCGTCA TTTGAT ATGATGCGCCCCGCTTC •tyrT/77 13 38 ATTATTCTTTAA TCGCCA GCAAAAATA ACTGGTTACC TTTAAT CCGTTACGGATGAAAAT Suncl 15 37 TGGCTACTTATTGT TTGAAA TCACGGGG GCGCACCG TATAAT TTGACCGCTTTTTGAT SuvrB-Pl 15 38 TCGAGTATAATTTG TTGGCA TAATTAAG TACGACGAG TAAAAT TACATACCTGCCCGC •/uvrB-P2 15 39 TCAGAAATATTATG GTGATG AACTGTTTT TTTATCCAG TATAAT TTGTTGGCATAATTAA SuvrB-P3 15 38 ACAGTTATCCACTA TTCCTG TGGATAAC CATGTGTAT TAGAGT TAGAAAACACGAGGCA •/uvrC 15 38 GCCCATTTGCCAGT TTGTCT GAACGTGA ATTGCAGAT TATGCT GATGATCACCAAGG SuvrD 15 37 TGGAAATTTCCCGC TTGGCA TCTCTGAC CTCGCTGA TATAAT CAGCAAATCTGTATAT •/434PR 15 38 AAGAAAAACTGTAT TTGACA AACAAGAT ACATTGTAT GAAAAT ACAAGAAAGTTTGTTGA •/434PRM 15 38 ACAATGTATCTTGT TTGTCA AATACAGT TTTTCTTGT GAAGAT TGGGGGTAAATAACAGA •/


CHAPTER 19

MULTI-OBJECTIVE EVOLUTIONARY ALGORITHMSFOR COMPUTER SCIENCE APPLICATIONS

Gary B. Lamont, Mark P. Kleeman, Richard 0. DayGenetic Computational Techniques Research GroupDepartment of Electrical and Computer Engineering

Air Force Institute of TechnologyWright Patterson Air Force Base, Dayton, OH 45433, USA

E-mail: [email protected]@afit. edu

[email protected]

In this chapter, we apply the multi-objective Messy Genetic Algorithm-II(MOMGA-II) to two NP-Complete multi-objective optimization prob-lems. The MOMGA-II is a multi-objective version of the fast Messy Ge-netic Algorithm (fmGA) that is an explicit building-block method. First,the MOMGA-II is used to determine 'good' formations of unmannedaerial vehicles (UAVs) in order to limit the amount of communicationflow. The multi-objective Quadratic Assignment Problem (mQAP) isused to model the problem. Then, the MOMGA-II is applied to theModified Multi-objective Knapsack Problem (MMOKP), a constrainedinteger based decision variable MOP. The empirical results indicate thatthe MOMGA-II is an effective algorithm to implement for similar NP-Complete problems of high complexity.

19.1. Introduction

Multi-objective evolutionary algorithms (MOEAs) are stochastic computa-tional tools available to researchers for solving a variety of multi-objectiveproblems (MOPs). Of course, there are many pedagogical polynomial-complexity MOPs that can be solved optimally using deterministic algo-rithms. However, there are also many real world MOPs that are too com-putationally intensive to be able to obtain an optimal answer in a reasonableamount of time. These problems are considered NP-complete (NPC) prob-

451

452 Gary B. Lamont et al.

lems1. The use of MOEAs to find "good" solutions to these high-dimensionexponential-complexity problems is of great utility. We address the generalcategory of NPC as defined in a MOP structure. Two MOP NPC examplesare solved in depth using MOEAs in order to provide specific insight. Somegeneric comments are presented that address MOEA approaches for NPCcombinatoric problems.

19.2. Combinatorial MOP Functions

Multi-objective Optimization Problems (MOPs), a variation of Combina-torial Optimization Problems, are a highly researched area in the computerscience and operational research fields. MOPs generally model real-worldproblems better than their single objective counterparts as most real worldproblems have competing objectives that need to be optimized. MOPs areused to solve many /VP-Complete problems. The detailed notational sym-bology for these problems, as well as for Pareto optimality, are described inChapter 1. Table 19.75 lists just a few of these NP-Complete (NPC) typesof problems.

In essence, NPC combinatoric MOP problems are constrained minimiza-tion problems with the additional constraint on x such that it is only able totake on discrete values (e.g., integers). The use of these combinatorial MOPsin any proposed MOEA test suite should also be considered. On one hand,EAs often employ specialized representations and operators when solvingthese NPC problems which usually prevent a general comparison betweenvarious MOEA implementations. On the other hand, NPC problems' inher-ent difficulty should present desired algorithmic challenges and complementother test suite MOPs. Databases such as TSPLIB 29, MP-Testdata 32, andOR Library 2, exist for these iVP-Complete problems. On another note,the fitness landscapes for various ./VP-Complete problems vary over a widerange. For example, the knapsack problem reflects a somewhat smooth land-scape while the TSP problem exhibits a many-faceted landscape. The laterthen being more difficult to search for an "optimal" Pareto front. OtherNP-Complete problem databases are also available 25>6>34.

As an example, for the multi-objective 0/1 knapsack problem with n

'By "NP" problem of course is meant non-polynomial deterministic Turing machine poly-nomial execution time. "C" refers to the polynomial mapping of various NP combinatoricproblems to each other; i.e., a complete set 8.

Multi-Objective Evolutionary Algorithms for Computer Science Applications 453

Table 19.75. Possible Multi-objective iVP-Complete Functions

| JVP-Complete Problem Example ^j

Travelling Salesperson Min energy, time, and/or distance; Max expan-s ion

Coloring Min number of colors, number of each colorSet/Vertex Covering Min total cost, over-coveringMaximum Independent Set (Clique) Max set size; Min geometryVehicle Routing Min time, energy, and/or geometryScheduling Min time, missed deadlines, waiting time, re-

source useLayout Min space, overlap, costsiVP-Complete ProblemCombinations Vehicle scheduling and routing0/1 Knapsack - Bin Packing Max profit; Min weightQuadratic Assignment Max flow; Min cost

knapsacks and m items, the objective is to maximize

/(x) = (/1(x),.../n(x)) (B.I)

where

m

/i(x) = Y,P*JXJ (B-2)i=i

and where pij is the profit of the j item in knapsack i and Xj is 1 if selected.The constraint is

m

2_,Wi,3Xi — Cifor att i (B-3)i=i

where u>ij is the weight of item j in knapsack i and Cj is the capacity ofknapsack j .

19.3. MOP NPC Examples

In order to gain insight to applying multi-objective evolutionary algo-rithms (MOEAs) to NPC MOP problems, discussed is the multi-objectivequadratic assignment problem and a modified MOP knapsack problem. In-sight provides for general understanding of MOEA development for NPCMOPs.

19.3.1. Multi-Objective Quadratic Assignment Problem

The standard quadratic assignment problem (QAP) and the multi-objectivequadratic assignment problem (mQAP) are NP-complete problems. Such


problems arise in real-world applications such as facilities placement,scheduling, data analysis, manufacturing, and resource use. Most QAPexamples can be thought of as minimizing the product of two matrices,for example a distance matrix times a flow matrix cost objective. Manyapproaches to solving large dimensional QAPs involve hybrid algorithmsincluding GA integration with local search methods such as Tabu searchand simulated annealing. Here we examine the mQAP as mapped to a het-erogenous mix of unmanned aerial vehicles (UAVs) using a MOEA. Ourmodel concentrates on minimizing communication flow and maximize mis-sion success by positioning UAVs in a selected position within a strict for-mation. Various experiments are conducted using a MOEA approach. Thespecific algorithm used was the multi-objective Messy Genetic Algorithm-II(MOMGA-II), an explicit building-block method. Solutions are then com-pared to deterministic results (where applicable). The symbolic problemdescription is initially discussed to provide problem domain insight.

Regarding a specific application, consider UAVs, flying in large groups.One possible scenario is to have a heterogenous group of UAVs flying to-gether to meet a specific objective. There could be some in the group thatare doing reconnaissance and reporting the information for security pur-poses. In a large heterogenous group, such as this, one UAVs position withrespect to the other UAVs is important. For example, it would be best toplace some UAVs around the outside of the group in order to protect thegroup as a whole. It would also be advantageous to have the reconnaissanceUAVs nearer to the ground in order to allow them to have an unobstructedfield of view.

While location in the formation for their particular part of the mission isimportant, they also need to be in a position where they can communicateeffectively with other UAVs. For example, the reconnaissance UAVs need tocommunicate coordinates, to enable them to find their target. Other UAVsneed to communicate with all of the other UAVs when they sense approach-ing aircraft, so that the group can take evasive action (like fish behavior). Allof this communication may saturate on communication channel, so multiplecommunication channels are used. All of these channels of communicationcan also dictate where the best location in the group may be for each UAVtype.

The UAV communication and mission success problem is a natural ex-tension of the mQAP. The mQAP comes from the quadratic assignmentproblem (QAP) and was introduced by Knowles and Corne 17. The scalarquadratic assignment problem was introduced in 1957 by Koopmans and


Beckmann, when they used it to model a plant location problem3. It isdefined as follows.

19.3.1.1. Literary QAP Definition

The QAP definition is based on a fixed number of locations where each lo-cation is a fixed distance apart from the others. In addition to the locations,there are an equal number of facilities. Each facility has a fixed flow to eachother facility. A solution consists of placing each facility in one and onlyone location. The goal is to place all facilities in such a way as to minimizethe cost of the solution, where the cost is defined as the summation of eachflow multiplied times the distance.

19.3.1.2. Mathematical QAP Definition

n n

i=i j=i

where n is the number of objects/locations, aij is the distance betweenlocation i and location j , bij is the flow from object i to object j , and•Ki gives the location of object i in permutation TT 6 P(n) where P{n) isthe QAP search space, which is the set of all permutations of {1,2,..., n)18. This problem is not only NP-hard and NP-hard to approximate, butis almost intractable. It is generally considered to be impossible to solveoptimally any QAP that has 20 instances or more within a reasonable timeframe 3>27.

19.3.1.3. General mQAP

The mQAP is similar to the scalar QAP, with the exception that thereare multiple flow matrices - each needing to be minimized. For example,the UAVs may use one communication channel for passing reconnaissanceinformation, another channel for target information, and yet another chan-nel for status messages. The goal is to minimize all the communicationflows between the UAVs. The mQAP is defined in mathematical terms inequations C.2 and C.3

19.3.1.4. Mathematical mQAP

The mQAP is defined in mathematical terms in equations C.2 and C.3

(c.1)


mmimfze{C(7r)} = {C1(7r),C2(7r),...,Crm(7r)} (C.2)

where

Ck(n) =n£%n) J2 E °0-^W).. * e l..m (C.3)»=i j=i

and where n is the number of objects/locations, a,ij is the distancebetween location i and location j , bfj is the kth flow from object i to objectj , •Ki gives the location of object i in permutation ?r £ P(n), and 'minimize'means to obtain the Pareto front 18.

Much work has been done with respect to classify solutions found inthe fitness landscape of QAP instances. Knowles and Corne 18 identifiedtwo metrics for use with the mQAP: diameter and entropy. Diameter of thepopulation is defined by Bachelet l and is shown in Equation C.4:

dmm(P) = 1— (C.4)

where dist(ir,fi) is a distance measurement that measures the smallestnumber of two-swaps that need to be performed in order to transform onesolution, 7T, into another solution, fi. The distance measure has a range of[ 0 , n - l ] .

The metric entropy measures the dispersion of the solutions. It is shownin equation C.5:

where n*j is a measure of the number of times object i is assigned tothe j location in the population.

Many approaches have been tried to solve the QAP. Researchers inter-ested in finding the optimal solution can usually only do so for problemsthat are of size 20 or less; moreover, even problem sizes of 15 are consideredto be difficult 3. In cases where it is feasible to find the optimal solution (lessthan size 20), branch and bound methods are typically used 10>28-3. Unfor-tunately, most real-world problems are larger than size 20 and thus requirethe employment of other solving methods in order to find a good solution ina reasonable time. For instance, the use of Ant Colonies has been explored

(c.5)


and is found to be effective when compared to other available heuristics7,31,21 Evolutionary algorithms have also been applied 23>20>n>26. A goodsource where researchers compare performances of different search methodswhen solving the QAP can be found at 33-22.

19.3.1.5. Mapping QAP to MOEA

Table 19.76. Test Suite

Test Name Instance # of # ofCategory locations flows

"~KC10-2fl-[l,2,3]uni II Uniform ~ 10 II 2KC20-2fl-[l,2,3]uni Uniform 20 2KC30-3fl-[l,2,3]uni Uniform 30 3KC10-2fl-[l,---,5]rl Real-like 10 2KC20-2fl-[l,---,5]rl Real-like 20 2

~~KC30-3fl-[l,2,3]rl || Real-like [| 30 || 3

Table 19.77. MOMGA-II settings

Parameter Value |

GA-type fast messy GARepresentation Binary

Eras 10BB Sizes 1-10

Pent 2%

Psplice 100%Pynutation 0%

string length 100, 200, 300Total Generations 100

Thresholding NoTiebreaking No

The Multi-objective messy Genetic Algorithm-II (MOMGA-II) programis based on the concept of the Building Block Hypothesis (BBH). TheMOMGA-II is based off of the earlier MOMGA algorithm 38.

The MOMGA implements a deterministic process to generate an enu-meration of all possible BBs, of a user specified size, for the initial pop-ulation. This process is referred to as Partially Enumerative Initialization(PEI). Thus, the MOMGA explicitly uses these building blocks in com-bination to attempt to solve for the optimal solutions in multi-objectiveproblems.


The original messy GA consists of three distinct phases: InitializationPhase., Primordial Phase, Juxtapositional Phase. The MOMGA uses theseconcepts and extends them where necessary to handle k > 1 objectivefunctions. In the initialization phase, the MOMGA produces all buildingblocks of a user specified size.

The primordial phase performs tournament selection on the populationand reduces the population size if necessary. The population size is adjustedbased on the percentage of "high" fitness BBs that exist. In some cases, the"lower" fitness BBs may be removed from the population to increase thispercentage.

In the juxtapositional phase, BBs are combined through the use of acut and splice recombination operator. Cut and splice is a recombination(crossover) operator used with variable string length chromosomes. Thecut and splice operator is used with tournament thresholding selection togenerate the next population.

A probabilistic approach is used in initializing the population of thefmGA. The approach is referred to as Probabilistically Complete Initializa-tion (PCI) 9. PCI initializes the population by creating a controlled numberof BBs based on the user specified BB size and string length. The fmGA'sinitial population size is smaller than the mGA (and MOMGA by exten-sion) and grows at a smaller rate as a total enumeration of all BBs ofsize o is not necessary. These BBs are then "filtered", through a BuildingBlock Filtering (BBF) phase, to probabilistically ensure that all of the de-sired good BBs from the initial population are retained in the population.The BBF approach effectively reduces the computational bottlenecks en-countered with PEI through reducing the initial population size required toobtain "good" statistical results. The fmGA concludes by executing a num-ber of juxtapositional phase generations in which the BBs are recombinedto create strings of potentially better fitness.

The MOMGA-II mirrors the fast messy Genetic Algorithm (fmGA) andconsists of the following phases: Initialization, Building Block Filtering, andJuxtapositional. The MOMGA-II differs from the MOMGA in the Initial-ization and Primordial phase, which is referred to as the Building Block Fil-tering phase. The initialization phase of the MOMGA-II uses PCI insteadof the PEI implementation used in the MOMGA and randomly creates theinitial population.

The application of an MOEA to a class of MOP containing few fea-sible points creates difficulties that an MOEA must surpass in order togenerate any feasible points throughout the search process. A random ini-


tialization of an MOEA's population may not generate any feasible pointsin a constrained MOP. Without any feasible solutions in the population, onemust question whether or not the MOEA can even conduct a worthwhilesearch. In problems where the feasible region is greatly restricted, it maybe impossible to create a complete initial population of feasible solutionsrandomly. Without feasible population members, any MOEA is destinedto fail. Feasible population members contain the BBs necessary to generategood solutions. It is possible for an infeasible population member to containa BB that is also present in a feasible solution. As it is also possible formutation to generate a feasible population member from a infeasible pop-ulation member. However, typically feasible population members containBBs that are not present in infeasible population members. Evolutionaryoperations (EVOP) applied to feasible members tend to yield better re-sults than EVOPs applied to infeasible population members. Therefore, itis critical to initialize and maintain a population of feasible individuals.

19.3.2. MOEA mQAP Results and Analysis

19.3.2.1. Design of mQAP Experiments and Testing

The goal of the experiments was to compare the MOMGA-II results withother programs that have solved the mQAP. In order to do this, a bench-mark data set was needed for comparison purposes. The test suite chosenwas created by Knowles 16 and for smaller sized problems a deterministicsearch program was used to get definitive results. See Table 19.76 for anentire listing of the test suite problems.

Table 19.77 lists the MOMGA-II default parameter settings used duringthe mQAP experiments. Building block sizes 1 through 10 were used. Eachbuilding block size was run in a separate iteration, or era, of the program.Population sizes were created using the Probabilistically Complete Initial-ization method referred to earlier in the paper. Specifically, the populationfor each era was determined using the the population formula found inEquation C.6.

PopSize = NumCopies x AlleleCombo x Choose(Prob_Len, Order) (C.6)

Where PopSize is the population size to be found, NumCopies is thenumber of duplicate copies of alleles that we want to have, AlleleCombo is2Order where Order is the building block size, and Choose(ProbLen, Order)is a combination that takes the problem length and the building block size


as its variables.These settings were chosen based on previous settings used when

MOMGA-II was applied to the multi-objective knapsack problem. Due tothe extended length of time it took to generate data, other settings couldnot be evaluated as well. It is recommended that future experiments berun with different settings in order to determine the best settings for theseparticular problems.

The MOMGA-II results are taken over 30 data runs. The MOMGA-IIwas run on a Beowulf PC cluster consisting of 32 dual-processor machines,each with 1-GB memory and two 1-GHz Pentium III processors (usingRedhat LINUX version 7.3 and MPI version 1.2.7.1).

The MOMGA-II code was run in two different manners. One methodstarted with a randomized competitive template and passed the improvedcompetitive template to larger building sizes. The other method had sep-arate competitive templates for each building block size. The first methodallows the algorithm to exploit "good" solutions as the algorithm runs. Thesecond method allows the larger building block sizes to explore the searchspace more.

The MOMGA-II code was run in order to generate a population withgood (low) fitness values for the flows and found the non-dominated points.After the unique Pareto points for each of the runs was found, the resultswere combined, one at a time, and pareto.enum was used to pull out theunique Pareto points for each round. A simple MatLab program was thenused that showed how the data values improved as more runs were run.

19.3.2.2. QAP Analysis

Table 19.78 compares our original results (competitive template passed tolarger building block sizes) to those found by Knowles and Corne 18 andthe optimal results, when applicable, using a simple program that goesthrough all possible permutations. Abbreviations used in the table are asfollows: Non-Dominated (ND), Diameter(Dia), Entropy(Ent), and Deter-ministic(Det). For all of the instances with 10 locations and 10 facilitiesKnowles and Corne used a deterministic algorithm. For all the instanceswith 20 locations and 20 facilities, they used local search measures whichemployed 1000 local searches from each of the 100 different A vectors. Forthe instances with 30 locations and 30 facilities, they employed a similarlocal search measure which used 1000 local searches from each of the 105different A vectors 17.


Table 19.78. Comparison of QAP Results

Knowles Results Our Results PFtTUe PointsTtest # N D I I # N D I I I ^~\ %

Name pts Dia Ent pts Dia Ent EA Det Found

KC10-2fl-luni II 13 I 7 I 0.71 II 13 I 5 I 0.69 II 9 I 13 I 69KC10-2fl-2uni 1 6 0.39 1 0 0 1 1 100KC10-2fl-3uni 130 8 0.78 118 6 0.87 40 130 31KC20-2fl-luni 80 15 0.828 24 11 0.82KC20-2fl-2uni 19 14 0.43 538 15 1.48 ~KC20-2fl-3uni 178 16 0.90 51 12 0.92KC30-3fl-luni 705 24 0.97 126 20 O50KC30-3fl-2uni 168~ 22 0.92 58 22 ~0.64 ~KC30-3fl-3uni 1257 24 ~~0.96 155 20 0.56 ~KC10-2fl-lrl 58 8 0.68 44 5 0.61 21 58 ~ 36KC10-2fl-2rl 15 7 0.49 10 5 0.56 5 15 ~ 33KC10-2fl-3rl 55 8 0.62 36 6 0.71 23 55 42KC10-2fl-4rl 53 8 ' 0.58 34 4 6753 24 53 45KC10-2fl-5rl 49 ~ 8 ~0.63 45 6 ~0.69 36 49 ~ 73KC20-2fl-lrl 541 15 ~0,63 17 12 0.73KC20-2fl-2rl 842 14 ~0.6 12 11 0.76KC20-2fl-3rl 1587 15 0.66 29 12 0.91KC20-2fl-4rl 1217~ 15 ~0.51 25 10 ~0.18 ~KC30-3fl-lrl 1329 24 0.83 191 24 0.79 ~KC30-3fl-2rl 1924 24 0.86 183 24 0.77

By comparing the initial MOMGA-II results for the 10 locations with 10facilities instances, it can be shown that the results did not equal the resultsfor the Pareto optimal (found deterministically). It can also be assumed thatthe MOMGA-II results for problems with 20 & 30 locations and 20 & 30facilities did not find all the true Pareto front members. When compared toKnowles and Corne's test results, the MOMGA-II results might be deficientdepending on if they indeed have found true Pareto front points. Figures19.1 and 19.2 illustrate results for 30 locations and 30 facilities.

Then we ran the MOMGA-II with the exact same settings, but we ran-domized our competitive template for each building block size. Table 19.79shows the outcome of those results with respect to our initial run andthe optimal results. The old method refers to using the same competitivetemplates throughout all the building block sizes and the new method ran-domizes the competitive template before each building block size. As youcan see, by allowing more exploration of the search space, we were able tofind more PFtrue points.

Figures 19.3 and 19.4 show some of the results of these runs. The resultsshow that the new method performs much better than the old method on


Fig. 19.1. Pareto front found for the KC30-3fl-2rl test instance

Fig. 19.2. Pareto front found for the KC30-3fl-3uni test instance

all instances except one. The one time the old method performs better iswhen there is only one data point as a solution. These results show that,with the exception of one instance, the new method is more effective than

MvXti- Objective Evolutionary Algorithms for Computer Science Applications 463

Table 19.79. Comparison of MOMGA-II Methods to PFtruf,

True Pareto Front PointsTotal Old Percent New Percent

Test Name P F l r u e Method Found Method Found

KC10-2fl-luni 13 II 9 69 II 11 85KC10-2fl-2uni 1 1 100 ' 0 0KC10-2fl-3uni 130 40 31 ' 122 94KC10-2fl-lrl 58 21 36 56 97KC10-2fl-2rl 15 5 33 11 73KC10-2fi-3rl 55 23 42 50 91KC10-2fl-4rl 53 24 45 47 89KC10-2fl-5rl 49 36 73 49 100

Mean II II 53.76 II 78.49

Std. Dev. 24.59 32.75

Mean(w/o anomaly) 47.16 89.70

Std. Dev.(w/o anomaly) 17.28 8.82

1

i.51

Fig. 19.3. Comparison of MOMGA-II methods to optimal results on KC10-2fl-lrl testinstance

the old method. These suggest that randomizing the competitive template,allows the algorithm to explore the objective space more effectively andyield better results. It is believed that the MOMGA-II suffers from "sped-


ation". This can be overcome by adding some competitive templates nearthe center of the Pareto Front. See 13>14 for more detailed analysis of theseresults.

Fig. 19.4. Comparison of MOMGA-II methods to optimal results on KC10-2fl-luni testinstance

The results from Table 19.79 support these findings. Whenever therewere many points to find, the new method always found more than the oldmethod. The reason why the old method performed better than the newmethod when there was only one point to find is due to the fact that bothcompetitive templates are pointing at the same location. This directs thesearch in the same direction as opposed to dividing the search into twodirections. Since the new method doesn't have this directed search passedon to the larger building block sizes, they start at a disadvantage whentrying to find one or two points.

Additional experiments were done to see if building block size playeda role in where the points were located along the Pareto Front. We foundthat, on average, about twice as many large building blocks populate theoutside of the Pareto Front than the smaller building block sizes do. This isdue to more bits being set in the genotype domain and allows for a bettersolution in the phenotype domain. These results support the results thatare discussed in Section 19.4 of this chapter.


More in-depth analysis of the results can be found in 5'14-13.

19.3.3. Modified Multi-Objective Knapsack Problem(MMOKP)

The generic multiple knapsack problem (MKP) consists of maximizing theprofit (amount of items) for all knapsacks while adhering to the maxi-mum weight (constraint) of each knapsack. The MOMGA-II is applied tothe Modified Multi-objective Knapsack Problem (MMOKP), also a con-strained integer based decision variable MOP. The formulation contains alarge number of decision variables. MOEAs are suited to attempt and solveproblems of high dimensionality and hence the MOMGA-II is suited forthis application. The MMOKP is formulated in 100, 250, 500, and 750 itemformulations with integer based decision variables and real-valued fitnessfunctions. While the MMOKP formulation used does not reflect the truemulti-objective formulation of the true multiple knapsack problem (MKP)due to the constraint that any item placed into one of the knapsacks mustalso be placed into all of the knapsacks. However, the MMOKP remains agood test problem due to the large number of decision variables and thedifficulty associated with generating solutions on the Pareto front. Manyresearchers have selected this MOP to test their non-explicit building-blockMOEAs 12,15,19,30,35,36,37 T h e MMOKP has been selected for testing due tothe difficulty of finding good solutions to this problem and to evaluate theperformance of an explicit BB-based MOEA approach as applied to thisMOP. Since the MMOKP has similar characteristics to other real-worldMOPs, it is a good test problem to use.

The specific MOMGA-II settings used are presented in Table 19.80.Results are taken over 30 data runs in order to compare the results ofthe MOMGA-II to other MOEAs also executed over 30 data runs. TheMOMGA-II was run on a SUN Ultra 10 with a single 440 MHz processor,1024 MB of RAM, and using the Solaris 8 operating system.

Table 19.80. MOMGA-II settings

| Parameter | Value |

Eras 10BB Sizes TlO

Pent 2%

Psplice 100%string length 100, 250, 500, 750

Total Generations 100


The overall goal is to maximize the profit obtained from each of theknapsacks simultaneously while meeting the weight constraints imposed.The MOP formulation follows for m items and n knapsacks , where

Pi,j = profit of item j according to knapsack i,Wij = weight of item j according to knapsack i,

Ci = capacity of knapsack i

For the MMOKP problem with n knapsacks and m items, the objectivesare to maximize

/(x) = ( / l « , - / n ( x ) ) (C.7)

where

/i(x) = $ > , ; * i (C.8)

and where Xj — Hi item j is selected, 0 otherwise 37. The constraints are:m

^WijXj <a\li (C.9)

where Xj — 1 if item j is selected, 0 otherwise. The MMOKP has similarcharacteristics to those of the ALP problem. Both problems are formulatedwith a large number of decision variables, the decision variables are integerbased, and the constraints are linear. Since both problems have similar char-acteristics, one may expect similar issues to arise as compared to the ALPtesting. In fact, the application of the MOMGA-II to the MMOKP problemillustrates that initially the similar performance was obtained as comparedto the results of the ALP. In the initial application of the MOMGA-II to theMMOKP, a constraint handling approach was not used and the MOMGA-IIgenerated few feasible solutions, entirely of inferior quality when comparedto the results of other algorithms. The initial results necessitated the useof a repair mechanism in order to provide the MOMGA-II an increasedprobability of identifying good BBs in the population.

An analysis of the initial results of the MOMGA-II illustrate that with-out the repair method, the infeasible results generated are far away fromthe feasible solutions in phenotype space. This is a different result thanwas realized from the application of the MOMGA-II to the ALP. Hencethe repair mechanism is also used in all three phases of execution. Thepopulation members are repaired following the random initialization of thepopulation, during the BBF phase and during the juxtapositional phase.


Additionally, the competitive templates are repaired prior to use. The popu-lation members are not repaired after each specific cut-and-splice operationof the juxtapositional phase in order to avoid convergence to a suboptimalsolution and reduce the overhead associated with the numerous recombi-nations that take place within a single generation. However, at the end ofeach generation of the juxtapositional phase, all of the population membersare repaired. The population is stored to an external archive to ensure thatthe feasible population members generated by the MOMGA-II are not lost.The next juxtapositional generation begins with feasible population mem-bers and the process repeats itself. Once termination criteria is met, theresults are presented to the researcher. At the conclusion of execution of thethree phases, all of the population members are feasible and the competi-tive templates are updated for the next BB execution. The repair processensures that subsequent BB size executions begins with feasible competi-tive templates. The initial repair mechanism selected for use is identical tothe repair mechanism used in the application to the ALP.

Since other researchers have attempted to solve the MMOKP, one mustconsider whether or not other repair approaches may have merit. Zitzlerand Thiele 37 proposed a greedy approach that is an extension of a singleobjective repair mechanism used by Michalewicz and Arabas 24. In thismulti-objective greedy repair approach, items are removed from the knap-sacks based on their profit to weight (ptw) ratio. Remember that each itemhas an associated profit and weight with respect to the knapsack the item isplaced into. If the knapsack capacity is exceeded, then items of lowest profitto weight ratio with respect to the knapsack constraint that is violated areremoved first. The profit to weight ratio is defined in Equation (CIO).

ptWij = EhL (C.1O)wij

Zitzler et al., state that this repair method performs well when usedin their MOEA, the SPEA. Initial testing of this method applied to theMOMGA-II was not as successful and hence a different way of repairingthe population members became necessary. Since the operators are sub-stantially different from the SPEA operators and the process it follows,it is accepted that a method that performs well when implemented in oneMOEA may not perform well when implemented in other MOEAs. Zitzler'srepair mechanism is a simple approach to extending a single objective re-pair mechanism to a multi-objective repair mechanism. The problem is thatZitzler's repair assumes that all of the constraints are violated, which maynot be the case 37.


A potentially better repair mechanism for use in the MOMGA-II re-moves items from the knapsacks based only on the knapsack constraintthat is violated. If only the knapsack 1 constraint is violated, then the ptwratio of each item based on knapsack 1 is compared. However in testing thismethod in the MOMGA-II, the new proposed repair method tends to ob-tain slightly better results in some cases than Zitzler's method but overallthe performance is similar to that obtained with Zitzler's repair mechanism.

In Jaszkiewicz's repair approach 12, used in the IMMOGLS MOEA,the ptw ratio of an item takes into account the multiple objectives in abetter manner than the other mentioned approaches. Only one ptw ratiois calculated per item, but the ptw ratio takes into account all j knapsacksunlike the previously mentioned methods that calculate a ptw ratio peritem, per knapsack. Jaszkiewicz's method sums up the profits over all ofthe knapsacks and sums up the weight over all of the knapsacks for a singleitem. The ptw ratio is calculated by determining the ration of the sums andis presented in Equation (C.ll).

^ = gfli (O.1I,

Jaszkiewicz's repair method yields the better performance when usedin the MOMGA-II. All of the results presented for the MOMGA-II useJaszkiewicz's repair method. The improved performance of Jaszkiewicz'srepair method over other repair methods (applied to the MMOKP usingthe MOMGA-II) is attributed to a better calculation of the lowest ptw ratiothat more accurately takes into account all of the knapsacks simultaneously.The MOMGA-II is applied to the 100 item, 2 knapsack; 250 item, 2 knap-sack; 500 item, 2 knapsack; and 750 item, 2 knapsack MMOKP presentedin Zitzler 37.

Any proposed change to an MOEA should be tested in order to deter-mine if it performs logically correct and to determine if the effect of thechange is worthwhile or as anticipated. The effect of the repair method se-lected (Jaszkiewicz's repair method), as applied to the initial population ofa single run of the MOMGA-II, is illustrated in Figure 19.5. The black (.)srepresent the results of applying the repair mechanism to the infeasible pop-ulation members. Any feasible population members are not repaired andhence the black (.)s only represent the members that are repaired. One cansee that the repaired population members have fitness values that are lowerthan the initial population members since the repair mechanism removesitem types from the knapsack if a constraint is violated. Hence the repaired


members move towards the lower left portion of the figure as item types areremoved from their knapsacks and hence the fitness values of these repairedmembers are decremented. Such a result is what one would expect occursafter repairing the population and this validates that the repair mechanismperforms correctly. The repaired members are all feasible, there are feasibleBBs in the population, and the MOMGA-II can proceed to the BBF phasewith feasible BBs present in the population. This type of comparison shouldalways be done by researchers attempting to improve the performance oftheir MOEAs. It is crucial to validate the performance of a repair operator.

Fig. 19.5. Initial Population for 100 Item 2 Knapsack MMOKP, MOMGA-II

Prior to execution of the MMOKP tests, a modification to the MOMGA-II is completed to increase the selection pressure. The MOMGA-II uses anelitist selection mechanism in which the current Pareto front is passed fromone generation to the next each time that selection occurs. This mecha-nism is used in the testing performs well. The elitist method used main-tains PFCUTrent in the population from generation to generation but doesnot guarantee that a member from PF^nov.m is not destroyed or removedfrom the population during a particular generation. Instead, the nondom-inated members are stored to an external archive each generation and ifnondominated at the conclusion of MOEA execution, appear in the finalPFknown set. Maintaining PFcurrent through the selection mechanism is an-


ticipated to increase the effectiveness of the MOMGA-II as the good BBspresent in PFcurrent remain in the population. In order to increase the se-lection pressure and increase the convergence to a good solution set, thiselitist scheme is implemented for the MMOKP. The elitist selection mecha-nism selects all of the population members that are elements of PFcurrent tobe placed in the next generation. If the population is not full, additionalmembers are selected through the use of an elitist based tournament selec-tion mechanism that selects two individuals at random to compete againsteach other. The tournament selection process repeats until the requiredpopulation size is achieved.

This new elitist routine is expected to increase the effectiveness of theMOMGA-II. The results of this elitist scheme are presented in Figure 19.6as (*)s, and the results of the normal tournament selection without elitismscheme is represented by (.)s. In the 100 item, 2 knapsack MMOKP, a slightimprovement is noted in the results of this new elitist selection scheme.Since an improvement is realized in the limited testing and anticipated toimprove performance, all of the MMOKP tests use this new elitist selectionmechanism.

Fig. 19.6. Elitist 100 Item 2 Knapsack MMOKP

Multi- Objective Evolutionary Algorithms for Computer Science Applications 471

19.3.4. MOEA MMOKP Testing and Analysis

In order to compare the MOMGA-II with a different MOEA as appliedto the MMOKP, the SPEA is selected. The data results of the SPEA areavailable and hence allow a comparison to be conducted. A small differenceexists between the results of the MOMGA-II as compared to the SPEA.The MOMGA-II obtains a limited number of points that dominate thosethe SPEA generates and vice versa, but overall both algorithms generatenumerous identical points. The SPEA appears to obtain a few points thatdominate the MOMGA-II results in the upper left end of the Pareto front.Overall the results from both MOEAs are similar with the exception ofslightly better performance by the SPEA at the end of the front.

The smallest instantiation of the MMOKP represents a difficult prob-lem, but one in which both MOEAs find good solutions and a good distribu-tion of points across the front. The next instantiation tested is the 250 item,2 knapsack MMOKP. The MOMGA-II achieves better performance thanthe SPEA across the entire center section of the front, as well as the lowerright end of the front. However the SPEA obtains points in the upper leftend of the front unpopulated by MOMGA-II results. Better performanceis realized by the MOMGA-II for most of the Pareto front but the SPEAcontinues to obtain better solutions in the upper left end.

Increasing the dimensionality of the MMOKP leads to the next instan-tiation tested, the 500 item, 2 knapsack formulation. Results as generatedby both the MOMGA-II and the SPEA for the 500 item formulation. TheMOMGA-II obtains many more points than the SPEA and the resultsdominate those of the SPEA. The results illustrate a considerable distancebetween the Pareto front generated by the SPEA and the better front gen-erated by the MOMGA-II. All of the points generated by the SPEA withthe exception of two are dominated. The MOMGA-II also obtains a goodspread of solutions across the front. While the MOMGA-II does not findsolutions on one small area of the front, the solutions found dominate thosesolutions found by the SPEA. This illustrates much better performance onthis more difficult problem. Note that the use of the number of functionevaluation comparisons regarding explicit and implicit MOEAs as utilizedhere is not appropriate due to the extensive differences in algorithm struc-tures.

The largest instantiation of the 2 knapsack problem is also tested, the750 item MMOKP. The MOMGA-II results exhibit much better perfor-mance across the entire front as compared to the SPEA. Figure 19.7 shows


that the results of the MOMGA-II as compared to the SPEA. It is easilyseen that the entire Pareto front generated by the MOMGA-II dominatesthe entire front generated by the SPEA in this 750 item MMOKP TheMOMGA-II does not obtain the same spread of solutions as the SPEA butall of the MOMGA-II solutions are of higher quality (dominate) than thoseof the SPEA.

Fig. 19.7. 750 Item 2 Knapsack MMOKP

Results of calculating metric values for the 100, 250, 500, and 750 item,2 fitness function MMOKP instantiations are presented in Table 19.81. TheONVG and spacing metrics are used along with the visualizations of thePareto fronts presented. The selection of these metrics is discussed in de-tail in 38. For each MOEA, the mean and standard deviation results arepresented for each metric. The MOMGA-II and SPEA obtain similar per-formance for the ONVG and spacing metrics as applied to the 100 itemknapsack MMOKP. The 100 item knapsack MMOKP is the smallest in-stantiation of the MMOKP resulting in similar PFknown •

Table 19.81 shows that the SPEA, on average, finds a larger number ofpoints in PFknown and obtains a slightly better spacing value for the 250item MMOKP. However, the MOMGA-II obtains a much better distribu-tion of points and points of higher quality over most of the front, with theexception of the upper left section for the 250 item MMOKP. Overall, theMOMGA-II generates mostly points of equivalent or better quality but the


Table 19.81. 2 Knapsack MMOKP Results

Number of I MOEA I ONVG I SpacingItems Mean | SD Mean | SD

100 I SPEA 49.267 I 6.291 17.797 I 3.841100 MOMGA-II 44.333 8.976 18.331 7.361250 SPEA 55.567 6.377 26.163 3.956250 MOMGA-II 41.167 10.986 22.855 9.343500 SPEA 34.533 5.594 46.798 10.778500 MOMGA-II 35.733 10.866 24.703 16.146750 SPEA 34.200 6.408 71.340 20.733750 1 MOMGA-II | 30.200 | 12.090 | 34.096 24.750

SPEA obtains a better spread of points. Since the quality of the resultsis typically a driving factor for use of an MOEA, one would deem bothalgorithms as performing well on this MOP instantiation.

The 500 item MMOKP is a more challenging instantiation of theMMOKP and the formulation specifies a large number of decision vari-ables. Results show that the solutions dominate all but two of the solutionsfound by the SPEA. Additionally, Table 19.81 illustrates that the aver-age number of solutions and the spacing values generated by the results ofboth MOEAs are comparable and but the SPEA metric results tend to beslightly better than the MOMGA-II. However, the MOMGA-II obtains agood distribution of points across most of the front and the MOMGA-IIpoints dominate those of the SPEA. Overall the MOMGA-II obtains betterresults on the 500 item MMOKP.

The results presented in Table 19.81 show that the MOMGA-II obtains asimilar number of vectors on average as the SPEA but obtains a much betterspacing value for the 750 item MMOKP. The results are shown graphicallyin Figure 19.7 which also illustrates that the results of the MOMGA-IIdominate all of the points found by the SPEA. Overall the MOMGA-IIobtains better performance than the SPEA on this instantiation of theMMOKP. The results presented illustrate a trend that as the number ofdecision variables increase, the improvement in performance as comparedto the SPEA increases, and the MOMGA-II typically generates solutionsof higher quality.

Jaszkiewicz proposes a different metric for comparing the results of dif-ferent MOEAs. His method involves calculating the average range of fitnessvalues of each Pareto front curve or surface and then he compares the av-erage values among the MOEAs 12. To determine these averages one mustfind the minimum value generated for each fitness function in PFknown for


each data run. The minimum values for each fitness function are then aver-aged across the number of data runs conducted. The same process is usedto calculate the average maximum values for each fitness function.

Table 19.82. MMOKP Knapsack Problem Results

| Instance | MOMGA-II | IMMOGLS | SPEA |

2-250 I [8876.47, 9405.57] I [8520.15, 9537.85] [8407.63, 9460.87][8956.97, 9546.53] [8614.15, 9629.90] [8809.47, 9747.23]

2-500 [18387.17, 18906.07] [17684.70, 19047.50] [17697.50, 18900.40][18830.70, 19311.60] [18114.70, 19459.30] [18455.20, 19460.70]

2-750 [27001.57, 27506.87] [25902.60, 27868.50] [26374.90, 27924.00][ [27110.57, 27580.47] [25738.10, 27904.30] | [26152.20, 27720.10]

Table 19.83. MMOKP Knapsack Problem Results (Cont)

| Instance | MOMGA-II | M-PAES | MOGLS |

2-250 I [8876.47, 9405.57] I [8742.50, 9473.05] I [7332.55, 9883.90)[8956.97, 9546.53] [8866.95, 9593.00] [7747.65, 10093.2]

2-500 [18387.17, 18906.07] [18198.50, 19174.70] [16148.60, 20029.20][18830.70, 19311.60] [18541.80, 19514.00] [16766.30, 20444.20]

2-750 [27001.57, 27506.87] [27100.30, 28661.80] [23728.90, 29938.80][27110.57, 27580.47] [26460.30, 28255.90] [23458.20, 29883.10]

The results of the IMMOGLS, SPEA, M-PAES, MOGLS, andMOMGA-II 12 are presented in Tables 19.82 and 19.83. Jaszkiewicz usesthis average range metric to state how well the MOEA solutions are spreadout over PFknown • However, the data presentation format he used does notaccurately reflect the spread or concentration of points. In most attemptsto solve an MOP by an MOEA, researchers conduct numerous data runsand then combine the PFknown sets generated for each run. This combina-tion of the data and subsequent analysis yields the overall PFknown solutionset generated by an MOEA over a course of data runs. While one run maygenerate solutions in the lower right portion of the front (consider the char-acteristic of the MMOKP fronts as an example, Figure 19.7), another maygenerate solutions exclusively in the upper left portion of the front and theremaining runs may only generate solutions in the center of the front. Aresearcher solving real-world MOPs is interested in the final overall resultand not necessarily in the averages. The averages can be deceiving. In the


previous example, most of the data runs generated solutions in the center ofthe front. Calculating the maximum and minimum average values using thedata runs containing results on the endpoints of the Pareto front and com-bining the results with the two runs that generated solutions in a differentarea of the front may yield a value closer to the center than the end of thePareto front. An analysis would then lead one to believe that the MOEAdid not generate a good spread of solutions but instead a cluster around thecenter of the front. In comparison testing of an MOEA that consistentlygenerated solutions only at the two ends of the front, the MOEA would beshown as generating a poor distribution of points across the front. How-ever, the a different MOEA may not have generated any solutions in thecenter portion of the front and therefore does not obtain as good a spreadof solutions but the average range table would show otherwise.

While the average range data presentation format may be useful, it mustbe used in conjunction with other metrics or with a graphical presentationof the Pareto front in order to avoid misinterpreting the results. As statedearlier, in some cases, a visual representation may be better than the resultsof a specific metric as metrics are lossy evaluations and loose informationin mapping a Pareto front of multiple data points to a single value. How-ever, Tables 19.82 and 19.83 can be useful if used in conjunction with thegraphical representation of PFknOwn presented in the Figures.

It is important to realize that the MOGLS MOEA is executed on therelaxed formulation of the MMOKP, and not on the formulation of theMMOKP that the MOMGA-II and the SPEA use. Due to this fact, onemust question if Jaszkiewicz's comparison of the MOGLS to IMMOGLS,SPEA, and M-PAES is valid. In Table 19.83 the results of MOGLS areincluded, but a direct comparison between the MOMGA-II and the SPEAis the main focus. A comparison is made to the SPEA as it achieves thebest published performance when applied to the identical formulation ofthe MMOKP.

The results presented in Tables 19.82 and 19.83 indicate that, for themost part, the MOMGA-II does not generate as effective results as the theother MOEAs when comparing the spread of solutions. This interpretationof the results is not necessarily correct. For example, consider the 2 knap-sack, 750 item instantiation of the MMOKP. Table 19.82 indicates that theSPEA has a wider range of values and leads to better solutions but in ac-tuality, Figure 19.7 illustrates that the front generated by the MOMGA-IIdominates the entire front generate by the SPEA. Therefore this presenta-tion format is not recommended for use when one Pareto front completely


dominates another. The average range metric can mislead a researcher'sanalysis of the data if one is not careful to see the whole picture.

The results of the MOMGA-II are very good when applied to the real-world NPC application MOPs. The MOMGA-II finds PFtrue for the 60 bitformulation of the ALP and appears to find good solutions for the larger120 bit formulation. The MOMGA-II also performs favorably when appliedto the MMOKP and compared to other MOEAs. In particular, as the prob-lem size increases, so does the effectiveness of the MOMGA-II in terms ofdominating the solutions found by other MOEAs. Additionally, detaileddescriptions of possible repair mechanisms and the best found MOMGA-IIrepair mechanism to use with MOPs formulated with integer based de-cision variables is presented. The MOMGA-II performs well when usingrepair mechanisms to attempt to solve these discrete constrained MOPs.

19.4. MOEA BB Conjectures for NPC Problems

Since one seeks to find the best solution in any optimization problem, iden-tification of the good BB(s) is critical to generating good and hopefullythe best solution. In the search for the optimal solution, it is possible thatthe identification of only one good BB is necessary to generate all of thegood solutions on the Pareto front or, the more likely case, that there existsmultiple BBs that must be identified in order to generate the multiple so-lutions on the Pareto front. If the identification of more than one good BBis necessary to find the entire front, then the MOEA must find the multiplegood BBs.

The size of the BBs that are necessary to find points on the Pareto fronthas yet to be addressed in the literature. Assuming a worst case situationthat multiple BBs are contained within each point in PFtrue, the possibilityalso exists for the good BBs necessary to generate the points in PFtrUe tobe of varying sizes.

In general, the identification of multiple BBs is necessary to generatePFtme for many MOPs as multiple solutions exist in PFtrUe • Since multiplesolutions existing in PFtrue, and multiple BBs are typically necessary togenerate these solutions, there is a high probability that multiple BB sizesare also required to generate all of the solutions in PFtme • The identifica-tion of multiple BB sizes by an MOEA results in the generation of multiplefronts of different ranks through the search process.

As good BBs are identified and recombined by the MOMGA-II, so-lutions on inferior fronts (fronts of rank 1, 2, etc.) are generated as the


population progresses towards PFtrUe • Once all of the necessary good BBsare generated, assuming a large enough population size to combat the noisepresent in the evolutionary process, an MOEA generates all of the points inPFtrue plus portions of the inferior fronts. Some researchers have identifiedin explicit statements or through the results they have presented a difficultyfor MOEAs in the generating points at the extremes of the front 12>19>37.The extremes are referred to as the endpoints of the curve or /c-dimensionsurface as dictated by the k objective functions. The difficulty of generat-ing the extreme points of the Pareto front is attributed to the necessaryidentification of multiple BBs of different sizes. Implicit BB-based MOEAsmay only generate BBs of a single size or may not be executed with a pop-ulation size large enough to statistically generate the multiple good BBs ofvarious sizes necessary to generate PFtrue • Various examples can illustratethe effect that different BB sizes may have in finding various points in theranked fronts and in PFtrue •

Since many MOEA researchers conduct their research efforts with im-plicit BB-based MOEAs, building block concepts and the effects of theidentification of good BBs are not readily noticeable. Through researchconducted using the MOMGA-II and the theoretical development of pop-ulation sizing equations 38 based upon the Building Block Hypothesis, theneed for different sized BBs to generate PFtrue becomes apparent.

Many of the existing MOEAs are not effective at finding all of thepoints on the Pareto front and more explicitly, points at the endpoints orend sections of the Pareto front when applied to test suite and real-worldMOPs 12'19>37. While generating any point(s) on the Pareto front may beuseful for real-world applications in which potential solutions have not beenfound, it would be even more useful if a researcher could generate a gooddistribution of points across the entire front. This has been identified byresearchers utilizing MOEAs as an important issue 4-12>19.

A question that the MOEA community should answer is: Why do var-ious MOEAs fail to find the endpoints of the Pareto front or if they dofind some of the points, why does this typically occur with larger populationsizes?

When using an explicit BB-based MOEA, the implications of Van Veld-huizen's theorem are that one must use a BB of the same order as thelargest order BB required to solve each of the functions in the MOP lead-ing to various conjectures 38!


19.5. Future Directions

The mQAP and the MMOKP are examples of difficult NPC MOP problemsto solve deterministically for relatively large problem sizes. Stochastic al-gorithms, like MOEAs, take a long time to get a "good" answer for a largenumber of locations simply because the solution space is so large and ofexponential complexity. It's imperative to ensure that the proper buildingblock sizes are used in order to populate PFknown with enough membersto get as close to PFtTue as possible. Thus, in applying MOEAs to largedimensional NPC MOPs, one should consider possible problem relaxation,analysis of building block structures, use of a variety of MOEAs and oper-ators, parallel computation, and finally an extensive design of experimentswith appropriate metric selection, parameter sensitivity analysis, and com-parison.

We plan on looking at how chromosome sizing affects the mQAP results.By changing the bit representation, we can cut the chromosome size downfrom 10 bits per location to 4. This effectively halves the genotype spacefrom previous experiments. This should produce better results since thesearch space is reduced. This concept can also improve the efficiency insolving other NP-Complete MOPs.

References

1. Vincent Bachelet. Metaheuristiques Paralleles Hybrides: Application auProbleme D'affectation Quadratique. PhD thesis, Universite des Sciences etTechnologies de Lille, December 1999.

2. John E. Beasley. Or-library. 12 May 2003 http://mscmga.ms.ic.ac.uk/info.html.

3. Eranda Cela. The Quadratic Assignment Problem - Theory and Algorithms.Kluwer Academic Publishers, Boston, MA, 1998.

4. Carlos A. Coello Coello, David A. Van Veldhuizen, and Gary B. Lamont.Evolutionary Algorithms for Solving Multi-Objective Problems. Kluwer Aca-demic Publishers, New York, May 2002.

5. Richard O. Day, Mark P. Kleeman, and Gary B. Lamont. Solving the Multi-objective Quadratic Assignment Problem Using a fast messy Genetic Al-gorithm. In Congress on Evolutionary Computation (CEC'2003), volume 4,pages 2277-2283, Piscataway, New Jersey, December 2003. IEEE Service Cen-ter.

6. Eranda ela. Qaplib - a quadratic assignment problem library. 8 June 2004http://www.opt.math.tu-graz.ac.at/qaplib/.

7. L. M. Gambardella, E. D. Taillard, and M. Dorigo. Ant colonies for thequadratic assignment problems. Journal of the Operational Research Society,50:167-176, 1999.

Multi- Objective Evolutionary Algorithms for Computer Science Applications 479

8. M. R. Garey and D. S. Johnson. Computers and Intractability - A Guide tothe Theory of NP-Completeness. Freeman, San Francisco, 1979.

9. David E. Goldberg, Kalyanmoy Deb, Hillol Kargupta, and Georges Harik.Rapid, accurate optimization of difficult problems using fast messy geneticalgorithms. In Stephanie Forrest, editor, Proceedings of the Fifth Interna-tional Conference on Genetic Algorithms, pages 56-64. Morgan KauffmannPublishers, 1993.

10. Peter Hahn, Nat Hall, and Thomas Grant. A branch-and bound algorithm forthe quadratic assignment problem based on the hungarian method. EuropeanJournal of Operational Research, August 1998.

11. Jorng-Tzong Horng, Chien-Chin Chen, Baw-Jhiune Liu, and Cheng-Yen Kao.Resolution of quadratic assignment problems using an evolutionary algo-rithm. In Proceedings of the 2000 Congress on Evolutionary Computation,volume 2, pages 902-909. IEEE, IEEE, 2000.

12. Andrzej Jaszkiewicz. On the performance of multiple-objective genetic lo-cal search on the 0/1 knapsack problem—a comparative experiment. IEEETransactions on Evolutionary Computation, 6(4):402-412, August 2002.

13. Mark P. Kleeman. Optimization of heterogenous uav communications usingthe multiobjective quadratic assignment problem. Master's thesis, Air ForceInstitute of Technology, Wright Patterson AFB, OH, March 2004.

14. Mark P. Kleeman, Richard O. Day, and Gary B. Lamont. Multi-objectiveevolutionary search performance with explicit building-block sizes for npcproblems. In Congress on Evolutionary Computation (CEC2004), volume 4,Piscataway, New Jersey, May 2004. IEEE Service Center.

15. Joshua Knowles and David Corne. M-PAES: A Memetic Algorithm for Mul-tiobjective Optimization. In 2000 Congress on Evolutionary Computation,volume 1, pages 325-332, Piscataway, New Jersey, July 2000. IEEE ServiceCenter.

16. Joshua Knowles and David Corne. Instance generators and test suitesfor the multiobjective quadratic assignment problem. Technical ReportTR/IRIDIA/2002-25, IRIDIA, 2002. (Accepted for presentation/publicationat the 2003 Evolutionary Multi-criterion Optimization Conference (EMO-2003)), Faro, Portugal.

17. Joshua Knowles and David Corne. Towards Landscape Analyses to Informthe Design of Hybrid Local Search for the Multiobjective Quadratic Assign-ment Problem. In A. Abraham, J. Ruiz del Solar, and M. Koppen, editors,Soft Computing Systems: Design, Management and Applications, pages 271-279, Amsterdam, 2002. IOS Press. ISBN 1-58603-297-6.

18. Joshua Knowles and David Corne. Instance generators and test suites for themultiobjective quadratic assignment problem. In Carlos Fonseca, Peter Flem-ing, Eckart Zitzler, Kalyanmoy Deb, and Lothar Thiele, editors, EvolutionaryMulti-Criterion Optimization, Second International Conference, EMO 2003,Faro, Portugal, April 2003, Proceedings, number 2632 in LNCS, pages 295-310. Springer, 2003.

19. Marco Laumanns, Lothar Thiele, Eckart Zitzler, and Kalyanmoy Deb.Archiving with Guaranteed Convergence and Diversity in Multi-Objective


Optimization. In W.B. Langdon and et. al., editors, Proceedings of the Ge-netic and Evolutionary Computation Conference (GECCO '2002), pages 439-447, San Francisco, California, July 2002. Morgan Kaufmann Publishers.

20. In Lee, Riyaz Sikora, and Michael J. Shaw. A genetic algorithm-based ap-proach to flexible flow-line scheduling with variable lot sizes. IEEE Transac-tions on Systems, Man and Cybernetics - Part B, 27:36-54, February 1997.

21. Vittorio Maniezzo and Alberto Colorni. The ant system applied to thequadratic assignment problem. IEEE Transactions on Knowledge and DataEngineering, 11:769-778, 1999.

22. Peter Merz and Bernd Freisleben. A comparison of memetic algorithms, tabusearch, and ant colonies for the quadratic assignment problem. In Proceedingsof the 1999 Congress on Evolutionary Computation, 1999. CEC 99, volume 3,pages 1999-2070. IEEE, IEEE, 1999.

23. Peter Merz and Bernd Freisleben. Fitness landscape analysis and memeticalgorithms for the quadratic assignment problem. IEEE Transactions on Evo-lutionary Computation, 4:337-352, 2000.

24. Zbigniew Michalewicz. Genetic Algorithms + Data Structures = EvolutionPrograms. Springer-Verlag, 2nd edition, 1994.

25. Arnold Neumaier. Global optimization test problems. 8 June 2004 http://www.mat.univie.ac.at/~neum/glopt/test.html.

26. Volker Nissen. Solving the quadratic assignment problem with clues fromnature. IEEE Transactions on Neural Networks, 5:66-72, 1994.

27. Panos M. Pardalos and Henry Wolkowicz. Quadratic assignment and relatedproblems. In Panos M. Pardalos and Henry Wolkowicz, editors, Proceedingsof the DIM ACS Workshop on Quadratic Assignment Problems, 1994.

28. K. G. Ramakrishnan, M. G. C. RESENDE, and P. M. PARDALOS. A branchand bound algorithm for the quadratic assignment problem using a lowerbound based on linear programming. In C. Floudas and P. M. PARDALOS,editors, State of the Art in Global Optimization: Computational Methods andApplications. Kluwer Academic Publishers, 1995.

29. Gerhard Reinelt. Tsplib. 4 May 2003 http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95/.

30. Masatoshi Sakawa, Kosuke Kato, and Toshihiro Shibano. An interactive fuzzysatisficing method for multiobjective multidimensional 0-1 knapsack prob-lems through genetic algorithms. In Proceedings of the 1996 InternationalConference on Evolutionary Computation (ICEC'96), pages 243-246, 1996.

31. Kwang Mong Sim and Weng Hong Sun. Multiple ant-colony optimizationfor network routing. In First International Symposium on Cyber Worlds(CW'02), volume 2241, pages 277-281. IEEE, IEEE, 2002.

32. G. Skorobohatyj. Mp-testdata. 20 May 2003 http: / /el ib.zib.de/pub/Packages/mp-testdata/.

33. Eric D. Taillard. Comparison of iterative searches for the quadratic assign-ment problem. Location science, 3:87-105, 1995.

34. Ke Xu. Bhoslib: Benchmarks with hidden optimum solutions for graphproblems. 8 June 2004 http://www.nlsde.buaa.edu.cn/~kexu/benchmarks/graph-benchmarks.htm.


35. Eckart Zitzler. Evolutionary Algorithms for Multiobjective Optimization:Methods and Applications. PhD thesis, Swiss Federal Institute of Technol-ogy (ETH), Zurich, Switzerland, November 1999.

36. Eckart Zitzler, Marco Laumanns, and Lothar Thiele. SPEA2: Improving theStrength Pareto Evolutionary Algorithm. Technical Report 103, ComputerEngineering and Networks Laboratory (TIK), Swiss Federal Institute of Tech-nology (ETH) Zurich, Gloriastrasse 35, CH-8092 Zurich, Switzerland, May2001.

37. Eckart Zitzler and Lothar Thiele. Multiobjective Evolutionary Algorithms:A Comparative Case Study and the Strength Pareto Approach. IEEE Trans-actions on Evolutionary Computation, 3 (4): 257-271, November 1999.

38. Jesse Zydallis. Explicit Building-Block Multiobjective Genetic Algorithms:Theory, Analysis, and Development. PhD thesis, Air Force Institute of Tech-nology, Wright Patterson AFB, OH, March 2003.

CHAPTER 20

DESIGN OF FLUID POWER SYSTEMS USING AMULTI OBJECTIVE GENETIC ALGORITHM

Johan Andersson

Department of Mechanical Engineering, Linkoping UniversitySE-581 83 Linkoping, Sweden


Within this chapter the multi-objective struggle genetic algorithm is em-ployed to support the design of hydraulic actuation systems. Two con-cepts, a valve-controlled and a pump-controlled system for hydraulic ac-tuation are evaluated using Pareto optimization. The actuation systemsare analyzed using comprehensive dynamic simulation models to whichthe optimization algorithm is coupled. The outcome from the Pareto op-timization is a set of Pareto optimal solutions, which allows visualizationof the trade-off between the objectives. Both systems are optimized, re-sulting in two Pareto fronts that visualize the trade-off between systemperformance and system cost. By comparing the two Pareto fronts, itcould be seen under which preferences a valve system is to be preferredto a pump system. Thus, optimization is employed in order to supportconcept selection.

Furthermore, general design problems usually constitute a mixture ofdetermining both continuous parameters as well as selecting individualcomponents from catalogs or databases. Therefore the optimization isextended to handle a mixture of continuous parameters and discreteselections from catalogs. The valve-controlled system is again studied,but this time with cylinders and valves arranged in hierarchical catalogs,resulting in a discrete Pareto optimal front.

20.1. Introduction

Design is an iterative feedback process where the performance of the sys-tem is compared with the specification, see for example Pahl and Beitz12

and Rozenburg and Eekels13. Usually this is a manual process where thedesigner makes a prototype system, which is tested and modified until sat-

483

484 Johan Andersson

isfactory. With the help of a simulation model, the prototyping could bereduced to a minimum. If the desired behavior of the system can be de-scribed as a function of the design parameters and the simulation results, itis possible to introduce optimization as a tool to further help the designerto reach an optimal solution. A design process that comprises simulationand optimization is presented in Andersson1 and depicted in Figure 20.1below.

Fig. 20.1. A system design process including simulation and optimization.

The 'problem definition' in Figure 20.1 results in a requirements listwhich is used in order to generate different solution principles/concepts.Once the concepts have reached a sufficient degree of refinement, modelingand simulation are employed in order to predict the properties of particularsystem solutions. Each solution is evaluated with the help of an objectivefunction, which acts as a figure of merit. Optimization is then employed inorder to automate the evaluation of system solutions and to generate newsystem proposals. The process continues until the optimization is convergedand a set of optimal systems are found. One part of the optimization is theevaluation of design proposals. The second part is the generation of newand hopefully better designs. Thus, optimization consists of both analysis(evaluation) and synthesis (generation of new solutions).

Often the first optimization run does not result in the final design. Ifthe optimization does not converge to a desired system, the concept has tobe modified or the problem reformulated, which results in new objectives.In Figure 20.1 this is visualized by the two outer loops back to 'generationof solution principles' and 'problem definition' respectively.

Naturally the activity 'generation of solution principles' produces anumber of conceivable concepts, which each one is optimized. Thus eachconcept is brought to maximum performance; optimization thereby pro-

Design of Fluid Power Systems Using a Multi Objective Genetic Algorithm 485

vides a solid basis for concept selection. This will be illustrated later in astudy of hydraulic actuation systems.

One essential aspect of using modeling and simulation is to understandthe system we are designing. The other aspect is to understand our expecta-tions on the system, and our priorities among the objectives. Both aspectsare equally important. It is essential to engineering design to manage thedialog between specification and prototype. Often simulations confirm thatwhat we wish for is unrealistic or ill-conceived. Conversely, they can alsoreveal that our whishes are not imaginative enough.

However, engineering design problems are often characterized by thepresence of several conflicting objectives. When using optimization to sup-port engineering design, these objectives are usually aggregated to one over-all objective function. Optimization is then conducted with one optimaldesign as the result. Another way of handling the problem of multiple ob-jectives is to employ the concept of Pareto optimality. The outcome froma Pareto optimization is a set of Pareto optimal solutions, which visualizesthe tradeoff between the objectives. In order to choose the final design,the decision-maker then has to trade the competing objectives against eachother.

General design problems also consists of a mixture of determining bothcontinuous parameters as well as selecting individual components from cat-alogs or databases. Thus, an optimization strategy suited for engineeringdesign problems has to be able to handle a mixture of continuous parame-ters as well as discrete selections of components from catalogs.

This chapter continues with describing a nomenclature for the generalmulti-objective design problem. Thereafter, multi-objective genetic algo-rithms are discussed and the proposed multi-objective struggle GA is de-scribed together with the genetic operators used. The optimization methodis then connected to the HOPSAN simulation program and applied to sup-port the design of two concepts for hydraulic actuation. Thus, it is shownhow optimization could be employed in order to support concept selection.The simulation model is then extended to include component catalogs forvalves and cylinders. The optimization strategy is modified accordingly andthe problem is solved as a mixed discrete/continuous optimization problem.

20.2. The Multi-Objective Optimization Problem

A general multi-objective design problem is expressed by equation B. I ,

where fi(x), f2(x),..., fk(x) are the k objectives functions, (xi , x2, •••,xn)

486 Johan Andersson

are the n optimization parameters, and 5 6 Rn is the solution or parameterspace. Obtainable objective vectors, {F (x) \x € S} are denoted by Y. Y GRk is usually referred to as the attribute space, where dY is the boundaryof Y. For a general design problem, F is non-linear and multi-modal and Smight be defined by non-linear constraints containing both continuous anddiscrete member variables.

minF(x) = [/i(x),/2(x),. . ,/fc(x)]s.t. x e S (B.I)

x= (xi,x2,..;Xn)

The Pareto subset of dY is of particular interest to the rational decision-maker. The Pareto set is defined by equation (B.2). Considering a minimiza-tion problem and two solution vectors x,y € S, x is said to dominate y,denoted x >- y, if:

Vt e {1, 2,..., k} : fi (x) < ft (y) and 3j 6 {1, 2,.... k} : fj (x) < fj (y) (B.2)

If the final solution is selected from the set of Pareto optimal solutions,there would not exist any solutions that are better in all attributes. It isclear that any final design solution should preferably be a member of thePareto optimal set. If the solution is not in the Pareto optimal set, it couldbe improved without degeneration in any of the objectives, and thus it isnot a rational choice. This is true as long as the selection is done based onthe objectives only. The presented nomenclature is visualized in Figure 20.2below.

20.3. Multi-Objective Genetic Algorithms

Genetic algorithms are modeled after mechanisms of natural selection. Eachoptimization parameter (xn) is encoded by a gene using an appropriaterepresentation, such as a real number or a string of bits. The correspondinggenes for all parameters Xi, ..xn form a chromosome capable of describingan individual design solution. A set of chromosomes representing severalindividual design solutions comprise a population where the most fit areselected to reproduce. Mating is performed using crossover to combine genesfrom different parents to produce children. The children are inserted into thepopulation and the procedure starts over again, thus creating an artificialDarwinian environment. For a general introduction to genetic algorithms,see work by Goldberg8.


Fig. 20.2. Solution and attribute space nomenclature for a problem with two designvariables and two objectives.

When the population of an ordinary genetic algorithm is evolving, itusually converges to one optimal point. It is however tempting to adjustthe algorithm so that it spreads the population over the entire Pareto op-timal front instead. As this idea is quite natural, there are many differenttypes of multi-objective genetic algorithms. For a review of genetic algo-rithms applied to multi-objective optimization, readers are referred to thework done by Deb3. Literature surveys and comparative studies on multi-objective genetic algorithms are also provided by several other authors, seefor example Coello4, Horn10 and Zitzler and Thiele15.

20.3.1. The Multi-Objective Struggle GA

In this paper the multi-objective struggle genetic algorithm (MOSGA)1'3 isused for the Pareto optimization. MOSGA combines the struggle crowdinggenetic algorithm presented by Grueninger and Wallace9 with Pareto-basedranking as devised by Fonseca and Fleming7. As there is no single objec-tive function to determine the fitness of the different individuals in a Paretooptimization, the ranking scheme presented by Fonseca and Fleming is em-ployed, and the "degree of dominance" in attribute space is used to rankthe population. Each individual is given a rank based on the number ofindividuals in the population that are preferred to it, i.e. for each individ-ual the algorithm loops through the whole population counting the numberof preferred individuals. "Preferred to" is implemented in a strict Paretosense, according to equation (B.2), but one could also combine Pareto op-timality with the satisfaction of objective goal levels, as discussed in ref.7.The principle of the MOSGA algorithm is outlined below.

488 Johan Andersson

Step 1: Initialize the population.Step 2: Select parents using uniform selection, i.e. each individual

has the same probability of being chosen.Step 3: Perform crossover and mutation to create a child.Step 4: Calculate the rank of the new child.Step 5: Find the individual in the entire population that is most

similar to the child. Replace that individual with thenew child if the child's ranking is better, or if the childdominates it.

Step 6: Update the ranking of the population if the child hasbeen inserted.

Step 7: Perform steps 2-6 according to the population size.Step 8: If the stop criterion is not met go to step 2 and start

a new generation.

Step 5 implies that the new child is only inserted into the population ifit dominates the most similar individual, or if it has a lower ranking, i.e. alower "degree of dominance". Since the ranking of the population does notconsider the presence of the new child it is possible for the child to dominatean individual and still have the same ranking. This restricted replacementscheme counteracts genetic drifts and is the only mechanism needed in orderto preserve population diversity. Furthermore, it does not need any specificparameter tuning. The replacement scheme also constitutes an extremeform of elitism, as the only way of replacing a non-dominated individual isto create a child that dominates it.

The similarity of two individuals is measured using a distance func-tion. The method has been tested with distance functions based upon theEuclidean distance in both attribute and parameter space. A mixed dis-tance function combining both the attribute and parameter distance hasbeen evaluated as well. The result presented here was obtained using anattribute based distance function.

An inherent property of the crowding method is the capability to iden-tify and maintain multiple Pareto fronts, i.e. global and local Pareto frontsin multi modal search spaces, see 1>2'3. In real world applications can onlyparts of the true problem be reflected in the formulation of the optimiza-tion problem. Therefore it is valuable to know about the existence of localoptima as they might posses other properties, such as robustness, that areimportant to the decision maker but not reflected in the objective functions.


In single objective optimization, niching techniques have been introducedin order to facilitate the identification of both global and local optima.

As can be seen from the description of the method there are no algorithmparameters that have to be set by the user. The inputs are only: popula-tion size, number of generations, genome representation and crossover andmutation methods, as in every genetic algorithm.

20.3.2. Genome Representation

The genome encodes design variables in a form suitable for the GA to oper-ate upon. Design variables may be values of parameters (real or integer) orrepresent individual components selected from catalogs or databases. Thus,the genome is a hybrid list of real numbers (for continuous parameters), in-tegers and references to catalog selections, see Figure 20.3.

A catalog could be either a straight list of elements, or the elementscould be arranged in a hierarchy. Each element of a catalog represents anindividual component. The characteristics of catalogs would be discussedfurther on and exemplified by the design example.

Real numbers Catalog selections

4.237 6.87e-3 12 37I I \ I

12th element, 1st catalog

Fig. 20.3. Example of the genome encoding. The first two elements represent real vari-ables and the last two elements catalog selections.

20.3.3. Similarity Measures

Speciating GAs require a measure of likeness between individuals, a socalled similarity measure. The similarity measure is usually based on a dis-tance function that calculates the distance between two genomes. The simi-larity could be based on the distance in either the attribute space (betweenthe objectives), the phenotype space (between the design parameters) orthe genotype space (in the genome encoding). As direct encoding is used(not a conversion to a string of bits), a phenotype and a genotype dis-tance function would yield the same result. It is shown in references 3-2

490 Johan Andersson

that the choice between an attribute based and a parameter based distancefunction might have a great influence on the outcome of the optimization.To summarize; an attribute space distance measure gives a fast and pre-cise convergence on the global Pareto optimal front, whereas a parameterbased distance function does not converges as fast but has the advantageof identifying and maintaining both global and local Pareto optimal fronts.

20.3.3.1. Attribute Based Distance Function

One way of comparing two individual designs is to calculate their distance inattribute space. As we want the population to spread evenly on the Paretofront (in attribute space) it seems to be a good idea to use an attributebased distance measure. The distance between two solutions (genomes) inattribute space is calculated using the normalized Euclidean distance, seeequation C.I.

Distance , b) = , £ ( f ^ I f ) ' \ ^\ ^=1 \/imax Jimin/ K

Where fta and fib are the objective values for the ith objective for o andb respectively. fimax and fimin is the maximum and the minimum of the i:thobjective in the current population, and k is the number of objectives. Thus,the distance function will vary between 0, indicating that the individualsare identical, and 1 for the very extremes.

20.3.3.2. Phenotype Based Distance Function

Another way of calculating the distance between solutions is to use thedistance in parameter (phenotype) space. As the genome might be a hybridmixture of real numbers and catalog selections, we have to define differentdistance functions to work on different types of elements. The methodsdescribed here build on the framework presented by Senin et al.14. In orderto obtain the similarity between two individuals the distance between eachdesign variable is calculated. The overall similarity is then obtained bysumming up the distances for each design variable.

20.3.3.3. Real Number Distance

A natural distance measure between two real numbers is the normalizedEuclidean distance, see equation C.2.


Distance^, b) = J I ^ — I (C.2)y \max distance/

Where a and b are the values for the two real numbers and max distanceis the maximum possible distance between the two values (i.e. the searchboundaries).

20.3.3.4. Catalog Distance

Distance between two catalog selections could be measured through relativepositions in a catalog or a catalog hierarchy. The relative position is onlymeaningful if the catalog is ordered, see Figure 20.4.

Fig. 20.4. Examples of ordered and unordered catalogs.

The dimensionless distance between two elements within the same cat-alog is expressed by equation C.3 and exemplified in Figure 20.5.

^ . ,. pos(a) — pos(b) ._, .Distance^, b) = , (C.3)

max distance

Fig. 20.5. Distance evaluation for two elements of an ordered catalog.

492 Johan Andersson

For catalog hierarchies equation C.3 has to be generalized as exemplifiedin Figure 20.6. For elements belonging to the same sub-catalog, the distanceis evaluated using the relative position within that sub-catalog. Otherwise,the maximum length of the path connecting the different sub-catalog isused. This implies that for two given sub-catalogs an element in one catalogis equally distant from every element in the other catalog. The length ofthe path is calculated as the maximal distance within the smallest commonhierarchy. In both cases, the distance is normalized by dividing with themaximum distance (i.e. the catalog size).

Fig. 20.6. Exemplification of distances between different catalog elements in a hierar-chical catalog.

20.3.3.5. Overall Distance

So far, distance measures for individual design variables have been devel-oped. An overall distance measure for comparing two genomes is obtainedby aggregating the distances for the individual design variables, see equa-tion C.4.

Dist!ince(a,b) = tDiStanCe{DVi) (CA)i=i n

Where a and b are the two designs being compared, and n is the numberof design variables (DV) encoded by the genome. Thus, the phenotypedistance between two individual designs is calculated by summing up theindividual distances for each element of the genome.


20.3.4. Crossover Operators

As the genome is a hybrid mix of continuous variables and catalog se-lections, we define different operators to work on different type of ele-ments. Uniform crossover is used, which implies that each element of thefathers genome is crossed with the corresponding element from the mothersgenome. For real numbers BLX crossover6 is used, see exemplification inFigure 20.7. For catalog selections, an analog crossover scheme is employedas illustrated in Figure 20.8.

Fig. 20.7. The outcome of a BLX crossover between two real numbers a and b is ran-domly selected from an interval of width 2d centered on the average M.

Fig. 20.8. An exemplification of the catalog crossover. The outcome of a crossover ofindividuals within the same catalog (a and b) are randomly selected from the intervalbetween them. For individuals from different sub-catalogs (c and d) the outcome israndomly selected within the smallest common hierarchy.

494 Johan Andersson

20.4. Fluid Power System Design

The objects of study are two different concepts of hydraulic actuation sys-tems. Both systems consist of a hydraulic cylinder that is connected to amass of 1000 kilograms. The objective is to follow a pulse in the positioncommand with a small control error and simultaneously obtain low energyconsumption. Naturally, these two objectives are in conflict with each otheras low control error implies large acceleration which consumes more energy.The problem is thus to minimize both the control error and the energyconsumption from a Pareto optimal perspective.

Fig. 20.9. The valve concept for hydraulic actuation.

Two different ways of controlling the cylinder are studied. In the firstmore conventional system, the cylinder is controlled by a directional valve,which is powered from a constant pressure system. In the second concept,the cylinder is controlled by a servo pump. Thus, the systems have differentproperties. The valve concept has all that is required for a low controlerror, as the valve has a small mass and thus very high bandwidth. Onthe other hand, the valve system is associated with higher losses, as thevalve constantly throttles fluid to the tank. The different concepts havebeen modeled in the simulation package HOPSAN, see ref.11. The systemmodels are depicted in Figures 20.9 and 20.10 respectively.


The models of each component of the systems consist of a set of algebraicand differential equations considering effects such as friction, leakage andnon-linearities as for example limited stroke distances and stroke speeds.HOPSAN uses a distributed simulation technique where each componentcontains its own numerical solver. The components are then connected us-ing transmission line elements as describes in ref.11. The distributed sim-ulation technique has the advantage that the components are numericallyseparated from each other which promotes stability. Furthermore, the com-putational time grows linearly with the size of the problem, which is nottrue for centralized solvers. The HOPSAN simulation software could befreely downloaded from the web.

The valve system consists of the mass and the hydraulic cylinder, thedirectional valve and a p-controller to control the motion. The directionalvalve is powered by a constant pressure pump and an accumulator, whichkeeps the system pressure at a constant level. The optimization parametersare the sizes of the cylinder, valve and the pump, the pressure level, thefeedback gain. Furthermore, a leakage parameter is added to both systemsin order to guarantee sufficient damping. Thus, this problem consists of sixoptimization parameters and two objectives.

Fig. 20.10. The pump concept of hydraulic actuation.

The pump concept contains fewer components: the cylinder and themass, the controller and the pump. A second order low-pass filter is addedin order to model the dynamics of the pump. The pump system consistsof only four optimization parameters. The performance of a relatively fast

496 Johan Andersson

pump system is depicted in Figure 20.11.

Fig. 20.11. Typical pulse response for a pump system.

20.4.1. Optimization Results

Both systems where optimized in order to simultaneously minimize the con-trol error /i and the energy consumption /2. The control error is obtainedby integrating the absolute value of the control error and adding a penaltyfor overshoots, see equation D.I. The energy consumption is calculated byintegrating the hydraulic power, expressed as the pressure times the flow,see equation D.2

4 / 2 4 \

h = f \Xref - X\dt + a l I (X> Xref)dt + f {x < Xref)dt (D.I)0 \0 2 /

4

h = {Qpump • Ppump)dt (D.2)

0

The optimization is conducted with a population size of 30 individualsover 200 generations. The parameters are real encoded and BLX crossoveris used to produce new offspring, and the Euclidean distance in attributespace was used as the similarity measure.


As a Pareto optimization searches for all non-dominated individuals,the final population will contain individuals with a very high control error,as they have low energy consumption. It is possible to obtain an energyconsumption close to zero, if the cylinder does not move at all. However,these solutions are not of interest, as we want the system to follow thepulse. Therefore, a goal level/constraint on the control error is introduced.The optimization strategy is modified so that solutions below the goal levelon the control error are always preferred to solutions that are above itregardless of their energy consumption. In this manner, the population isfocused on the relevant part of the Pareto front.

The obtained Pareto optimal fronts for both systems are depicted inFigure 20.12. In order to achieve fast systems, and thereby low control er-rors, large pumps and valves are chosen by the optimization strategy. Alarge pump delivers more fluid, which enables higher speed of the cylin-der. However, bigger components consume more energy, which explains theshape of the Pareto fronts.

Fig. 20.12. Pareto fronts showing the trade-off between energy consumption and controlerror for the two concepts. The graph on the right shows a slow pulse response, whereasthe graph on the left shows a fast pulse response.

498 Johan Andersson

When the Pareto fronts for different concepts are drawn within the samegraph, as in Figure 20.12, an overall Pareto optimal front could be obtainedby identifying the non-dominated set from all Pareto optimal solutions ob-tained. It is then evident that the final design should preferably be on theoverall Pareto front, which elucidates when it is rational to switch betweenconcepts. The servo pump system consumes less energy and is preferred ifa control error larger than 0.05 ms is acceptable. The servo valve system isfast but consumes more energy. If a lower control error than 0.05 ms is de-sired, the final design should preferably be a servo valve system. In order tochoose the final design, the decision-maker has to select a concept and thenstudy the trade-off between the control error and the energy consumptionand select a solution point on the Pareto front.

This application shows how Pareto optimization can be employed tosupport concept selection, by visualizing the pros and cons of each concept.

20.5. Mixed Variable Design Problem

Real design problems usually show a mixture of determining continu-ous parameters as well as selecting existing components from catalogs ordatabases, see Senin et al.14. Therefore, the multi-objective genetic algo-rithm has been extended to handle a mixture of continuous variables as wellas discrete catalog selections. The object of study for the mixed variabledesign problem is the valve actuation system depicted in Figure 20.9.

The objective is again to design a system with good controllability, butthis time at low cost. When designing the system, cylinders and valves areselected from catalogs of existing components. To achieve good controlla-bility we can choose a fast servo valve, which is more expensive than aslower proportional valve. Therefore, there is a trade-off between cost andcontrollability. The cost for a particular design is composed of the costfor the individual components as well as the cost induced by the energyconsumption.

Other parameters such as the control parameter, the leakage coefficientand the pump size have to be determined as well. Thus the problem is multi-objective with two objectives and five optimization variables, of which twoare discrete catalog selections and three are continuous variables. For thisoptimization the pressure level is not an optimization parameter, as it isdetermined by the choice of the cylinder.


20.5.1. Component Catalogs

For the catalog selections, catalogs of valves and cylinders have been in-cluded in the HOPSAN simulation program. For the directional valve, thechoice is between a slow but inexpensive proportional valve or an expensiveand fast servo valve. Valves from different suppliers have been arranged intwo ordered sub-catalogs as depicted in Figure 20.13. The same structureapplies to the cylinders as they are divided into sub-catalogs based on theirmaximum pressure level. The pressure in the system has to be controlled sothat the maximum pressure for the cylinder is not exceeded. A low-pressuresystem is cheaper but has inferior performance compared to a high-pressuresystem. Each catalog element contains a complete description of that par-ticular component, i.e. the parameters that describe the dynamics of thecomponent, which is needed by the simulation model as well as informationon cost and weight etc.

Fig. 20.13. The catalog of directional valves is divided into proportional valves andservo valves. Each sub-catalog is ordered based on the valve size. For each component, aset of parameters describing the component is stored together with information on costand weight.

20.5.2. Optimization Results

The system has been optimized using a population of 40 individuals and400 generations. In order to limit the Pareto front a goal level on the controlerror was introduced for this problem as well. The result could be dividedinto three distinct regions depending on valve type and pressure level, seeFigure 20.14.

As can be seen from Figure 20.14, there is a trade-off between sys-tem performance (control error) and system cost. By accepting a highercost, better performance could be achieved. The cheapest designs consist

500 Johan Andersson

Fig. 20.14. Optimization results. In (a) the obtained Pareto optimal front is shown inthe objective space. Different regions have been identified based on valve and cylinderselections, which is shown in the parameter space in (b).

of small proportional valves and low-pressure cylinders. By choosing largerproportional valves and high-pressure cylinders, the performance could beincreased at the expense of higher cost. If a still better performance is de-sired, a servo valve has to be chosen, which is more expensive but has betterdynamics.

The continuous parameters, such as the control parameter, tend tosmoothen out the Pareto front. For a given valve and cylinder, differentsettings on the continuous parameters affect the pulse response. A fasterresponse results in a lower control error, but also a higher energy consump-tion and thereby higher cost. Therefore, there is a local trade-off betweencost and performance for each catalog selection.

20.6. Discussion and Conclusions

Modelling and simulation are very powerful tools that could support theengineering design process and facilitate better and deeper understandingof the systems being developed. When connecting an optimization strat-egy to the simulation model the knowledge acquisition process could besped-up further as the optimization searches through the simulation modelin an efficient manner. Furthermore, the optimization frequently identifiesloopholes and shortcomings of the model since it is unbiased in its searchfor an optimal design. As a system designer, or model developer, it is hardto conduct such a thorough inspection of the model as the optimizationdoes. Thus even more information could be gathered out from the simula-


tion models if combined with an optimization strategy. As has been shownin this chapter optimization also elucidates how the preferences among theobjectives impact the final design. Thus optimization facilitates the under-standing of the system being developed as well as our expectations on thesystem and our priorities among the objectives.

In this chapter the multi-objective struggle genetic algorithm is con-nected to the HOPSAN simulation program in order to support the designof fluid power systems. The method has been applied to two concepts of hy-draulic actuation systems; a valve-controlled system and a pump controlledsystem, which has been modeled in the HOPSAN simulation environment.Both systems were optimized in order to minimize the control error andthe energy consumption. Naturally, these two objectives are in conflict witheach other, and thus the resulting Pareto fronts visualize the trade-off be-tween control error and energy consumption for each concept. The existenceof the trade-off was known beforehand, but with support of the Pareto opti-mization the trade-off could be quantified and the performance for designsat different regions on the Pareto front could be visualized in order to pointout the effects of the trade-off.

When the Pareto optimal fronts for different concepts are drawn in thesame graph, the advantages of the concepts are clearly elucidated. An over-all Pareto optimal front could be obtained by identifying the non-dominatedset from all Pareto optimal fronts. The rational choice is naturally to selectthe final design from this overall Pareto optimal set. Thus the decision-maker is advised which concept to choose depending on his or her pref-erences, and hence Pareto optimization could be a valuable support forconcept selection. In this application it was recognized that the conceptshad different properties, i.e. one concept is faster but consumes more energy,but it was not known under which preferences one concept were better thanthe other, i.e. when the Pareto fronts intersected. Therefore, Pareto opti-mization contributed to elucidate the benefits of the different concepts. Theconception of an overall Pareto front and thereby the support for conceptselection is one of the main contributions of this chapter.

Subsequently, the method has been extended to handle selection of in-dividual components from catalogs, and thus the problem is transformedto a mixed discrete/continuous optimization problem. Component catalogshave therefore been added to the simulation program where each cata-log element contains all data needed by the simulation program as wellas properties such as cost and weight. Furthermore, the GA has been ex-tended with genomes with the ability to represent hierarchical catalogs as

502 Jokan Andersson

well as operators for similarity measures and crossover between catalogs

elements. The valve-controlled system was again optimized resulting in a

discrete Pareto front that visualizes the trade-off between system cost and

system performance based on discrete selections of valves and cylinders.

For future work, the catalogs could be exchanged for databases, where

each element could be extended to contain the entire simulation model for

a particular component. These models could either be made by the system

designer, or be provided by the supplier, in such a form that proprietary in-

formation is not jeopardized. In this way, the supplier does not only supply

a component for the final system, but also the simulation model describ-

ing the component. Furthermore, optimization is transformed from being a

system model operator to a system model creator.

References

1. Andersson J., Multiobjective Optimization in Engineering Design - Appli-cations to Fluid Power Systems, Dissertation, Linkoping studies in scienceand Technology, Dissertation No. 675, Linkoping University, Linkoping, Swe-den, 2001.2. Andersson J. and Krus P., "Multiobjective Optimization of Mixed Vari-able Design Problems", in Proceedings of 1st International Conference onEvolutionary Multi Criteria Optimization, Zitzler E et al. (editors), Springer-Verlag, Lecture Notes in Computer Science No. 1993, pp 624-638, 2001.3. Andersson J. and Wallace D., "Pareto optimization using the strugglegenetic crowding algorithm", Engineering Optimization, Vol. 34, No. 6 pp.623-643, 2002.4. Coello Coello C, An empirical study of evolutionary techniques for mul-tiobjective optimization in engineering design, PhD thesis, Department ofComputer Science, Tulane University, 1996.5. Deb K., Multi-objective Objective Optimization using Evolutionary algo-rithms, Wiley and Sons Ltd, 2001.6. Eshelman L. J. and Schaffer J. D., "Real-Coded Genetic Algorithms andInterval-Schemata," in Foundations of Genetic Algorithms 2, L. D. Whitley,Ed., San Mateo, CA, Morgan Kaufmann, pp. 187-202, 1993.7. Fonseca C. M. and Fleming P. J., "Multiobjective optimization and mul-tiple constraint handling with evolutionary algorithms - Part I: a unifiedformulation," IEEE Transactions on Systems, Man, & Cybernetics Part A:Systems & Humans, vol. 28, pp. 26-37, 1998.8. Goldberg D. E., Genetic Algorithms in Search and Machine Learning, Ad-dison Wesley, Reading, 1989.9. Grueninger T. and Wallace D., "Multi-modal optimization using geneticalgorithms," Technical Report 96.02, CADlab, Massachusetts Institute ofTechnology, Cambridge, 1996.10. Horn J., "Multicriterion decision making," in Handbook of evolutionary


computation, T. Back, D. Fogel, and Z. Michalewicz, Eds., IOP PublishingLtd and Oxford University Press, pp. F1.9:l - F1.9:15, 1997.11. Jansson A., Krus P., Hopsan -a Simulation Package, User's Guide, Tech-nical Report LITHIKPR-704, Dept. of Mech. Eng., Linkoping University,Sweden, 1991. http://hydra.ikp.liu.se/hopsan.html.12. Pahl G. and Beitz W., Engineering Design - A Systematic Approach,Springer-Verlag, London, 1996.13. Roozenburg N. and Eekels J., Product Design: Fundamentals and Meth-ods, John Wiley & Sons Inc, 1995.14. Senin N., Wallace D. R., and Borland N., "Mixed continuous and discretecatalog-based design modeling and optimization," in Proceeding of the 1999CIRP International Design Seminar, U. of Twente, Enschede, The Nether-lands, 1999.15. Zitzler E. and Thiele L., "Multiobjective Evolutionary Algorithms: AComparative Case Study and the Strength Pareto Approach," IEEE Trans-action on evolutionary computation, vol. 3, pp. 257-271, 1999.

CHAPTER 21

ELIMINATION OF EXCEPTIONAL ELEMENTS INCELLULAR MANUFACTURING SYSTEMS USING

MULTI-OBJECTIVE GENETIC ALGORITHMS

S. Afshin Mansouri

Industrial Engineering DepartmentAmirkabir University of TechnologyP.O.Box 15875-4413, Tehran, IranE-mail: [email protected]

Cellular manufacturing is an application of group technology in orderto exploit similarity of parts processing features in improvement of theproductivity. Application of a cellular manufacturing system (CMS) isrecommended for the mid-volume, mid-variety production environmentswhere traditional job shop and flow shop systems are not technicallyand/or economically justifiable. In a CMS, collections of similar parts(part families) are processed on dedicated clusters of dissimilar machinesor manufacturing processes (cells). A totally independent CMS with nointercellular parts movement can rarely be found due to the existenceof exceptional elements (EEs). An EE is either a bottleneck machineallocated in a cell while is being required in the other cells at the sametime, or a part in a family that requires processing capabilities of ma-chines of the other cells. Despite the simplicity of production planningand control functions in a totally independent CMS, such an indepen-dence cannot be achieved without machine duplication and/or part sub-contracting, which have their own side effects. These actions deterioratesome other performance aspects of the production system regarding cost,utilization and workload balance. In this chapter, tackling the EEs in aCMS is formulated as a multi-objective optimization problem (MOP) tosimultaneously take into account optimization of four conflicting objec-tives regarding: intercellular movements, cost, utilization, and workloadbalance. Due to the complexity of the developed MOP, neither exactoptimization techniques nor total enumeration are applicable for largeproblems. For this, a multi-objective genetic algorithm (MOGA) solu-tion approach is proposed, which makes use of the non-dominated sortingidea in conjunction with an elitism scheme to provide the manufacturingsystem designers with a set of near Pareto-optimal solutions. Application

505

506 S. Afshin Mansouri

of the model and the solution approach in a number of test problemsshow its suitability for real world instances.

21.1. Introduction

The majority of manufacturing industries employ three basic designs toorganize their production equipments. These include: job shop, flow shopand cellular designs. A job shop process is characterized by the organizationof similar equipment by function (such as milling, drilling, turning, forging,and assembly). As jobs flow from work centre to work centre, or departmentto department, a different type of operation is performed in each centre ordepartment. Orders may flow similar or different paths through the plant,suggesting one or several dominant flows. The layout is intended to supporta manufacturing environment in which there can be a great diversity of flowamong products. Fig. 21.1 depicts a job shop design.

Fig. 21.1. A job shop design.

The flow shop is sometimes called a product layout because the productsalways flow the same sequential steps of production. Fig. 21.2 shows atypical flow shop system.

In a cellular manufacturing system, machines are divided into manufac-

Cellular Manufacturing Systems Using Multi-Objective Genetic Algorithms 507

Fig. 21.2. A flow shop design.

turing cells, which are in turn dedicated to process a group of similar partscalled part family. Cellular manufacturing strives to bring the benefits ofmass production to high variety, medium-to-low volume quantity produc-tion. It has several benefits such as reduced material handling, work-in-process inventory, setup time and manufacturing lead time, and simplifiedplanning, routing and scheduling activities. Fig. 21.3 shows a cellular con-figuration.

Each of the above-mentioned systems has its own rational range of ap-plication. Fig. 21.4 illustrates relative position of these systems in terms ofproduction volume and product variety.

Identification of part families and machine groups in the design of a CMSis commonly referred to as cell design/formation. Many solution approachesfor cell design problem have been proposed over the last three decades.Mansouri et al.1 and Offodile et al.2 provide comprehensive reviews of theseapproaches.

There are occasions where all of the machines/parts cannot be exclu-sively assigned to a machine cell/part family. These are known as Excep-tional Elements (EEs).The EEs cause a number of problems in the operationof CMSs, e.g. intercellular part movements and unbalance of the workloadacross the cells. Dealing with the EEs has also been a subject of research.


Fig. 21.3. A cellular design.

Fig. 21.4. Relative position of the three manufacturing systems.

For instance, Logendran and Puvanunt3, Moattar-Husseini and Mansouri4,Shafer et al.5, Sule6 and Seifoddini7 develop solution approaches for thisproblem. Fig. 21.5(a) demonstrates initial machine part incidence matrixin a 4 machines, 6 parts problem. A "1" entry in the matrix indicates thatthere is a relationship between the associated parts and machines, i.e. the


part requires that particular machine in its process route. In Fig. 21.5(b),the sorted matrix is shown along with a decomposition scheme which sep-arates all the machines and parts in two interdependent cells as: Cell 1:{(M2, Ml), (PI, P3, P6)}, and Cell 2: {(M4, M3), (P2, P4, P5)} where Mand P stands for Machine and Part, respectively. There are two exceptionalparts (PI and P4) and two exceptional (bottleneck) machines (M2 and M4)in the CMS proposed in Fig. 21.5(b).

Parts Parts1 2 3 4 5 6 1 3 6 2 4 5

«, 1 1 1 1 « 2 I 1 1 l l 1I 2 1 1 1 1 | 1 [ i l l| 3 1 1 1 J 4 1 1 1 1

4 1 1 1 1 3 1 1 1

. . . . . . . b. A decomposition scheme on the sorteda. Initial matrix , .matrix

Fig. 21.5. Initial and final machine-part incidence matrixes.

In deciding on which parts to subcontract and which machines to du-plicate, one should take into account the associated side effects, e.g. costincrement and utilization decrement. Any effort to decrease intercellularpart movements (as a performance measure) may degrade other measuresof performance. In other words, there are multiple objectives to be consid-ered in tackling the EEs. Hence it can be formulated as a multi-objectiveoptimization problem (MOP).

In this chapter, a MOP model is introduced for dealing with the EEs in aCMS and a MOGA-based solution approach to find locally non-dominatedor near Pareto-optimal solutions. The remaining sections are organized asfollows. An overview of multi-objective optimization is given in Section2. Section 3 formulates the problem of dealing with the EEs as a MOPmodel. The developed MOGA-based solution approach for the model isintroduced in Section 4, and subsequently its parameters are set in Section5. Experiments on a number of test problems are conducted in Section 6.Finally concluding remarks are summarized in Section 7.

510 5. Afshin Mansouri

21.2. Multiple Objective Optimization

A MOP can be defined as determining a vector of design variables withina feasible region to minimize a vector of objective functions that usuallyconflict with each other. Such a problem takes the form:

Minimize {/i(X),/2(X), ...,/m(X)} subject to g(X) < 0 (B.I)

where X is vector of decision variables; /i(X) is the ith objective func-tion; and g(X) is a constraint vector. Usually, there is no single optimalsolution for B.I, but rather a set of alternative solutions. These solutionsare optimal in the wider sense that no other solutions in the search spaceare superior to them when all objectives are considered. A decision vectorX is said to dominate a decision vector Y (also written as X >- Y) iff:

/i(X) < fi(Y) for all i E {1, 2, . . . , m};and (B.2)

/i(X) -< fi(Y) for at least one i £ {1, 2, . . . , m} (B.3)

There are various solution approaches for solving the MOP. Among themost widely adopted techniques are: sequential optimization, e-constraintmethod, weighting method, goal programming, goal attainment, distancebased method and direction based method. For a comprehensive study ofthese approaches, readers may refer to Szidarovszky et al.s.

Evolutionary algorithms (EAs) seem particularly desirable to solvemulti-objective optimization problems because they deal simultaneouslywith a set of possible solutions (the so-called population) which allows tofind an entire set of Pareto-optimal solutions in a single run of the algo-rithm, instead of having to perform a series of separate runs as in the case ofthe traditional mathematical programming techniques. Additionally, EAsare less susceptible to the shape or continuity of the Pareto-optimal frontier,whereas these two issues are a real concern for mathematical programmingtechniques. However, EAs usually contain several parameters that need tobe tuned for each particular application, which is in many cases highlytime consuming. In addition, since the EAs are stochastic optimizers, dif-ferent runs tend to produce different results. Therefore, multiple runs ofthe same algorithm on a given problem are needed to statistically describetheir performance on that problem. These are the most challenging issueswith using EAs for solving MOPs. For detail discussion on application ofEAs in multi-objective optimization see Coello et al.9 and Deb10.


21.3. Development of the Multi-Objective Model forElimination of EEs

21.3.1. Assumptions

It is assumed that part subcontracting and machine duplication are twopossible alternatives for the elimination of EEs from a CMS as proposedby Shafer et al.5. It is also assumed that partial subcontracting is not al-lowed. In other words, the whole demand to a part should be supplied bysubcontractors once it has been decided to subcontract it.

21.3.2. The Set of Decision Criteria

The following set of criteria is considered for development of the MOPmodel:

• Minimizing intercellular parts movements,• Minimizing total cost of machine duplication and part subcontract-

ing,• Minimizing under-utilization of machines in the system, and• Minimizing deviations among the levels of the cells' utilization.

Among the above-mentioned objectives, minimizing intercellular partsmovement is of special importance as it is the key factor to make cells in-dependent. However any effort to reduce intercellular parts movement bymeans of machine duplication and part subcontracting, increases cost, dete-riorates overall utilization of machinery, and imbalances levels of utilizationamong the cells. The other objectives are considered to overcome these sideeffects.


21.3.3.1. Notation

Set of indices

i :Index for machine types, i=l,...,mj :Index for part types, j=l,...,pk : Index for cells, k=l,...,c

Decision variables


Two binary decision variables are defined to formulate the problem as :Xj = 1 if part j is subcontracted and Xj — 0 otherwise; y ^ = 1 if machinei is duplicated in cell k and yi^ = 0 otherwise.

Set of parameters

Dj-. annual demand for part j ;SJ: incremental cost of subcontracting a unit of part j ;Uj: processing time of a unit of part j on machine i;PM jti : number of intercellular transfers required by part j as aresult of machine type i not being available within the part's man-ufacturing cell;Mi : annual cost of acquiring an additional machine i;CMi : annual machining capacity of each unit of machine i (min-utes);HFk'- set of parts assigned to cell k;MCk • set of machines assigned to cell k;GFk'- set of parts assigned to the cells other than k while requiringsome of the machines in cell k;BMk • set of the bottleneck machines required by the parts in cellk;EP\. : set of exceptional parts in cell k;EM j : set of bottleneck machines required by the exceptional part

3\CSk '• number of machines assigned to cell fe;MCS : maximum cell size;c : number of cells;UCk '• utilization of cell A;; andOU : overall utilization of the CMS.

21.3.3.2. The Objective Functions

We define the solution vector X = (XJ'S , j/i^'s) which consists of binarydecision variables. The objectives considered for dealing with the EEs aredescribed as follows:

Objective 1: minimizing intercellular parts movement

Intercellular movement of parts is one of the major problems associ-ated with the EEs in a CMS, which complicates production and inventory


management functions. Minimization of the intercellular parts movementis sought through the following objective function:

A(X)= £ £ (1-*;) * ( £ (PMjti x (1-ifc,*))) (C.I)*=1 j€EPk \iEEMj J

Objective 2: minimizing total cost of machine duplication and partsubcontracting

Any reduction in intercellular parts movement by machine duplicationand / or part subcontracting will result in cost increment. Hence minimiza-tion of total part subcontracting and machine duplication cost is includedin the model as follows:

/ 2 (X) = £ £ [(DjX S3 x Xj) + £ (Mi x j,i>fc) J (C.2)k jeEPk \ i£EMj J

Objective 3: minimizing overall machine under-utilization

Since machine duplication and / or part subcontracting deteriorates ma-chinery utilization, minimization of the overall machines under-utilization,which is equivalent to maximization of overall utilization, is taken into ac-count employing the following objective function:

E UCk x \CSk+ Y. Vi,k\

MX) = l-OU = 1- V ^ ^ (C.3)

E icsk+ E vi,k)k=l \ i<=BMk j

where UCk's can be calculated as below:

E ( E (DjXU.i)- £ (DixU,jxxj)+ £ (DjXti.jxa-xj)))Jjp __ i€MCk \i£HFk j€EP t jeGF t J_

k ~ E CMi+ Y. (yi,kxCMi)i€MCk i£BMk

(C.4)Objective 4: minimizing deviations among utilization of the cells

514 S. AJshin Mansouri

Significant differences in the cells' utilization may result in major prob-lems in the managerial functions, e.g. different overtime payments to theoperators as a result of their differing workload. Hence the following ob-jective function is included to minimize deviations among the cells' level ofutilization:

E (uck - ouf/4(X) = k=l

c _ i (C.5)

According to Bowker and Liberman11, in calculating standard devia-tion of a small sample size N, the sum of the square differences from thesample mean should be divided by TV — 1 rather than N. That is why thedenominator in equation C.5 is c — 1 instead of c.

Among the above-mentioned objectives, objective 1 is of special impor-tance due to the fact that intercellular movements are the main cause of cellsinterdependencies. However any effort to reduce intercellular parts move-ment by means of machine duplication and part subcontracting, increasescost, deteriorates overall utilization of machinery, and imbalances levels ofutilization among the cells. Objectives 2, 3 and 4 have been included in themodel to overcome these side effects, respectively.

21.3.3.3. The Constraints

The solution space is restricted by the following constraints:

[CSk + ] T Vi>k) < MCS,k = l,...,c (C.6)i£BMk

xe{o,i} (C.7)

Constraints C.6 prevent cell sizes from exceeding a pre-determined up-per bound. Relations C.7 restrict the decision variables to take either a '0'or '1 ' .

21.3.3.4. The Multi-Objective Optimization Problem (MOP)

The set of objectives and constraints stated above, constitute the MOP asfollows:

Minimize {A(X), /2(X), /3(X), /4(X)},


subject to: (C.6) and (C.7) (C.8)

21.3.4. A Numerical Example

Consider the problem shown in Fig. 21.5(a) and the CMS proposed inFig. 21.5(b). There are 4 machines and 6 parts grouped in 2 cells alongwith 4 exceptional elements. The values of Utj,Dj,Sj and M, are given inTable 21.84, wherein the numbers in the incident matrix represent tij's. Itis also assumed that CMi =210000 minutes for all machines. The upperbound for the cell sizes is assumed to be 3. The data sets for the two cellsproposed by Fig. 21.5(b) are as shown in Table 21.84.

Table 21.84. The values of Uj, Dj, Sj and M{ for the example.

Parts

1 2 3 4 5 6 M,

1 8 7 1 14000

| 2 7 3 2 8 6000

| 3 10 6 5 19000

4 9 3 1 9 7000

Dj 10000 7000 8000 1000 4000 2000

Sj 3 3 1 4 2 3

Cell 1:HFl = {PI, P3, P6}, MCX = {M2, Ml}, GF1 = {P4}, BM1 = {M4}, EPX

= {PI}, EMX ={M4}, C5i = 2, PM1A = £>i = 10000.

Cell 2:HF2 = {P2, P4, P5}, MC2 = {M4, M3}, GF2 = {PI}, BM2 = {M2}, EP2

= {P4}, EMi = {M2}, CS2 = 2, FM4,2 = D4 = 1000.

The decision vector for this problem is X = {a;i,£4,2/4,1,2/2,2}- Theobjective functions and constraints of the problem are as follows:

Minimize {^(X), /2(X), /3(X), /4(X)} (C.9)

where:


MX) = ((10000) x(l-Xl)x (l-t/4,1)) + ((1000) x (1 - Xi) x (l-y2,2))

(CIO)

/2(X) = (10000) x (3) x ( n ) + (7000) x (2/4,1) + (c m

(1000) x (1) x (i4) + (6000) x (2/2,2) l • ;

/3(X) = 1-OU (C.12)

/4(X) = ( ( t /d -OC/ ) 2 + (UC2-OU)2) (C.13)

where:

0 [ / = ((UCi)x(2 + y4A) + (t7C2)x(2 + y2,2))

((2 + 2/4,1) + (2 + 2/2,2)) l ' j

^ 1 = 210000+210000+210000 X(j/4,l) X

((10000) x (7 + 8) + (8000) x (3 + 7) + (2000) x (8 + 1 ) - (C.15)(10000) x (7 + 8) x (zi) + (10000 x 2) x (1 - xA))

JJC 1 vu W 2 210000+210000+210000x(»/2,2)

((7000) x (10 + 3) + (1000) x (1 + 6) + (4000) x (9 + 5 ) - (C.16)(1000) x (1 + 6) x (x4) + (10000 x 9) x (1 - xi))

subject to:

(2 + 1/4,1) < 3 (C.17)

(2 + ?/2,2)<3 (C.18)

x e { o , i } (C.19)

In Table 21.85, the set of feasible solutions for the example are summa-rized along with their relative objective values. The utilization level of ineach cell as well as the overall utilization of the system, which are required


in the calculation of the objective values f$ and j \ , are also presented ascomplementary information.

The results show that X3dominates Xn or, X3 XXn. Moreover, X5 >-X2 y X6and X4 y (X8, X9, X12, X13, X15, X16). The set of the non-dominated or Pareto-optimal solutions for the example problem include:X i , X3 , X4, X5, X7, X10 and X14.

Table 21.85. Total solutions of the numerical example.

_ . . . Objective values Utilization levelsDecision vectors J

~~7/ I f2 I fs I 7, UC! I UC2 I OUX i = { 0 , 0 , 0 , 0 } 11000 0 0.41190 0.00010 0.595 0.581 0.588X 2 = { 0 , 0 , 0 , 1 } 10000 6000 0.52952 0.02248 0.595 0.387 0.470X 3 = { 0 , 0,1,0} 1000 7000 0.52952 0.01763 0.397 0.581 0.470X4 = {0,0, 1, 1} 0 13000 0.60794 0.00005 0.397 0.387 0.392X5 = {0, 1, 0,0} 10000 4000 0.42262 0.00034 0.590 0.564 0.577Xt '•'- {0, 1,0,1} 10000 10000 0.53810 O.O23S8 0.590 0.376 0.462X7 - { 0 , 1 , 1 , 0 } 0 11000 0.53810 0.01 I I ...194 0.564 0.462X s » { 0 , 1, 1, 1} JO 17000 0.61508 0.00015 n 394 0.376 0.385X, ={1,0 ,0 ,0} 1000 30000 0.69762 0.00827 0.238 0,367 0,302X ! O - { 1 , 0, 0, 1} 0 36000 0.75810 0.00002 0.238 0.244 0.242X(1 = {1,0, 1,0} .1000 37000 0.75810 0.02248 0.159 0.367 0.242Xt* = {l. 0, 1, 1} , 0 43000 0.79841 0.00367 0.159 0.244 0.202X n = {l, 1,0,0} d 34000 0.70833 O.00681 0.233 0.350 0.292XM = {1, 1, 0, 1} 0 40000 0.76667 10.00000 0.233 0.233 0.233X,5 = {1, 1, 1,0} 0 41000 0.76667 10.01966 0.156 0.350 0.233Xit = {l, 1, 1, 1 } | 0 j 47000 I U SO.viC O.iK'302 0 156 0.233 0.194

i=<4<»,: dominated solution

The solution space of the above example with 4 exceptional elementsconsists of 24 = 16 solutions. In general, the total number of solutionsfor a problem having n exceptional elements is equal to 2n. It is obviousthat the size of the solution space exponentially increases as the numberof exceptional elements increases. Hence application of exact optimizationtechniques is prohibitively expensive (computationally speaking) for largeproblems. For this, a MOGA-based solution approach is developed, whichis described in the subsequent section.

21.4. The Proposed MOGA

In simple GAs, a candidate solution is represented by a sequence of genesand is known as a chromosome. A chromosome's potential as a solution is


determined by its fitness function that evaluates a chromosome with re-spect to the objective function of the optimization problem at hand. Ajudiciously selected set of chromosomes is called a population and the pop-ulation at a given time is a generation. The problem size remains constantfrom generation to generation and has a significant impact on the perfor-mance of the GA. The mechanism of Gas generally operates on a generationthrough three main operators, i.e. (1) reproduction (selection of copies ofchromosomes according to their fitness value), (2) crossover (an exchange ofa portion of the chromosomes), and (3) mutation (a random modificationof the chromosome). The chromosomes resulting from these three opera-tions form the next generation's population. The process is then iteratedfor a desired number of times, usually up to the point where the systemceases to improve or the population has converged to a few well-performingsequences.

In order to apply genetic algorithms to the developed MOP in a problemwith n decision variables, a chromosomal structure consisting of n genes isconsidered. Each gene in the chromosome can take either a value of '0 or'1 ' that reflects value of its corresponding binary decision variable. Theobjective values are normalized so that they lie in the interval of 0 and 1by means of the following formula:

Fi = - ^ J - ,i = l,...,4 (D.I)î T Ji

where: Ft is the fitness value, / , is the objective value and C» is thenormalizing factor concerning the objective i.

21.4.1. Pseudocode for the Proposed MOGA

The following pseudocode details steps of the proposed MOGA:

Initialize Search ParametersRandomly Generate Initial Solutionsi = 1do

for j = 1 to Population SizeCalculate Dummy Fitness Value for chromosomeîf chromosome^ is Infeasible then

Let Dummy Fitness Value of chromosome^- to be 0end if

next j


doSelect chromosome to form Mating Pool using RSSWR-UE scheme

while (Members of Mating Pool are less than Population Size)Shuffle Mating PoolProduce offspring using: Reproduction, Crossover and Mutation operatorsfor j = 1 to Population Size

if chromosome^ is Infeasible thenLet Dummy Fitness Value of chromosome j to be 0

end ifnext ji = i + 1

while ((i <Max Generations) and (Successive Nondominated Frontiers isless than Min. Successive Nondominated Frontiers))Report Resultant Nondominated Frontier

It should be noted that Successive Nondominated Frontiers refers tothe number of successive generations in which all non-dominated solutionsof the current generation have remained non-dominated when comparedagainst the non-dominated frontiers of previous generations. Some steps ofthe algorithm are discussed in more details in the following sub-sections.

21.4.2. Fitness Calculation

The fitness values are calculated using the non-dominated sorting methodof Srinivas and Deb12. The idea behind the non-dominated sorting proce-dure is that a ranking method is used to emphasize good solutions and aniche method is used to maintain stable subpopulations of good solutions.In this procedure, the population is ranked on the basis of an individual'snon-domination. The non-dominated individuals present in the populationare first identified from the current population. Then, all these individualsare assumed to constitute the first non-dominated frontier in the popula-tion and assigned a large Dummy Fitness Value. The same fitness value isassigned to give an equal reproductive potential to all these non-dominatedindividuals. To maintain diversity in the population, these classified indi-viduals are then shared with their dummy fitness values. Sharing is achievedby performing a selection operation using degraded fitness values that areobtained by dividing the original fitness value of an individual by a quantityproportional to the number of individuals around it. This causes multiplePareto-optimal solutions to co-exist in the population. After sharing, thesenon-dominated individuals are ignored temporarily to process the rest of


the population in the same way to identify individuals for the second non-dominated frontier. These non-dominated solutions are then assigned a newdummy fitness value that is kept smaller than the minimum shared dummyfitness of the previous frontier. This process is continued until the entirepopulation is classified into several frontiers.

21.4.3. Selection

For selection, a novel scheme, called RSSWR-UE was developed using theRemainder Stochastic Sampling Without Replacement in conjunction witha new Elitism operator. For details of this scheme, readers may refer toMansouri et al.13.

21.4.4. Recombination

All selected chromosomes in the Mating Pool are shuffled and then mutu-ally recombined, according to Crossover Rate, via single-point crossover,wherein the two selected parents are cut from a random point along theirlength into two sections. Section 1 of parent 1(2) attaching section 2 ofparent 2(1) form offspring 1(2). A small portion of genes in the populationare then mutated according to the Mutation Rate from "1" into "0" andvice versa through the mutation operator.

21.4.5. Updating the Elite Set

The updating mechanism of the Elite Set and how to keep its size fromexceeding an upper limit are important factors that affect the performanceof the MOGA. In the current MOGA, a niche mechanism is employed inupdating the elite set so that the diversity of the members of the set isimproved.

21.4.6. Stopping Criteria

The algorithm terminates as soon as either it converges to a robust non-dominated frontier or a predetermined number of generations have beencompleted. To realize if a robust non-dominated frontier is achieved, mem-bers of non-dominated frontiers of successive generations are mutuallycompared against each other. If individuals of the frontiers remain non-dominated for a predetermined number of generations, say Minimum Suc-cessive Nondominated Frontiers (Min. SNDF), then it could be assertedthat the obtained frontier is robust and hence the algorithm terminates.


21.5. Parameter Setting

In order to find a good set of parameters for the MOGA, two medium-sized problems with n — lOand n = 15 were selected from the literature.True non-dominated frontiers of these problems, found via total enumer-ation, were employed as the references for evaluation. Three measures forjudgment on the effectiveness of the set of parameters were used as follows:

• MPi: Quality of non-dominated solutions; ratio of true non-dominated solutions in the final non-dominated frontier of the al-gorithm.

• MP2: Diversity of solutions in the final non-dominated frontier,measured by the number of solutions in the frontier.

• MP3: CPU time.

The experiments were conducted in two stages. In the first stage, theparameters are examined individually, i.e. changing the value for a givenparameter while keeping values of the rest of parameters at a constantlevel. At each level, both problems were solved twenty times. Consideringthe three performance measures, an appropriate value for the given param-eter was selected according to the average of these runs. The examinedparameter was given this value and the procedure was repeated for anotherparameter until all parameters were assigned an initial value. Figures 21.6to 21.8 illustrate sample results of this stage for the test problems withn = 15.

Fig. 21.6. The effect of mutation rate on quality.

In order to examine the interaction between parameters, the parameterswere examined jointly in the second stage on problem set with n = 15. Apair of parameters was selected at the beginning with various combinationsof values. Twenty runs were conducted using each combination and the best


Fig. 21.7. The effect of mutation rate on diversity.

Fig. 21.8. The effect of mutation rate on CPU time.

value for one of them was selected. The best value for the other parameterwas then determined in the same way with a new parameter. The procedurewas iterated until all parameters were examined. The measure for select-ing appropriate values in this stage was multiplication of Quality (MPi)and Diversity (MP2), i.e. MPixMP2. This measure reflects the number oftrue non-dominated solutions found by the MOGA. Figures 21.9 and 21.10depict the effect of crossover rate in two joint tests.

Considering the result of the above experiments, the following pa-rameter set was found to be promising in terms of the three perfor-mance measures: Population Size=150, Crossover Rate=0.50, MutationRate=0.03, Niching Parameter=0.60, Min. Successive Nondominated Fron-tiers (Min. SNDF)=15, Elitism Prob.=1.00, Initial Transfer Prob.=0.10,Epsilon Niche=0.30, Elite Set Size=50 and Degrading Factor—0.80.

21.6. Experimentation

In order to evaluate the MOGA algorithm, data sets regarding 5 cellularmanufacturing systems in differing sizes were selected from the literature.Major characteristics of the test problems are presented in Table 21.86.


Fig. 21.9. The joint effect of crossover rate and min. SNDF.

The size of solution space associated with the test problems ranges from aspace having 210 = 1024 solutions to a space of 243 = 8.796xl012solutions.

Moreover maximum number of mutual comparisons required for total enu-meration of these problems, ranges from (210!)/[(2!)(210-2)!] = 523,776 to(243!)/[(2!)(243-2)!] = 3.869xl025 where complete enumeration is impossi-

ble. Table 21.86. Main characteristics of the test problems.

Number of:Problem Decision . „

. , . Machines Parts Cellsvariables

Venogupal and Narendran14 10 15 30 3Burbidge15 15 2 0 - 3 5 5Askinef al.16 30 12 19 3Seifoddini7 35 16 43 5Boe and Cheng17 43 20 35 4


Fig. 21.10. The joint effect of crossover rate and elite set size.

The algorithm was coded in C++ and implemented on a Pentium-II(Celeron) CPU at 333 MHZ with 64 MB of RAM under Windows 2000.

To evaluate quality (MPi) of the non-dominated frontier found by thealgorithm, a reference set for every problem was formed. In small to mediumsized problems, i.e. the problems with less than or equal to 15 decisionvariables, the reference sets were created through total enumeration. Con-cerning the large problems, i.e. the problems with more than 16 variables,where total enumeration was practically impossible, a refining scheme wasdevised to establish a set of near non-dominated frontiers. In the refiningscheme, successive runs of the MOGA were conducted, each run using arandomly selected set of parameters.

Non-dominated solutions of the first run were adopted as an initial refer-ence set. Adding non-dominated solutions of the next run into the referenceset, a mutual dominance check was performed between the old members ofthe set and the new entrants. Dominated solutions were removed and theremaining solutions formed the new reference set. This procedure was iter-ated 50 times for each problem and the final reference set was adopted for

Cellular Manufacturing Systems Using Multi- Objective Genetic Algorithms 525

later comparisons of the algorithm.Quality of each run (MPi) of an algorithm is then calculated by compar-

ing the final results against the corresponding reference sets. The diversitywas simply measured by the number of non-dominated solutions found andrepresented by MP2. CPU time (MP3) was also measured and used asthe third measure. Each problem was then solved 20 times by each algo-rithm. The average results of the 20 conducted test runs are presented inTable 21.87.

Table 21.87. The average results for the test problems.

Measures of performanceProblem

MPi MP2 MP3

VenogupalandNarendran14 0.983 30.6 18.9

Burbidge15 0.789 75.3 22.8

Askin etal.16 0.599 101.4 22.5

Seifoddim7 0.695 106.5 30.7

Boe and Cheng17 0.596 97.1 24.8

21.7. Conclusion

In this chapter a multi-objective optimization model was addressed alongwith a solution approach based on genetic algorithms to decide on whichparts to subcontract and which machines to duplicate in a CMS whereinsome exceptional elements exist. The set of objectives considered include:(1) minimizing inter-cellular movements, (2) minimizing total cost of ma-chine duplication and part subcontracting, (3) minimizing overall under-utilization of the cells, and (4) minimizing unbalance of the workloadsamong the cells. The proposed MOGA seeks for non-dominated or Pareto-optimal solutions to the aforementioned model.

Application of the MOGA was tested in a number of problems. It wasobserved that the MOGA is capable of producing good solutions in terms ofa three fold measure of effectiveness concerning quality, diversity and CPUtime. Simplicity of use besides its acceptable level of effectiveness in a shortamount of computation time are the key advantages of the proposed MOGAcompared to the exact optimization and total enumeration methods, whichare only applicable to small problem instances. This besides the fact that

526 S. Ajshin Mansouri

the majority of cellular manufacturing systems are being implemented insmall to medium enterprises (SMEs) where the application of sophisticatedoptimization schemes is impractical even for small problems, justifies morethe use of the MOGA for real world problems.

References

1. S. A. Mansouri, S. M. Moattar-Husseini and S. T. Newman, A review ofthe modern approaches to multi-criteria cell design, International Journal ofProduction Research, 38 (2), 1201-1218 (2000).2. F. Offodile, A. Mehrez and J. Grznar, Cellular manufacturing: a taxonomicreview framework, Journal of Manufacturing Systems,13, 196-220 (1994).3. R. Logendran and V. Puvanunt, Duplication of machines and sub-contracting of parts in the presence of alternative cell locations, Computersand Industrial Engineering, 33 (3-4), 235-238 (1997).4. S. M. Moattar-Husseini and S. A. Mansouri, A cost based part ma-chine grouping method for group technology, Proceedings of the 5 Inter-national Conference on Flexible Automation and Intelligent Manufacturing(FAIM'95), Stuttgart, Germany, 415-423 (1995).5. S .M. Shafer, G. M. Kern and J. C. Wei, A mathematical programmingapproach for dealing with exceptional elements in cellular manufacturing,International Journal of Production Research, 30, 1029-1036 (1992).6. D. R. Sule, Machine capacity planning in group technology, InternationalJournal of Production Research, 29 (6), 1909-1922 (1991).7. H. Seifoddini, Duplication process in machine cells formation in grouptechnology, HE Transactions, 21 (1), 382-388 (1989).8. F. Szidarovsky, M. E. Gershon and L. Dukstein, Techniques for Multiob-jective Decision Making in Systems Management, Elsevier: New York (1986).9. C. A. C. Coello, D. A. Van Veldhuizen and G. B. Lamont, EvolutionaryAlgorithms for Solving Multi-Objective Problems, Kluwer Academic Pub-lishers, New York (2002).10. K. Deb, Multi-Objective Optimization using Evolutionary Algorithms,John Wiley & Sons, Chichester, UK (2001).11. H. Bowker and G. J. Liberman, Engineering Statistics (2 ed), Prentice-Hall (1972).12. N. Srinivas and K. Deb, Multiobjective optimization using nondominatedsorting in genetic algorithms, Evolutionary Computation, 2 (3), 221-248(1994).13. S. A. Mansouri, S. M. Moattar-Husseini and S. H. Zegordi, A geneticalgorithm for multiple objective dealing with exceptional elements in cellularmanufacturing, Production Planning & Control, 14 (2), 437-446 (2003).14. V. Venugopal and T. T. Narendran, A genetic algorithm approach to themachine-component grouping problem with multiple objectives, Computersand Industrial Engineering, 22 (1), 469-480 (1992).15. J. L. Burbidge, An introduction of group technology, Proceedings of the

Cellular Manufacturing Systems Using Multi- Objective Genetic Algorithms 527

Seminar on GT, Turin, Italy (1969).16. R. G. Askin, S. H. Cresswell, J. B. Goldberg and A. J. Vakharia, AHamiltonian path approach to reordering the part-machine matrix for cel-lular manufacturing, International Journal of Production Research, 29 (3),1081-1100 (1991).17. W. J. Boe and C. H. Cheng, A close neighbor algorithm for designing cel-lular manufacturing systems, International Journal of Production Research,29 (10), 2097-2116 (1991).

CHAPTER 22

SINGLE-OBJECTIVE AND MULTI-OBJECTIVEEVOLUTIONARY FLOWSHOP SCHEDULING

Hisao Ishibuchi and Youhei Shibata

Department of Industrial Engineering, Osaka Prefecture University1-1 Gakuen-cho, Sakai, Osaka 599-8531, Japan

E-mail: {hisaoi, shibata} @ie.osakafu-u.ac.jp

This chapter explains how evolutionary algorithms can be applied tosingle-objective and multi-objective permutation flowshop schedulingproblems. In permutation flowshop scheduling, each solution is repre-sented by an order (i.e., permutation) of given jobs, which are pro-cessed on given machines in that order. Such a permutation is handledas an individual in genetic algorithms as in the case of traveling sales-man problems. We first examine various genetic operations designed forpermutation-type strings (i.e., order-based coding) through computa-tional experiments on single-objective problems. Next we compare ge-netic algorithms with multi-start local search and genetic local search. Itis shown that multi-start local search and genetic local search are moreefficient than genetic algorithms for single-objective problems. Then wediscuss the application of genetic algorithms to multi-objective problems.It is shown that multi-objective genetic algorithms outperform multipleruns of multi-start local search and single-objective genetic algorithms.This is because a large number of various non-dominated solutions canbe simultaneously obtained by a single run of multi-objective geneticalgorithms. We also suggest some tricks for improving the performanceof multi-objective genetic algorithms.

22.1. Introduction

Permutation flowshop scheduling is one of the most frequently studiedscheduling problems in the literature1. Since Johnson's pioneering work2,various criteria have been considered such as makespan, total flowtime,maximum flowtime, maximum tardiness, and total tardiness3. In general,it is impractical to try to find optimal schedules for large permutation

529

530 H. Ishibuchi and Y. Shibata

flowshop scheduling problems. Thus metaheuristic approaches such as sim-ulated annealing4'5, taboo search6'7 and genetic algorithms8"11 as well asheuristic approaches12"15 have been proposed for efficiently finding near-optimal solutions. By simultaneously considering multiple criteria, single-objective permutation flowshop scheduling problems have been extendedto multi-objective ones16 where evolutionary algorithms have been fre-quently used17"25. Evolutionary algorithms, which are population-basedsearch techniques, are suitable for multi-objective optimization because alarge number of various non-dominated solutions can be simultaneously ob-tained by their single run26"29. On the other hand, only a single solutionis usually obtained by a single run of other heuristic and metaheuristicalgorithms.

Since Schaffer's proposal of the first multi-objective genetic algorithm30,a number of evolutionary multi-objective optimization (EMO) algorithmshave been proposed in the literature (e.g., MOGA31, NPGA32, NSGA33,SPEA34, PAES35, MOGLS36, NSGA-II37). Those algorithms have beencompared with each other in some comparative studies34'38'39 where func-tion optimization problems and 0/1 knapsack problems have been fre-quently used as test problems. One of the main characteristic features of theapplication of evolutionary algorithms to permutation flowshop schedulingis the use of the order-based coding where an order (i.e., permutation) ofgiven jobs is used as an individual. As a result, standard genetic operationsfor binary strings are not directly applicable. Thus we first examine variouscrossover and mutation operations for the order-based coding through com-putational experiments on single-objective permutation flowshop schedul-ing problems. Next we examine the performance of single-objective geneticalgorithms in comparison with multi-start local search and genetic localsearch. It is shown that multi-start local search and genetic local search aremore efficient than genetic algorithms for single-objective problems. Thenwe discuss the application to multi-objective problems where we use a well-known multi-objective genetic algorithm: NSGA-II37 (elitist nondominatedsorting genetic algorithm). Of course, we can use other EMO algorithmsbecause they are usually general-purpose algorithms. It is shown that multi-objective genetic algorithms outperform multiple runs of multi-start localsearch and single-objective genetic algorithms. We also discuss some tricksfor improving multi-objective genetic algorithms for permutation flowshopscheduling such as the hybridization with local search and the introductionof mating restriction. Through computational experiments, it is shown thatthe search ability of the NSGA-II can be improved by those tricks.

Single-Objective and Multi-Objective Evolutionary Flowshop Scheduling 531

22.2. Permutation Flowshop Scheduling Problems

In this section, we briefly explain permutation flowshop scheduling. Fordetails of various scheduling problems, see Brucker40. Let us assume thatwe have n jobs {J\, J2, ..., Jn) that are processed on m machines {Mi,M2 , ..., Mm} in the same order. We also have an n x m matrix whose(i, j) element is the processing time of the i-th job on the j-th machine,and an n-dimensional vector whose i-th element di is the due date of thei-th job. A single-objective permutation flowshop scheduling problem is toobtain an optimal permutation of {Ji , J2, ..., Jn} with respect to a singlescheduling criterion. In this chapter, we consider the maximum completiontime (i.e., makespan) and the maximum tardiness. Let Ci be the completiontime of the i-th job at the last machine (i.e., m-th machine), which iscalculated from the n x m matrix as shown in Fig. 1. The makespan isdefined as max{Ci | i = 1,2, ...,n} while the maximum tardiness is definedas max{max{(Ci - di), 0} | i = 1,2,..., n} .

Machine 1 Job 1 Job 2 Job 3 Job 4

Machine 2 Job 1 Job 2 Job 3 Job 4I


C, C2 C3 C4

Fig. 22.1. A schedule of four jobs on three machines. In this figure, the four jobs areprocessed on the three machines in the order of Job 1, Job 2, Job 3 and Job 4. The lengthof each rectangle shows the processing time of the corresponding job on the correspondingmachine. The completion time Ci of the i-th job is the right limit of the correspondingrectangle at the last machine (i.e., the third machine).

When our aim is to minimize the makespan, the optimal schedule for thefour jobs on the three machines in Fig. 1 is obtained as shown in Fig. 2. Sinceall permutations of the given n jobs are feasible schedules, the total numberof possible solutions (i.e., the size of the search space) is n\. When n is small(e.g., n < 10), the optimal schedule can be easily obtained by examining allpermutations. On the other hand, it is impractical to try to find the optimalschedule of a large problem with many jobs. Thus metaheuristic approachessuch as simulated annealing, taboo search and genetic algorithms as well asvarious heuristic approaches have been proposed for permutation flowshop



Machine 2 Job4 Job3 | Job2 Job 1

I


C4 C3 C2 C,

Fig. 22.2. The optimal schedule of the four jobs in Fig. 1 with respect to the minimiza-tion of the makespan.

scheduling problems in the literature.As test problems, we use a 20-machine 40-job problem and a 20-machine

80-job problem, which were generated in our former study25. The processingtime of each job on each machine was specified as a random integer in theinterval [1, 99]. The due date of each job was specified by adding a randominteger in the interval [-100,100] to its actual completion time in a randomlygenerated schedule. In this chapter, the makespan and the maximum tardi-ness are separately optimized by single-objective genetic algorithms whilethey are simultaneously optimized by multi-objective genetic algorithms.This means that we have four single-objective and two two-objective testproblems.

22.3. Single-Objective Genetic Algorithms

In this section, we discuss several issues related to the implementation of ge-netic algorithms for single-objective permutation flowshop scheduling prob-lems. Multi-objective genetic algorithms are discussed in the next section.

22.3.1. Implementation of Genetic Algorithms

We use the order-based coding where a permutation of the given n jobs isdirectly handled as a chromosome (i.e., individual) in genetic algorithms.We examine seven crossover operations: three versions of one-point ordercrossover, three versions of two-point order crossover, and one version ofuniform order crossover. The one-point order crossover is illustrated in Fig.3 where one parent is divided into two parts by a randomly chosen cuttingpoint. The left-hand side of the parent is inherited to the offspring with nochanges in Version 1 as shown in Fig. 3 (a). The remaining jobs are placedinto the remaining positions of the offspring in the order of those jobs in

Single-Objective and Multi- Objective Evolutionary Flowshop Scheduling 533

the other parent. On the other hand, the right-hand side of one parent isinherited in Version 2 as shown in Fig. 3 (b). Version 3 uses Version 1 inFig. 3 (a) and Version 2 in Fig. 3 (b) with the same probability (i.e., theprobability of 0.5) when this crossover operation is invoked.

Parent 1 |Jl |J2|J3fj4| J51J6 JJ7 [ Parent 1 | j l jJ21 J3|J4JJ5|J6|J7|

111 II 11Offspring |Jl|J2|J31J61J4jJ7 [J51 Offspring |j31J21Jl |J4|J5|j6|J7|

Parent 2 | J3| J61 J4| J2| J71J5 [jjj Parent 2 1 J3| J6|J4|J2| J7 |J5 |J11

(a) Version 1 of the one-point order crossover. (b) Version 2 of the one-point order crossover.

Fig. 22.3. One-point order crossover. The left-hand side is inherited to the offspring inVersion 1 in (a) while the right-hand side is inherited in Version 2 in (b). In Version 3,these two crossover operations are used with the same probability.

The two-point order crossover has two cutting points as shown in Fig.4. The outer parts of one parent are inherited to the offspring as shown inFig. 4 (a) in Version 1 of the two-point order crossover. On the other hand,the inner part is inherited in Version 2 as shown in Fig. 4 (b). In Version 3,Version 1 and Version 2 are used with the same probability. By increasingthe number of cutting points, we may have the uniform order crossoverin Fig. 5 where each job in one parent is inherited to the offspring withthe probability of 0.5. The remaining jobs are placed into the remainingpositions of the offspring in the order of those jobs in the other parent.

Parent 1 | Jl]j21J31J41J5JJ6I J7| Parent 1 lJlfj2|J3|J4|J5|j6|J7|

Offspring [jl|J3|j4iJ2lJ5|j6|j7| Offspring |J61J2|J3|J4|J5|J71Jl 1

/ iT \ \ .' / tParent 2 lj3[j6ij4|J2[j7|J5|jl| Parent 2 [j3|j6|J4|J2|J7|j5iJl|

(a) Version 1 of the two-point order crossover, (b) Version 2 of the two-point order crossover.

Fig. 22.4. Two-point order crossover. The outer parts are inherited to the offspring inVersion 1 in (a) while the inner part is inherited in Version 2 in (b). Version 3 usesVersion 1 and Version 2 with the same probability (i.e., the probability of 0.5).


1 * 1 • * • . * .Parent 1 |jl |J2|J31J4|J51J6|J7|

I ITOffspring 1J6|J4|J31J2|J5|J1|J7|

\\T \Parent 2 |J3|J6|J4[J2|J7|J5|J11

Fig. 22.5. Uniform order crossover. Each job in one parent is inherited to the sameposition of the offspring with the probability of 0.5. The remaining jobs are placed intothe remaining positions in the order of those jobs in the other parent.

While other crossover operations (e.g., edge recombination41, en-hanced edge recombination42, partially matched43, cycle44, precedencepreservation45 and one segment46) were examined in comparativestudies9'24, it was reported that better results were obtained by the or-der crossover for permutation flowshop scheduling. Thus we examine theabove-mentioned seven versions of the order crossover. It should be notedthat the order crossover operations are not suitable for traveling salesmanproblems while they work well for permutation flowshop scheduling.

We also examine four mutation operations shown in Fig. 6: Adjacenttwo-job change (i.e., switch), arbitrary two-job change (i.e., swap), arbitrarythree-job change, and insertion (i.e., shift). It should be noted that thearbitrary two-job change and the insertion include the adjacent two-jobchange. The arbitrary three-job change does not include the adjacent orarbitrary two-job change as its special case.

For selecting a pair of parents from the current population, we use thestandard binary tournament selection. First, two individuals are randomlychosen from the current population with replacement. Next the better oneis chosen as a parent. The other parent is also chosen from the current pop-ulation in the same manner. One of the above-mentioned seven crossoveroperations is applied to the pair of the selected parents with a pre-specifiedcrossover probability for generating an offspring. When the crossover oper-ation is not applied, one of the two parents is randomly chosen. Then oneof the above-mentioned four mutation operations is applied to the newlygenerated offspring (or to the randomly chosen parent when the crossoveroperation is not applied) with a pre-specified mutation probability. Thesegenetic operations (i.e., selection, crossover and mutation) are iterated forgenerating a pre-specified number of offspring.

In our implementation of a single-objective genetic algorithm, we gen-


jfr "k -k -k

| Jl 1 J2|J3|J4|J5| J6[J7J |J11 J2|J3|J4| J5|J6|J7

X ><1Jl1J2|J4|J3|J5[J6|J7[ |J1[J2|J6JJ4|J5|J3|J7

(a) Adjacent two-job change. (b) Arbitrary two-job change.

•k * * y *

|J1|J21J3|J41J5|J6|J71 1Jl|J2lJ3|J4|J5|J6|J7

|J6|J2|Jl|J4)J5|J3|J7J |J1|J6|J2|J3|J41JS|J71

(c) Arbitrary three-job change. (d) Insertion.

Fig. 22.6. Four mutation operations examined in this chapter.

erate Npop offspring where Npop is the population size (i.e., the number ofstrings in each population). The next population is constructed by choos-ing the best Npop strings from the current population with Npop stringsand the offspring population with Npop strings. This generation updatescheme is similar to that of the elitist nondominated sorting genetic algo-rithm (NSGA-II37). In this generation update scheme, the number of elitesolutions can be viewed as the population size (i.e., all strings in each pop-ulation can be viewed as elite solutions). We also examine the standardgeneration update scheme with a single elite solution.

22.3.2. Comparison of Various Genetic Operations

Through computational experiments, we examine the performance of theseven crossover operations and the four mutation operations. We use thefollowing parameter specification in our single-objective genetic algorithmwith NpOp elite solutions:

Population size (Npop): 100,Crossover probability: 1.0,Mutation probability: 1.0,Stopping condition: Evaluation of 100,000 solutions.

The performance of each crossover operation is examined by applying


our genetic algorithm to each of the four single-objective test problems 100times. In this computational experiment, we use the insertion mutation.The average value of the makespan is shown in Table 1 together with thecorresponding standard deviation in parentheses. The best (i.e., smallest)average value in each column is highlighted by boldface in Table 1. Fromthis table, we can see that Version 2 of the one-point order crossover andVersion 1 of the two-point order crossover work well for all the four testproblems. This observation suggests that the utilization of the right-handside of a string as a building block (i.e., the inheritance of jobs processedlater in a schedule) is important in the implementation of efficient geneticalgorithms. This is also supported by poor performance of Version 1 of theone-point order crossover and Version 2 of the two-point order crossoverfor the 80-job maximum tardiness minimization problem. We can also seefrom Table 1 that the uniform order crossover does not work well for themakespan minimization problems.

Table 22.88. Comparison of the seven crossover operations on the four sin-gle-objective test problems.

Makespan Maximum tardinessCrossover operation

40-job 80-job 40-job 80-job

One-point Version 1 3336 (13.3) 5490 (18.7) 87 (54.9) 291 (67.2)One-point Version 2 3335 (13.1) 5487 (17.4) 76 (47.0) 207 (85.7)One-point Version 3 3332 (11.9) 5484 (17.2) 75 (48.0) 231 (76.1)Two-point Version 1 3333 (12.9) 5484 (15.4) 75 (49.2) 213 (74.3)Two-point Version 2 3338 (17.2) 5490 (18.2) 78 (52.9) 252 (74.1)Two-point Version 3 3334 (14.5) 5487 (20.0) 82 (50.9) 222 (75.9)Uniform 3347 (17.1) 5515 (23.7) 74 (43.8) 212 (64.8)

In the same manner as Table 1, we examine the performance of eachmutation operation. We use Version 1 of the two-point order crossover inthis computational experiment. Experimental results are summarized inTable 2. From this table, we can see that the best results are obtained bythe insertion mutation for all the four test problems. From the comparisonbetween Table 1 and Table 2, we can see that the choice of a mutation oper-ation has a much larger effect on the performance of our genetic algorithmthan the choice of a crossover operation. Based on our experimental resultsin Table 1 and Table 2, we decide to use Version 1 of the two-point ordercrossover and the insertion mutation in this chapter.

We also compare the two generation update schemes with each other:


Table 22.89. Comparison of the four mutation operations on the four sin-gle-objective test problems.

Makespan Maximum tardinessMutation operation

40-job 80-job 40-job 80-jobAdjacent two-job 3490 (38.1) 5787 (65.4) 723 (143.9) 1962 (263)Arbitrary two-job 3354 (19.6) 5515 (25.9) 135 (42.8) 336 (82.5)Arbitrary three-job 3434(29.2) 5616(36.5) 206(48.1) 700(143.6)Insertion 3333 (12.9) 5484 (15.4) 75 (49.2) 213 (74.3)

One genetic algorithm used in the above computational experiments hasNpOp elite solutions while the other has a single elite solution. Since theappropriate specifications of the crossover and mutation probabilities aredifferent between these two algorithms, we examine 10 x 10 combinationsof the crossover probability Pc and the mutation probability PM '• Pc =0.1, 0.2, ..., 1.0 and PM — 0.1, 0.2, ..., 1.0. Using each combination, eachgenetic algorithm is applied to each test problem 20 times. Experimentalresults on the 80-job makespan minimization problem are shown in Fig.7. From this figure, we can see that better results are obtained by thegenetic algorithm with Npop elite solutions than that with a single elitesolution. When the number of elite solutions is small (e.g., a single elitesolution), high mutation probabilities lead to poor search ability as shownin Fig. 7 (b). On the other hand, the higher the mutation probability is, thehigher the search ability is in Fig. 7 (a) with many elite solutions. Similarobservations are obtained from computational experiments on the 80-jobmaximum tardiness minimization problem in Fig. 8.

In Table 3, we summarize the best result by each algorithm for each testproblem over the 100 combinations of the crossover and mutation proba-bilities. Table 3 shows the average result over 20 runs with the best com-bination of these two parameters. From this table, we can see that muchbetter results are obtained by the single-objective genetic algorithm withNpop elite solutions. In general, a large number of elite solutions have anegative effect on the diversity of solutions while they have a positive effecton the convergence speed of solutions. This negative effect is observed inFig. 7 and Fig. 8 when the mutation probability PM is small. In Fig. 9,we show the distribution of the values of the makespan of randomly gener-ated 1,000,000 schedules for the 80-job problem and the best (i.e., smallest)value of the makespan obtained by the above computational experimentsusing the genetic algorithms. From this figure, we can see that the obtained


Fig. 22.7. Comparison between the two generation update schemes on the 80-job testproblem with the objective of minimizing the makespan.

Fig. 22.8. Comparison between the two generation update schemes on the 80-job testproblem with the objective of minimizing the maximum tardiness.

best solution is far from randomly generated schedules. This means that theconvergence speed to the optimal solution is very important. As a result,the positive effect of a large number of elite solutions on the convergencespeed overwhelms their negative effect on the diversity of solutions in awide range of parameter values in Fig. 7 and Fig. 8. In this chapter, we usethe generation update scheme with Npop elite solutions.


Table 22.90. Comparison of the two generation update schemes on the foursingle-objective test problems using the best combination of the crossoverand mutation probabilities for each algorithm and each test problem.

Makespan Maximum tardinessAlgorithm

40-job 80-job 40-job 80-job

Npop elite solutions 3330 (13.1) 5479 (11.3) 49 (36.2) 185 (60.0)Single elite solution 3354 (16.8) 5503 (22.9) 80 (41.7) 230 (44.0)

Fig. 22.9. Distribution of the values of the makespan of randomly generated 1,000,000schedules of the 80-job test problem. The best (i.e., smallest) value of makespan obtainedin our computational experiments using the genetic algorithms is also shown.

22.3.3. Performance Evaluation of Genetic Algorithms

In some comparative studies9'10, it was reported that genetic algorithmswere outperformed by local search methods (e.g., simulated annealing,taboo search and multi-start local search) in their applications to per-mutation flowshop scheduling. In this subsection, we compare our single-objective genetic algorithm with a multi-start local search algorithm wherelocal search is repeated from randomly generated initial solutions. In localsearch, we use the insertion mutation for generating neighboring solutions.Thus the size of the neighborhood structure is (n — I)2 for n-job permu-tation flowshop scheduling problems (i.e., 1521 for the 40-job test problemand 6241 for the 80-job test problem). We use the first move strategy inlocal search. That is, neighboring solutions are examined in a random orderand the current solution is replaced with the first solution that improvesthe current one. As a stopping condition of local search, we use a param-eter Lfails. When Lfails solutions have already been examined in theneighborhood of the current solution (i.e., when local moves have succes-


sively failed Lfails times), local search is terminated. In this case, an initialsolution is randomly generated for restarting local search.

We compare our genetic algorithms with the multi-start local search al-gorithm under the same computation load (i.e., the examination of 100,000solutions). We examine the multi-start local search algorithm using vari-ous values of Lfails: Lfails = 10, 20, 50, 100, 200, 500, 1000. Using eachvalue of Lfails, the multi-start local search algorithm is applied to eachtest problem 100 times. Average results over the 100 runs are summarizedin Table 4 where the experimental results by our genetic algorithms arecited from Table 3. We can see from Table 4 that our genetic algorithmsare outperformed by the multi-start local search algorithm except for the40-job makespan minimization problem.

Table 22.91. Comparison between the multi-start local search algorithmand our genetic algorithms. The best result over the 100 combinations ofthe crossover and mutation probabilities is cited from Table 3 as the resultof our genetic algorithms for each test problem.

, , , Makespan Maximum tardinessl-jails

40-job 80-job 40-job 80-job10 3532 (19.3) 5876 (26.5) 556 (54.0) 1513 (169)20 3477 (15.7) 5752 (28.0) 294 (31.9) 787 (97.8)50 3419 (13.3) 5633 (23.0) 139 (13.0) 330 (34.1)100 3387 (11.9) 5551 (22.7) 88 (19.3) 218 (46.7)200 3363 (10.1) 5496 (15.6) 35 (20.1) 176 (60.2)500 3345 (11.8) 5475 (10.6) 27 (16.0) 171 (61.8)1000 3338 (12.2) 5475 (12.0) 27 (16.0) 171 (61.8)

GA {Npop elites) 3330 (13.1) 5479 (11.3) 49 (36.2) 185 (60.0)GA (single elite) 3354 (16.8) 5503 (22.9) 80 (41.7) 230 (44.0)

In some comparative studies9'10, it was also reported that very goodresults were obtained by genetic local search algorithms (i.e., hybrid algo-rithms of genetic algorithms and local search). Thus we implement a geneticlocal search algorithm based on our genetic algorithm with Npop elite solu-tions. We use the same local search procedure as in the above-mentionedmulti-start local search algorithm. The local search procedure is applied toeach offspring generated by the genetic operations with a pre-specified localsearch application probability Pis- We examine 10 x 11 combinations of thelocal search termination parameter Lfails and the local search applicationprobability PLS: Lfails = 1, 2, 5, 10, 20, 50, 100, 200, 500, 1000 and PLs =0, 0.001, 0.002, 0.005, 0.01, 0.02, 0.05, 0.1, 0.2, 0.5, 1.0. It should be noted


that the genetic local search algorithm with PLS — 0 is the same as ourgenetic algorithm. Using each combination of I-fails and PLS, the geneticlocal search algorithm is applied to each test problem 20 times using theevaluation of 100,000 solutions as the stopping condition. Experimental re-sults are summarized in Fig. 10 (a) for the 80-job makespan minimizationproblem and Fig. 10 (b) for the 80-job maximum tardiness minimizationproblem. In this computational experiment, we use the best combinationof the crossover and mutation probabilities found in the previous computa-tional experiments for our genetic algorithm on each test problem. That is,these parameters are not tuned for the genetic local search algorithm buttuned for the genetic algorithm (i.e., for the case of PLS = 0 in Fig. 10).From Fig. 10, we can see that the hybridization with local search improvesthe performance of our genetic algorithm when Lfails and PLS are appro-priately specified. The genetic local search algorithm also outperforms themulti-start local search algorithm. For example, the best average result is171 by the multi-start local search algorithm in Table 4 for the 80-job max-imum tardiness minimization problem while it is 151 by the genetic localsearch algorithm in Fig. 10. Similar results are obtained for the two 40-jobtest problems as shown in Fig. 11. Moreover the hybridization with localsearch has a positive effect on the efficiency of genetic algorithms. In Fig.12, we show the average CPU time for each combination of Lfails and PLScorresponding to Fig. 11. From Fig. 12, we can see that the hybridizationwith local search decreases the CPU time of genetic algorithms.

22.4. Multi-Objective Genetic Algorithms

While genetic algorithms are often outperformed by local search in their ap-plications to single-objective permutation flowshop scheduling, they have aninherent advantage in their applications to multi-objective problems. Thatis, a large number of various non-dominated solutions can be simultaneouslyobtained by their single run. On the other hand, a single solution is usuallyobtained by a single run of other search algorithms. In this chapter, we usethe elitist nondominated sorting genetic algorithm (NSGA-II37) becauseits high search ability has been frequently reported in the literature andits implementation is relatively easy. Of course, other EMO (evolutionarymulti-objective optimization) algorithms can be applied to multi-objectivepermutation flowshop scheduling problems.


(a) Minimization of the makespan. (b) Minimization of the maximum tardiness.

Fig. 22.10. Performance of the genetic local search algorithm on the two 80-job testproblems. Experimental results with P^s = 0 show the performance of the non-hybridgenetic algorithm.


Fig. 22.11. Performance of the genetic local search algorithm on the two 40-job testproblems. Experimental results with PLS = 0 show the performance of the non-hybridgenetic algorithm.

22.4.1. NSGA-II Algorithm

The basic framework of the NSGA-II is the same as our single-objectivegenetic algorithm in the previous section. This is because we used the sameframework as the NSGA-II when we implemented our single-objective ge-netic algorithm for comparison. In the NSGA-II, Npop offspring are gener-ated from the current population with Npop solutions. Then the best iVpop



Fig. 22.12. Average CPU time of the genetic local search algorithm with each combi-nation of PLS an<3 l-fails for the two 40-job test problems.

solutions are chosen from the current and offspring populations for con-structing the next population. The point is how to evaluate each solutionbecause multiple objectives are involved.

The NSGA-II uses the Pareto dominance relation and the concept ofcrowding for evaluating each solution. When the next population is con-structed by choosing the best ATpop solutions, first the current and offspringpopulations are merged to form a tentative population. Then a rank is as-signed to each solution in the tentative population using the concept ofPareto ranking. That is, the first rank is assigned to all the non-dominatedsolutions in the tentative population. All solutions with the first rank areremoved from the tentative population and added to the next population.The second rank is assigned to all the non-dominated solutions in the re-duced tentative population. All solutions with the second rank are removedfrom the reduced tentative population and added to the next population.In this manner, good solutions with respect to multiple objectives are cho-sen and added to the next population. If the number of the solutions inthe next population exceeds the pre-specified population size (i.e., Npop),solutions with the worst rank in the next population are sorted using theconcept of crowding. Each solution is evaluated by the sum of the distancesfrom adjacent solutions with the same rank. That is, the crowding measureof each solution is calculated as the sum of the distances from adjacentsolutions with the same rank. More specifically, two adjacent solutions ofeach solution are identified with respect to each objective. Then the dis-


tance between those adjacent solutions is calculated on each objective andsummed up over all the objectives for calculating the measure of crowd-ing. To each extreme solution with the maximum or minimum value of atleast one objective among the same rank solutions, an infinite large valueis assigned as the crowding measure because one of the two adjacent so-lutions cannot be identified. Solutions with larger values of the crowdingmeasure are viewed as being better because those solutions are not locatedin crowded regions in the objective space. Solutions with the worst rank areremoved from the next population in the increasing order of the crowdingmeasure until the number of remaining solutions in the next populationbecomes the pre-specified population size.

When a pair of parent solutions is to be selected from the current pop-ulation by the binary tournament selection, each solution is also evaluatedin the same manner (i.e., using its rank as the primary criterion and thecrowding measure as the secondary criterion).

22.4.2. Performance Evaluation of the NSGA-II Algorithm

We apply the NSGA-II to the two-objective 40-job and 80-job test prob-lems in Section 2 using the same parameter specification in Section 3 (i.e.,population size: 100, crossover probability: 1.0, mutation probability: 1.0,stopping condition: evaluation of 100,000 solutions). In Fig. 13, we showexperimental results by a single run of the NSGA-II on each test prob-lem. Each figure shows the initial population, an intermediate populationat the 50th generation and the final population at the 1000th generation.From Fig. 13, we can see that the NSGA-II simultaneously minimizes bothobjectives while maintaining the diversity of solutions.

For evaluating the performance of the NSGA-II, we compare solutionsets obtained by the NSGA-II with those by multiple runs of our single-objective genetic algorithm with Npop elite solutions. More specifically, weapply the NSGA-II to each two-objective test problem 10 times. Fromthis computational experiment, 10 solution sets are obtained. Our single-objective genetic algorithm is also applied to each of the correspondingsingle-objective test problem 10 times. From this computational experi-ment, 20 solutions are obtained (10 solutions from each single-objectivetest problem). Obtained solutions are shown in Fig. 14. From this figure,we can see that a variety of solutions cannot be obtained by multiple runsof our single-objective genetic algorithm. We can also see that better resultsare obtained by our single-objective genetic algorithm if we consider only


(a) 40-job test problem. (b) 80-job test problem.

Fig. 22.13. The initial population, an intermediate population at the 50th generationand the final population at the 1000th generation in a single run of the NSGA-II on thetwo-objective 40-job and 80-job test problems.


Fig. 22.14. Comparison between solutions obtained by the NSGA-II (small closed cir-cles) and those by our single-objective genetic algorithm (open circles).

a single objective.

We also use the following weighted scalar objective function in our

single-objective genetic algorithm:

/(x) =Wl x.fi(x)+w2 x /2(x), (D.I)

where x denotes a solution, fx (x) is the makespan, f2 (x) is the maximum


tardiness, and Wi and u>2 are non-negative weights. For simultaneouslyminimizing both objectives, we specify the weight vector w = (u>i, w2) asw = (0.5, 0.5). Our single-objective genetic algorithm is applied to each two-objective test problem 10 times for minimizing the weighted scalar objectivefunction in (1) with w = (0.5, 0.5). In this computational experiment, weuse the same stopping condition as the NSGA-II: evaluation of 100,000solutions. Experimental results are summarized in Fig. 15. From this figure,we can see that a variety of solutions cannot be obtained by our single-objective genetic algorithm for minimizing the weighted scalar objectivefunction with the fixed weight vector. We can also see that our single-objective genetic algorithm slightly outperforms the NSGA-II when ourobjective is to minimize the weighted scalar objective function with thefixed weight vector w = (0.5, 0.5).


Fig. 22.15. Comparison between solutions obtained by the NSGA-II (small closed cir-cles) and those by our single-objective genetic algorithm for minimizing the weightedscalar objective function (open circles).

For finding a variety of solutions by our single-objective genetic algo-rithm, we use the following five weight vectors in the weighted scalar ob-jective function: w = (1, 0), (0.75, 0.25), (0.5, 0.5), (0.25, 0.75), (0, 1). Oursingle-objective genetic algorithm is applied to each test problem for min-imizing the weighted scalar fitness function with each weight vector. Forcomparing the NSGA-II with our single-objective genetic algorithm underthe same computation load, we use the evaluation of 20,000 solutions (i.e.,


1/5 of 100,000 solutions in the case of the NSGA-II) as the stopping con-dition of our single-objective genetic algorithm. This is because our single-objective genetic algorithm is applied to each test problem five times forobtaining five solutions, each of which corresponds to the minimization ofthe weighted scalar objective function with each weight vector. Five solu-tions are obtained by our single-objective genetic algorithm. Those solutionsare compared with a solution set obtained by a single run of the NSGA-II inFig. 16. From this figure, we can see that the quality of obtained solutionsby our single-objective genetic algorithm is not high because the availablecomputation load for its single run is specified as 1/5 of the NSGA-II forallowing multiple runs in order to obtain multiple solutions. On the otherhand, a large number of non-dominated solutions can be obtained by asingle run of the NSGA-II.


Fig. 22.16. Comparison between solutions obtained by the NSGA-II (small closed cir-cles) and those by multiple runs of our single-objective genetic algorithm for minimizingthe weighted scalar objective function with various weight values (open circles).

We also perform the same computational experiment as Fig. 16 usingthe multi-start local search algorithm with Lfails = 1000 in Section 3. Ex-perimental results are shown in Fig. 17. In contrast to Fig. 16, we cannot saythat the NSGA-II outperforms multiple runs of the multi-start local searchalgorithm in Fig. 17. This is because the multi-start local search algorithmis more efficient than genetic algorithms for single-objective permutationflowshop scheduling. Advantages of multi-objective genetic algorithms over


local search, however, become clearer when we apply them to permutationflowshop scheduling problems with many objectives. This is because localsearch should be executed many times in order to obtain a variety of non-dominated solutions in high-dimensional objective spaces (i.e., because wecannot use long CPU time for a single run of local search). On the otherhand, a large number of non-dominated solutions can be obtained by asingle run of multi-objective genetic algorithms.


Fig. 22.17. Comparison between solutions obtained by the NSGA-II (small closed cir-cles) and those by multiple runs of the multi-start local search algorithm for minimizingthe weighted scalar objective function with various weight values (open circles).

22.4.3. Extensions to Multi-Objective Genetic Algorithms

In the design of evolutionary multi-objective optimization (EMO) algo-rithms, there exist two conflicting requirements: One is to increase the con-vergence speed to the Pareto front and the other is to increase the diversityof solutions. We demonstrate that the choice of parent solutions has posi-tive and negative effects on these two requirements. We also demonstratethe effect of the hybridization with local search on the performance of EMOalgorithms.

We use a similarity-based mating scheme47, which is illustrated in Fig.18. First, a candidates are chosen by iterating the binary tournament selec-tion a times. Then the most extreme solution among them is selected as oneparent (say, Parent A). This selection is based on the distance from each


candidate in the objective space to the average vector of the a candidates.For choosing the other parent (say, Parent B), j3 candidates are chosen byiterating the binary tournament selection /3 times. Then the most similarcandidate to Parent A is selected as Parent B. This selection is based onthe distance from Parent A in the objective space to each candidate.

Fig. 22.18. Similarity-based mating scheme.

In the similarity-based mating scheme, the diversity of solutions is in-creased by the use of a large value of a while the convergence speed to thePareto front is increased by a large value of /?. These effects are demon-strated in Fig. 19 where we depict a solution set obtained for the 40-jobtest problem by a single run of the NSGA-II using the similarity-basedmating scheme with each combination of a and 0. For comparison, exper-imental results by the original NSGA-II are also depicted in Fig. 19. Itshould be noted that the similarity-based mating scheme with (a, /?) = (1,1) is exactly the same as the standard binary tournament selection. Fig. 19(a) shows two extreme cases (i.e., (a, /3) = (10, 1), (1, 10)) where we canobserve the above-mentioned effects of the similarity-based mating scheme.On the other hand, a and /? are appropriately specified in Fig. 19 (b) wherethe convergence speed to the Pareto front is improved without degradingthe diversity of obtained solutions.

The efficiency of evolutionary multi-objective optimization (EMO) algo-rithms can be improved by the hybridization with local search19'25-39. Thehybridization, however, is not straightforward if compared with the caseof single-objective optimization. This is because local search is a single-


(a) Extreme parameter specifications. (b) Appropriate parameter specifications.

Fig. 22.19. Solution sets obtained for the 40-job test problem by the NSGA-II with thesimilarity-based mating scheme.

objective optimization technique. We implement a hybrid EMO algorithmby combining local search with the NSGA-II in the following manner. Aswe have already explained, an offspring population with Npop solutions isgenerated from the current (i.e., parent) population with Npop solutions inthe NSGA-II. An initial solution for local search is chosen from the offspringpopulation using the binary tournament selection with replacement. Thisselection is based on the weighted scalar objective function in (1) where theweight vector w = (w\,W2) is randomly specified whenever an initial solu-tion is to be chosen. Then local search is applied to a copy of the selectedinitial solution for improving the weighted scalar objective function withthe current weight vector in the same manner as in the previous subsec-tion. The execution of local search is terminated based on the terminationparameter Lfails. When the initial solution is improved by local search,the improved solution is added to the offspring population. The selectionof an initial solution and the application of local search to the selected ini-tial solution are iterated Npop x Pis times where PLS is the local searchapplication probability. The next population is constructed from the par-ent population and the offspring population in the same manner as in theNSGA-II.

In Fig. 20, we show solution sets obtained for the 80-job test problem bya single run of our hybrid EMO algorithm with each combination of PLS a ndLfails: (PLS,Lfails) = (1, 500), (0.2, 1). In the case of (PLS,Lfails) =(1, 500), almost all solutions are examined in the local search part. Actually


the number of updated generations in the EMO part is 1 in this case. As aresult, the diversity of solutions is degraded by the hybridization with localsearch while the convergence speed is not degraded. Since local search canbe more efficiently executed than genetic search, the average CPU time isdecreased from 10.6 seconds of the non-hybrid NSGA-II to 7.6 seconds ofour hybrid EMO algorithm with (PLS,Lfails) = (1, 500). On the otherhand, a good balance between local search and genetic search is realizedin Fig. 20 (b) where (Pts,L/ai/s) = (0.2, 1). In this case, the number ofupdated generations is 806.

(a) Too much local search. (b) Appropriate parameter specifications.

Fig. 22.20. Solution sets obtained for the 80-job test problem by our hybrid EMOalgorithm and the non-hybrid original NSGA-II.

22.5. Conclusions

In this chapter, we illustrated how genetic algorithms can be applied tosingle-objective and multi-objective permutation flowshop scheduling. Weshowed through computational experiments that the order-based crossoverand the insertion mutation work well for permutation flowshop schedul-ing. We also showed that multi-objective genetic algorithms are superiorto multiple runs of single-objective optimization techniques in terms ofthe diversity of solutions while single-objective genetic algorithms are in-ferior to single-objective local search in many cases. While we used theNSGA-II in our computational experiments, we can use other evolutionarymulti-objective optimization (EMO) algorithms for permutation flowshop


scheduling. For improving the performance of those EMO algorithms, wesuggested the use of a similarity-based mating scheme and the hybridizationwith local search.

References

1. R. A. Dudek, S. S. Panwalkar and M. L. Smith, "The lessons of flowshopscheduling research," Operations Research 40 (1992) 7-13.

2. S. M. Johnson, "Optimal two- and three-stage production schedules withsetup times included," Naval Research Logistics Quarterly 1 (1954) 61-68.

3. K. R. Baker and G. D. Scudder, "Sequencing with earliness and tardinesspenalties: A review," Operations Research 38 (1990) 22-36.

4. I. H. Osman and C. N. Potts, "Simulated annealing for permutation flow-shopscheduling," OMEGA 17 (1989) 551-557.

5. H. Ishibuchi, S. Misaki and H. Tanaka, "Modified simulated annealing algo-rithms for the flow shop sequencing problem," European Journal of Opera-tional Research 81 (1995) 388-398.

6. E. Taillard, "Some efficient heuristic methods for the flow shop sequencingproblem," European Journal of Operational Research 47 (1990) 65-74.

7. M. Ben-Daya and M. Al-Fawzan, "A tabu search approach for the flow shopscheduling problem," European Journal of Operational Research 109 (1998)88-95.

8. C. R. Reeves, "A genetic algorithm for flowshop sequencing," Computers andOperations Research 22 (1995) 5-13.

9. T. Murata, H. Ishibuchi and H. Tanaka, "Genetic algorithms for flowshopscheduling problems," Computer and Industrial Engineering 30 (1996) 1061-1071.

10. C. A. Glass and C. N. Potts, "A comparison of local search methods for flowshop scheduling," Annals of Operations Research 63 (1996) 489-509.

11. C. Dimopoulos and A. M. S. Zalzala, "Recent developments in evolution-ary computation for manufacturing optimization: Problems, solutions, andcomparisons," IEEE Trans, on Evolutionary Computation 4 (2000) 93-113.

12. M. Nawaz, E. Enscore and I. Ham, "A heuristic algorithm for the m-machinera-job flow-shop sequencing problem," OMEGA 11 (1983) 91-95.

13. Y. B. Park, C. D. Pegden and E. E. Enscore, "A survey and evaluation ofstatic flowshop scheduling heuristics," International Journal of ProductionResearch 22 (1984) 127-141.

14. T. C. Lai, "A note on heuristics of flow-shop scheduling," Operations Research44 (1996) 648-652.

15. B. Chen, C. A. Glass, C. N. Potts and V. A. Strusevich, "A new heuristic forthree-machine flow shop scheduling," Operations Research 44 (1996) 891-898.

16. R. L. Daniels and R. J. Chambers, "Multiobjective flowshop scheduling,"Naval Research Logistics 37 (1990) 981-995.

17. J. Sridhar and C. Rajendran, "Scheduling in flowshop and cellular manufac-turing systems with multiple objectives - A genetic algorithmic approach,"Production Planning and Control 7 (1996) 374-382.


18. T. Murata, H. Ishibuchi and H. Tanaka, "Multi-objective genetic algorithmand its applications to flowshop scheduling," Computer and Industrial Engi-neering 30 (1996) 957-968.

19. H. Ishibuchi and T. Murata, "A multi-objective genetic local search algorithmand its application to flowshop scheduling," IEEE Trans, on Systems, Man,and Cybernetics - Part C: Applications and Reviews 28 (1998) 392-403.

20. T. P. Bagchi, Multiobjective Scheduling by Genetic Algorithms (Kluwer Aca-demic Publishers, Boston, 1999).

21. T. P. Bagchi, "Pareto-optimal solutions for multi-objective productionscheduling problems," Lecture Notes in Computer Science 1993 (2001) 458-471.

22. E. Talbi, M. Rahoual, M. H. Mabed and C. Dhaenens, "A hybrid evolutionaryapproach for multicriteria optimization problems: Application to the FlowShop," Lecture Notes in Computer Science 1993 (2001) 416-428.

23. M. Basseur, F. Seynhaeve and E. G. Talbi, "Design of multi-objective evolu-tionary algorithms: Application to the flow-shop scheduling problem," Proc.of 2002 Congress on Evolutionary Computation (2002) 1151-1156.

24. C. A. Brizuela and R. Aceves, "Experimental genetic operators analysis forthe multi-objective permutation flowshop," Lecture Notes in Computer Sci-ence 2632 (2003) 578-592.

25. H. Ishibuchi, T. Yoshida and T. Murata, "Balance between genetic search andlocal search in memetic algorithms for multiobjective permutation flowshopscheduling," IEEE Trans, on Evolutionary Computation 7 (2003) 204-223.

26. C. A. Coello Coello, "A comprehensive survey of evolutionary-based multi-objective optimization techniques," Knowledge and Information Systems 1(1999) 269-308.

27. D. A. van Veldhuizen and G. B. Lamont, "Multiobjective evolutionary algo-rithms: Analyzing the state-of-the-art," Evolutionary Computation 8 (2000)125-147.

28. K. Deb, Multi-Objective Optimization Using Evolutionary Algorithms (JohnWiley & Sons, Chichester, 2001).

29. C. A. Coello Coello, D. A. van Veldhuizen and G. B. Lamont, EvolutionaryAlgorithms for Solving Multi-Objective Problems (Kluwer Academic Publish-ers, Boston, 2002).

30. J. D. Schaffer, "Multi-objective optimization with vector evaluated geneticalgorithms," Proc. of 1st International Conference on Genetic Algorithmsand Their Applications (1985) 93-100.

31. C. M. Fonseca and P. J. Fleming, "Genetic algorithms for multiobjectiveoptimization: Formulation, discussion and generalization," Proc. of 5th In-ternational Conference on Genetic Algorithms (1993) 416-423.

32. J. Horn, N. Nafpliotis and D. E. Goldberg, "A niched Pareto genetic al-gorithm for multi-objective optimization," Proc. of 1st IEEE InternationalConference on Evolutionary Computation (1994) 82-87.

33. N. Srinivas and K. Deb, "Multiobjective optimization using nondominatedsorting in genetic algorithms," Evolutionary Computation 2 (1994) 221-248.

34. E. Zitzler and L. Thiele, "Multiobjective evolutionary algorithms: A com-


parative case study and the strength Pareto approach," IEEE Trans, onEvolutionary Computation 3 (1999) 257-271.

35. J. D. Knowles and D. W. Corne, "Approximating the nondominated front us-ing Pareto archived evolution strategy," Evolutionary Computation 8 (2000)149-172.

36. A. Jaszkiewicz, "Genetic local search for multi-objective combinatorial opti-mization," European Journal of Operational Research 137 (2002) 50-71.

37. K. Deb, A. Pratap, S. Agarwal and T. Meyarivan, "A fast and elitist multi-objective genetic algorithm: NSGA-II," IEEE Trans, on Evolutionary Com-putation 6 (2002) 182-197.

38. E. Zitzler, K. Deb and L. Thiele, "Comparison of Multiobjective EvolutionaryAlgorithms: Empirical Results," Evolutionary Computations (2000) 173-195.

39. A. Jaszkiewicz, "On the performance of multiple-objective genetic localsearch on the 0/1 knapsack problem - A comparative experiment," IEEETrans, on Evolutionary Computation 6 (2002) 402-412.

40. P. Brucker, Scheduling Algorithms (Springer, Berlin, 1998).41. D. Whitley, T. Starkweather and D. Fuquay, "Scheduling problems and trav-

eling salesmen: the genetic edge recombination operator," Proc. of 3rd Inter-national Conference on Genetic Algorithms (1989) 133-140.

42. T. Starkweather, S. McDaniel, D. Mathias, D. Whitley and C. Whitley, "Acomparison of genetic sequence operators," Proc. of 4th International Con-ference on Genetic Algorithms (1991) 69-76.

43. D. E. Goldberg, Genetic Algorithms in Search, Optimization, and MachineLearning (Addison-Wesley, Reading, 1989).

44. I. Oliver, D. Smith and J. Holland, "A study of permutation crossover oper-ators on the travelling salesman problem," Proc. of 2nd International Con-ference on Genetic Algorithms (1987) 224-230.

45. C. Bierwirth, D. C. Mattfeld and H. Kopfer, "On permutation representationsfor scheduling problems," Lecture Notes in Computer Science 1141 (1996)960-970.

46. M. Gen and R. Cheng, Genetic Algorithms and Engineering Design (JohnWiley k Sons, New York, 1997).

47. H. Ishibuchi and Y. Shibata, "A similarity-based mating scheme for evo-lutionary multiobjective optimization," Lecture Notes in Computer Science2723 (2003) 1065-1076.

CHAPTER 23

EVOLUTIONARY OPERATORSBASED ON ELITE SOLUTIONS FOR

BI-OBJECTIVE COMBINATORIAL OPTIMIZATION

Xavier Gandibleux

LAMIH/ROI - UMR CNRS 8530, Universite de ValenciennesLe Mont Houy - F59313 Valenciennes cedex 9, France

E-mail: Xavier. [email protected]

Hiroyuki Morita

Faculty of Economics, Osaka Prefecture UniversitySakai, Osaka 599-8231, Japan

E-mail: morita@eco. osakafu-u. ac.jp

Naoki Katoh

Graduate School of Engineering, Kyoto UniversityKyoto 606-8501, Japan


Combinatorial optimization problems with multiple objectives representan important area of mathematical programming. Yet with two objec-tives, these problems are difficult to solve, even when, for example, theproblems considered are structured according to their constraint ma-trix. Sometimes subsets of non-dominated solutions can be computed orapproximated easily. Such solutions can set up an initial elite solutionset that can be used advantageously in an evolutionary algorithm withthe appropriate operators. This chapter describes a population-basedmethod, using evolutionary operators based on elite solutions, to ap-proximate efficient solutions of bi-objective combinatorial optimizationproblems. The operators used are a crossover, a path-relinking and alocal search on elite solutions. The method has been applied to the bi-objective assignment problem and the bi-objective knapsack problem.These two fundamental problems are encountered in practical applica-tions, such as resource assignment and portfolio design, and are sub-

555

556 X. Gandibleux, H. Morita, N. Katoh

problems of other more complicated problems, such as transportationproblems.

23.1. Introduction

Combinatorial Optimization is studied extensively in operational research.Due to its potential for application in real world problems (vehicle routing,bin-packing, timetabling, etc.), this field of study has prospered over thelast few decades. However, real world decision-making involves dealing withseveral, usually conflicting, objectives. For example, a decision-maker facedwith a portfolio problem has to balance the risks and the returns of invest-ment, taking both objectives into account simultaneously. Obviously, thereis generally no single unique optimal solution, and thus decision-makershave to deal with efficient solutions that best meet their multiple objec-tives.

For these reasons, the increasing interest of many researchers in thefield of multi-objective combinatorial optimization (MOCO) in recent yearsis hardly surprising. Since 1990, specific methodologies have been devel-oped, and the number of papers in the field has increased considerably8.Still, the theoretical complexity of MOCO7 problems is certainly a majorobstacle to the development of exact methods. Even with only two objec-tives, computing all efficient solutions is generally difficult, and even if thesingle-objective version of the problem is polynomially solvable. Thus, asis the case for single objective problems, a reasonable alternative to exactmethods for solving large-scale instances of MOCO problems is to derivean approximation method.

The challenge for these methods in multi-objective programming is tofind "good" solutions which approximate all efficient solutions of the prob-lem. And here, Multiple Objective Heuristics and Multiple Objective Meta-Heuristics are powerful methods aiming to provide a good tradeoff betweenthe "quality" of the set of elite solutions and the time and memory requiredto produce them. The approximation is called the set of potential efficientsolutions or, in the context of evolutionary algorithms, the set of elite solu-tions. When a solution is included in this set, no other solution computedby the procedure at this step dominates that solution.

The approximation methods for multi-objective problems first appearedin 1984. Since then, pioneer methods have been introduced: Genetic Algo-rithms (Schaffer 198420), Artificial Neural Networks (Malakooti 199018),Simulated Annealing (Serafini 199221), and Tabu Search (Gandibleux19969). Two characteristics are common to these pioneering methods. First,

Evolutionary Operators Based on Elite Solutions for biCO 557

they are inspired exclusively by evolutionary algorithms, or by neighborhoodsearch algorithms. Second, the first methods were direct derivations of singleobjective optimization metaheuristics, adapted to integrate the concept ofefficient solutions for optimizing multiple objectives. Recent multi-objectivemetaheuristics are often hybridized. For example, some methods based onneighborhood search algorithms handle a population of solutions4'16.

For some MOCO problems, subsets of efficient solutions can be com-puted or approximated easily. Such solutions can set up an initial solutionset that can be used advantageously in an evolutionary algorithm with theappropriate operators. This is the primary concern of this chapter. Thegeneric principle of a population-based method that identifies the efficientfrontier of bi-objective combinatorial optimization problems is described.The operators used are a crossover, a path-relinking and a local search onelite solutions. Numerical experiments underline the method's effectivenessfor quickly obtaining an approximation of the exact efficient frontier for twobi-objective combinatorial optimization problems: the assignment problemand the knapsack problem.

23.2. MOCO Problems and Solution Sets

Given a finite set X and Q > 1 objective functions zq : X —t E,q = 1 , . . . , Q, a multi-objective combinatorial optimization (MOCO) prob-lem is defined as7 :

unnn"(z1(x),...,zQ(x)) (MOCO)

A solution x € X is a feasible decision, where X is called the decisionspace. A vector z(x) = (z1(x),..., zQ{x)),z(x) 6 Z is a performance, whereZ is called the objective space. Typically, two types of objective functionsare often considered, namely the sum and the bottleneck objectives. Theproblem is then to solve (MOCO) where the meaning of "min" has stillto be defined. Often the minimization in (MOCO) is understood in thesense of efficiency, also called Pareto optimality. Since we are interestedin the bi-objective case (denoted biCO), Q is set to 2 in the continuation.A solution x € X is called efficient if there is no other feasible solutionx' G X, such that zq(x') < zq(x) for all q — 1,2 with at least one strictinequality. The corresponding vector z(x) is called a non-dominated pointin the objective space Z. The set of all efficient solutions is denoted by E,and the representation of E in Z is called the efficient frontier, or also thePareto front.


For a (biCO), the efficient solution set is generally partitioned into twosubsets. The set SE of supported efficient solutions is a subset of E, suchthat z(x) for x G SE is an optimal solution of the following parametrizedsingle objective problem for some A = (A1, A2) with A1, A2 > 0 :

2

mm Â 'z 'Or ) (biCOA)9=1

If a convex hull C — conv{z(z) : x 6 E} is computed in the objectivespace, z(a;) for any supported efficient solution x £ SE belongs to theboundary of this convex hull. SE is composed of SE1 and SE2; SE1 isthe set of SE solutions x, such that z(x) is on the vertex of the convexhull C and SE2 = SE \ SE1. Computing the SE2 set is generally moredifficult than computing SE1, because the former requires the enumerationof all optimal solutions that minimize (biCO.\) with A given. A solution xin the set NE = E \ SE of non-supported efficient solutions is the one forwhich z(x) is not on the boundary of the convex hull. In the bi-objectivecase, NE solutions are located in the triangles drawn on two successivesupported efficient solutions in the objective space. There is no theoreticalcharacterization leading to the efficient computation of NE solutions.

Generally, several distinct efficient solutions xl,x2, x3 can correspondto the same non-dominated point z(x1) = z(x2) = z(x3) in the objectivespace. The solutions x1,x2,xi are said to be equivalent in the objectivespace. The number of such equivalent solutions is generally quite large, andso the enumeration of all of them may be intractable. In such a situation, itis impossible to design an efficient algorithm that can compute all efficientsolutions. All the introduced sets are then redefined restrictively accordingto the notion of a minimal complete set17 of efficient solutions. A set ofefficient solutions is minimal if and only if no two of its efficient solutionsare equivalent. The application of this definition to the introduced setsgives rise to the Em, SEm, SElm, SE2m, and NEm minimal completesets. Figure 23.1, which summarizes the inclusion relationship among thesesets, illustrates, for example, that SE\m C SEm C SE.

The published papers are sometimes unclear about the ability of thealgorithms that they present. Some authors claim that their algorithmcan enumerate "all" efficient solutions in terms of the set E. However, asmentioned before, it is generally difficult to compute this set. Thus, it isimportant to clearly define the class of efficient solutions handled by thealgorithm.


Pig. 23.1. Classification of efficient solutions

23.3. An Evolutionary Heuristic for Solving biCO Problems

The principle of our heuristic11'12'13 is based on the intensive use of threeoperators applied to a population composed uniquely of elite solutions. Thefollowing sections present the main features of the heuristic. Its algorithmicframework is shown in algorithm 1.

23.3.1. Overview of the Heuristic

Let us introduce PE, which denotes the set of elite solutions. PE is firstinitialized with a subset of supported solutions (routine detectPEinit).Three operators are used: a crossover (routine crossoverWithElites),a path-relinking (routine pathRelinkingWithElites), and a local search(routine localSearchOverNewElites). Upper and lower bound sets de-fined in the objective space (routine buildBoundSets) provide acceptablelimits for performing a local search. A genetic map, derived from the elitesolutions (routine elaborateGeneticInf ormation), provides useful infor-mation to crossover operators for fixing certain bits. This genetic infor-mation is refreshed periodically. Each new elite solution is noted (routinenoteNewSolutions). Three rules, which can be used separately or in com-bination, define a stopping condition (routine isTheEnd?). Basically, theheuristic can be stopped after a predefined effort (rule 1 with parameteriterationMax) or after an elapsed time (rule 2 with parameter timeMax).Rule 3 concerns the detection of unfruitful iterations. This rule allows theheuristic to be stopped when no new elite solutions are produced aftera certain number of iterations (parameters seekChangesFrequency andnoChangeMax).

Each iteration of the algorithm performs one crossover operation, whichgenerates one solution, and one path-relinking operation, which generates


Algorithm 1 The entry point

Require: input data which determines the objective functions and con-

straints ; parameter(s) for the stopping condition chosen.

Ensure: PE

- - Compute the initial elite population set PEinit

detectPEinit( data 4- , pelnit t) ; pe 4- pelnit-- | Compute the lower and the upper bound setsbuildBoundSets( data 4- , pe \. , lowerB t, upperB t )-- | A first local search on the PEina solution setlocalSearchOverNewElites(pe X )-- | Identify the genetic heritage and elaborate the genetic mapelaborateGeneticInf ormation( pe 4- , map "f)

- -| Initialize the running indicators

iteration 4- 1 ; elapsedTime <— 0 ; changes -f- 0 ; noMore «— 0repeat

- -| Elaborate a solution by crossovercrossoverWithElites( pe £ , map 4- , lowerB 4-, upperB 4-)

- -| Elaborate a series of solutions by path-relinkingpathRelinkingWithElites( pe £ , lowerB | , upperB 4-)

-- | Apply a local search to the new elite solutions in PE

localSearchOverNewElites( pe £)

- -| Refresh the genetic heritage by integrating genetic information-- | from the new PE into the existing mapif (iteration MOD refreshMapFrequency = 0) then

elaborateGeneticInf ormation( pe 4- , map | )end if

--| Identify the producer of new potential solutions and note--| a series of iterations without production of new PEnoteNewSolutions( pe 4- , changes t )if (iteration MOD seekChangesFrequency = 0) then

noMore «— (changes — 0 ? noMore + 1 : 0 ) ; changes «— 0end if

- -| Check the stopping condition(s)until isTheEnd?( iteration-|--(- 4- , elapsedTime 4- , noMore 4-)


a list of solutions. For each of these generated solutions, a local search isperformed (section 23.3.7), if and only if the solution is promising, meaningthat it falls into the "admissible area" (section 23.3.3). All solutions poten-tially efficient in this neighborhood are added to the existing set PE. Theiteration ends once again by performing a local search operation for eachnewly included elite solution in PE set.

Due to the simplicity of the method, performing one iteration consumeslittle CPU time, especially when the individual is not promising (in whichcase, no local search is performed). This allows the heuristic to be veryaggressive in implementing a generation process that performs many itera-tions. In addition the approximation set contains only elite solutions. Thealgorithm maintains PE, and iteratively improves it, moving it towards theset of exact efficient solutions. Thus, a poor solution, one that is far fromthe exact efficient frontier, will never be introduced to the approximation.At any time, the heuristic will produce only good approximations of theefficient frontier.

Unlike other Multi-Objective Evolutionary Algorithms2, our heuristicperforms no direction searches to drive the approximation process, and itrequires no ranking method (there is no fitness measure). This is important,given that direction searches and ranking are often criticized; the formerfor its difficulty in guiding the heuristic search along the efficient frontier,and the latter for requiring increased computing efforts.

23.3.2. The Initial Population

The initial population is the set SElm. When a polynomial time algorithmis available for the single objective combinatorial optimization, (biCO^) canbe solved efficiently for a fixed A. Clearly, this initial population set givesa complete description of the efficient frontier (figure 23.2). When such apolynomial time algorithm is not available, a heuristic can be used to obtainan approximate description of the efficient frontier10.

In any case, the exact or approximate set of SElm can be obtainedby solving the parametric problem (biCC-A) for all possible A. By applyinga dichotomic scheme in the objective space, solving this problem is possi-ble in time proportional to the size of |£.Elm| multiplied by the runningtime needed to solve (biCO,\) for a fixed A. In this chapter, we assumethat an efficient algorithm exists for computing the exact set of SElm.Clearly, some efficient solutions belonging to SE2m can be obtained usingthis computation principle as a byproduct. Obviously, these solutions are


integrated into the initial population set, denoted by PEinit. However, nospecific algorithm has been developed for computing solutions belonging toSE2m.

Fig. 23.2. Example of an initial solution set (squares). Bullets are solutions (Em \PEinit) that will be approximated by the heuristic

23.3.3. Bound Sets and Admissible Areas

The upper bound set is defined by the set of "local nadir points", where onenadir point is derived from two adjacent supported solutions. Specifically,if xi and x2 are two adjacent supported solutions in the objective space,the corresponding nadir is a point in Z with the following coordinates :

(ma,x(z1(x1),zl(x2)) ,ma,x(z2(x1),z

2{x2)))

Figure 23.3 illustrates the upper bound set. Using these points, an initialarea (SI) is derived inside the efficient frontier. A point in the area SI cancorrespond to either a feasible or an infeasible solution.

The lower bound set is defined in a symmetrical manner, by computing:

(min(z1{x1),z1(x2)) ,min (z2{x1), z2{x2)))


Fig. 23.3. Efficient solutions (squares), upper bound set (bullets), lower bound set(stars) and the admissible areas where a local search procedure will be performed tonew solutions. Filled squares are supported solutions, empty squares are non-supportedones. Grey triangles denote areas where efficient solutions can exist

for all adjacent supported solutions x\ and X2 in the objective space. Ob-viously, this bound also defines a second admissible area (52) outside theefficient frontier which is composed only of infeasible solutions. The use ofareas 51 and 52 allows the design of an oscillation strategy15 between thefeasible and infeasible parts of the search space along the efficient frontier.

These bound sets are used in a heuristic strategy to determine whethera solution is a candidate for an intensive search in its neighborhood. Allsolutions in both areas are considered promising for finding new elite solu-tions. A local search is performed, beginning with such promising solutions.Because no effort is wasted on solutions outside of the admissible area, thisheuristic strategy helps to save computing effort.

23.3.4. The Genetic Map

The mechanism used here is inspired by the principle of pheromones inartificial ant colonies6. Assuming that similarities exist between efficientsolution vectors, we compute the occurrence frequency of the values of the


elite solutions for each component (which is often a variable). A roulettewheel is built to store those occurrence frequencies which provides geneticinformation. A genetic map comprised of the roulette wheels of each solutionvector's component, contains the genetic heritage of our population. Thisinformation is used extensively by the crossover operator (section 23.3.5).

The genetic information is always derived from elite solutions. The ini-tial map is thus derived only from exact supported solutions. Periodically,the genetic map is refreshed. Once PE (the current set of elite solutions)has been significantly renewed, it is used to rebuild the roulette wheels. Thisactivity indicates an important evolution in the population. In the currentversion, refreshment occurs after a predefined number of generations (pa-rameter ref reshMapFrequency) has been performed. This parameter valuehas been experimentally set to 100 000 generations.

23.3.5. The Crossover Operator

For each crossover operation, two parent individuals, x\ and X2, are ran-domly selected from the current elite population, and one offspring x3 isproduced. Genes common to the parents are replicated in the child, andthe other genes are determined using the genetic map, on the basis of theoccurrence frequency stored in the roulette wheel (figure 23.4).

Fig. 23.4. Crossover operator principle

Suppose the values of a component j for two parents are different. Thevalue of the component j for an offspring can be randomly determinedaccording to the probability value stored in the roulette wheel j . However,the solution so obtained may not be feasible in general. To ensure feasibility,the value of component j is determined randomly from a list of feasibleselections. Another option allows infeasible solutions to also be consideredas candidates for a local search (depending whether or not they are locatedin the admissible area).


23.3.6. The Path-Relinking Operator

Path-relinking generates new solutions by exploring the trajectories thatconnect elite solutions. Starting from one solution -the initiating solution,a path is generated through the neighborhood space that leads to the othersolution -the guiding solution15. Because the population contains only elitesolutions, the presence of a path-relinking operator in our heuristic is anatural development.

A path-relinking operation starts by randomly selecting I A and IB,two individuals from the current elite population (figure 23.5). Becauseboth individuals are elite, both could potentially be the guiding solution.Let I A be the initiating solution and IB, the guiding solution. The path-relinking operation generates a path IA(— / O ) , / I , . .. , / B , such that thedistance between h and IB decreases monotonically in i, where the distanceis denned as the number of positions for which different values are assignedin Ii and IB-

Fig. 23.5. Path-relinking operator principle

Although many such paths may possibly exist, one path is chosen usingrandom moves based on a swap operator. (Details are provided in Ref. 13.)Such randomness introduces a form of diversity to the solutions generatedalong the path. For every intermediate solution /,, a single solution is gen-erated in the neighborhood (figure 23.6).

Like the crossover operation, the bound set is used to determine whetherthe solution produced falls into the admissible area. If so, the solution iscompared with the current list of elite solutions, and a local search is per-formed. Otherwise, no improvement strategy is triggered, and the solutionis simply ignored.


Fig. 23.6. Illustration of a possible path construction. I A and IB are two individualsrandomly selected from the current elite population (small bullets). I A is the initiatingsolution, and IB is the guiding solution. N(IA) is the feasible neighborhood accordingto the move defined. I A — I\ — li — /3 — I4 — IB is the path that is built

23.3.7. The Local Search Operator

A classic neighborhood structure based on a swap move has been adoptedfor implementing the local search operator. Let us consider two positionsji and j 2 of a solution x, where Xjx and Xj2 are values in positions j \and J2, respectively. Then using the swap move (ji,.72)1 the set of pairwiseexchanges (iji,ij2) with jl = l , . . . ,n — 1 and j2 = jl,...,n defines anassociated neighborhood J\f(x) in the current solution x.

J\f(x) may contain infeasible solutions. Such solutions y 6 Af(x) may beconsidered if y is located in area 52, or if y is located in area SI and z(y) isnot dominated by a solution from PE. Because the computational cost for alocal search can be significant (O(n2)), the local search will only be triggeredon the promising candidate solutions that result from the crossover and thepath-relinking operators.

Evolutionary Operators Based on Elite Solutions for HCO 567

23.4. Application to Assignment and Knapsack Problemswith Two Objectives

The general principle of our population-based heuristic has been applied totwo classic bi-objective (MOCO) problems: the assignment problem (MAP)and the knapsack problem (biKP).


The assignment problem with two objectives (biAP) can be formulated asfollows, where c\ are non-negative integers and x — ( i n , . . . , xnn) :

n n

"min"^) =Y,Y,cuXil 5 = 1'2

i=i i=i

n

n

^ x u = 1 i = l,...,ni=i

xu E {0,1}The (single-objective) assignment problem (AP) is a well-known funda-

mental combinatorial optimization problem. The goal is to find an optimalassignment of n tasks to n positions so that every task is assigned to exactlyone position, and no two tasks are assigned to the same position, c^ de-notes the cost incurred by assigning the task i to position j for objective q.Efficient specific algorithms exist to solve the single objective assignmentproblem, such as the Hungarian method or the successive shortest pathmethod1.

The 0-1 knapsack with two objectives (biKP) can be formulated asfollows, where coefficients cj, Wi and w are nonnegative constants and x =(xi, . . . ,xn) :

n

" max " z" (x) = YsC1xi 9 = 1 > 2

2 ^ WiXi < U

xt G {0,1}

The single objective 0 - 1 knapsack problem is also a well-known com-binatorial optimization problems. Although it is known to be NP-hard, it


can be solved efficiently in a practical sense by a branch and bound methodor by using dynamic programming. (See the book by Martello and Toth19

for details about knapsack problems.) In addition, a fully polynomial-timeapproximation scheme exists.

23.4.2. Experimental Protocol

A library of numerical instances for MOCO problems is available online atwww.terry.uga.edu/mcdm/. This library contains data for both the assig-ment and the knapsack problems. A series of fifteen instances were used forour (MAP) experiments. The objective coefficients c\,• were generated ran-domly in the range [1,20], with a problem size n ranging from 5 to 100. Forthe biKP, we used nine randomly generated problem instances, where theproblem size n ranges from 100 to 500. The objective's coefficients c\ andweights Wi were also randomly generated in the range [1,100]. The entry won the right-hand side of the constraint 5ZiLi wixi ^ w is correlated withvector w as follows : u = 0.5 x Yî=i wi-

For these problems, the minimal complete set of efficient solutions Em

was computed, using Cplex on a mainframe5, and broken down into thecharacteristic subsets, specially SElm and SE2m.

The computer used for the experiments was a desktop equipped with aPentium 4 2.6GHz processor, with 1 GB of RAM installed. The operatingsystem was Redhat linux, version 9, and the algorithms were implemented inlanguage C. The compilation was done using gcc-2.95.3 with the optimizeroption -03.

The following three stopping rules were used both separately and incombination.

Rule 1: number of iterations. The heuristic is stopped after a predeter-mined effort (parameter iterationMax).

Rule 2: timeout. The heuristic is stopped after a predefined elapsed time(parameter timeMax);

Rule 3: unfruitful iterations. After a cycle of a predetermined number ofiterations (parameter seekChangesFrequency), the rule checks tosee if new elite solutions were added during the cycle. Consecutivecycles without change are counted, and the heuristic is stoppedwhen a predetermined number of cycles has been recorded (param-eter noChangeMax).

The default parameter value of iterationMax in the heuristic is 250 000


iterations, and the genetic map is refreshed every 100 000 iterations. Thesuggested value for rule 3 is 2 cycles of 100 000 iterations.

For each problem size, we repeated the experiments five times, using thedifferent random seeds. We used Mi (introduced by Ulungu23) for measur-ing the ratio of exact efficient solutions contained in the elite solution setPE, i.e., Mi = \PE(~]Em\/\Em\. Minimal, average, and maximal valuesfor Mi for each input size, have been recorded.

23.5. Numerical Experiments with the Bi-ObjectiveAssignment Problem

The initial population set, PEinn is computed by solving a series of para-metric assignment problems according to a dichotomic scheme. Each singleobjective assignment problem is solved by the Successive Shortest Pathalgorithm. By using such a dichotomic scheme to generate SElm, somesolutions belonging to SE2m can also be generated. Thus, PEinu containsall SElm solutions and some SE2m solutions.

Assignments are coded as permutations of n tasks instead of the n2-dimensional 0 — 1 vector. For example, the coded solution x = (4,2,1,5,3)means task i = 4 is assigned to position j = 1, i — 2 to j = 2, etc. Aneighborhood J\f(x), associated with the current solution x, is the set ofpermutations obtained by applying pairwise exchanges (iji,ij2) to the cur-rent solution, where j \ — l,...,n — 1 and j2 = jl + l,...,n. Any moveresulting from the pairwise exchange preserves the feasibility of the assign-ment. Consequently, no oscillation is designed for this problem. The geneticmap is composed of n roulette wheels corresponding to n positions, each ofwhich represents the occurrence frequency of assignment "i in position j "in vector x for elite solutions. The crossover and path-relinking operatorsare designed as described in Sections 23.3.5 and 23.3.6.

23.5.1. Minimal Complete Solution Sets and Initial EliteSolution Set

Figure 23.7 shows how the Em, SEm, NEm, SElm, SE2m and PEinit

grow as the problem size increases. The CPU time needed for computingPEinu is small compared to the time needed to run the heuristic.

According to figure 23.7, the number of solutions in each subset increaseslinearly with the input size for these instances. Interestingly, the differencebetween SElm and SE2m seems to decrease with input size.

Examining PEinn column confirms that PEinu contains all SElm so-


Table 23.92. Number of solutions in the minimal complete set Em and its dis-tribution in the subsets. Initial set of elite solutions (number and CPUt). PEinitcomputed using a successive shortest path (SSP) method

instance Cplex SSPn Em NEm SEm SElm SE2m PEinH CPUt5 8 5 3 3 0 3 0.010 16 10 6 6 0 6 0.015 39 27 12 12 0 12 0.020 55 42 13 13 0 13 0.025 74 49 25 20 5 21 0.030 88 61 27 24 3 24 0.035 81 54 27 25 2 25 0.040 127 73 54 34 20 38 0.045 114 71 43 32 11 33 0.050 163 96 67 39 28 43 1.060 128 84 44 39 5 41 1.070 174 114 60 42 18 46 2.080 195 126 69 47 22 50 3.090 191 108 83 51 32 60 5.0100 223 122 101 50 51 59 6.0

Fig. 23.7. Minimal complete sets and the initial elite sets {PEinit)


lutions, plus some additional solutions from SE2m. For instances with alarge input size, it is not surprising to observe an increasing difference be-tween SElm and the PEinit. Because solutions belonging to SE2m increasewith input size, the probability of generating a SE2m solution using thedichotomic principle is increased.

23.5.2. Our Results Compared with Those Existing in theLiterature

The reported computational results in column two of table 23.93 are takenfrom the results published in Tuyttens et al.22. They were obtained for onerun only, using the improved MOSA method with the same set of numericalinstances. Figure 23.8 shows that the MOSA method performs poorly in

Table 23.93. MOSA vs the population-based heuristic (rules 1 and 3 enabled)

instance ~ MOSA ~ PEinit

nxn Mi Mi CPUt #itermin avg max avg avg

5~~ 87.5 100.0 iOOO 100.0 00 100 00010 56.2 100.0 100.0 100.0 1.0 100 00015 25.6 100.0 99.5 100.0 1.0 100 00020 3.7 96.4 99.3 100.0 4.0 190 00025 0.0 93.2 94.1 98.7 4.2 250 00030 3.4 94.4 96.9 97.8 7.2 250 00035 0.0 95.1 96.1 96.3 7.4 250 00040 0.0 86.7 90.8 93.0 8.8 250 00045 0.0 84.2 87.7 89.5 10.4 250 00050 0.0 86.7 87.5 88.5 14.2 250 00060 N.A. 71.1 73.6 76.6 14.4 250 00070 N.A. 67.8 69.4 70.1 20.4 250 00080 N.A. 78.5 80.0 84.5 27.4 250 00090 N.A. 66.0 71.7 74.4 31.6 250 000

100 N_A. 58.7 61.4 66.4 43.8 250 000

terms of the quality measure Mi, especially as the size of the instances in-creases. Since the computer used to obtain these results is different from theone used in our experiments (MOSA results were computed on a DEC3000alpha), discussion of CPUt is not possible.

The columns on the right-hand side of table 23.93 report the results ofour heuristic when rules 1 and 3 were activated. The number of generationswas tuned to 250 000 iterations, and the genetic map was refreshed every100 000 iterations. The CPUt indicated is an average value for five complete


insiance s i ^ i i x n;

Fig. 23.8. The population based heuristic with rules 1-3 activated, compared withMOSA

runs of the heuristic. The time reported includes the time for computingthe initial population of elite solutions, and the time used for the approx-imation. The rightmost column of the table gives the average number ofiterations performed by the algorithm during one generation. Any valueother than 250 000 indicates that rule 3 was triggered before rule 1.

The comparison of MOSA results with those produced by thepopulation-based heuristic proposed in this chapter, underlines clearly thatour heuristic outranks the MOSA performances (figure 23.8). We presumethat our heuristic consumes more time than the MOSA method. (CPUt forMOSA were reported to be 5s and 246s respectively for instances 5 x 5 and50 x 50.) However, our heuristic shows two important features that convinceus that our method would outperform the MOSA method in tests run onthe same computer. First, the solution detection evolves very quickly dur-ing the early iterations of the generation13. Despite the brief time allowedfor the generation, the quality of the approximations is already good. Sec-ond, our heuristic is able to improve its approximation when more time isallowed for the generation process, which does not seem to be possible forMOSA, whose approximation did not improve, even given more time. Also,


according to the values reported for the quality indicator M2 in Tuyttenset al.22, the MOSA method rapidly has difficulty in detecting good ap-proximations of the efficient frontier. Because our heuristic uses SElm andmanages only elite solutions, use of the M2 quality indicator makes no sensehere (M2 = 100% at all times).

23.6. Numerical Experiments with the Bi-ObjectiveKnapsack Problem

In this second set of numerical experiments, the initial population set,PElinit is also computed by solving a series of the parametric knapsackproblem, denoted by (biKP\). As with the biAP, this is done using the di-chotomic scheme. The single objective knapsack problems are solved usingthe branch and bound procedure19.

A solution is coded as an n-dimensional 0-1 vector. A neighborhoodAf(x), associated with the current solution x, is the set of permutationsobtained by applying pair wise exchanges (iji,ij2), with jl = l,...,n — 1and j2 = jl + 1, ...,n to the current solution. An infeasible solution x canresult from such a pairwise exchange, due to the violation of the knapsackconstraint. Such a solution x is not deleted when its vector performance z(x)either belongs to area S2 or, belongs to area SI with z(x) not dominated bya PE solution. (SI and S2 are defined in section 23.3.3, and are consideredhere in the maximization case.)

The principle is to admit promising infeasible solutions located near theefficient frontier in the objective space. All accepted infeasible solutions aresubject to a local search. Infeasible solutions are used as trials for jumpingbetween the feasible and infeasible domains in the decision space. However,only feasible solutions are considered in the continuation. This use of infea-sible solutions designs an oscillation strategy to seek the neighborhoods ofpromising infeasible solutions. This strategy is also triggered for infeasiblesolutions resulting from the crossover and path-relinking operations.

The path-relinking operator adopted for the (biKP) follows the prin-ciple described in section 23.3.6. However, the path between two solutions71 and 72 may be long, and the number of solutions elaborated along thepath may be quite important, requiring a large number of operations to beperformed for each path-relinking built.

To reduce the effort required, more than one swap is performed for eachstep in the path, which speeds up path construction. Specifically, if 5 is thenumber of different genes between 71 and 72, the value p G [1,6] is selected


at random, and only p solutions are built along the path. According to thisstrategy, the path elaborated is a sample of solutions from 7o, I\,..., 12.

23.6.1. Minimal Complete Solution Sets and the InitialElite Solution Set

Figure 23.7 shows how the sizes of Em, SEm, NEm, SElm, SE2m andPEinit grow as the problem size increases.

Although the knapsack problem is known to be NP-hard, the runningtime required by the branch and bound algorithm is reasonable for theseinstances. If the CPUt for computing PEinit should become large for otherinstances, an approximate algorithm (like a greedy one) instead of branchand bound could be advantageously used. (The impact of the initial solutionset on the detection ratio is discussed in Gandibleux et al10.)

Table 23.94. Distribution of efficient solutions Em, its subsets and PEinit computed witha Branch & Bound-based method

instance Benchmark B&Bn Em NEm SEm SElm SE2m PEinit CPUt

100 172 149 23 23 0 23 0.0150 244 216 28 28 0 28 0.0200 439 400 39 38 1 38 0.0250 629 579 50 50 0 50 0.0300 713 658 55 54 1 54 0.0350 871 805 66 63 3 63 0.0400 1000 930 70 69 1 69 0.0450 1450 1369 81 73 8 75 1.0500 1451 1353 98 96 2 96 1.0

As shown in Figure 23.9, the number of solutions in each subset exceptSE2m increases linearly with the input size for these instances. Unlike thesituation with biAP, SE2m is very small and remains insignificant evenwhen input size increases.

As for the assignment problem, PEinit is composed of all solutions be-longing to SElm, plus some additional solutions from SE2m.

Table 23.94 presents all the solutions computed. As shown, \NEm\ growsquickly as instance size increases. On the other hand, \SEm\ stays small.These results are very contrasted from those produced with the biAP.


Fig. 23.9. Minimal complete sets and initial elite set (PEinit)

23.6.2. Our Results Compared with Those Existing in theLiterature

The results of the proposed population-based method have been comparedwith those obtained using the MGK algorithm10, a genetic algorithm thatuses crossover, mutation and local search operators.

The results of the comparison are presented in Table 23.95 and Fig-ure 23.10. (Both algorithms were implemented on the same computer, de-scribed in section 23.4.2). This comparison highlights the advantages of ourproposed heuristic over the MGK. The proposed algorithm produces a bet-ter approximation of efficient solutions E with less CPUt. As with MAP,the approximation PE is improved when more CPU time is allowed, whichmeans that when the rule 1 is triggered, the approximation has not yetbeen saturated.

23.7. Conclusion and Perspectives

We have described a population-based heuristic for solving bi-objectivecombinatorial optimization problems. This heuristic uses three operators- crossover, path-relinking and local search - on a population composed


Table 23.95. M\ (avg) and CPUt(avg, including the computation of initial elite set).In this experiment, iterationMax of stopping rule 1 is 600,000. The elapsedTime ofstopping rule 2 is shown in the rightmost column

instance~ EMO01 rulel rule2n Mi CPUt Afi CPUt Mi elapsedTime

100 98.8 110.6 99.9 42.0 99.8 50150 98.6 225.0 99.6 148.8 99.6 200200 96.7 495.2 99.0 274.6 99.2 400250 95.3 740.2 97.9 379.6 98.4 700300 94.8 1693.0 97.1 665.6 97.8 1500350 95.7 2336.0 96.9 860.8 98.0 2000400 92.8 2949.8 95.1 925.4 97.3 2500450 91.6 5896.4 94.3 1935.2 96.9 5500500 91.8 6297.6 92.2 1633.2 96J3 6000

Fig. 23.10. Comparison with EMO01

uniquely of elite solutions. Based on simple principles, our heuristic is easyto implement and involves only two parameters that require no tuningphase.

In the first step, a set of efficient supported solutions is computed. Thisset composes the initial population of elite solutions. In the second step,this population undergoes a generation process involving the three opera-


tors. Through computational experiments with the biojective assignmentproblem and the bi-objective knapsack problem, we have verified that, incomparison with other heuristics, our proposed heuristic is able to pro-duce an excellent approximation of the efficient frontier even given a shortcomputing time (i.e. a small number of iterations).

Several perspectives for further research exist. The first path concernsthe elaboration of the genetic information. In order to handle larger in-stances more efficiently, it must be useful to divide the genetic informa-tion into sectors along the efficient frontier, using a region-based principle3.Such a technique would help maintain a representative genetic map locallyand avoid genetic map sterilized by too many diverse solutions. Secondpromising path would be to design a less random path-relinking operator.Generating several neighbors according to a given characteristic, and select-ing the best neighbor from among them, could make path-relinking evenmore powerful. A third possibility concerns the local search. In the cur-rent version, a local search is systematically applied to a solution when itis located in the promising zone. The same solution can be visited severaltimes and thus repetitively generate the same neighbors. A filtering on so-lutions before starting the local search would help to reduce computationtime. Lastly, further experimentation with a broader class of combinatorialoptimization problems would help to confirm the resolution performancesof our heuristic.

References

1. R. K. Ahuja, T. L. Magnanti, and J. B. Orlin. Network flows: Theory, Algo-rithms, and Applications, Prentice-Hall, 1993.

2. C. Coello, D. Van Veldhuizen and G. Lamont. Evolutionary Algorithms forSolving Multi-Objective Problems. Kluwer Academic Publishers, New York,2002.

3. D.W. Corne, N.R. Jerram, J.D. Knowles, and M.J. Oates. PESA-II: Region-based Selection in Evolutionary Multiobjective Optimization. In Proceedingsof the Genetic and Evolutionary Computation Conference (GECCO-2001),pp. 283-290, Morgan Kaufmann Publishers, 2001.

4. P. Czyzak and A. Jaszkiewicz. A multiobjective metaheuristic approach tothe localization of a chain of petrol stations by the capital budgeting model.Control and Cybernetics, 25(1):177-187, 1996.

5. F. Degoutin and X. Gandibleux. Un retour a"experiences sur la resolutionde problemes combinatoires bi-objectifs, 5e journee du groupe de travail Pro-grammation Mathematique MultiObjectif (PM20), Angers, France, mai 2002.

6. M. Dorigo and G. Di Caro. The Ant Colony Optimization Meta-Heuristic.In D. Corne, M. Dorigo and F. Glover, editors, New Ideas in Optimization,


McGraw-Hill, 11-32, 1999.7. M. Ehrgott. Multiple Criteria Optimization - Classification and Methodology.

Shaker Verlag, Aachen (1997).8. M. Ehrgott and X. Gandibleux. Multiobjective Combinatorial Optimization.

In Multiple Criteria Optimization: State of the Art Annotated BibliographicSurvey (M. Ehrgott and X. Gandibleux Eds.), pp.369-444, Kluwer's Interna-tional Series in Operations Research and Management Science : Volume 52,Kluwer Academic Publishers, Boston, 2002.

9. X. Gandibleux, N. Mezdaoui, and A. Freville. A tabu search procedure tosolve multiobjective combinatorial optimization problems. In R. Caballero,F. Ruiz, and R. Steuer, editors, Advances in Multiple Objective and GoalProgramming, volume 455 of Lecture Notes in Economics and MathematicalSystems, pages 291-300. Springer Verlag, Berlin, 1997.

10. X. Gandibleux, H. Morita, N. Katoh. The Supported Solutions used as a Ge-netic Information in a Population Heuristic. In Evolutionary Multi- CriterionOptimization (E. Zitzler, K. Deb, L. Thiele, C. Coello, D. Corne Eds.). Lec-ture Notes in Computer Sciences 1993, pp.429-442, Springer, 2001.

11. X. Gandibleux, H. Morita, N. Katoh. Use of a genetic heritage for solvingthe assignment problem with two objectives. In Evolutionary Multi-CriterionOptimization (C. Fonseca, P. Fleming, E. Zitzler, K. Deb, L. Thiele Eds.).Lecture Notes in Computer Sciences 2632, pp 43-57, Springer, 2003.

12. X. Gandibleux, H. Morita, N. Katoh. Impact of clusters, path-relinking andmutation operators on the heuristic using a genetic heritage for solving as-signment problems with two objectives. MIC2003 Fifth Metaheuristics Inter-national Conference, Kyoto, Japan, August 25 - 28, 2003.

13. X. Gandibleux, H. Morita, and N. Katoh. A population-based metaheuris-tic for solving assignment problems with two objectives. Technical Reportn°7/2003/ROI, LAMIH, Universite de Valenciennes, 2003. To appear in Jour-nal of Mathematical Modelling and Algorithms.

14. X. Gandibleux, M. Sevaux, K. Sorensen and V. T'kindt (Eds.) Metaheuris-tics for Multiobjective Optimisation Proceedings of the workshop "MOMH:Multiple Objective MetaHeuristics", November 4-5, 2002, Carre des Sciences,Paris. Lecture Notes in Economics and Mathematical Systems 535, 249 pages,Springer Berlin.

15. F. Glover and M. Laguna. Tabu search. Kluwer Academic Publishers, Boston,1997.

16. M.P. Hansen. Tabu search for multiobjective combinatorial optimization: TA-MOCO. Control and Cybernetics, 29(3):799-818, 2000.

17. P. Hansen. Bicriterion path problems. In Multiple Criteria Decision MakingTheory and Application (G. Fandel and T. Gal Eds. ). Lecture Notes in Eco-nomics and Mathematical Systems 177, pp 109-127. Springer Verlag, Berlin,1979.

18. B. Malakooti, J. Wang, and E.C. Tandler. A sensor-based accelerated ap-proach for multi-attribute machinability and tool life evaluation. Interna-tional Journal of Production Research, 28:2373, 1990.

19. S. Martello and P. Toth. Knapsack Problems-Algorithms and Computer Im-


plementations. John Wiley & Sons, Chichester (1990).20. J.D. Schaffer. Multiple objective optimization with vector evaluated genetic

algorithms. In J.J. Grefenstette, editor, Genetic Algorithms and their Appli-cations: Proceedings of the First International Conference on Genetic Algo-rithms, 93-100. Lawrence Erlbaum, Pittsburgh, 1985.

21. P. Serafini. Simulated annealing for multiobjective optimization problems. InProceedings of the 10th International Conference on Mutiple Criteria Deci-sion Making, Taipei-Taiwan, volume I, pp.87-96, 1992.

22. D. Tuyttens, J. Teghem, Ph. Fortemps and K. Van Nieuwenhuyse. Perfor-mance of the MOSA method for the bicriteria assignment problem, Journalof Heuristics, 6 pp. 295-310, (2000).

23. E.L. Ulungu. Optimisation Combinatoire multicritere: Determination deI'ensemble des solutions efficaces et methodes interactives, Universite deMons-Hainaut, Faculte des Sciences, 313 pages, 1993.

CHAPTER 24

MULTI-OBJECTIVE RECTANGULARPACKING PROBLEM

Shinya WatanabeDepartment of Human and Computational Intelligence,

Ritsumeikan University,1-1-1 Nojihigashi, Kusatsu, Shiga 525-8577, Japan.

E-mail: sinQsys.ci.ritsumei.ac.jp

Tomoyuki Hiroyasu

Doshisha University Department of Engineering1-3 Tatara Miyakodani,Kyo-tanabe, Kyoto, 610-0321, JAPAN


This chapter describes an implementation of a Multi-Objective GeneticAlgorithm (MOGA) for the Multi-Objective Rectangular Packing Prob-lem (RP). RP is a well-known discrete combinatorial optimization prob-lem arising in many applications, such as a floor-planning problem in theLSI problem, truck packing problem, etc. Over the last 20 years, Evo-lutionary Algorithms (EAs), including Genetic Algorithms (GA), havebeen applied to RP, as EAs are adapted for pattern generation. On theother hand, many cases of RP have become multi-objective optimiza-tion problems. For example, floor-planning problems should take care ofthe minimum layout area, the minimum length of wires, etc. Therefore,RP is a very important problem as an application of MOGA. In thischapter, we describe the application of MOGA to Multi-Objective RP.We treat RP as two objective optimization problems to archive severalcritical layout patterns, which have different aspect ratios of packingarea. We used the Neighborhood Cultivation GA (NCGA) as a MOGAalgorithm. NCGA includes not only the mechanisms of effective algo-rithms, such as NSGA-II and SPEA2, but also the mechanism of neigh-borhood crossover. The results were compared to those obtained usingother methods. Through numerical examples, we found that MOGA isa very effective method for RP. Especially, NCGA can provide the best

581

582 S. Watanabe and T. Hiroyasu

solutions as compared to other methods.

24.1. Introduction

In this chapter, we describe the implementation of a Multi-ObjectiveGenetic Algorithm (MOGA) for the Multi-Objective Rectangular Pack-ing(RP). RP is a well-known discrete combinatorial optimizationproblem arising in many applications, such as the VLSI layoutproblem2'4'10'12-15-16,the truck packing problem 14, etc. As RP is a well-known NP-hard and discrete problem, good heuristic methods such as Ge-netic Algorithms (GA) or Simulated Annealing (SA), are generally applied.

The VLSI layout problem is one of the most important RP becausethere are many VLSI layout problems such as chip floor planning, standardcell, macro cell digital placement, and analog placement which have thesame goal of optimally packing arbitrarily sized blocks. In addition, layoutcomplexity is becoming an important design consideration as VLSI deviceintegration is doubling every two to three years. In addition, a floor-layoutproblem is essentially a multi-objective optimization problem involving theminimum layout area, the minimum length of wires, the minimum overlap-ping area, etc. Therefore, RP is a very important problem as an applicationof a MOGA.

As the variety of packing is infinite, the important key for successfuloptimization is the introduction of a finite solution space that includesan optimal solution. We used a sequent-pair to represent the solution ofrectangular packing. Sequence-pair schemes can represent not only slicingstructures but also non-slicing structures.

In this chapter, we describe the application of MOGA to Multi-Objective RP. We treat the RP as two objective optimization problemsto achieve several critical layout patterns, which have different aspect ra-tios of packing area. We used Neighborhood Cultivation GA (NCGA) as aMOGA algorithm 17. NCGA includes not only the mechanisms of effectivealgorithms, such as NSGA-II and SPEA2, but also the mechanism of neigh-borhood crossover. This model can be used to derive good nondominatedsolutions in typical multi-objective optimization test problems. The resultswere compared to those obtained with other methods: SPEA2, NSGA-II,and non-NCGA (NCGA without neighborhood crossover). Through numer-ical examples, we found that MOGA was a very effective method for RP,because several good solutions were found with small iterations in one trial.NCGA obtained the best solutions with a small area of layout as compared

Multi-Objective Rectangular Packing Problem 583

to the other methods.Just after this section, we introduce the formulation of RP(Section 24.2),

followed by a discussion of the application of GA for RP (Section 24.3).Section 24.4 introduces our proposed NCGA. Finally, Section 24.5 presentsthe results of the experiments for test data.

24.2. Formulation of Layout Problems

Many layout problems can be treated as rectangular packing problems (RP)in the real world. RP involves the placement of a given set of rectangularblocks of arbitrary size without overlap on a plane within a rectangle ofminimum area, and is a well-known discrete combinatorial optimizationproblem in many applications, such as VLSI layout problems 10<n>l2

j thetruck packing problems 14, etc.

Thus, it is a difficult and time-consuming problem to solve RP by com-puter simulation. In RP, the number of possible placements of rectanglesincreases exponentially with an increasing number of rectangles10.

24.2.1. Definition of RP

Here, we would like to define RP. Let M be a set of m, rectangular blocks(blocks) of fixed orientations, whose heights and widths are given in realnumbers. Packing of M requires placement of the blocks with no overlap.This problem is NP-hard, and therefore good heuristics are generally re-quired. A packing example is shown in Fig.24.1.

As the heights and widths of blocks are real numbers, RP is not simplya combinatorial problem.

24.2.2. Multi-Objective RP

There have been a number of previous studies of multi-objective RP, forexample in structural synthesis of cell-based VLSI circuits 1, placement ofpower electronic devices on liquid-cooled heat sinks6, and the truck packingproblem 14, etc.

In this chapter, we treat the RP as a bi-objective optimization. Thismulti-objective RP aims to minimize not only the packing area but alsoboth the width and height of the packing area. In this formulation, wecan obtain various Pareto solutions that have different aspect ratios byperforming a single search. Therefore, a decision maker can select the aspectratio of the packing area.

584 5. Watanabe and T. Hiroyasu

Fig. 24.1. Example of placement.

Next, we will describe the formulation of the multi-objective RP adoptedin this chapter.

min fi (x) = width of packing area of blocksmin fi (x) — length of packing area of blocks

These two objectives have tradeoff relations with each other.

24.3. Genetic Layout Optimization

GAs have been applied to various aspects of digital VLSI design. Examplesinclude cell placement (layout)4'10'11'12'15'16, channel routing9, test patterngeneration13, etc.

There are two key issues in the use of a GA for RP.

• Representation of individuals• GA operators

These issues have a strong influence on the search ability of the GA. Ifthese points are not carefully considered, it is not possible to obtain goodresults in real time.

The following sections describe these considerations in some detail.


Fig. 24.2. Slicing structure and Polish expression.

24.3.1. Representations

There are two distinct spatial representations of placement configurations2.The first is the so-called flat or absolute representation that has been usedin earlier studies4. In this method, block positions are denned in terms ofabsolute coordinates on a gridless plane.

As this method allows blocks to overlap in possibly illegal ways, thismethod uses a weighted penalty cost term that is associated with infeasi-ble overlaps, and this penalty must be driven to zero in the optimizationprocess. However, the total overlap in the final placement solution is notnecessarily zero. In addition, the weighted penalty cost must be carefullyinstituted: if it is too small, the blocks may tend to collapse, while if itis too large we may not obtain good search ability. Moreover, the packingvariety of this method is infinite.

In contrast to the fiat representation, in the topological representations,block positions are specified in a relative manner. The most common rep-resentations are based on the slicing model that assumes the blocks areorganized in a set of slices that recursively bisect the layout horizontallyand vertically. The direction and nesting of the slices is recorded in a slic-ing tree or equivalently in a normalized Polish expression15. In this method,blocks cannot overlap, which may lead to improved efficiency in the place-ment optimization. However, if the optimal solution is not non-slicing, thisrepresentation cannot obtain the optimal solution as this representation isrestricted to slicing floor plan topologies. Fig.24.2 shows an example of aslicing structure.

Recently, the sequence-pair, first suggested by Murata et al. 10, andbounded-sliceline grid (BSG), proposed by Nakatake et al.11, have been


proposed as solutions to this problem. The sequence-pair encodes the "left-right" and "up-down" positioning relations between blocks using two se-quences of blocks. BSG can define orthogonal relations between blockswithout physical dimensions.

These methods are particularly suitable for stochastic algorithms, suchas GA and simulated annealing (SA). These encoding schemes can representnot only slicing structure but also non-slicing structures.

In this chapter, we used sequence-pair as the representation of a solution,as this method can perform more effective searches than BSG. The numberof all sequence-pair combinations is smaller than that of BSG.

24.3.1.1. Sequence-Pair

The sequence-pair is used to represent the solution of rectangular packing.Each block has the sequence-pair (F_, F+). In Fig. 24.3, an example ofsequence-pair is shown. To express the relative position, packages are lo-cated on the sequence-pair surface. This surface consist of two axes: F_and F+. These axes are not located perpendicularly and horizontally butleans 45 degrees. The relative positions of two blocks are defined by com-paring the sequence-pair of the two blocks. Let blocks A and B have thesequence pairs (xa-,ya+) and (xb-,Ub+), respectively. In this case, there isa relationship between the positions of the blocks and the sequence pairsas follows:

when xa- < a;j_ and ya+ < yb+, A is in the left side of Bwhen xa- > Xb- and ya+ > yb+, A is in the right side of Bwhen a;o_ < Xb- and ya+ > yb+, A is in the upper side of B

when a;a_ > Xb- and ya_ < yb+, A is in the bottom side of B.

In addition to the sequence-pair, each block has the orientation informa-tion 0. This information instructs the direction of the block arrangement.

24.3.1.2. Encoding System

A gene of the GA consists of three parts: F_, F+ , and 0. Fig. 24.3 showsthe encoding for 6 blocks.

The relative position (b) is derived from the encoding information(Fig. 24.3(c)). This position shows the floor plan (a). In this chapter, eachblock is settled lengthwise or breadthwise. Therefore, 0 takes a value of 0or 1.

Multi- Objective Rectangular Packing Problem 587

Fig. 24.3. Encoding example of sequence-pair.

24.3.2. GA Operators

For an effective search, it is necessary to choose an appropriate method.Especially, crossover, which is the primary method of optimization, mustbe chosen carefully.

The traditional genetic crossover operator, one-point crossover, cannotbe applied without modification to the combination problem, such as RPor TSP. Some crossovers for combination problems have been previouslyproposed.

The three crossover methods, which are one of the most commonly usedmethods for combination problems16, can be described as follows (Fig.24.4shows the concept of these crossover).

Order crossover (OX) : Pass the left segment from parent 1. Constructthe right segment by taking the remaining blocks from parent 2 inthe same order.

Partially mapped crossover (PMX): The right segments of both par-ents act as a partial mapping of pairwise exchanges to be performedon parent 1 to generate the offspring.

Cycle crossover (CX): Start with the cell in location 1 of parent 1 (or


Fig. 24.4. Crossover operators . (a) Order crossover, (b) PMX crossover, (c) Cyclecrossover.

any other reference point) and copy to location 1 of the offspring.The block, which is the same block at location 1 of parent 2, issearched in parent 1 and passed on to the offspring from there.This process continues until we complete a cycle and reach a blockthat has already been passed.

However, these crossovers cannot provide an efficient search usingsequence-pairs, as they do not take into account the features of sequence-pairsJ.

Therefore, an effective crossover must be considered as a position on anoblique grid (F_, F+) that is stetted by two sequences of blocks.

Nakaya et al. proposed a new crossover for sequence-pair, known asPlacement-based Partially Exchanging Crossover (PPEX) 12.

JIn this chapter, we do not describe the performance of these crossover operators. In ourprevious experience using sequence-pair, however, OX can obtain the best solutions, ascompared to other methods. On the other hand, CX does not provide good solutions.


24.3.2.1. Placement-Based Partially Exchanging Crossover

Here, we used the Placement-based Partially Exchanging Crossover(PPEX) 12. PPEX makes a window-territory located in the neighborhoodof blocks chosen at random. This window-territory is a continuous partof the oblique-grid that is denned by the sequence-pair. PPEX performsa crossover that exchanges blocks within this window-territory. Therefore,PPEX can exchange blocks within the neighborhood position. The PPEXprocedure is illustrated as follows.

Step 1: Two blocks are chosen randomly as parent blocks.Step 2: The window-territory is created in the neighborhood of the chosen

blocks. Let Mc be the set of blocks within window-territory andMnc be the rest of the blocks.

Step 3: Each block of Mc is exchanged according to the sequence of itspartner parent and is copied to the child.

Step 4: Mnc are directly copied to the child.

Fig.24.5 displays PPEX when the window-territory size is 4.In Parent 2, blocks of a and e are chosen for Mc, and blocks of Mc are

exchanged.In this exchange, the relative position of the other parent is referenced.

Then, these blocks are copied to the child. With the location informationof Parent 1, a, e and / are moven then copied to child 2.

24.3.2.2. Mutation Operator

In this chapter, we describe the use of bit flip of the orientation for block(0).That is, if 9 is 1, let 9 be 0. In the opposite case, if 9 is 0, let 9 be 1.

24.4. Multi-Objective Optimization Problems by GeneticAlgorithms and Neighborhood Cultivation GA

24.4.1. Multi-Objective Optimization Problems and GeneticAlgorithm

Several objectives are used in multi-objective optimization problems. Theseobjectives usually cannot be minimized or maximized at the same time dueto a tradeoff relationship among them 7. Therefore, one of the goals ofthe multi-objective optimization problem is to find a set of Pareto optimalsolutions.


Fig. 24.5. Placement-based Partially Exchanging Crossover (PPEX).

The Genetic Algorithm is an algorithm that simulates the heredity andevolution of living things 7. As it is a multi-point search method, an opti-mum solution can be determined even when the landscape of the objectivefunction is multi-modal. It can also find a Pareto optimum set with onetrial in multi-objective optimization. As a result,the GA is a very effec-tive tool for multi-objective optimization problems. There is a great dealof research concerned with multi-objective GA. Also, many new evolution-ary algorithms for multi-objective optimization have been recently devel-oped 3.5.7.8.18.

Multi-objective genetic algorithms can be roughly divided into two cat-egories: algorithms that treat Pareto optimal solutions implicitly and thosethat treat Pareto optimal solutions explicitly 7. Many of the newest meth-ods treat Pareto optimal solutions explicitly.

Typical algorithms that treat Pareto optimal solutions explicitly include


NSGA-II 5 and SPEA2 18. These algorithms have the following similarschemes:

1) Mechanism responsible for retaining nondominated solutions2) Cut down (sharing) method for maintaining diversity among the

nondominated solutions retained3) Unification mechanism of values of each objective

These mechanisms derive good Pareto optimal solutions. Consequently,a competitive multi-objective genetic algorithm should have these mecha-nisms.

24.4.2. Neighborhood Cultivation Genetic Algorithm

In this section, we describe the mechanism of a new algorithm called Neigh-borhood Cultivation Genetic Algorithm (NCGA). NCGA has a neighbor-hood crossover mechanism in addition to the mechanisms of GAs that wereexplained in the previous section. In GAs, exploration and exploitation arevery important. By exploration, an optimum solution can be found aroundthe elite solution. By exploitation, an optimum solution can be found in aglobal area. In NCGA, the exploitation factor of the crossover is reinforced.In the crossover operation of NCGA, a pair of individuals for crossover arenot chosen randomly, but individuals that are close to each other are cho-sen. As a result of this operation, child individuals that are generated afterthe crossover may be close to the parent individuals, and therefore preciseexploitation is expected.

Let us denote the search population at generation t by Pt- Also wedenote the archive population at generation tbyAt. Using these notations,the overall flow of NCGA can be described as follows.Step 1: Initialization: Generate an initial population Po- Population size is

N. Set t = 0. Calculate fitness values of the initial individuals in-Po- Copy PQ into AQ. Archive size is also N.

Step 2: Start new generation: set t = t + 1.Step 3: Generate new search population: Pt = At-\.Step 4: Sorting: Individuals of Pt are sorted according to the values of

the focused objective. The focused objective is changed at everygeneration. For example, when there are three objectives, the firstobjective is focused in the first generation and the third objective isfocused in the third generation. The first objective is focused againin the fourth generation.

Step 5: Grouping: Pt is divided into groups consisting of two individuals.


Fig. 24.6. Neighborhood crossover.

These two individuals are chosen from the top to the bottom of thesorted individuals.

Step 6: Crossover and Mutation: In a group, crossover and mutation oper-ations are performed. From two parent individuals, two child indi-viduals are generated. Here, parent individuals are eliminated.

Step 7: Evaluation: All of the objectives of individuals are derived.Step 8: Assembling: All the individuals are assembled into one group and

this becomes the new Ft-Step 9: Renewing archives: Assemble Pt and At-\ together. The N indi-

viduals are chosen from 2N individuals. To reduce the number ofindividuals, the same operation of SPEA2 (Environment Selection)is performed. In NCGA, this environment selection is applied as aselection operation.

Step 10: Termination: Check the terminal condition. If it is satisfied, thesimulation is terminated. If not, the simulation returns to Step 2.

In NCGA, most of the genetic operations are performed in a groupconsisting of two individuals.

The neighborhood crossover is performed for a crossover operations withpopulation that is sorted according to the values of the focused objective.As two adjacent individuals of the sorted population are relatively closefrom each other in objective space, a "neighborhood crossover" is realizedby using two adjacent individuals. The concept of neighborhood crossoveris shown in Fig.24.6.


However, if the focused objective has completely converged, applyingcrossover over a pair of individuals may cause no changes at the finalstages of the search. Therefore, we use the following techniques within ourcrossover operator:

1) The focused objective is changed at every generation.2) The sorted population is slightly disturbed by using "neighborhood

shuffle".

The focused objective is changed one by one at every generation. Forexample, when there are three objectives, the first objective is focused in thefirst generation and the third objective is focused in the third generation.The first objective is focused again in the fourth generation.

The "neighborhood shuffle" is a technique, which randomly shuffles thepopulation within a definite range. The range of neighborhood shuffle isdefined as 10 percent of the population size. For example, when the popu-lation size is 100, the population is randomly shuffled within a range of size10.

To use these techniques,@the parents subject to crossover should bechanged at every generation, even if the population had stayed unchanged.In addition, an exchange of individuals would be more active.

The following features of NCGA are the differences between SPEA2 andNSGA-II.

1) NCGA has a neighborhood crossover mechanism.2) NCGA has only environment selection and does not have mating

selection15.

24.5. Numerical Examples

In this chapter, we describe the application of NCGA to some numericalexperiments. We used four instances of this problem: ami33, ami49, rdmlOO,and rdm500. The instances ami33 and ami49, whose data are in the MCNCbenchmark, consist of 33 and 49 blocks (rectangles). The instances rdmlOOand rdm500 were randomly generated and have 100 and 500 rectangles,respectively.

The results were compared with those of SPEA2 18, NSGA-II5, and non-NCGA. Non-NCGA is the same algorithm as NCGA without neighborhoodcrossover.

kIf there are diverse solutions that have the same design variables, neighborhoodcrossover may not perform effectively. Therefore, the search population (Pf) is producedby making a copy of the archive population (At).


Table 24.96. GA Parameters

population size 200crossover rate 1.0mutation rate I/bit length

terminal generation 400

Fig. 24.7. Sampling of the Pareto frontier lines of intersection

24.5.1. Parameters of GAs

Table 24.96 displays the GA parameters used. We used the previously de-scribed GA operator, PPEX and the bit flip of block orientation. The lengthof the chromosome is three times as long as the number of blocks.

24.5.2. Evaluation Methods

To compare the results obtained by each algorithm, the following evaluationmethods were used.

24.5.2.1. Sampling of the Pareto Frontier Lines of Intersection(ILI)

This comparison method was reported by Knowles and Corne 8. The con-cept of this method is shown in Fig. 24.7. This figure illustrates two solutionsets of X and Y derived by the different methods.

The following three steps are the comparison procedures. Firstly, theattainment surfaces defined by the approximation sets are calculated. Sec-ondly, the uniform sampling lines that cover the Pareto tradeoff area are de-fined. For each line, the intersections of the line and the attainment surfacesof the derived sets are obtained. These intersections are then compared. Fi-nally, the Indication of Lines of Intersection (ILI) is derived. When the


Fig. 24.8. Example of IMMA-

two approximation sets X and Y are considered, ILJ(X,Y) indicates theaverage number of points X that are ranked higher than Y. Therefore themost significant outcome would be ILI{X,Y) = 1.0 and ILI(Y,X) = 0.0.

To focus only on the Pareto tradeoff area as defined by the approxima-tion sets and to derive the intuitive evaluation value, the following termsare considered:

• The objective values of approximation sets are normalized.• The sampling lines are located in the area where the approximation

sets exist.• Many sampling lines are prepared. In the following experiment,

1000 lines were used.

24.5.2.2. Maximum, Minimum and Average Values of Each Objectof Derived Solutions (IMMA)

To evaluate the derived solutions, not only the accuracy but also the spreadof the solutions is important. To discuss the spread of the solutions, themaximum, minimum, and average values of each object are considered.Figure 24.8 shows an example of this measurement. In this figure, the max-imum and minimum values of the objective function are illustrated and themedium value is shown as a circle.

24.5.3. Results

In this chapter, we examined four types of problem: ami33, ami49, rdmlOO,and rdm500 blocks. In this section, we discuss only the instances ami33 andrdm500.

Proposed NCGA, SPEA2, NSGA-II, and non-NCGA (NCGA without


Fig. 24.9. Placement of the blocks(ami33).

neighborhood crossover) were applied to these problems. Thirty trials wereperformed and all results shown are the averages of 30 trials.

24.5.3.1. Layout of the Solution

It should be verified whether solutions that are derived by the algorithm areopposite placement of blocks. In this section, we focus on ami33, which con-sisted of 33 blocks. The placement of ami33, which is presented by solutionsof NCGA, is shown in Fig. 24.9.

Some of the typical solutions are illustrated in Fig. 24.9. As this is acombination of the N\x N\x 2N problem with N blocks, the real optimumsolutions were not derived. In this experiment, 80,000 function calls (200individuals and 400 generations) were performed. These results may be rea-sonable, as there were very few blank spaces. We also used a sequence-pairand PPEX to derive good solutions as these techniques are very suitable


Fig. 24.11. 7jvfMA °f ami33

for GAs and RP.

24.5.3.2. ami33

The results of ami33, ILI are shown in Fig. 24.10, and those of IMMA areshown in Fig. 24.11. Fig. 24.12 shows the nondominated solutions of eachalgorithm. In this figure, all nondominated solutions derived from the 30trials are plotted.

ILI of Fig. 24.10 indicates that solutions of NCGA are closer to the realPareto solutions than those obtained by the other methods. This is alsoconfirmed by the plots of the nondominated solutions(Fig.24.12). It is alsoclear from IMMA of Fig. 24.11 that NCGA and non-NCGA can find morewidely spread nondominated solutions as compared to the other methods.

Non-NCGA can obtain widely spread nondominated solutions. However,as compared to the real Pareto solutions, non-NCGA is not ideal. This result


Fig. 24.12. Nondominated solutions(ami33).

Fig. 24.13. Results of/L/(rdm500).

shows that neighborhood crossover can derive good solutions in RP.

24.5.3.3. rdm500

The results of rdm500 are shown in Fig. 24.13 and Fig. 24.14. Fig. 24.15illustrates the nondominated solutions of the different algorithms.

The results from this problem showed a similar trend to those of theprevious problem. From Fig.24.13 and Fig.24.15, it is clear that NCGA


Fig. 24.14. IMMA of rdm500

Fig. 24.15. Nondominated solutions(rdm500).

obtained a better value of ILI', i.e., the solution of NCGA was much betterthan those of the other methods. Similarly to the previous problem, thesolutions of non-NCGA were far from the real Pareto front. Therefore, theneighborhood crossover was very effective to derive good solutions in RP,irrespective of the number of blocks.

On the other hand, in this problem, the solutions of SPEA2 and NSGA-II were gathered around the center of the Pareto front. These observationsindicate that SPEA2 and NSGA-II tend to concentrate in one part of thePareto front when the number of blocks is very large. On the other hand,


Fig. 24.14 and Fig. 24.15 indicate that NCGA and non-NCGA maintainedhigh degrees of diversity of their solutions during the search even if thenumber of blocks was very large.

24.6. Conclusion

In this chapter, we described the implementation of MOGA for the Multi-Objective Rectangular Packing Problem (RP). We described the formula-tion of RP, implementation of GA to RP, and our experience with RP usingGA.

The main issues associated with the implementation of GA to RP arethe representation of a solution and the appropriate GA operator. In thischapter, we explain sequence-pair as an effective representation of place-ment, and PPEX as an effective crossover in cases using sequence-pair.

In addition, based on our experience using GA for RP, NeighborhoodCultivation GA (NCGA), which has not only the important mechanismof the other methods but also the mechanism of neighborhood crossoverselection, was applied to Multi-Objective RP. We confirmed that MOGAis a very effective method for RP. In addition, NCGA can obtain the bestsolutions as compared to other methods. Through numerical examples, thefollowing points were clarified.

1) The RP described in this chapter is a large scale problem. For thisproblem, a reasonable solution is derived with a small calculationcost. It is assumed that a sequence-pair and PPEX work well inthis problem.

2) In almost all the test functions, the results of NCGA were superiorto those of the other methods. From this result, it can be notedthat NCGA is a good method for the RP.

3) NCGA was obviously superior to NCGA without neighborhoodcrossover in all problems. The results emphasized that neighbor-hood crossover allows the derivation of good solutions in RP.

4) When the number of blocks is very large, the solutions of SPEA2and NSGA-II tend to concentrate in the center of the Pareto front.However, NCGA and non-NCGA could retain diversity of the so-lutions.

References

1. T. Arslan, D. H. Horrocks, and E. Ozdemir. Structural synthesis of cell-based vlsi circuits using a multi-objective genetic algorithm. In IEE Elec-tronic Letters, volume 32, pages 651-652, 1996.


2. F. Balasa and K. Lampaert. Symmetry within the sequence-pair represen-tation in the context of placement for analog design. In IEEE Trans, onComp.-Aided Design of IC's and Systems, volume 19, pages 721-731, 2000.

3. C. A. Coello Coello, D.A. Van Veldhuizen, and G. B. Lamont. EvolutionaryAlgorithms for Solving Multi-Objective Problems. Kluwer Academic Pub-lishers, New York, May 2002. ISBN 0-3064-6762-3.

4. J. P. Cohoon and W. D. Paris. Genetic placement. In Proceedings of TheIEEE International Conference on Computer-Aided Design, pages 422-425,1986.

5. K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan. A Fast and Elitist Multi-objective Genetic Algorithm: NSGA-II. IEEE Transactions on EvolutionaryComputation, 6(2): 182-197, April 2002.

6. D. Gopinath, Y. K. Joshi, and S. Azarm. Multi-objective placement op-timization of power electronic devices on liquid cooled heat sinks. In theSeventeenth Annual IEEE Symposium on Semiconductor Thermal Measure-ment and Management, pages 117-119, 2001.

7. K. Deb. Multi-Objective Optimization using Evolutionary Algorithms.Chichester, UK:Wiley, 2001.

8. J. D. Knowles and D. W. Corne. Approximating the Nondominated FrontUsing the Pareto Archived Evolution Strategy. In Evolutionary Computa-tion, volume 8, pages 149-172, 2000.

9. J. Lienig and K. Thulasiraman. A genetic algorithm for channel routing invlsi circuits. In Evolutionary Computation, volume 1, pages 293-311, 1994.

10. H. Murata, K. Fujiyoshi, S. Nakatake, and Y. Kajitani. VLSI Module Place-ment Based on Rectangle-Packing by the Sequence-Pair. In IEEE Transac-tions on Computer Aided Design, volume 15, pages 1518-1524, 1996.

11. S. Nakatake, H. Murata, K. Fujiyoshi, and Y. Kajitani. Module Placementon BSG-Structure and IC Layout Applications. In Proc. of InternationalConference on Computer Aided Design '96, pages 484-491, 1996.

12. S. Nakaya, S. Wakabayashi, and T. Koide. An adaptive genetic algorithmfor vlsi floorplanning based on sequence-pair. In 2000 IEEE InternationalSymposium on Circuits and Systems, (ISCAS2000), volume 3, pages 65-68,2000.

13. M. J. O'Dare and T. Arslan. Generating test patterns for vlsi circuits usinga genetic algorithm. In IEE Electronics Letters, volume 30, pages 778-779,1994.

14. P. Grignon, J. Wodziack, and G. M. Fadel. Bi-objective optimization of com-ponents packing using a genetic algorithm. In In NASA/AIAA/ISSMO Mul-tidisciplinary Design and Optimization Conference, pages 352-362, 1996.

15. V. Schnecke and O. Vornberger. An adaptive parallel genetic algorithm forvlsi-layout optimization. In 4th Conf. Parallel Problem Solving from Nature(PPSN IV), pages 859-868, 1996.

16. K. Shahookar and P. Mazumber. A genetic approach to standard cell place-ment using meta-genetic parameter optimization. In IEEE Transaction onComputer-Aided Design, volume 9, pages 500-511, 1990.

17. S. Watanabe, T. Hiroyasu, and M. Miki. Neighborhood cultivation genetic


algorithm for multi-objective optimization problems. In Proceedings of the4th Asia-Pacific Conference on Simulated Evolution And Learning (SEAL-2002), pages 198-202, 2002.

18. E. Zitzler, M. Laumanns, and L. Thiele. SPEA2: Improving the StrengthPareto Evolutionary Algorithm. In K. Giannakoglou, D. Tsahalis, J. Peri-aux, P. Papailou, and T. Fogarty, editors, EUROGEN 2001. EvolutionaryMethods for Design, Optimization and Control with Applications to Indus-trial Problems, pages 95-100, Athens, Greece, 2002.

CHAPTER 25

MULTI-OBJECTIVE ALGORITHMS FORATTRIBUTE SELECTION IN DATA MINING

Gisele L. Pappa and Alex A. Freitas

Computing Laboratory, University of Kent, CanterburyCT2 7NF, UK , E- mail: {glp6,A.A.Freitas}@kent.ac.uk

http://wwui. cs.kent. ac.uk/people/staff/aaf

Celso A. A. Kaestner

Graduate Program in Applied Computer SciencePontificia Universidade Catolica do Parana (PUCPR)

Rua Imaculada Conceicao, 115580215-901 Curitiba - PR - BrazilE-mail: [email protected]

Attribute selection is an important preprocessing task for the applicationof a classification algorithm to a given data set. This task often involvesthe simultaneous optimization of two or more objectives. In order tosolve this problem, this chapter describes two multi-objective methods:a genetic algorithm and a forward sequential feature selection method.Both methods are based on the wrapper approach for attribute selectionand were used to find the best subset of attributes that minimizes theclassification error rate and the size of decision tree built by a well-knownclassification algorithm, namely C4.5.

25.1. Introduction

Attribute selection is one of the most important preprocessing tasks to beperformed before the application of data mining techniques. In essence,it consists of selecting a subset of attributes relevant for the target datamining task, out of all original attributes. In this work the target task isclassification, where the goal is to predict the class of an example (record)given the values of the attributes describing that example. Attribute selec-

603

604 G.L. Pappa, A.A. Preitas and C.A.A. Kaestner

tion became essential when researches discovered it can improve the datamining algorithm's performance (with respect to learning speed, classifica-tion rate and/or rule set simplicity) and at the same time remove noise anddecrease data dimensionality.

In face of the importance of attribute selection, a variety of methodshave been used in order to find a small attribute subset capable of obtaininga better classification rate than that obtained with the entire attributeset. These methods include sequential search1, ranking techniques2 andevolutionary algorithms3.

Independent of the method used to solve the problem of attribute se-lection, solving this problem often requires the minimization of at leasttwo objectives: the classification error rate and a measure of size — whichcan be a measure of size of the selected data (typically the number of se-lected attributes) and/or a measure of size of the classifier (say, a rule set)learned from the selected data. Many attribute selection methods optimizethese objectives setting weights to each one and combining them in a singlefunction.

However, the study of multi-objective optimization has shown that, insome tasks, a weighted combination of the objectives to be optimized ina single function is not the most effective approach to solve the problem.Mainly in tasks that deal with optimization of conflicting objectives, suchas attribute selection, the use of the Pareto's dominance concept duringoptimization can be the best choice.

The optimization based on the Pareto's concept4 suggests that, for eachof the conflicting objectives to be optimized, exists an optimal solution. So,the final response of the optimization system is a set of optimal solutionsinstead of a single solution. This is in contrast with systems that intend tooptimize a single objective. Hence, it is left to the user to decide which ofthe optimal solutions he/she considers the best to solve his/her problem,using his/her background knowledge about the problem.

In this spirit, this work presents two multi-objective attribute selec-tion algorithms based on the Pareto's dominance concept. One of them isa multi-objective genetic algorithm, and the other one is a multi-objectiveversion of the well-known forward sequential feature selection method. Bothmethods use the wrapper approach (see next section) in order to minimizethe error rate and the size of the decision tree built by a well-known clas-sifier, namely C4.5.

We report the results of extensive computational experiments with 18public domain real-world data sets, comparing the performance of these

Multi-Objective Algorithms for Attribute Selection in Data Mining 605

two methods. The results show that both methods effectively select goodattribute subsets — by comparison with the original set of all attributes —and, somewhat surprisingly, the multi-objective forward sequential selectionmethod is competitive with the multi-objective genetic algorithm.

25.2. Attribute Selection

As mentioned earlier, attribute selection is an important step in the knowl-edge discovery process and aims to select a subset of attributes that arerelevant for a target data mining task. In the classification task, which isthe task addressed in this work, an attribute is considered relevant if it isuseful for discriminating examples belonging to different classes.

We can find in the literature a lot of attribute selection methods. Thesemethods differ mainly in the search strategy they use to explore the spaceof candidate attribute subsets and in the way they measure the quality ofa candidate attribute subset.

With respect to the search strategy, the methods can be classified asexponential (e.g. exhaustive search), randomized (e.g. genetic algorithms)and sequential. The exponential methods are usually too computationallyexpensive, and so are not further discussed here.

The sequential methods include the well-known FSS (forward sequentialselection) and BSS (backward sequential selection)5. FSS starts with anempty set of attributes (features) and iteratively selects one-attribute-at-a-time — the attribute considered most relevant for classification at thecurrent step — until classification accuracy cannot be improved by selectinganother attribute. BSS starts with the full set of original attributes anditeratively removes one-attribute-at-a-time — the attribute considered leastrelevant for classification at the current step — as long as classificationaccuracy is not decreased. We have developed a multi-objective version ofthe FSS method, which will be described later.

With respect to randomized methods, in this chapter we are particularlyinterested in genetic algorithms, due to their ability to perform a globalsearch in the solution space. In our case, this means that they tend to copebetter with attribute interaction than greedy, local-search methods (suchas sequential methods)3. We have also developed a multi-objective geneticalgorithm (GA) for attribute selection, which will be described later.

The evaluation of the quality of each candidate attribute subset can bebased on two approaches: the filter or the wrapper approach. The maindifference between them is that in the wrapper approach the evaluation

606 G.L. Pappa, A.A. Freitas and C.A.A. Kaestner

function uses the target classification algorithm to evaluate the quality ofa candidate attribute subset. This is not the case in the filter approach,where the evaluation function is specified in a generic way, regardless of theclassification algorithm. That is, in the wrapper approach the quality of acandidate attribute subset depends on the performance of the classificationalgorithm trained only with the selected attributes. This performance canbe measured with respect to several factors, such as classification accuracyand size of the classifier learned from the selected data. Indeed, these arethe two performance measures used in this work, as will be seen later.

Although the wrapper approach tends to be more expensive than the fil-ter approach, the wrapper approach usually obtains better predictive accu-racy that the filter approach, since it finds an attribute subset "customized"for the target classification algorithm.

The vast majority of GAs for attribute selection follow the wrapperapproach. Table 25.97, adapted from Freitas3, shows the criteria used inthe fitness function of a number of GAs for attribute selection following thewrapper approach.

As can be observed in Table 25.97, there are many criteria that canbe used in the fitness of a GA for attribute selection, but all the GAsmentioned in the table use classification accuracy, and many GAs use eitherthe number of selected attributes or the size of the classifier learned fromthe data. Note that only one of the GAs mentioned in Table 25.97 is amulti-objective method — all the other GAs either try to optimize a singleobjective (predictive accuracy) or use some method (typically a weightedformula) to combine two or more objectives into a single objective to beoptimized.

25.3. Multi-Objective Optimization

Real world problems are usually complex and require the optimization ofmany objectives to reach a good solution. Unfortunately, many projects thatshould involve the simultaneous optimization of multiple objectives avoidthe complexities of such optimization, and adopt the simpler approach ofjust weighing and combining the objectives into a single function. Thissimpler approach is not very effective in many cases, due to at least tworeasons. First, the objectives are often conflicting with each other. Second,the objectives often represent different and non-commensurate aspects of acandidate solution's quality, so that mixing them into a single formula is notsemantically meaningful. Indeed, both reasons hold in our case, where the


Table 25.97. Main aspects of fitness functions of GAs for attribute selection

Reference Criteria used in fitness function

[Bala et al. 1995]6 predictive accuracy, number of selectedattributes

[Bala et al. 1996]7 predictive accuracy, information con-tent, number of selected attributes

[Chen et al. 1999]8 based first on predictive accuracy, andthen on number of selected attributes

[Guerra-Salcedo & Whitley 1998]9 predictive accuracy[Guerra-Salcedo et al. 1999]10 predictive accuracy[Cherkauer & Shavlik 1996]n predictive accuracy, number of selected

attributes, decision-tree size[Terano & Ishino 1998]12 subjective evaluation, predictive accu-

racy, rule set size[Vafaie & DeJong 1998]13 predictive accuracy[Yang & Honavar 1997, 1998]14'15 predictive accuracy, attribute cost[Moser & Murty 2000]16 predictive accuracy, number of selected

attributes[Ishibuchi &: Nakashima 2000]17 predictive accuracy, number of se-

lected instances, number of selectedattributes (attribute and instanceselection)

[Emmanouilidis et al. 2000]ls predictive accuracy, number of selectedattributes (multi-objective evaluation)

[Rozsypal & Kubat 2003]19 predictive accuracy, number of se-lected instances, number of selectedattributes (attribute and instanceselection)

[Llora & Garrell 2003]20 predictive accuracy

two objectives to be minimized — classification error rate and decision-treesize are to some extent conflicting and entirely non-commensurate.

According to the multi-objective optimization concept, when many ob-jectives are simultaneously optimized, there is no single optimal solution.Rather, there is a set of optimal solutions, each one considering a certaintrade-off among the objectives21. In this way, a system developed to solvethis kind of problem returns a set of optimal solutions, and can be left to theuser to choose the one that best solves his/her specific problem. This meansthat the user has the opportunity of choosing the solution that representsthe best trade-off among the conflicting objectives after examining severalhigh-quality solutions. Intuitively, this is better than forcing the user to de-fine a single trade-off before the search is performed, which is what happenswhen the multi-objective problem is transformed in a single-objective one.

608 G.L. Poppa, A.A. Preitas and C.A.A. Kaestner

The Pareto's multi-objective optimization concept is used to find this setof optimal solutions. According to this concept, a solution Si dominates asolution S2 if and only if4:

• Solution Si is not worse than solution S2 in any of the objectives;• Solution Si is strictly better than solution 52 in at least one of the

objectives.

Figure 1 shows an example of possible solutions found for a multi-objective attribute selection problem. The solutions that are not dominatedby any other solutions are considered Pareto-optimal solutions, and theyare represented by the dotted line in Figure 1.

Fig. 25.1. Example of Pareto dominance in a two-objective problem

Note that Solution A has a small decision-tree size but a large error rate.Solution D has a large decision-tree size but a small error rate. Assumingthat minimizing both objectives is important, one cannot say that solutionA is better than D, nor vice-versa. On the other hand, solution C is clearlynot a good solution, since it is dominated, for instance, by D.

25.4. The Proposed Multi-Objective Methods for AttributeSelection

In the last few years, the use of multi-objective optimization has led toimproved solutions for many different kinds of problems21. So, in order toevaluate the effectiveness of the multi-objective framework in the attributeselection problem for the classification task, we proposed a multi-objective


genetic algorithm22 (MOGA) that returns a set of non-dominated solu-tions. We also proposed a multi-objective version of the forward sequentialselection (FSS) method23.

The goal of these proposed algorithms is to find a subset of relevantattributes that leads to a reduction in both classification error rate andcomplexity (size) of the decision tree built by a data mining algorithm.

The classification algorithm used in this paper is C4.525, a well-knowndecision tree induction algorithm. The proposed methods are based in thewrapper approach, which means they use the target data mining algorithm(C4.5) to evaluate the quality of the candidate attribute subsets. Hence,the methods' evaluation functions are based on the error rate and on thesize of the decision tree built by C4.5. These two criteria (objectives) areto be minimized according to the concept of Pareto dominance.

The next subsections present the main aspects of the proposed methods.The reader is referred to Pappa22'23 for further details.

25.4.1. The Multi-Objective Genetic Algorithm (MOGA)

A genetic algorithm (GA) is a search algorithm inspired by the principleof natural selection. It works evolving a population of individuals, whereeach individual is a candidate solution to a given problem. Each individualis evaluated by a fitness function, which measures the quality of its corre-sponding solution. At each generation (iteration) the fittest (the best) indi-viduals of the current population survive and produce offspring resemblingthem, so that the population gradually contains fitter and fitter individuals— i.e., better and better candidate solutions to the underlying problem.For a comprehensive review of GAs in general the reader is referred toMichalewicz24. For a comprehensive review of GAs applied to data miningthe reader is referred to Freitas3.

The motivation for developing a multi-objective GA for attribute selec-tion was that: (a) GAs are a robust search method, capable of effectivelyexploring the large search spaces often associated with attribute selectionproblems; (b) GAs perform a global search, so that they tend to cope bet-ter with attribute interaction than greedy search methods, which is also animportant advantage in attribute selection; and (c) GAs already work witha population of candidate solutions, which makes them naturally suitablefor multiobjective problem solving4, where the search algorithm is requiredto consider a set of optimal solutions at each iteration.


25.4.1.1. Individual Encoding

In the proposed GA, each individual represents a candidate subset of se-lected attributes, out of all original attributes. Each individual consists ofM genes, where M is the number of original attributes in the data beingmined. Each gene can take on the value 1 or 0, indicating that the corre-sponding attribute occurs or not (respectively) in the candidate subset ofselected attributes.

25.4.1.2. Fitness Function

The fitness (evaluation) function measures the quality of a candidate at-tribute subset represented by an individual. Following the principle of multi-objective optimization, the fitness of an individual consists of two qualitymeasures: (a) the error rate of C4.5; and (b) the size of the decision treebuilt by C4.5. Both (a) and (b) are computed by running C4.5 with theindividual's attribute subset only, and by using a hold-out method to es-timate C4.5's error rate, as follows. First, the training data is partitionedinto two mutually-exclusive data subsets, the building subset and the vali-dation subset. Then we run C4.5 using as its training set only the examples(records) in the building subset. Once the decision tree has been built, it isused to classify examples in the validation set.

25.4.1.3. Selection Methods and Genetic Operators

At each generation (iteration) of the GA, the next population of individualsis formed as follows. First the GA selects all the non-dominated individualsof the current generation, which are then passed unaltered to the next gen-eration by elitism26. Elitism is a common procedure in MOGAs. It avoidsthat non-dominated individuals disappear from the population due to thestochastic nature of selection operators. However, a maximum number ofelitist individuals has to be fixed to avoid that the next population con-sist only of elitist individuals, which would prevent the creation of newindividuals, stopping the evolutionary process. This maximum number ofelitist individuals was set to half the population size. If the number of non-dominated individuals is larger than half the population size, that numberof elitist individuals is chosen by the tie-breaking criterion explained later.

Once elitist reproduction has been performed, the remainder of the nextgeneration's population is filled in with new "children" individuals, gener-ated from "parent" individuals from the current generation. The parent


individuals are chosen by tournament selection with a tournament size of2. Then children are generated from parents by applying conventional uni-form crossover and bit-flip mutation. The tournament selection procedureis adapted for multi-objective search as follows.

The fitness of an individual is a vector with values for two objectives:the error rate and decision-tree size associated with the attribute subsetrepresented by the individual. The selection of the best individual is basedon the concept of Pareto dominance, taking into account the two objectivesto be minimized. Given two individuals I\ and 72 playing a tournament,there are two possible situations. The first one is that one of the individualsdominates the other. In this case the former is selected as the winner of thetournament.

The second situation is that none of the individuals dominates the other.In this case, we use the following tie-breaking criterion to determine thefittest individual. For each of the two individuals /», i=l,2, the GA com-putes Xi as the number of individuals in the current population that aredominated by U, and Yi as the number of individuals in the current popu-lation that dominate U. Then the GA selects as the best the individual Iiwith the largest value of the formula: Xi -Yi. Finally, if It and Is have thesame value of the formula Xi - Yi (which is rarely the case), the tournamentwinner is simply chosen at random.

In all our experiments the probabilities of crossover and mutation wereset to 80% and 1%, respectively, which are relatively common values in theliterature. The population size was set to 100 individuals, which evolve for50 generations. These values were used in all our experiments.

25.4.2. The Multi-Objective Forward Sequential SelectionMethod (MOFSS)

A single-objective optimization and a multi-objective optimization methoddiffer mainly in the number of optimal solutions that they return. Hence,the first step to convert the traditional FSS into a multi-objective methodis to make it able to return a set of optimal solutions instead of a singlesolution.

This first point was resolved by creating a list of all non-dominatedsolutions generated by the MOFSS until the current iteration of the al-gorithm. This concept of a external list of non-dominated solutions wasinspired by some MOGAs in literature such as SPEA27, that maintain allthe non-dominated individuals in an external population.


The proposed MOFSS starts as the traditional FSS: a subset of solutionsis created and evaluated. The evaluation of each solution considers both theerror rate and the decision tree size generated by C4.5 during training. Asin the proposed MOGA, the values of these objectives to be minimized arestored and later used to judge a solution as better or worse than other.

Each new solution of the current iteration is compared with every othersolution of the current iteration, in order to find all non-dominated solu-tions in the current iteration. Then the non-dominated solution list, L, isupdated. This update consists in comparing, through the Pareto's domi-nance concept, the solutions in the list with the non-dominated solutions ofthe current iteration. More precisely, for each non-dominated solution S ofthe current iteration, 5 will be added to the list L only if S is not dominatedby any solution in L. It is also possible that S dominates some solution(s)in L. In this case those dominated solutions in L are, of course, removedfrom L.

The non-dominated solution list is the start point for generating newcandidate solutions. At each iteration, each solution in the current list isextended with each new attribute (different from the ones that occur in thecurrent solution), and the process starts again, until no more updates canbe made in the non-dominated solution list.

25.5. Computational Results

Experiments were executed with 18 public-domain, real-world data setsobtained from the UCI (University of California at Irvine)'s data setrepository28. The number of examples, attributes and classes of these datasets is shown in Table 25.98.

All the experiments were performed with a well-known stratified 10-foldcross-validation procedure. For each iteration of the cross-validation proce-dure, once the MOGA/MOFSS run is over we compare the performance ofC4.5 using all the original attributes (the "baseline" solution) with the per-formance of C4.5 using only the attributes selected by the MOGA/MOFSS.Recall that the MOGA/MOFSS can be considered successful to the extentthat the attributes subsets selected by it lead to a reduction in the errorrate and size of the tree built by C4.5, by comparison with the use of alloriginal attributes.

As explained before, the solution for a multi-objective optimizationproblem consists of all non-dominated solutions (the Pareto front) found.Hence, each run of the MOGA outputs the set of all non-dominated so-


Table 25.98. Main characteristics of the data sets used in the ex-periments

Data Set # examples # attributes # classes

Arrhythmia 269 452 16Balance-Scale 4 625 3Bupa 6 345 2Car 6 1717 4Crx 15 690 2Dermatology 34 366 6Glass 10 214 7Ionosphere 34 351 2Iris 4 150 3Mushroom 22 8124 2Pima 8 768 2Promoters 57 106 2Sick-euthyroid 25 3163 2Tic tac toe 9 958 2Vehicle 18 846 4Votes 16 435 2Wine 13 178 3Wisconsin breast-cancer 9 699 2

lutions (attribute subsets) present in the last generation's population andeach run of the MOFSS outputs the solutions stored in the non-dominatedsolution list in the last iteration. In a real-world application, it would beleft to the user the final choice of the non-dominated solution to be usedin practice. However, in our research-oriented work, involving many differ-ent public-domain data sets, no user was available. Hence, we needed toevaluate the quality of the non-dominated attribute subsets returned byMOGA/MOFSS in an automatic, data-driven manner. We have done thatin two different ways, reflecting two different (but both valid) perspectives,as follows.

The first approach to evaluate the set of non-dominated solutions re-turned by MOGA and MOFSS is called Return All Non-Dominated So-lutions. The basic idea is that we return all the non-dominated solutionsfound by the method, and we compare each of them, one-at-a-time, with thebaseline solution — which consists of the set of all original attributes. Thenwe count the number of solutions returned by the MOGA and MOFSS thatdominate or are dominated by the baseline solution, in the Pareto sense —with respect to the objectives of minimizing error rate and decision-treesize, as explained above.

The second approach, called Return the "Best" Non-Dominated Solution


consists of selecting a single solution to be returned to the user by usingthe tie-breaking criterion described earlier. From a user's point of view,this is a practical approach, since the user often wants a single solution.Moreover, this decision making process makes the solution of the multi-objective problem complete, following its 3 potential stages of development:measurement, search and decision making29.

There are many ways of setting preferences in a decision making process,as shown in Coello-Coello29, but we did not follow any of those approaches.For both MOGA and MOFSS we return the solution in the non-dominatedset of the last generation (or iteration) with the highest value of the tie-breaking criterion - which is a decision-making criterion tailored for ouralgorithms and underlying application. Note that, once the number of so-lutions that dominates the solutions in the non-dominated set is zero, theformula of the tie-breaking criterion is reduced to Xi. Therefore, insteadof explicit ranking the objectives, we rank the non-dominated solutions ac-cording the number of individuals they dominate in the last generation.The solution chosen through this method was compared with the baselinesolution.

There is one caveat when using this criterion in MOFSS. For this algo-rithm, we recalculate the tie-breaking criterion considering all the solutionsgenerated in all the iterations of the method. That is, we calculate thenumber of solutions that are dominated by each of the solutions in thenon-dominated solution list of the last iteration, considering all solutionsgenerated by the method. The tie-braking criterion was recalculated be-cause, for some data sets, the number of solutions in the non-dominatedlist at the beginning of the last iteration was small. As a result, few newsolutions were generated in the last iteration. It was not fair to comparethe solutions in that list just with those few solutions generated in the lastgeneration, because the small number of solutions would lead to a low con-fidence (from a statistical point of view) in the result. In order to solve thisproblem, the tie-breaking criterion is recalculated using all generated solu-tions since the algorithm starts. There was no need to apply this procedureto MOGA, because this method has a larger number of solutions in thelast iteration, providing enough solutions for a reliable computation of thetie-breaking criterion.

Multi- Objective Algorithms for Attribute Selection in Data Mining 615

25.5.1. Results for the "Return All Non-DominatedSolutions" Approach

As explained earlier, the basic idea of this approach is that MOGA andMOFSS return all non-dominated solutions that they have found, and thenwe count the number of solutions returned by each of these methods thatdominate or are dominated by the baseline solution.

Tables 25.99 and 25.100 show, respectively, the results found by MOGAand MOFSS returning all the non-dominated solutions of the last generation(or iteration). Hereafter this version of the algorithms is called MOGA-alland MOFSS-all. In Tables 25.99 and 25.100 the second column shows thetotal number of solutions found by the method. The numbers after the "±"are standard deviations. The next columns show the relative frequency ofthe found solutions that dominate the baseline solution (column Fdominate),the relative frequency of the found solutions that are dominated by the base-line solution (column Fdominated) a n d the relative frequency of the foundsolutions that neither dominate nor are dominated by the baseline solution(column Fneutral)-

Table 25.99. Results found with MOGA-all

Solutions found with MOGA-all

Data set Total Fdorninate Fdominated Fneutrat

Arrhythmia 3.9 ± 0.54 0.21 0.33 0.46Balance-Scale 1.0 ± 0.0 0.7 0 0.3Bupa 6.1 ± 0.38 0.31 0 0.69Car 38.3 ± 0.76 0.002 0 0.998Crx 4.55 ± 0.67 0.56 0.05 0.39Dermatology 1.11 ± 0.11 0.8 0 0.2Glass 46.9 ± 1.03 0 0.06 0.94Ionosphere 1.14 ± 0.14 0.37 0.12 0.5Iris 4.4 ± 0.16 0.8 0.02 0.18Mushroom 1.9 ± 0.18 0.68 0 0.32Pirna 18.3 ± 1.15 0.34 0 0.66Promoters 1.5 ± 0.16 0.33 0 0.67Sick- euthyroid 25.4 ± 0.93 0.02 0.02 0.96Tic tac toe 16.5 ±1.0 0 0 1Vehicle 6.1 ± 0.76 0.25 0.18 0.57Votes 26.6 ± 1.63 0.6 0 0.4Wine 4.66 ± 1.21 0.48 0.31 0.21Wisconsin 9.3 ± 0.4 0.5 0.2 0.3

As can be observed in Table 25.99, there are 6 data sets where the value


of Fdcminate is greater than 0.5 (shown in bold), which means that morethan 50% of the MOGA-all's solutions dominated the baseline solution. In9 out of the 18 data sets, no MOGA-all's solution was dominated by thebaseline solution. There are only two data sets, namely arrhythmia andglass, where the value of Fdominate is smaller than the value of Fdominated

(shown in bold), indicating that the MOGA was not successful in thesetwo data sets. In any case, in these two data sets the difference betweenFdominate and Fdominated is relatively small (which is particularly true inthe case of glass), and the value of Fneutrai is greater than the values ofboth Fdominate and Fdominated-

In summary, in 14 out of the 18 data sets the value of Fdominate isgreater than the value of Fdominated, indicating that overall MOGA-all wassuccessful in the majority of the data sets. MOGA-all was very successfulin 6 data sets, where the value of Fdominate was larger than 0.5 and muchgreater than the value of Fdominated-

In Table 25.100, we can see that there are 7 data sets where the value ofFdominate is greater than 0.5 (shown in bold), which means that 50% or moreof the MOFSS-all's solutions dominated the baseline solution. Remarkably,there are only two data sets — namely wine and Wisconsin breast cancer— where the number of MOFSS-all's solutions dominated by the baselinesolution was greater than zero, and in the case of wine that number isvery close to zero, anyway. There are two data sets where all MOFSS-all'ssolutions are neutral, namely dermatology and mushroom. In summary, in16 out of the 18 data sets the value of Fdominate is greater than the valueof Fdominated, indicating that overall MOFSS was successful in the vastmajority of the data sets. MOFSS was very successful in 7 data sets, asmentioned above.

25.5.2. Results for the "Return the 'Best' Non-DominatedSolution" Approach

Tables 25.101 and 25.102 show the results obtained by following this ap-proach. These tables show results for error rate and tree size separately, asusual in the machine learning and data mining literature. Later in this sec-tion we show results (in Table 25.103) involving Pareto dominance, whichconsider the simultaneous minimization of error rate and tree size. In Ta-bles 25.101 and 25.102 the column titled C4.5 contains the results for C4.5ran with the baseline solution (all original attributes), whereas the columnstitled MOGA-1 and MOFSS-1 contain the results for C4.5 ran with the sin-


Table 25.100. Results found with MOFFS-all

Solutions found with MOFFS-all

Data set Total Fdominate Fdominated Fneutral

Arrhythmia 32.2 ± 10.82 0.54 0 0.46Balance-Scale 1.8 ± 0.2 0.5 0 0.5Bupa 2.9 ± 0.31 0.65 0 0.35Car 4.3 ± 0.33 0.07 0 0.93Crx 84.1 ± 2.05 0.89 0 0.11Dermatology 76.5 ± 10.3 0 0 1Glass 94.1 ± 5.24 0.99 0 0.01Ionosphere 12.9 ± 6.23 0.14 0 0.86Iris 3.5 ± 0.34 0.86 0 0.14Mushroom 51.9 ± 11.88 0 0 1Pima 11.1 ± 1.88 0.95 0 0.05Promoters 66.6 ± 12.66 0.27 0 0.73Sick-euthyroid 50.3 ± 6.44 0.1 0 0.9Tic tac toe 8.1 ± 1.54 0.11 0 0.89Vehicle 3.6 ± 0.16 0.17 0 0.83Votes 98.4 ± 0.37 0.1 0 0.9Wine 8.3 ± 6.1 0.92 0.01 0.07Wisconsin 10.1 ± 4.76 0.45 0.37 0.18

gle "best" non-dominated solution found by MOGA and MOFSS, using thecriterion for choosing the "best" solution explained earlier. The figures inthe tables are the average over the 10 iterations of the cross-validation pro-cedure. The values after the "±" symbol represent the standard deviations,and the figures in bold indicate the smallest error rates/tree sizes obtainedamong the three methods. In the columns MOGA-1 and MOFSS-1, thesymbol "+" ("-") denotes that the results (error rate or tree size) of thecorresponding method is significantly better (worse) than the result ob-tained with the baseline solution. The difference in error rate or tree sizebetween the columns MOGA-l/MOFSS-1 and C4.5 are considered signif-icant if the corresponding error rate or tree size intervals — taking intoaccount the standard deviations — do not overlap. The last two lines ofTables 25.101 and 25.102 summarize the results of these tables, indicatingin how many data sets MOGA-l/MOFSS-1 obtained a significant win/lossover the baseline solution using C4.5 with all original attributes.

In Tables 25.101 and 25.102, the results of MOFSS-1 for the dataset Ar-rhythmia are not available due to the large number of attributes in this dataset, 269. This leads to a too large number of solutions generated along alliterations of the algorithm, so that re-calculating the tie-breaking criterionconsidering all the generated solutions was impractical with the machine


Table 25.101. Error rates obtained with C4.5, MOGA-1 and MOFSS-1

Error Rate (%)

Data set C4j> MOGA-1 MOFSS-1

Arrhythmia 32.93 ± 3.11 26.38 ± 1.47 (+) N/ABalance-Scale 36.34 ± 1.08 28.32 ± 0.71 (+) 36.47 ± 1.84Bupa 37.07 ± 2.99 30.14 ± 1.85 (+) 40.85 ± 1.45Car 7.49 ± 0.70 16.65 ± 0.4 (-) 18.5 ± 0.70 (-)Crx 15.95 ± 1.43 12.44 ± 1.84 15.04 ± 1.35Dermatology 6.0 ± 0.98 2.19 ± 0.36 (+) 11.15 ± 1.60 (-)Glass 1.86 ± 0.76 1.43 ± 0.73 1.86 ± 0.76Ionosphere 10.2 ± 1.25 5.13 ± 1.27 (+) 7.98 ± 1.37Iris 6.0 ± 2.32 2.68 ± 1.1 (+) 6.01 ± 2.09Mushroom 0.0 ± 0.0 0.0 ± 0.0 0.18 ± 0.07 (-)Pima 26.07 ± 1.03 23.07 ± 1.16 28.16 ± 1.72Promoters 16.83 ± 2.55 11.33 ± 1.92 (+) 33.5 ± 6.49 (-)Sick-euthyroid 2.02 ± 0.12 2.22 ± 0.18 2.32 ± 0.23Tic tac toe 15.75 ± 1.4 22.65 ± 1.19 (-) 31.19 ± 1.69 (-)Vehicle 26.03 ± 1.78 23.16 ± 1.29 33.74 ± 1.78 (-)Votes 3.2 ± 0.91 2.97 ± 0.75 4.57 ± 0.89Wine 6.69 ± 1.82 0.56 ± 0.56 (+) 6.07 ± 1.69Wisconsin 5.28 ± 0.95 3.84 ± 0.67 7.16 ± 0.77 (-)Wins over C4.5 - 8 0Losses over C4.5 - 2 7

used in the experiments (a dual-PC with l.lGHz clock rate and 3Gbytesmemory).

The results in Table 25.101 show that MOGA-1 obtained significantlybetter error rates than the baseline solution (column "C4.5") in 8 data sets.In contrast, the baseline solution obtained significantly better results thanMOGA-1 in just two data sets. MOFSS-1 has not found solutions withsignificantly better error rates than the baseline solution in any data set.On the contrary, it found solutions with significantly worse error rates thanthe baseline solution in 7 data sets.

As can be observed in Table 25.102, the tree sizes obtained with thesolutions found by MOGA-1 and MOFSS-1 are significantly better thanthe ones obtained with the baseline solution in 15 out of 18 data sets. Inthe other three data sets the difference is not significant.

In summary, both MOGA-1 and MOFSS-1 are very successful in findingsolutions that led to a significant reduction in tree size, by comparison withthe baseline solution of all attributes. The solutions found by MOGA-1were also quite successful in reducing error rate, unlike the solutions foundby MOFSS-1, which unfortunately led to a significant increase in error ratein a number of data sets. Hence, these results suggest that MOGA-1 has


Table 25.102. Tree sizes obtained with C4.5, MOGA-1 and MOFSS-1

Tree Size (number of nodes)

Data set CJU5 MOGA-1 MOFSS-1

Arrhythmia 80.2 ± 2.1 65.4 ± 1.15 (+) N/ABalance-Scale 41.0 ± 1.29 16.5 ± 3.45 (+) 7.5 ± 1.5 (+)Bupa 44.2 ± 3.75 7.4 ± 1.36 (+) 11.4 ± 2.78 (+)Car 165.3 ± 2.79 29.4 ± 5.2 (+) 17.7 ± 1.07 (+)Crx 29.0 ± 3.65 11.2 ± 3.86 (+) 24.6 ± 8.27Dermatology 34.0 ± 1.89 25.2 ± 0.96 (+) 23.2 ± 2.84 (+)Glass 11.0 ± 0.0 11.0 ± 0.0 11.0 ± 0.0Ionosphere 26.2 ± 1.74 13.0 ± 1.4 (+) 14.2 ± 2.23 (+)Iris 8.2 ± 0.44 5.8 ± 0.53 (+) 6.0 ± 0.68 (+)Mushroom 32.7 ± 0.67 30.0 ± 0.89 (+) 27.2 ± 1.76 (+)Pima 45.0 ± 2.89 11.0 ± 2.6 (+) 9.2 ± 1.85 (+)Promoters 23.8 ± 1.04 11.4 ± 2.47 (+) 9.0 ± 1.2 (+)Sick-euthyroid 24.8 ± 0.69 11.2 ± 1.35 (+) 9.6 ± 0.79 (+)Tic tac toe 130.3 ± 4.25 21.1 ± 4.54 (+) 10.6 ± 1.4 (+)Vehicle 134.0 ± 6.17 95 ± 3.13 (+) 72.8 ± 10.98 (+)Votes 10.6 ± 0.26 5.4 ± 0.88 (+) 5.6 ± 1.07 (+)Wine 10.2 ± 0.68 9.4 ± 0.26 8.6 ± 0.26 (+)Wisconsin 28.0 ± 2.13 25 ± 3.71 18 ± 1.53 (+)Wins over C4.5 - 15 15Losses over C4.5 - 0 0

effectively found a good trade-off between the objectives of minimizing errorrate and tree size, whereas MOFSS-1 minimized tree size at the expense ofincreasing error rate in a number of data sets.

Table 25.103. Number of significantPareto dominance relations

C4.5 MOGA-1 MOFSS-1

C4.5 X 0 0MOGA-1 14 X 7MOFSS-1 8 0 X

Table 25.103 compares the performance of MOGA-1, MOFSS-1 andC4.5 using all attributes considering both the error rate and the tree size atthe same time, according to the concept of significant Pareto dominance.This is a modified version of conventional Pareto dominance tailored for theclassification task of data mining, where we want to find solutions that arenot only better, but significantly better, taking into account the standarddeviations (as explained earlier for Tables 25.101 and 25.102). Hence, eachcell of Table 25.103 shows the number of data sets in which the solution


found by the method indicated in the table row significantly dominatesthe solution found by method indicated in the table column. A solution Sisignificantly dominates a solution 52 if and only if:

• obji(Si) + sd^Si) < obj1{S2) - sdl{S2) and. not[o&j2(S2) + sd2(S2) < o6j2(5i) - sd2(S1)\

where obji(Si) and sdi(Si) denote the average value of objective 1 and thestandard deviation of objective 1 associated with solution Si, and similarlyfor the other variables. Objectivel and objective2 can be instantiated witherror rate and tree size, or vice-versa. For example, in the bupa dataset wecan say that the solution found by MOGA-1 significantly dominates thesolution found by MOFSS-1 because: (a) In Table 25.101 MOGA-l's errorrate plus standard deviation (30.14+1.85) is smaller than MOFSS-1's errorrate minus standard deviation (40.85-1.45); and (b) concerning the tree size(Table 25.102), the condition "not (11.4 + 2.78 < 7.4 - 1.36)" holds. So,both conditions for significant dominance are satisfied.

As shown in Table 25.103, the baseline solution (column "C4.5") did notsignificantly dominate the solutions found by MOGA-1 and MOFSS-1 inany dataset. The best results were obtained by MOGA-1, whose solutionssignificantly dominated the baseline solution in 14 out of the 18 datasetsand significantly dominated MOFSS-1's solutions in 7 data sets. MOFSS-1 obtained a reasonably good result, significantly dominating the baselinesolution in 8 datasets, but it did not dominate MOGA-1 in any dataset. Amore detailed analysis of these results, at the level of individual data sets,can be observed later in Tables 25.104 and 25.105.

25.5.3. On the Effectiveness of the Criterion to Choose the"Best" Solution

Analyzing the results in Tables 25.99, 25.100, 25.101 and 25.102 we canevaluate whether the criterion used to choose a single solution out of allnon-dominated ones (i.e., the criterion used to generate the results of Ta-bles 25.101 and 25.102) is really able to choose the "best" solution for eachdata set. We can do this analyzing the dominance relationship (involving theerror rate and tree size) between the single returned solution and the base-line solution. That is, we can observe whether or not the single solution re-turned by MOGA-1 and MOFSS-1 dominates, is dominated by, or is neutralwith respect to the baseline solution. Once we have this information, we cancompare it with the corresponding relative frequencies associated with the


solutions found by MOGA-all/MOFSS-all (columns Fdominate, FdominaUd,Fneutrai of Tables 25.99 and 25.100). This comparison is performed in Ta-bles 25.104 and 25.105, which refer to MOGA and MOFSS, respectively. Inthese two tables the first column contains the data set names, the next threecolumns are copied from the last three columns in Tables 25.99 and 25.100,respectively, and the last three columns are computed from the results inTables 25.101 and 25.102, by applying the above-explained concept of signif-icant Pareto dominance between the MOGA-l's/MOFSS-l's solution andthe baseline solution.

Table 25.104. Performance of MOGA-all versus MOGA-1

Performance of MOGA-all's Performance of MOGA-l'ssolutions wrt baseline solution wrt baseline solution

solution

Data set Jrfom F,jom_eli Fneut Pom Dom_ed Neut

Arrhythmia 0.21 0.33 0.46 XBalance-Scale 0.7 0 0.3 XBupa 0.31 0 0.69 XCar 0.002 0 0.998 XCrx 0.56 0.05 0.39 XDermatology 0.8 0 0.2 XGlass 0 0.06 0.94 XIonosphere 0.37 0.12 0.5 XIris 0.8 0.02 0.18 XMushroom 0.68 0 0.32 XPima 0.34 0 0.66 XPromoters 0.33 0 0.67 XSick- euthyroid 0.02 0.02 0.96 XTic tac toe 0 0 1 XVehicle 0.25 0.18 0.57 XVotes 0.6 0 0.4 XWine 0.48 0.31 0.21 XWisconsin 0.5 0.2 0.3 X

As can be observed in Table 25.104, there are only 4 data sets in whichthe solutions found by MOGA-1 do not dominate the baseline solutions:car, glass, tic-tac-toe and Wisconsin. For these 4 data sets the solutionsfound by MOGA-1 were neutral (last column of Table 25.104), and thevalue of Fneutrai was respectively 0.998, 0.94, 1 and 0.3. Therefore, in thefirst three of those data sets it was expected that the single solution chosenby MOGA-1 would be neutral, so that the criterion used for choosing asingle solution cannot be blamed for returning a neutral solution. Only inthe Wisconsin data set the criterion did badly, because 50% of the found


Table 25.105. Performance of MOFSS-all versus MOFSS-1

Performance of MOFSS-all's Performance of MOFSS-1'ssolutions wrt baseline solution wrt baseline solution

solution

Data set Fdom Fdorn_ed Fneut Pom Dom-ed Neut

Arrhythmia 0.54 0 0.46Balance-Scale 0.5 0 0.5 XBupa 0.65 0 0.35 XCar 0.07 0 0.93 XCrx 0.89 0 0.11 XDermatology 0 0 1 XGlass 0.99 0 0.01 XIonosphere 0.14 0 0.86 XIris 0.86 0 0.14 XMushroom 0 0 1 XPima 0.95 0 0.05 XPromoters 0.27 0 0.73 XSick- euthyroid 0.1 0 0.9 XTic tac toe 0.11 0 0.89 XVehicle 0.17 0 0.83 XVotes 0.1 0 0.9 XWine 0.92 0.01 0.07 XWisconsin 0.45 0.37 0.18 X

solutions dominated the baseline solution but a neutral solution was chosen.The criterion was very successful, managing to chose a solution that

dominated the baseline, in all the other 14 data sets, even though in 8 ofthose data sets less than 50% of the solutions found by MOGA-all dom-inated the baseline. The effectiveness of the criterion can be observed,for instance, in arrhythmia and sick-euthyroid. Although in arrhythmiathe value of Fdominated was quite small (0.21), the solution returned byMOGA-1 dominated the baseline solution. In sick-euthyroid, 96% of thesolutions found by MOGA-all were neutral, but a solution that dominatesthe baseline solution was again returned by MOGA-1. With respect to theeffectiveness of the criterion when used by MOFSS-1, unexpected negativeresults were found in 2 data sets of Table 25.105, namely crx and glass.For both data sets, despite the high values of Fdominate, the solutions cho-sen by MOFSS-1 were neutral. The opposite happened in ionosphere, sick-euthyroid and votes, where Fneutrai had high values, but single solutionsbetter than the baseline solution were chosen by MOFSS-1.

The relatively large number of neutral solutions chosen by MOFSS-1 happened because in many data sets the tree size associated with thesolution chosen by MOFSS-1 was smaller than the tree size associated with


the baseline solution, whilst the error rates of the former were larger thanthe error rates of the latter.

Overall, the criterion for choosing a single solution was moderately suc-cessful when used by MOFSS-1, and much more successful when used byMOGA-1. A possible explanation for this result is that the procedure usedfor tailoring the criterion for MOFSS, described earlier, is not working verywell. An improvement in that procedure can be tried in future research.

It is important to note that, remarkably, the criterion for choosing asingle solution did not choose a solution dominated by the baseline solutionin any data set. This result holds for both MOGA-1 and MOFSS-1.

25.6. Conclusions and Future Work

This chapter has discussed two multi-objective algorithms for attribute se-lection in data mining, namely a multi-objective genetic algorithm (MOGA)and a multi-objective forward sequential selection (MOFSS) method. Theeffectiveness of both algorithms was extensively evaluated in 18 real-worlddata sets. Two major sets of experiments were performed, as follows.

The first set of experiments compared each of the non-dominated solu-tions (attribute subsets) found by MOGA and MOFSS with the baselinesolution (consisting of all the original attributes). The comparison aimedat counting how many of the solutions found by MOGA and MOFSS dom-inated (in the Pareto sense) or were dominated by the baseline solution, interms of classification error rate and decision tree size. Overall, the results(see Tables 25.99 and 25.100) show that both MOGA and MOFSS are suc-cessful in the sense that they return solutions that dominate the baselinesolution much more often than vice-versa.

The second set of experiments consisted of selecting a single "best" so-lution out of all the non-dominated solutions found by each multi-objectiveattribute selection method (MOGA and MOFSS) and then comparing thissolution with the baseline solution. Although this kind of experiment is notoften performed in the multi-objective literature, it is important becausein practice the user often wants a single solution to be suggested by thesystem, to relieve him from the cognitive burden and difficult responsibilityof choosing one solution out of all non-dominated solutions.

In order to perform this set of experiments, this work proposed a simpleway to choose a single solution to be returned from the set of non-dominatedsolutions generated by MOGA and MOFSS. The effectiveness of the pro-posed criterion was analyzed by comparing the results of the two different

624 G.L. Pappa, A.A. Preitas and C.A.A. Kaestner

versions of MOGA and MOFSS, one version returning all non-dominatedsolutions (results of the first set of experiments) and another version re-turning a single chosen non-dominated solution. Despite its simplicity, theproposed criterion worked well in practice, particularly when used in theMOGA method. It could be improved when used in the MOFSS method,as discussed earlier.

In the future we intend to analyze the characteristics of the data setswhere each of the proposed methods obtained its best results, in order tofind patterns that describe the data sets where each method can be appliedwith greater success.

References

1. Aha, D.W., Bankert, R.L.: A Comparative Evaluation of Sequential FeatureSelection Algorithms. In: Fisher, D., Lenz, H.J. (eds.) Learning from Data:AI and Statistics V. Springer-Verlag, Berlin Heidelberg New York, (1996),1-7.

2. Guyon, I., Elisseeff, A. : An Introduction to Variable and Feature Selection.In: Kaelbling, L. P. (ed.) Journal of Machine Learning Research 3, (2003),1157-1182.

3. Freitas, A.A.: Data Mining and Knowledge Discovery with EvolutionaryAlgorithms. Springer-Verlag (2002).

4. Deb, K.: Multi-Objective Optimization using Evolutionary Algorithms.John Wiley & Sons, England (2001).

5. Liu, H., Motoda, H.: Feature Selection for Knowledge Discovery and DataMining, Kluwer, (1998).

6. Bala, J., De Jong, K., Huang, J.,Vafaie, H., Wechsler, H.: Hybrid learningusing genetic algorithms and decision trees for pattern classification. In:Proc. Int. Joint Conf. on Artificial Intelligence (IJCAI-95), (1995), 719-724.

7. Bala, J., De Jong, K., Huang, J., Vafaie, H., Wechsler, H.: Using learn-ing to facilitate the evolution of features for recognizing visual concepts.Evolutionary Computation 4(3),(1996), 297-312.

8. Chen, S., Guerra-Salcedo, C, Smith, S.F.: Non-standard crossover for astandard representation - commonality-based feature subset selection. In:Proc. Genetic and Evolutionary Computation Conf. (GECCO-99), MorganKaufmann, (1999), 129-134.

9. Guerra-Salcedo, C, Whitley, D.: Genetic Search for Feature Subset Selec-tion: A Comparison Between CHC and GENESIS. In: Proc. Genetic Pro-gramming Conference 1998, (1998), 504-509.

10. Guerra-Salcedo, C, Chen, S., Whitley, D., Smith, S.: Fast and accuratefeature selection using hybrid genetic strategies. In: Proc. Congress on Evo-lutionary Computation (CEC-99),Washington D.C., USA. July (1999), 177-184.

11. Cherkauer, K.J., Shavlik, J.W.: Growing simpler decision trees to facilitate


knowledge discovery. In: Proc. 2nd Int. Conf. on Knowledge Discovery andData Mining (KDD-96), AAAI Press, (1996), 315-318.

12. Terano, T. , Ishino, Y. interactive genetic algorithm based feature selectionand its application to marketing data analysis. In: Liu, H. ,Motoda, H. (Eds.)Feature Extraction, Construction and Selection,Kluwer, (1998), 393-406.

13. Vafaie, H., DeJong, K.Evolutionary Feature Space Transformation. In: Liu,H., Motoda, H. (Eds.) Feature Extraction, Construction and Selection,Kluwer, (1998), 307-323.

14. Yang, J. ,Honavar, V.: Feature subset selection using a genetic algorithm.Genetic Programming 1997: Proc. 2nd Annual Conf. (GP-97), Morgan Kauf-mann, (1997), 380-385.

15. Yang J., Honavar V.: Feature subset selection using a genetic algorithm. In:Liu, H., Motoda, H. (Eds.) Feature Extraction, Construction and Selection,Kluwer,(1998), 117-136.

16. Moser A., Murty, M.N.: On the scalability of genetic algorithms to verylarge-scale feature selection. In: Proc. Real-World Applications of Evolution-ary Computing (EvoWorkshops 2000). Lecture Notes in Computer Science1803, Springer-Verlag, (2000), 77-86.

17. Ishibuchi, H., Nakashima, T.: Multi-objective pattern and feature selectionby a genetic algorithm. In: Proc. 2000 Genetic and Evolutionary Computa-tion Conf. (GECCO-2000), Morgan Kaufmann, (2000), 1069-1076.

18. Emmanouilidis, C, Hunter, A., Maclntyre, J.: A multiobjective evolutionarysetting for feature selection and a commonality-based crossover operator.In: Proc. 2000 Congress on Evolutionary Computation (CEC-2000), IEEE,(2000), 309-316.

19. Rozsypal, A., Kubat, M.: Selecting Representative examples and attributesby a genetic algorithm. Intelligent Data Analysis 7, (2003), 290-304.

20. Llora, X.,Garrell, J.: Prototype Induction anda attribute selection via evo-lutionary algorithms. Intelligent Data Analysis 7, (2003), 193-208.

21. Coello Coello, C.A., Van Veldhuizen, D.A., Lamont, G.B.: EvolutionaryAlgorithms for Solving Multi-Objective Problems. Kluwer Academic Pub-lishers, New York (2002).

22. Pappa, G.L., Freitas, A.A., Kaestner, C.A.A.: Attribute Selection with aMultiobjective Genetic Algorithm. In: Proc. of 16th Brazilian Symposium onArtificial Intelligence, Lecture Notes in Artificial Intelligence 2507, Spring-Verlag, (2002), 280-290.

23. Pappa, G.L., Freitas, A.A., Kaestner, C.A.A.: A Multiobjective GeneticAlgorithm for Attribute Selection. In: Proc. of 4 International Conferenceon Recent Advances in Soft Computing (RASC), University of Nottingham,UK, (2002), 116-121.

24. Michalewicz, Z.: Genetic Algorithms + Data Structures = Evolution Pro-grams. 3rd edn. Springer-Verlag, Berlin Heidelberg New York, (1996).

25. Quinlan, J.R.: C4.5: Programs for Machine Learning. Morgan Kaufmann,(1993).

26. Bhattacharyya, S.: Evolutionary Algorithms in Data mining: Multi-Objective Performance Modeling for Direct Marketing. In: Proc of 6th ACM


SIGKDD International Conference on Knowledge Discovery and Data Min-ing (KDD-2000), ACM Press (2000), 465-471.

27. Zitzler, E., Thiele, L.: Multiobjective Evolutionary Algorithms: A Compar-ative Study and the Strength Pareto Approach. In: IEEE Transactions onEvolutionary Computation 3(4), (1999), 257-271.

28. Murphy, P.M., Aha, D.W.: UCI Repository of Machine Learning databases.[http://www.ics.uci.edu/~mlearn/MLRepository.html]. Irvine, CA: Univer-sity of California, Department of Information and Computer Science, (1994).

29. Coello Coello, C.A.: Handling Preferences in Evolutionary MultiobjectiveOptimization: A Survey. In: Proc. of Congress on Evolutionary Computation(CEC-2002), IEEE Service Center, New Jersey (2000), 30-37.

CHAPTER 26

FINANCIAL APPLICATIONS OFMULTI-OBJECTIVE EVOLUTIONARY ALGORITHMS:

RECENT DEVELOPMENTS ANDFUTURE RESEARCH DIRECTIONS

Frank Schlottmann1'2 and Detlef Seese2

1 GILLARDON AG financial software, Research Department,Alte Wilhelmstr. 4, D-75015 Bretten, Germany


2Institute AIFB, University Karlsruhe (TH)D-76128 Karlsruhe, Germany


The area of finance contains many algorithmic problems of large prac-tical interest whose complexity prevents finding efficient solutions. Therange of application in this area covers e. g. portfolio selection and riskmanagement and reaches from questions of real world financial interme-diation to sophisticated research problems. Since there is an urgent needto solve these complex problems heuristic approaches like EvolutionaryAlgorithms are a potential toolbox. The application of Multi-ObjectiveEvolutionary Algorithm concepts to this area has started more recentlycompared to the vast majority of other application areas, as e. g. de-sign and engineering. We give a brief survey on promising developmentswithin this field and discuss potential future research directions.

26.1. Introduction

It is one of the goals of computational finance to develop methods andalgorithms to support decision making. Unfortunately, many problems ofpractical or theoretical interest are too complex to be solvable exactly by adeterministic algorithm in reasonable computing time, e. g. using a methodthat applies a simple closed-form analytical expression. Such problems re-quire approximation procedures which provide sufficiently good solutionswhile requiring less computational effort compared to an exact algorithm.Heuristic approaches are a class of algorithms which have been developed

627

628 F. Schlottmann and D. Seese

to fulfil these requirements in many problem contexts, see e. g. Fogel &Michaelwicz1 or Hromkovic2 for the general methodology and Schlottmann& Seese3 for an overview of heuristic algorithm applications to financialproblems. Moreover, Chen's book4 contains a selection of mainly single-objective Evolutionary Algorithm applications in the finance area.

In contrast to these more general literature surveys, we concentratesolely on Multi-Objective Evolutionary Algorithm (MOEA) applicationsin finance in the following text. The main advantage of MOEAs is theirability to investigate many objectives/goals at the same time. Hence, theyoffer many possibilities to support decision making particularly in financewhere a majority of naturally multi-criteria problems have been consideredonly in a simplified single-objective manner for a long time. Since we donot address general concepts and details of standard MOEAs, we refer thereader e. g. to Deb5, Coello et al.6 and Osyczka7 for an introduction as wellas a thorough coverage of this methodology.

The rest of this chapter is structured as follows: In the next section,we point out the complexity of financial problems. Afterwards, we givean introduction to portfolio selection problems in the standard Markowitzsetting. We discuss some successful MOEA applications from the literaturewhich solve different problems related to portfolio selection. The chapterends with a conclusion and potential future research directions.

26.2. A Justification for MOEAs in Financial Applications

Many decision problems in the area of finance can be viewed as some of thehardest problems in economics. This is caused by their strong interrelationwith many other difficult decision problems in business life, by the hugenumber of parameters which are usually involved in these problems andoften by the intrinsic complexity of some of these problems itself. Complex-ity influences financial decision making in many forms. There is the hugenumber of parameters often involved in financial decision problems, finan-cial systems are often highly dynamic and recently there are proofs thateven financial decision problems for simple models have a high algorithmiccomplexity preventing the existence of efficient algorithmic solutions.

Often such complexity results give insights into structural reasons forthe difficulties preventing to support decision making with the help of com-puters. For instance, Aspnes et al.8 showed that already for a very simplemodel of a stock market the complexity of the problem to predict the marketprice depends essentially on the number of trading strategies in comparison

Financial Applications of MOEAs: Recent Developments and Future Research 629

to the number of traders. If there is a large number of traders but theyemploy a relatively small number of strategies, then there is a polynomial-time algorithm for predicting future price movements with high accuracyand if the number of trading strategies is large, market prediction becomescomplex.

Of course, such complexity results require a precise definition of com-plexity. A widely accepted formal definition of complex problems results bycomparing the asymptotic computation times of their solution algorithms.Here the computation time - measured in elementary computation steps -is a function denning for each size of the input, which is measured e.g. asthe number of input variables of the given problem, the number of steps thealgorithm needs to compute the result in a worst case. Such computationtime functions can be compared with respect to their asymptotic growthrates, i.e. comparing the growths of the function neglecting constant fac-tors and possibly a finite number of input sizes. The observation is thatmost problems which can be solved efficiently in practice can be solved viaalgorithms whose asymptotic growth rate is at most polynomial in the in-put size n, i.e. at most nk for a constant k and in most cases k is small.All these problems are gathered in the class P. Unfortunately, for almostall problems of practical or theoretical importance no polynomial time al-gorithms are known - instead of that only exponential time solutions arefound, e.g. with an exponential number 2™ of necessary calculation stepsfor n given input variables of the considered problem. Another observationis that almost all of these problems can be computed in polynomial time bya nondeterministic algorithm. Such algorithms can perform a certain num-ber of computation steps in parallel and choose the shortest computationpath at the end. An equivalent way to describe such algorithms is to allowguessing a solution and the only thing the algorithm has to do in poly-nomial time is to verify that the guessed solution is correct. All problemscomputable in polynomial time via a nondeterministic algorithm build theclass NP and almost all problems of practical importance are included inthis class. It is an outstanding open problem in computer science to decidewhether P = NP holds.

In an attempt to answer this question the class of NP-complete prob-lems was defined and investigated. A problem is defined to be NP-completeif it is in NP and each problem in NP can be reduced to it via a determinis-tic polynomial time algorithm. NP-complete problems are thus the hardestproblems in the class NP. If one finds an algorithm which solves one ofthese problems in polynomial time then all problems in NP can be solved


in polynomial time, hence P = NP. It is widely conjectured, that no suchalgorithm exists and this conjecture is one of the most famous problems incomputer science which is open already for more than two decades.

The really surprising fact is not the existence of NP-complete problems,but the fact that almost all problems of practical interest belong to this classof problems: Until now there are thousands of NP-complete problems in allareas of application and there is no known algorithm which requires only apolynomial number of computational steps depending on the input size nfor an arbitrarily chosen problem that belongs to this class of problems. Soit is not surprising that recently it could be proved that also many problemsin finance belong to the class of NP-complete problems, since they have acombinatorial structure which is equivalent (with respect to polynomial-time reductions) to well-known NP-complete problems, e. g. constrainedportfolio selection and related questions of asset allocation (which are con-sidered later in this chapter) are equivalent to the following problem whichis known to be NP- complete.

KNAPSACK: Given is a finite set U is given together with positiveintegers s(u) (the size of u) and v{u) (the value of u) for each element u 6 U,a positive integer B as size constraint and a positive integer K as value goal.Find a subset U' CU such that £)«€£/' s(u) < B and J2uev v(u) > K-

More details on knapsack problems and a large collection of furthercomplexity results can be found e. g. in Garey & Johnson9, see alsoPapadimitriou10 for the formal definitions of computational complexity andKellerer et al.11 for a contemporary monograph on knapsack problems. Anillustrative formulation of knapsack problems in portfolio selection is givenin the next section, whereas e. g. Seese & Schlottmann12'13 provide corre-sponding complexity results.

The main consequence of the above mentioned complexity results is thatwe require approximation algorithms that yield sufficiently good solutionsfor complex finance problems and consume only polynomial computationalresources measured by the size of the respective problem instance (e. g. num-ber of independent variables). For some complex problem settings and undercertain assumptions, particularly linearity or convexity of target functionsin optimization problems, there are analytical approximation algorithmswhich provide a fast method of finding solutions having a guaranteed qual-ity of lying within an e-region around the globally best solution (s). If theconsidered problem instance allows the necessary restrictions for the appli-cation of such algorithms, these are the preferred choice, see Ausiello et al.14

for such considerations. However, some applications in finance require non-


linear, non-convex functions (e. g. valuation of financial instruments usingnon-linear functions), and sometimes we know only the data (parameters)but not the functional dependency between them, so there is nevertheless aneed for methods that search for good solutions in difficult problem settingswhile spending only relatively small computational cost. This is the justi-fication for heuristic approaches like MOEAs, which unlike conventionalalgorithms, allow imprecision, uncertainty as well as partial truth and canhandle multiple objectives in a very natural manner. These requirements arematching many real-world search and optimization problems. MOEAs offerespecially on the basis of their evolutionary part adaptability as one of theircharacteristic features and thus permit the tracking of a problem through achanging environment. Moreover on the basis of their multi-objective partthey allow flexible decisions of the management on the basis of the actuallypresent information. Hence they are an interesting tool in a complex anddynamically changing world. The next section contains some examples ofsuch heuristic approaches to complex financial problems.

26.3. Selected Financial Applications of MOEAs

26.3.1. Portfolio Selection Problems

All MOEA approaches which will be discussed later in this subsection focuson portfolio selection problems or related questions. To give a brief introduc-tion of this application context, we concentrate on standard Markowitz15

portfolio selection problems first.Given is a set of n £ N financial assets, e. g. exchange traded stocks.

At time to € M., each asset i has certain characteristics describing its futurepayoff: Each asset i has an expected rate of return Hi per monetary unit(e. g. dollars) which is paid at time t\ 6 E, t\ > to- This means if we takea position in y £ K units of asset 1 at time to our expected payoff in t\will be Hi y units. Moreover, the covariances between the rates of returnof all assets are given by a symmetric matrix £ := Wij)i,je{i,...,«}• In thisstraightforward notation, an is the variance of asset i-th's rate of returnand Gij is the covariance between asset i-th's rate of return and asset j-th'srate of return.

A portfolio is defined by a vector x :— (xi,..., xn) G IRn which containsthe weight x, € R of asset i e {1 , . . . , n} in its i-th component.

In the standard problem formulation, the weights of a portfolio are


normalized as follows:

j> = i (c.i)»=i

Depending on the specific problem context, there are additional restric-tions on the weights, e. g. lower bounds (a common constraint is Xi > 0),upper bounds and/or integrality constraints. This topic will be addressedin more detail later. At this point, it is sufficient to denote the set of allunconstrained portfolios by S C 1™ and the set of feasible portfolios whichsatisfy the required constraints by F C 5. If the specific portfolio selectionproblem is unconstrained, one can simply assume F = S.

Usually, at least two conflicting target functions are considered: A returnfunction freturnix) which is to be maximized and a risk function frisk{x)which is to be minimized. In the standard Markowitz setting these functionsare defined as follows:

n

freturnix) \= ^ xi Mi (C2)

i=l

n n

frisk (x) := 2 XI X i xi °{i (C-3)i = l j=l

The above definition of freturn resembles the fact that the expected rateof return of a portfolio is the weighted sum of the assets' expected rate ofreturn. And in the above specification of frisk the standard deviation ofthe portfolio rate of return is chosen as a risk measure which describes thelevel of uncertainty about the future payoff at time ti.

In the context of portfolio management, a feasible portfolio x £ F isdominated by a feasible portfolio y G F iff at least one of the following twoconditions is met:

freturnix) < freturniv) A frisk(x) > frisk{v) (C.4)

freturnix) < freturniv) A friskix) > friskiy) (C.5)

As rational investors prefer non-dominated portfolios over dominatedportfolios, one is usually interested in finding an approximation of theso-called efficient frontier which is identical to the set of all feasible non-dominated portfolios. In the standard finance literature, this is formulatedas a constrained single objective problem: Given a rate of return r*, find afeasible portfolio x* £ F satisfying

freturnix*) = r* A frisk(x*) = min{/ri.*(a:)}. (C.6)xEF


If there are no integrality constraints or other restrictions which raisethe complexity such problems can be solved using standard Quadratic Pro-gramming algorithms (under the assumption that £ is positive definite).Prom a computational complexity point of view, this is equivalent to solv-ing a knapsack-like problem using real-valued decision variables - in theknapsack problem formulation in section 26.2 we considered binary deci-sion variables, hence the complexity is different (lower) here although theobjective function is not linear.

By considering two objective functions instead of modelling an objec-tive function constraint, we obtain a quite natural problem formulation fora MOEA approach which allows more flexibility concerning both the objec-tive functions and the constraints on the portfolios. And we point out thatthe question of finding non-dominated portfolios raised above can easilybe extended to a multi-period problem where the payoff at each additionalfuture point of time t2, t3,..., tm, m e N, Mi £ { 1 , . . . , m} : U £ E is con-sidered separately. This results in 2 • m objective functions to be optimized.

In the following subsections we will summarize several applications ofMOEAs in this context. Particularly, we will describe the deviation fromthe above Markowitz problem setting, the genetic modelling, the chosengenetic variation operators and the parameter sets used in empirical testsof the methodology.

26.3.2. Vederajan et al.

The article by Vederajan et al.16 contains different applications of Ge-netic Algorithm (GA) methodology to portfolio selection problems in theMarkowitz context. At first, the authors consider the standard problem ofportfolio selection from the previous section and add the constraint

\/i£{l,...,n}:0<Xi<xmax (C.7)

where xmax 6 ffi-|_ is a constant.Besides a single-objective GA approach using a weighted sum of the

frisk and freturn objective functions from section 26.3.1 which we do notconsider here, Vederajan et al. also propose a MOEA approach searching fornon-dominated feasible invididuals with respect to the two objectives. Theyuse the Non-dominated Sorting Genetic Algorithm (NSGA) from Srinivas& Deb17 based on the following genetic representation of the Xi variables:Each decision variable is represented by a binary string of fixed length lCOnst,which represents the weight of the asset in the portfolio. The strings of alldecision variables are concatenated such that the resulting genotype of each


individual consists of a binary gene string of length n • I const- It has to beemphasized here that this genetic modelling restricts the search space to adiscrete subset of W1:

F : = | a ; 6 {O.d.ca • - , . . . , Q C O B , , _ 1 - ^ - J } | £ > < = 1 | (C.8)

Here the constants c; > 0 are chosen together with lconst such that Xi > 0(trivial) and Xi < xmax is assured. To incorporate the summation con-straint from equation (C.I) into the algorithm, Vederajan et al. propose arepairing procedure for infeasible individuals derived from Bean18: The Xivalues of an infeasible individual are sorted in descending order to obtaina permutation TT(I') of the decision variables. Using this permutation onestarts with the highest value given by xn(k) for k := 1 and raises k succes-sively until Xa=i xn(i) > 1 f°r the minimum k. Knowing this value k onesets

{ xn(j) if j < k,

1 - Eti1 **« iij = k, (C.9)0 otherwise.

This repairing operation is applied each time an infeasible individual isgenerated (e. g. after random initialization of the first population).

The selection operator used for reproduction of individuals in the NSGAis standard binary tournament, and the genetic variation operators areone-point-crossover with crossover probability pcross '•= 0.9 and a standardbinary complement mutation operator applied with probability pmut :=0.01 to each single bit in the gene string.

Diversity preservation in the population is achieved by a niching ap-proach using the sharing function

n(J \ ._ J 1 ~ ( ^77 ) ti dxy < Sconst,gyaxy) • - < . (o.iuj

( 0 otherwise.Here, dxy is the Euclidean distance between the fitness function values of agiven individual x and a given individual y. sconst is the maximum acceptedvalue of dxy for two arbitrary individuals which belong to the same niche.

Vederajan et al. perform several experiments with stock market dataparticularly consisting of the historical asset price means and covariancesfor Boeing, Disney, Exxon, McDonald's and Microsoft stocks from January1991 to December 1995. Their NSGA application to the Markowitz problem


described above yielded a well-converged approximation of many Pareto-optimal solutions within 100 population steps. Each population contained1000 individuals.

Concerning their application of a MOEA to a quadratic optimizationproblem instead of using standard quadratic programming approaches, Ved-erajan et al. point out an interesting fact which the authors of this chapteralso encountered when it came to real-world portfolio selection problems:As it has already been mentioned in section 26.3.1, the covariance matrix (is required to be positive definite to apply standard quadratic programmingalgorithms. If there are numerical issues (e .g. numerical imprecision due torounding and/or floating-point arithmetic), this assumption might be vio-lated. Moreover, a violation is not unlikely for real-world data, particularlywhen n gets large, since the covariances are estimated from real asset pricetime series which do not necessarily satisfy a priori given restrictions of themathematical tool for portfolio analysis. Thus, a MOEA approach is evensuitable for such a standard problem setting.

In addition to the above results, Vederajan et al. also consider a variantof their Markowitz problem setting where transaction cost due to changes ina portfolio (rebalancing) are an additional ingredient which causes problemsfor standard quadratic programming algorithms. Thus, the authors applytheir NSGA approach again using a third objective function which is to beminimized:

n

fcost(x) :- ^2ci(xi - x{f, (C.ll);=i

where X{ G E . is the given initial weight of asset i in the portfolio that is tobe changed potentially due to rebalancing transactions, and the constantCi £ R is the transaction cost for asset i.

The above NSGA approach is again applied to the given five asset prob-lem instance, just the number of individuals per population is raised to1500. Vederajan et al. illustrate the three-dimensional boundary of the ap-proximated solutions in the objective function space and give a reasonableinterpretation for the shape of the approximated Pareto front.

As a summary, the work by Vederajan et al. contains an early appli-cation of the MOEA methodology to portfolio selection problems and pro-vides even a practical justification for the application of this methodologyto standard Markowitz problem settings where quadratic programming ap-proaches are often considered to be mandatory. Furthermore, an interestingapplication of a MOEA to portfolio selection problems with transaction cost


is illustrated.

26.3.3. Lin et al.

In their study, Lin et al.19 consider the following variation of the standardMarkowitz problem from section 26.3.1: Each asset can only be held innonnegative integer units, i. e.

S:={x:=(xu...,xn) | Vi e { 1 , . . . , « } : a* eNU{0}} (C.12)

The market price of one unit of asset % which can be bought is pi. Thereis an upper limit Ui on the maximum monetary value which is invested intoeach asset i, i. e.

Vie{l,...,n} :piXi<Ui. (C.13)

Furthermore, there are capital budget thresholds Co,Ci € K+ and thetotal capital to be invested is required to satisfy the condition

n

Co<]Tpia:i<Ci. (C.14)i=i

Summarizing the above constraints, we obtainn

F:= {(x1,...,xn) e S \ Co <^PiXi < Ci A'ii e {1,... ,n} : piXi < m}

(C.15)For each asset i € { 1 , . . . , n) there is a variable transaction cost of c; £ E

which is proportional per unit of asset i that is to be bought. In addition,there is a fixed transaction fee of Fi which is also due in case that asset i isput into the portfolio. The transaction cost considerations yield a differentreturn objective function:

, n n

freturn(x) ~ ™T (^((H - d) Pi Z» - ^ Fi 1{*i>0}) (C.16)

where the characteristic function is denned as

! („>.» = { ; X>>0: (C17)[0 otherwise.

The risk function is the variance as in the standard portfolio selectionsetting, but written in terms of the integer xt variables and the prices p;:

.. n n

friskix) := j^n r YlY2PiXi cixi aii (C-18)KL.i=iPiXi) i=lj=1


Lin et al. suggest a hybrid algorithm for the approximation of feasiblenon-dominated portfolios which combines elements of the MOEA NSGA-IIby Deb et al.20 with concepts used in the single-objective EA GENOCOP byMichalewicz & Janikow21. Their hybrid algorithm is structured as follows:

(1) Run NSGA-II to determine a feasible initial population Pop(0) forthe succeeding steps. The NSGA-II run is terminated if all individ-uals are feasible.

(2) Use NSGA-II to determine minx£f{frisk{%)}and rri&xX£F{fretv.rn{x)} and insert the corresponding individualx into Pop(0).

(3) Apply NSGA-II using Pop(0) and stop if popmax populations havebeen processed.

The basis is the NSGA-II using integer-valued genes to represent theXi decision variables. Due to the feasibility constraints, Lin et al. suggesttwo special preprocessing stages to obtain a completely feasible initial pop-ulation particularly including the boundary solution(s) having minimumrisk and maximum return objective function value. For clarity, we omit theadditional optimization problems which arise in the first two steps and con-centrate on the algorithmic elements used by Lin et al. in their NSGA-IIimplementation.

The standard binary tournament selection implemented in the NSGA-II is performed on each population. To incorporate the level of constraintviolation into the standard NSGA-II tournament rule which allows the com-parison between individuals both by domination and by degree of constraintviolation, the following natural definition of the degree of constraint viola-tion g(x) is used:

[o c o <ELiP^ i<Gi ,<?(*):=< Co-LIUWa* C0>TLiPiXu (C19)

[ Ei=l Pi xi ~ Cl Ei=l Pi Xi>Ci.

As a crossover operator, a modified simulated binary crossover (SBX,cf. Deb & Agarwal22) is chosen. It works as follows: Given are two parentindividuals x,y e S which are used to create two offspring individualsa, b € S. A random scaling factor /% is drawn independently and identicallydistributed for each gene (for more details, see e. g. Deb5, pp. 109-110), andthen the offspring are determined by

a, := 0.5 ((1 + Pi)Xi + (1 - Pi)yi) (C.20)


6< :=0.5((l-A)a:i + (l + A)y0 (C.21)

After performing this SBX operation, the offspring individuals do not nec-essarily consist of integer-valued genes. Thus, Lin et al. use the followingstrategy for each gene to obtain integer allele values:

f \ai\ or \a{\ + 1 chosen randomly if pi (|~a;] + 1) < Ui,(C.22)

r^-1 otherwise.

The same applies to bi after performing the above SBX operator. Usingthe ideas from GENOCOP, the two offspring are checked for violation ofconstraint (C.14). If both are feasible, they are accepted, otherwise theyare dropped and the whole crossover procedure is repeated using the twogiven parents x and y.

Moreover, the adaptation from formula (C.22) is also applied to eachgene after performing the mutation operator for which the standard pa-rameter based mutation for real-valued genes is used.

Lin et al. point out that they have found empirical evidence that inthe context of their problem restoring the feasibility of individuals afterperforming the genetic variation operators is essential for improving theefficiency of the evolutionary search process.

In their empirical test a sample portfolio consisting of 31 stocks from theHang Seng index is considered. The constrained portfolio selection problemis solved using the following parameters for the NSGA-II in the algorith-mic steps (1) and (2): 200 individuals per population, crossover probabilityPcross : = 0.95, mutation probability pmut := „ *nes. In the third run ofthe NSGA-II, they use pcross :- 0.4, pmut := 0.2 and popmax := 3000populations. Lin et al. compare the final individuals found by a run ofthe MOEA to the globally optimal solutions for the corrresponding uncon-strained portfolio selection problem instance without transaction cost andreal-valued variables (i. e. the standard Markowitz problem instance), andthe exact solution is approximated well by the MOEA. For the constrainedproblem, they illustrate the deviation of the approximations found by theMOEA for the Hang Seng instance from the globally optimal solutions ofthe corresponding unconstrained problem which is due to the integralityconstraints and the transaction cost.

Summarizing the study, Lin et al. have considered a variant of theMarkowitz portfolio selection problem where additional constraints raisedthe complexity such that a MOEA approach seems reasonable. They con-structed a hybrid algorithm which combined ideas from the single-objective


GENOCOP algorithm to repair infeasible solutions with the NSGA-II al-gorithmic framework.

26.3.4. Fieldsend & Singh

The article by Fieldsend k Singh23 transfers the concepts of finding a wholePareto front of non-dominated individuals concerning a risk and a returnobjective function to the prediction of time series by Artificial Neural Net-works (ANNs). As the main focus in this volume is on MOEAs we refer thereader to Schlottmann & Seese3 for a short introduction to ANN as well asa recent overview of financial applications of the latter methodology andfurther references.

For our considerations below, it is sufficient to know that an ANN isused as a non-linear regression tool by Fieldend & Singh to perform assetprice time series prediction: Before asset trading commences at day t - 1(t € N) the following data is considered for a single risky asset: the unknownopening asset price popen(t - 1) at day t - 1 and the unknown highest dailyasset price which will occur at day t, denoted by Phigh(t)- The goal is topredict

V- Popen\i ~ *-) )

where c := 0.993 is a constant which is derived using a hypothetical tradingstrategy for the asset incorporating transaction cost. This is described inmore detail later. Since the prediction for y(t) is made before knowing therealization of popen(t — 1) and Phigh(t) the a priori prediction y(t) is notnecessarily matching the true ex post outcome y(t) at day t. This yields aforecast error which is measured by the commonly used Root Mean SquaredError (RMSE) over k € N observations:

1 k

RMSE ~ - J2(y(t) - y(t))2 (C24)\| K t=i

To obtain a prediction y(t) the known data from the previous ten tradingdays before t—1 is used to calculate y(t-2), y(t — 3), . . . ,y(t —11), and thesevalues together with the five previously made predictions y(t - l),y(t -2),. . . ,y(t — 5) are the input for the ANN which performs a non-linearregression to predict the dependent variable y(t).

The key idea of Fieldsend & Singh is to consider two separate functionsdescribing the prediction error: The first function freturn measures the re-turn of a trading strategy that buys and sells the asset depending on the

(c.23)


value of y(t), the last asset price movements on day t and the riskless inter-est rate for a bank deposit which is considered as an investment alternativeto the risky asset. The calculation of the fretUm function value can be sum-marized as follows: The risky asset is bought if the forecast of the next day'shighest asset price is 1.7% above or more than 99.3% of today's openingprice and if today's lowest price is < 99.3% of today's opening price. Ify(t + 1) > 1.017 then the asset is sold as soon as the market price is 1.7%above the price paid for buying the asset thus realizing a profit. Otherwise,the asset is sold at the end of the same trading day it was bought, andthe outcome of the trade can be either profit or loss. If no trade is made,the risk-free interest rate is earned by the investor. The function value offreturn represents the outcome of the trading strategy depending on theprediction y(t) and on the ex post realized asset prices.

The second function friSk covers the prediction risk and is identicalto the RMSE from formula (C.24). Based on the two objective functions,Fieldsend & Singh are interested in finding a front of non-dominated ANNmodels which predict the asset price time series. In their study, they high-light the analogy of their considerations to the Capital Asset Pricing Model(CAPM) proposed independently by Sharpe, Lintner and Mossin (see e. g.Sharpe24) which is an extension of the Markowitz work described in sec-tion 26.3.1. The bottom line is that they use a Markowitz-like definition ofrisk and return objective functions and search for a non-dominated Paretofront of non-linear regression models rather than searching a capital marketequilibrium as stated by the CAPM. Thus, their study fits exactly into thepicture from section 26.3.1.

The non-dominated solutions concerning the two objective functionsare approximated by a combination of ANN methodology and the StrengthPareto Evolutionary Algorithm (SPEA) which is described in Zitzler &Thiele25. Fieldsend & Singh report the results of an empirical study with thefollowing settings for SPEA: The search population (standard population)contains 80 individuals (ANNs), and there are up to 20 individuals chosenfrom a secondary unconstrained elite population for performing SPEA'sbinary tournament selection within each generation. A total of popmax :=2000 population steps is performed.

The standard one-point-crossover is chosen as the first variation oper-ator applied with probability pcrOss '-= 0.8 to two selected individuals. Inaddition, the mutation operator is performed by adding the product Zmut

of three independent random variables Z\ £ [0,1], Z2 6 [0,1] and Z3 £ K tothe mutated gene such that Zmut := Z\ • Z2 • Z$ and Z\, Z2 are distributed


uniformly, Z3 is normal distributed with zero mean and variance 10%. Therandom variable Zmut is symmetric and has zero mean as well as a highdegree of kurtosis (« 10). The mutation probability is 10%.

The data used in an empirical study is the Dow Jones Industrial Averageindex within 2500 trading days between 28/2/1986 and 3/1/2000. Thisdata is divided into 25 time windows each of which consists of the first1000 trading days that are used in the ANN training (i.e. for calibratingparameters in the non-linear regression) and of the following 100 tradingdays used for an out-of-sample prediction performance test. The objectivefunction values are calculated seperately on the training and on the testdata for each of the 25 time windows.

Fieldsend & Singh compare the resulting profit of the hybrid ANN-MOEA algorithm to the profit of a naive random walk prediction (y(t) =y(t - 2)) and to the compounded daily asset return which reflects the out-come of a buy-and-hold strategy for the asset over the chosen time period.They conclude that the respective hybrid algorithm's prediction model out-performs the buy-and-hold strategy in terms of profit on the given data setif one chooses e.g. the model from the approximated Pareto front whichhas maximum objective function value of freturn- The same holds for themodel from the approximated Pareto front which has minimum objectivefunction value of frisk, and this also applies to the 'middle' model which isthe model in the middle of the respective approximated Pareto front. Aninteresting additional observation is that while both the first and the lastmentioned model also outperforms the naive random walk prediction onthe data set, the prediction risk minimizing model does not. This meansthat strictly minimizing frisk does not necessarily lead to excess values offreturn compared to more risky strategies which is an analogy to the viewsof neo-classical capital market theory (of course in a different context heresince the prediction's risk and return are considered).

Summarizing their study, Fieldsend & Singh apply a hybrid algorithm toan asset price time series. The goal is to predict the asset price movementsby different non-linear ANN regression models which are non-dominatedconcerning the return resulting from the prediction as well as the predictionrisk. As the shape of the set of non-dominated solutions is a priori unknownand depends heavily on the empirical data which changes over time (cf. theextreme movements in stock prices which happened recently), a MOEA-based approach (here: SPEA) is appropriate. A heuristic approach is alsoreasonable here due to the results of Aspnes et al. which were mentioned insection 26.2: In the real world stock market there are many traders having


many different strategies, thus stock market prediction is a complex task.

26.3.5. Schlottmann & Seese

In Schlottmann & Seese26'27>28'29 we have developed a hybrid algorithm forsolving a portfolio selection problem which is relevant to real-world banking.It is substantially different from the original Markowitz problem: Given is abank which has a fixed supervisory capital budget C g l This is an upperlimit for investments into a portfolio consisting of a subset of n given assets(e. g. n loans to be given to different customers of the bank) each of whichis subject to the risk of default (credit risk). Besides an expected rate ofreturn / i j £ R similar to the Markowitz problem setting, each asset i alsohas an a priori expected default probability pi £ (0,1) and a net exposureei e l f within a fixed risk horizon [0, T]. The expected rate of return fn isnot adjusted for the default risk (this will be addressed later).

If asset i defaults until time T, the bank will lose the amount of e ,and this loss event is expected to happen with probability pi. Otherwise,if asset i does not enter default within the period of time [0, T], the bank'sloss from this asset will be equal to 0.

The search space of potential investment decisions for the bank withoutrespecting the capital budget C is given by

S:={x:=(Xl,...,xn) | V i e { 1 , . ..,n} : n € { 0 , e ; } } . (C.25)

Thus, we consider binary-style decision variables since the bank has todecide whether the whole net exposure is to be held in the portfolio. Fur-thermore, if and only if asset i is held in the portfolio, the bank has toallocate a supervisory capital amount of Wi • e, (wi 6 K_|_ is the supervisorycapital weight) from its scarce resource C. This implies the constrainedsearch space

F:=ixeS\ ^2wiXi<c\. (C.26)

In contrast to the standard Markowitz setting, the return objectivefunction has to be adjusted for default risk, and we consider a monetaryobjective function value rather than a relative (percentage) value due toreal-world banking objectives:

n n n

freturn{x) ~ ^ Hi Zj - ^ Pi Xi = (fM ~ Pi) %i (C.27)i = l i—1 i = l


The objective function measures the net expected return of the portfolio xn

which is adjusted for the whole expected loss from x, i. e. J2 Pixi •2 = 1

Since the portfolio loss due to defaults has a highly skewed and asym-metric probability distribution, the variance risk measure used in theMarkowitz setting is not appropriate here. Instead, in many banks, thefollowing Credit-Value-at-Risk measure is used to quantify the unexpectedloss for the bank due to the default risk: For a given portfolio structurex the Credit-Value-at-Risk (CVaR) at the arbitrary, but fixed confidencelevel a € (0.5,1) is obtained by calculating

nCVaR(x) := ~l{a) -J^pHi (C.28)

i=i

where ip~1(a) is the a-percentile (inverse) of the cumulative distributionfunction of aggregated losses calculated from the portfolio x from givenparameters (i. e. ei,pi and further parameters not relevant here, see e. g.Schlottmann & Seese29 for more details).

To account for the real-world objective of banks, we set frisk{x) '•—CVaR(x). From a computational perspective, this immediately raises prob-lems for standard non-linear optimization methods, since it can be showneasily that CVaR(x) is a non-convex function (cf. e. g. Schlottmann30,p. 127 f.). Moreover, in conjunction with the binary-style decision variablesand the knapsack-like constraint (cf. formula (C.26)) one obtains a dis-crete constrained search space which contains many local optima and twoconflicting objective functions.

Hence, a MOEA-based approach is appropriate here. The decision vari-ables Xi are concatenated to form a gene string consisting of real-valuedalleles. Of course, the binary-style decision variables could also be repre-sented by binary digits, but for sake of simplicity, we assume real-valuedgenes coding the decision variables in the following text. In the geneticrepresentation, the decision variables are ordered such that the variablesfor assets which are highly correlated concerning default events are locatedvery close to each other, see Schlottmann & Seese26 for more details andan example. The main goal of the permutation of the decision variables inthe gene string is a better performance of the chosen one-point-crossovervariation operator which has a lower probability of destroying good partialsolutions for the highly correlated decision variables if there are very fewcut positions between them.

We combine elements from different MOEAs in our approach with a

644 F. Schlottmann and D. See.se

gradient-based search method. The basic algorithm is similar to the NSGA-II by Deb et al.20 for binary-style decision variables, but we added an exter-nal elite population which contains the best non-dominated feasible indi-viduals found so far at each population step to obtain more approximationsolutions after termination of the algorithm without the need to raise thenormal population size. The individuals are selected from the populationfor reproduction in a standard binary tournament comparable to NSGA-II.We use the standard one-point-crossover with probability pcross '•= 0.9, andthe standard mutation operator for binary-style variables (i. e. Xi — e; ismutated to Xi = 0 and vice versa) with a mutation probability of pmut := ^per gene.

In addition to these standard MOEA elements, we apply a third localsearch variation operator with probability 0 < pi0Cai < 0.2 (this parameterchoice will be discussed later) to each selected individual after finishing thecrossover and mutation operation:

If x £ F Then Direction := — 1Else Choose Direction G {1,-1} with uniform probability 0.5Vz G {1 , . . . ,n} : xt := XiStep := 0Do

Vi G { 1 , . . . , n} : Xi :— £iAold) __ r , x .(Old) . _ f ( \

Jreturn -~ J return\J') •> Jrisk '~ Jrisky-L)

For each x-j calculate the partial derivatives d,- := jp- ( 7'1""? ^ )

If Direction = — 1 ThenChoose the minimal gradient component i := argmin{dj \XJ > 0}

£i := 0Else

Choose the maximal gradient component i := argmax {dj \XJ = 0 }

£i := aEnd IfAnew) . _ , /-x Anew) . _ , (~\Jreturn -~ JreturnK-L/: J risk '~ Jnsky-1)

Step:= Step + 1While (Step < Stepmax) A (3i : xt > 0) A (3j : Xj = 0) A

(x ^ current population) A (x £ elite population) A[ (Direction = - l A ^ F ) Vt/r\- i- i « - T-I\ . /Anew) ^ j.(o/d) . , j.(neu;l _ /•(o!d)\\i

((Direction = lAxeF)A (PretJn > freturn V frisk < frisk))]Vi e {1 , . . . , n} : Xi := fj


At first, the feasibility of the current individual x is checked: In caseof an infeasible individual, the knapsack constraint (cf. formula C.26) isviolated, thus the local search procedure has to remove assets from x tomove into the direction of F. Otherwise the Direction of the local searchis selected randomly.

Within each iteration of the Do loop, the partial derivatives of thequotient freturn{x)/frisk(x) are exploited to obtain the decision variablewhich is to be modified. This quotient is known in finance applications asa Risk-Adjusted Performance Measure, and maximizing this measure im-plies maximizing freturn a nd minimizing fTisk- We use a computationallyefficient credit portfolio risk measurement model (CreditRisk+ from Credit-Suisse Financial Products31) which yields a fast approximation of all partialderivatives d\,..., dn for a given portfolio x within a total computation timeof only O(n). This saves time since the exploitation of the neighbourhoodof a solution x that can be obtained by evaluation of all individuals, whichdiffer from x only in one allele value, would require O(n2) computationalsteps in our setting. Of course, one has to point out that the exploitationof the gradient is a heuristic approach here since we calculate the change offrisk(x)I freturn(x) for infinitely small changes of the decision variables andactually change the decision variables in a large discrete step afterwards.

The local search operation is iterated at most Stepmax £ N times, andby empirical tests we recommend a value of 0 < Stepmax < 4 depending onthe problem instance (again, this parameter setting will be discussed later).Moreover, the local search iteration is terminated if no decision variable isleft to be changed, if an a priori infeasible solution has become feasible, orif an a priori feasible solution would either become infeasible or could notbe improved at least in one of the two objectives.

Besides the fact that the gradient-based local search is rather heuristic,the empirical results from using this problem-specific variation operator arestriking: In our empirical studies, we applied the hybrid MOEA approachto several portfolios, with the number of assets ranging from 20 < n < 386.The portfolios correspond to typical real-world loan portfolios from Ger-man banks. To obtain a true judgement of the hybrid algorithm's perfor-mance, we compared the approximated feasible non-dominated solutionsto the globally optimal solutions provided by a complete enumeration ofthe search space for small problem instances. In these comparisons, thehybrid algorithm found a well-converged approximation of the globally op-timal solutions within seconds or a few minutes while the enumerationtook hours. Moreover, we compared the hybrid MOEA ceteris paribus to


its MOEA counterpart without the additional local search variation op-erator (piocai — 0) and performed 50 independent runs for the respectiveproblem instance to obtain e. g. average performance measures for eachalgorithm. A summarizing observation is that the hybridization yielded anaverage improvement of about 17% to 95% for the set coverage metric cri-terion (cf. Zitzler25) which measures the dominance between solutions intwo approximation sets in relation to their cardinality, and the maximumspread of solutions in the objective function space was raised by about 1%to 6% on average. Moreover, the standard deviation of the performancemeasures over the 50 respective runs was lower for the hybrid approach.

For setting the parameters piocai and Stepmax we recommend the fol-lowing guideline: If the problem is very discrete in its nature (i. e. n fa 30or lower), piocai should be chosen very low (e. g. pi0Cai — 0.01), since thegradient is a rather imprecise measure for the actual changes in objectivefunction values depending on variations of the decision variables (this isalso indicated by empirical results). For higher dimensionality, the problemtypically gets smoother in the objective function values even if the decisionvariables are binary, thus the gradient-based local search should be appliedto more individuals in the population. For such instances, we obtained goodresults with piocai values up to 0.2. The considerations for Stepmax are sim-ilar, but in general this parameter should be kept small compared to n,otherwise the local search operator might run into the same local optimaagain and again, even if it starts from portfolios which are far away fromeach other both in the objective function and in the decision variable space.

Our approach can be summarized as a problem-specific MOEA hy-bridized with gradient-based local search which is applied to a discretenon-convex constrained multi-criteria problem from the context of portfo-lio credit risk management. It contains elements and ideas from differentMOEAs as well as from classical quantitative methods in finance.

26.4. Conclusion and Future Research Directions

In the previous sections we discussed several successful applications ofMOEAs to portfolio management problems which are very common in fi-nance, particularly in real-world asset management and trading. All theapproaches discussed so far in this chapter incorporate problem-specificknowledge - besides the standard MOEA elements - to obtain a more pow-erful algorithm compared to a straightforward MOEA application.

From a theoretical point of view this is not surprising: It is a well known


result from complexity theory that there is no uniform problem solving ap-proach which can be successfully applied to all possible algorithmic prob-lems while guaranteeing an efficient solution. Hence, MOEAs cannot besuch a tool, either. Also from a practitioner's perspective, one cannot ex-pect that a simple straightforward application of the basic ideas underlyingMOEAs necessarily leads to success. Instead, it is highly recommended tohybridize MOEAs with other search algorithms and problem-specific meth-ods since both the multi-objective as well as the evolutionary approach offerhigh potential for a successful hybridization. The applications summarizedin this chapter are good examples of such algorithms. We refer the readeralso to Takada et al.32 and Mukerjee et al.33 for further interesting resultsin the context of portfolio selection.

Of course, reasonable applications of MOEAs to financial problems as-sume that the respective problem under consideration is actually multidi-mensional in its objectives. At first, this might seem to be a trivial state-ment, but particularly in economics and finance, this requirement is con-trary to the tendency to build models as simple as possible. As a conse-quence of this tendency, many financial problems are modelled in a single-objective way even if they are naturally multi-objective. The portfolio se-lection problem denned in section 26.3.1 is a typical example.

Thus, to extend the focus of MOEAs to a broader range of financialapplications beyond portfolio selection problems - this is what we considerto be mission-critical for a better acceptance of MOEAs from finance re-searchers' point of view - one has to look for problems which are not suf-ficiently solved by considering simple (i. e. single-objective) models and/orstandard algorithms. Potential candidates are particularly those problemswhich require simultaneous calibration of several input parameters or mul-tiple model outputs, e. g.

• the calibration of a multi-factor model for the term structure ofinterest rates to empirical data given for all maturities (e. g. 1year, 2 years,..., 10 years),

• the calibration of the parameters of a non-linear, non-smooth valu-ation function for an exotic option contract to given empirical assetprice histories,

• building valuation models from basic building blocks for complexstructured finance products for which a closed-form solution hasnot been found yet.

Such problems are really multi-dimensional in the decision variables and the


objective functions. Moreover, they typically incorporate constraints, non-linear and/or non-convex functions as well as other issues which cannot behandled by standard methods. Hence, more flexible solution methods likeMOEAs which do not assume strict mathematical properties offer interest-ing potential not only to solve a single given instance of a problem, but alsoto enhance our understanding of the problem's structure. For instance, ifwe obtained a whole set of Pareto-efficient solutions to a difficult parame-ter calibration problem we might better understand the dependencies andtrade-offs between the parameters and the objective functions. And thisis very valuable support for the finance model architect to design bettermodels.

Due to the flexibility of MOEAs one can potentially build more real-istic financial models or respectively, models which are more suitable forpractical applications compared to classroom-tailored approaches that areconstrained to analytical solution procedures (cf. similar considerations inGoldberg34). For instance, compare our approach from section 26.3.5 to thestandard Markowitz problem. While our MOEA-based approach allows theuse of the Credit-Value-at-Risk which is a very common risk measure incurrent practice, the latter approach based on the portfolio variance is notsuitable in the context of asymmetric asset return probability distributionsoccurring in credit risk.

Identifying appropriate problems, building corresponding applicationsand finding the added value beyond solving the respective problem instancein the above sense will be a key challenge both for MOEA researchers andthe finance community in the near future.

Moreover, we think that it is mission-critical for the success of MOEAsin the long run that their theoretical foundation can be provably enhanced.Cotta and Moscato35 point out some central questions for general evolu-tionary computation, which of course also apply to MOEAs in finance sothey should be stated in our context, too:

• Identify multi-objective NP- complete finance problems for whichMOEAs have proved not to be competitive against the best otherheuristic or approximation algorithm known for those problems.

• Identify which financial problems can be approached using aMOEA paradigm and identify the reasons.

• For the problems matching the previously listed two items, it will beimportant to find links to the theory of computational complexityand to the complexity classes these problems belong to. Of special


interest here are approximability and, in particular, parameterizedcomplexity (see Downey & Fellows36 and Fellows37).

It is very important for real-world applications to know some conditionsunder which MOEAs can be applied successfully. Especially in the area offinance where often deep, mathematically founded theories are applied it isimportant to know why a specific heuristic method succeeds. Furthermore,it is important to prove the performance guarantee. An important questionin this context is how the success depends on structural parameters?

In Seese & Schlottmann12'13'38'39 we discussed a structural criterionimplying high complexity of algorithmic problems. This criterion, denotedas ABC-criterion, is also relevant in the area of finance and especially riskmanagement, and we think that it could be of interest to investigate howthe structural parameters size of embeddable grids, homogeneity and flowof information identified in our studies could influence the performance ofMOEAs for different problems.

We hope that the interaction between the MOEA community and fi-nance researchers as well as professionals will be intensified by this surveyon selected financial MOEA applications and by the potential future re-search activities pointed out above. Hopefully, more applications will bedeveloped in a broader range of financial problem contexts compared tothe nevertheless promising results obtained so far.

26.5. Acknowledgement

The authors would like to thank GILLARDON AG financial software forpartial support of their work. Nevertheless, the views expressed in thischapter reflect the personal opinion of the authors and are neither officialstatements of GILLARDON AG financial software nor of its partners or itsclients.

References

1. David Fogel and Zbiginiew Michalewicz. How to solve it: Modern heuristics.Springer, Heidelberg, 2000.

2. J. Hromkovic. Algorithmics for hard problems. Springer, Heidelberg, 2001.3. Frank Schlottmann and Detlef Seese. Modern heuristics for finance problems:

a survey of selected methods and applications. In S. Rachev and C. Marinelli,editors, Handbook on Numerical Methods in Finance. Springer, Berlin, 2004.

4. S. Chen. Evolutionary computation in economics and finance. Springer, Hei-delberg, 2002.


5. Kalyanmoy Deb, Multi-objective optimisation using evolutionary algorithms.John Wiley k. Sons, Chichester, 2001.

6. C. Coello, D. Van Veldhuizen, and G. Lamont. Evolutionary Algorithms forsolving multi-objective problems. Kluwer, New York, 2002.

7. Andrzej Osyczka. Evolutionary algorithms for single und multicriteria designoptimization. Physica, Heidelberg, 2002.

8. J. Aspnes, D. Fischer, M. Fischer, M. Kao, and A. Kumar. Towards under-standing the predictability of stock markets. In Proceedings of the 12th An-nual ACM-SIAM Symposium on Discrete Algorithms, pages 745-754. 2001.

9. Michael Garey and David Johnson. Computers and intractability. W. H. Free-man k. Company, New York, 1979.

10. Christos Papadimitriou. Computational complexity. Addison-Wesley, Read-ing, 1994.

11. H. Kellerer, U. Pferschy, and D. Pisinger. Knapsack problems. Springer, Hei-delberg, 2004.

12. Detlef Seese and Frank Schlottmann. The building blocks of complexity: aunified criterion and selected problems in economics and finance. 2002. Syd-ney Financial Mathematics Workshop, http://www.qgroup.org.au/SFMW.

13. Detlef Seese and Frank Schlottmann. The building blocks of complex-ity: a unified criterion and selected applications in risk management.2003. Complexity 2003: Complex behaviour in economics, Aix-en-Provence,http://zai.ini.unizh.ch/www_complexity2003/doc/Paper _Seese.pdf.

14. Giorgio Ausiello, Pierluigi Crescenzi, Giorgio Gambosi, and Viggo Kann.Complexity and approximation. Springer, Heidelberg, 1999.

15. Harry Markowitz. Portfolio selection. Journal of Finance, 7:77ff, 1952.16. Ganesh Vedarajan, Louis Chan, and David Goldberg. Investment portfolio

optimization using genetic algorithms. In John Koza, editor, Late BreakingPapers of the Genetic Programming 1997 Conference, pages 255-263. Stan-ford University, California, 1997.

17. N. Srinivas and K. Deb. Multiobjective function optimization using nondom-inated sorting genetic algorithms. Evolutionary Computation, 2(3):221-248,1995.

18. J. Bean. Genetic algorithm and random keys for sequencing and optimization.ORSA Journal of Computing, (6):154-160, 1994.

19. Dan Lin, Shouyang Wang, and Hong Yan. A multiobjective genetic algorithmfor portfolio selection. 2001. Working Paper, Institute of Systems Science,Academy of Mathematics and Systems Science Chinese Academy of Sciences,Beijing, China.

20. Kalyanmoy Deb, Samir Agrawal, Amrit Pratap, and T. Meyarivan. A fastelitist non-dominated sorting genetic algorithm for multi-objective optimisa-tion: NSGA-II. In M. Schoenauer, K. Deb, G. Rudolph, X. Yao, E. Lutton,J. Merelo, and H. Schwefel, editors, Parallel problem solving from nature,LNCS 1917, pages 849-858. Springer, Berlin, 2000.

21. Z. Michalewicz and C. Janikow. Genocop: A genetic algorithm for numericaloptimization problems with linear constraints. Communications of the ACM,(12):118, 1996.


22. K. Deb and R. Agarwal. Simulated binary crossover for continuous searchspace. Complex Systems, 9(2):115-148, 1995.

23. Jonathan Fieldsend and Sameer Singh. Pareto multi-objective non-linearregression modelling to aid capm analogous forecasting. In Proceedings ofthe IEEE/INNS Joint International Conference on Neural Networks (IN-CNN'02), World Congress on Computational Intelligence, Vol. 1, pages 388-393. Honolulu, Hawaii, 2002.

24. William Sharpe. Capital asset prices, a theory of market equilibrium underconditions of risk. Journal of Finance, 19:425-442, 1964.

25. Eckart Zitzler and Lothar Thiele. An evolutionary algorithm for multiob-jective optimization: the strength pareto approach. Technical Report 43,Eidgenoessisch-Technische Hochschule, Zuerich, 1998.

26. Frank Schlottmann and Detlef Seese. A hybrid genetic-quantitative methodfor risk-return optimisation of credit portfolios. In Carl Chiarella andEckhard Platen, editors, Quantitative Methods in Finance 2001 Confer-ence abstracts, page 55. University of Technology, Sydney, 2001. Fullpaper: http://www.business.uts.edu.au/finance/resources/qmf2001/Schlott-mann_F.pdf.

27. Frank Schlottmann and Detlef Seese. Hybrid multi-objective evolutionarycomputation of constrained downside risk-return efficient sets for creditportfolios. 2002. 8th International Conference of the Society for Computa-tional Economics: Computing in Economics and Finance, Aix-en-Provence,http://www.cepremap.ens.fr/sce2002/papers/paper78.pdf.

28. Frank Schlottmann and Detlef Seese. Finding constrained downside risk-return efficient credit portfolio structures using hybrid multi-objective evolu-tionary computation. In G. Bol, G. Nakhaeizadeh, S. Rachev, T. Ridder, andK. Vollmer, editors, Credit risk, pages 231-266. Physica, Heidelberg, 2003.

29. Frank Schlottmann and Detlef Seese. A hybrid heuristic approach to discreteportfolio optimization. Computational Statistics and Data Analysis, 2004. Toappear.

30. Frank Schlottmann. Komplexitaet und hybride quantitativ-evolutionaereAnsaetze im Kreditportfoliorisikomanagement (in German). PhD thesis, Uni-versity Karlsruhe, Karlsruhe, 2003.

31. CreditSuisse Financial Products. CreditRisk+(tm). 1997.http://www.csfp.co.uk/creditrisk/assets/creditrisk.pdf.

32. Y. Takada, M. Yamamura, and S. Kobayashi. An approach to portfolio se-lection problems using multi-objective genetic algorithms. In In Proceedingsof the 23rd Symposium on Intelligent Systems, pages 103-108. 1996.

33. A. Mukerjee, R. Biswas, K. Deb, and A. Mathur. Multi-objective evolutionaryalgorithm for the risk return trade-off in bank loan management. TechnicalReport 2001005, KanGAL, Kanpur, India, 2002.

34. David Goldberg. Genetic and evolutionary algorithms in the real world. Tech-nical Report 99013, Department of General Engineering, University of Illi-nois, Urbana, March 1999.

35. Carlos Cotta and Pablo Moscato. Evolutionary computation: challenges andduties, pages 1-15, 2002. Preprint of Dept. Lenguajes y Ciencias de la Com-


putation, Universidad de Malaga and The School of Electrical Engineeringand Computer Science, University of Newcastle.

36. Rod Downey and Mike Fellows. Parameterized complexity theory. Springer,Berlin, 1999.

37. Mike Fellows. Parameterized complexity: The main ideas and connections topractical computing. Electronic Notes in Theoretical Computer Science, 61,2002. http://www.elsevier.nl/locate/entcs/volume61.html.

38. Detlef Seese and Frank Schlottmann. Large grids and local information flowas a reason for high complexity. In Gerry Frizelle and Huw Richards, editors,Tackling industrial complexity: the ideas that make a difference, Proceedingsof the 2002 Conference of the Manufacturing Complexity Network, pages 193-207. University of Cambridge, Cambridge, UK, 2002.

39. Detlef Seese and Frank Schlottmann. Structural reasons for high complexity:A survey on results and problems, pages 1-160, 2003. University Karlsruhe,unpublished manuscript.

CHAPTER 27

EVOLUTIONARY MULTI-OBJECTIVE OPTIMIZATIONAPPROACH TO CONSTRUCTING NEURAL NETWORK

ENSEMBLES FOR REGRESSION

Yaochu Jin, Tatsuya Okabe and Bernhard Sendhoff

Honda Research Institute EuropeCarl-Legien-Str.30, 63073 Offenbach, Germany


Neural network ensembles have shown to be very effective in improvingthe performance of neural networks when they are used for classificationor regression. One essential issue in constructing neural network ensem-bles is to ensure that the ensemble members have sufficient diversity intheir behavior. This chapter suggests a multi-objective approach to gen-erating diverse neural network ensemble members. A genetic algorithmis used to evolve both the weights and structure of the neural networks.Besides, the R-prop learning algorithm is employed for efficient life-timelearning of the weights. Different complexity criteria, such as the num-ber of connections, the sum of absolute weights and the sum of squaredweights have been adopted as an additional objective other than the ap-proximation accuracy. Ensembles are constructed using the whole set ora subset of found non-dominated solutions. Various methods for select-ing a subset from the found non-dominated solutions are compared. Theproposed multi-objective approach is compared to the random approachon two regression test problems. It is found that using a neural networkensemble can significantly improve the regression accuracy, especiallywhen a single network is not able to predict reliably. In this case, themulti-objective approach is effective in finding diverse neural networksfor constructing neural network ensembles.

27.1. Introduction

It has been shown that neural network ensembles are able to improve thegeneralization performance both for classification and regression17'3. Thebenefit of using a neural network ensemble originates from the diversityof the behavior of the ensemble members. Basically, diversity of ensemble

653

654 Y. Jin et al

members can be enhanced by using various initial random weights, varyingthe network architecture, employing different training algorithms or sup-plying different training data20. In some cases, it is also possible to increasenetwork diversity by generating training data from different sources. Forexample, the geometry of an object can be represented by parametric ornon-parametric methods. Thus, different sources of training data can beobtained for describing certain performance of the same object.

Compared to the above-mentioned methods that achieve diversity im-plicitly, methods for explicitly encouraging diversity among ensemble mem-bers have been widely studied in the recent years. Measures for increasingdiversity include a diversity index16, degree of decorrelation19, or degree ofnegative correlation14-15 between the output of the candidate networks.

Individual neural networks in an ensemble can be trained either inde-pendently, sequentially and simultaneously11. In the first case, neural net-works are generated separately and no interaction between the networkswill be taken into account in training. In the second case, neural networksare generated sequentially. However, the correlation between the currentnetwork and the existing ones will be considered too to encourage diver-sity. In the third case, neural networks are trained simultaneously, not onlyminimizing the approximation error, but also encouraging diversity amongindividual networks. Obviously, in the latter two approaches, diversity istaken into account explicitly. It is believed that one possible disadvantageof simultaneous training is that the networks in the population could becompetitive11.

In training single neural networks, regularization techniques havewidely been employed to improve the generalization performance of neu-ral networks3. A general idea is to include an additional term in the costfunction of learning algorithms, often known as the regularization, to avoidoverfitting the training data. Actually, most diversity based methods forgenerating ensembles can also be seen as a kind of regularization techniques.

From the multi-objective optimization point of view, adding a regular-ization term in the cost function is equivalent to combining two objectivesusing a weighted aggregation formulation. Thus, it is straightforward tore-formulate the regularization techniques as multi-objective optimizationproblems. Such ideas have been reported4. In that chapter, a variation ofthe e-constraint algorithm was adopted to obtain one single Pareto-optimalsolution that simultaneously minimizes the training error and the norm ofthe weights. Similar work has also been reported1, where a multi-objectiveevolutionary algorithm is used to minimize the approximation error and the

Evolutionary Multi-Objective Optimization Approach to Constructing Ensembles 655

number of hidden nodes of the neural network. Again, only the one with theminimal approximation error has been selected for final use. In addition,multi-objective optimization has been employed to evolve neural networkmodules in a cooperative co-evolution framework to increase diversity ofthe modules7.

This chapter presents a method for generating a set of Pareto-optimalneural networks for constructing neural network ensembles. The geneticalgorithm with Lamarckian inheritance for evolving neural networks9 isadapted to the multi-objective optimization purpose. To this end, the elitistnon-dominated sorting and the crowded tournament selection suggested6

are adopted for fitness assignment and selection. The whole obtained non-dominated set or a subset of it is used to construct neural ensembles. Theperformance of the ensembles are compared on two test problems. Ensem-bles whose members are generated using the multi-objective approach isalso compared to those whose member networks are generated indepen-dently. It is shown that the performance of the ensembles depends to alarge degree on the features of the training, validation and test data.

27.2. Multi-Objective Optimization of Neural Networks

27.2.1. Parameter and Structure Representation of theNetwork

A connection matrix and a weight matrix are employed to describe thestructure and the weights of the neural networks. Obviously, the connectionmatrix specifies the structure of the network whereas the weight matrixdetermines the strength of each connection. Assume that a neural networkconsists of M neurons in total, including the input and output neurons,then the size of the connection matrix i s M x ( M + l ) , where an elementin last column indicates whether a neuron is connected to a bias value. Inthe matrix, if element dj,i — l,...,M,j — 1,...,M equals 1, it means thatthere is a connection between the i-th and j-th neuron and the signal flowsfrom neuron j to neuron i. If j = M + 1, it indicates that there is a biasin the i-th neuron. Obviously, for a purely feedforward network, the upperpart of the matrix, except the M + 1-th column is always zero. Fig. 27.1illustrates a connection matrix and the corresponding network structure.It can be seen from the figure that the network has one input neuron, twohidden neurons, and one output neuron. Besides, both hidden neurons havea bias.

The strength (weight) of the connections is defined in the weight matrix.

656 Y. Jin et al

Fig. 27.1. A connection matrix and the corresponding network structure.

Accordingly, if the c,j in the connection matrix equals zero, the correspond-ing element in the weight matrix must be zero too.

27.2.2. Objectives in Network Optimization

The most common objective function (also known as the error functionor the cost function) in training or evolving neural networks is the meansquared error (MSE):

E = ^ I > d W - ^ ) ) 2 > (B.I)

where N is the number of training samples, yd{i) is the desired output ofthe i-th sample, and y(i) is the network output for the i-th sample. For thesake of clarity, we assume here that the neural network has only one output.Other error functions, such as Minkowski error or cross-entropy can also beused.3

It has been found that neural networks can often over-fit the trainingdata, which means that the network has a very good approximation ac-curacy on the training data, but a very poor one on unseen data. Manymethods have been developed to improve the generalization performance ofneural networks. 3 A very popular technique to improve the generalizationperformance is known as regularization, which usually adds a penalty termto the error function:

J = E + Xfl, (B.2)

where A is a coefficient that controls the extent to which the regularizationinfluences the optimal solution, and 0, is known as the regularizer. A simpleclass of regularizers is to penalize the sum of squared weights, also known


as the Gaussian regularizer, which favors smooth output of the network:

1 k

where k is an index summing up all weights. Alternatively, the sum ofabsolute weights, also known as the Laplace regularizer can be used:

n = Y^\wi\. (B.4)i

The Gaussian regularizer and the Laplace regularizer are also known asweight decay in neural network training.

In neural network training using regularization techniques, it is often amatter of trial-and-error to determine the coefficient A, although methodshave been developed to optimize the coefficient based on empirical, alge-braic or Bayesian estimation of the generalization error on the validationdata.21 This situation is quite easy to understand from the multi-objectivepoint of view. For each given A, one single Pareto-optimal solution will beobtained. Obviously, the regularization technique in equation (B.2) can bereformulated as a bi-objective optimization problem:

min{/1, /2} (B.5)

h = E, (B.6)

h = n, (B.7)where E is defined in equation (B.I), and fi is one of the regularizationterm denned in equation (B.3) or (B.4).

Basically, the weight decay techniques try to reach a good trade-offbetween the complexity of neural networks and the approximation accuracyto avoid overfitting the training data. Another straightforward index formeasuring the complexity of neural networks is the sum of connections- inthe network:

" = EEc^- (B-8)* 3

Obviously, the smaller the number of connections in a network is, the lesscomplex the network. Note that this regularizer is well suited for evolution-ary optimization although it is not applicable to gradient-based learningalgorithms due to its discrete nature. For this reason, we term it as evolu-tionary regularization.

In the following study, the sum of connections, the sum of absoluteweights and the sum of squared weights are employed as the second objec-tive in optimization.

(8.3)

658 Y. Jin et al

27.2.3. Mutation and Learning

A genetic algorithm with a hybrid of binary and real-valued coding hasbeen used for optimizing the structure and weights of the neural networks.The genetic operators used are quite specific. Four mutation operators areimplemented on the chromosome encoding the connection matrix, namely,insertion of a hidden neuron, deletion of a hidden neuron, insertion of aconnection and deletion of a connection.9 A Gaussian-type mutation isapplied on the chromosome encoding the weight matrix. One of the fivemutation operators is randomly selected and performed on each individual.No crossover has been employed in this algorithm.

After mutation, an improved version of the Rprop algorithm10 has beencarried out to train the weights. This can be seen as a life-time learningwithin a generation. After learning, the fitness of each individual with re-gard to the approximation error (/i) is updated. In addition, the weightsmodified during the life-time learning are also encoded back into the chro-mosome, which is known as the Lamarckian type of inheritance2.

In the life-time learning, only the first objective, i.e., the approximationerror will be minimized. The Rprop learning algorithm is employed in thiswork because it is believed that the Rprop learning algorithm is faster andmore robust compared with other gradient-based learning algorithms.

Let u>ij denotes the weight connecting neuron j and neuron i, then thechange of the weight (Aw,j) in each iteration is as follows:

A ) = -SK1^H?' <B-9>where sign(-) is the sign function, A y > 0 is the step-size, which is ini-tialized to Ao for all weights. The step-size for each weight is adjusted asfollows:

{ £+ • A(4"1} if a^"1 ' • »^!1 > n

A^~ ' , otherwise

where 0 < £~ < 1 < £+. To prevent the step-sizes from becoming too largeor too small, they are bounded by Am;n < Ajj < Amaa:.

One exception must be considered. After the weights are updated, it isnecessary to check if the partial derivative changes sign, which indicatesthat the previous step might be too large and thus a minimum has been

(B.10)


missed. In this case, the previous weight change should be retracted:

aro«., = _ a r U ^ . ^ < 0 . (B.11)3 OWij OWij

Recall that if the weight change is retracted in the i-th iteration, thedE^/dwij should be set to 0.

It is argued that the condition for weight retraction in equation (B.ll) isnot always reasonable.10 The weight change should be retracted only if thepartial derivative changes sign and if the approximation error increases.Thus, the weight retraction condition in equation (B.ll) is modified asfollows:

AwW = - A ^ 1 ' , if —^ • < 0 and if £ « > E^K (B.12)13 dwij awij

It has been shown on several benchmark problems that the modifiedRprop (termed as Rprop+) exhibits consistent better performance thanthe Rprop.10

27.2.4. Elitist Non-Dominated Sorting and CrowdedTournament Selection

After mutation and life-time learning, the offspring and the parent popula-tions are combined. Then, a non-domination rank (r,) and a local crowdingdistance (di) are assigned to each individual in the combined population assuggested.6 After that, the crowded tournament selection6 is implemented.In the crowded tournament selection, two individuals are randomly pickedout from the combined population. If individual A has a higher (better)rank than individual B, individual A is selected. If they have the samerank, then, the one with a better crowding distance (the one locating in aless crowded area) is selected. Compared to the fitness sharing techniques,the crowded tournament selection gurantees that the one with a better rankis selected. The crowding distance can be calculated either in the parameteror objective space. In this work, the distance is computed in the objectivespace.

27.3. Selecting Ensemble Members

So far, the size of ensembles is often determined empirically, with a fewexceptions.22'23 A genetic algorithm is used to select a subset of the finalpopulation as ensemble members.22 In another work23, a genetic program-ming has been employed to search for an optimal ensemble size.

660 Y. Jin et al

Selecting a subset from a given number of networks can also be convertedinto finding out the optimal weight for each candidate network based ona certain criterion. Given TV neural networks, the final output of the en-semble can be obtained by averaging the weighted outputs of the ensemblemembers:

fc=i

where j/fe) and a^ are the output and its weight of the k-ih neural networkin the ensemble. Usually, all weights are equally set to 1/N, and the overalloutput is known as simple average. If the weights are optimized based on acertain criterion, the overall output is then called weighted average. Givena set of validation data, the expected error of the weighted output of theensemble can be calculated by:

F^^flPCy, (C.2)1=1 j=l

where Cij is the error correlation matrix between network i and network jin the ensemble:

Cij = E[(yi - yfXyj - yf)], (C.3)

where E(-) denotes the mathematical expectation.It has been shown17 that there exists an optimal set of weights that

minimizes the expected prediction error of the ensemble:

v N (r, W(fc) _ Lj=i^k3) (c 4)

a - N N , ^ .4 jl^%=\ 2^7=1 v ° u ;

where 1 <i,j,k < N.

However, a reliable estimation of the error correlation matrix is notstraightforward because the prediction errors of different networks in anensemble are often strongly correlated. Alternatively, the recursive least-square method can be employed to search for the optimal weights. 22 Othermethods have also been proposed to solve this problem. 12-24

In this investigation, a canonical evolution strategy is employed to findthe optimal weights to minimize the expected error in equation C.2.

In the multi-objective optimization approach to generating neural net-work ensemble members, the easiest way is to select all non-dominatedsolutions found in the optimization as ensemble members. In the following

(c.1)


empirical investigations, we compare three cases. In the first case, all non-dominated solutions found in the final population are used to construct anensemble. In the second case, a well distributed subset of the non-dominatedsolutions are selected by hand. Finally, the criterion in equation (C.2) isminimized using an evolution strategy based on a validation data set.

27.4. Case Studies

27.4.1. Experimental Settings

The population size of the GA used for evolving neural networks is 100and the optimization is run for 200 generations. In mutating the weights,the standard deviation of the Gaussian noise is set to 0.05. The weights ofthe network are initialized randomly in the interval of [—0.2,0.2] and themaximal number of hidden neurons is set to 10. In the Rprop+ algorithm,the step-sizes are initialized to 0.0125 and bounded between [0,50] in theadaptation, and £~ = 0.2, £+ = 1.2. Note that a number of parametersneeds to be specified in the Rprop+ algorithm, however, the performance ofthe algorithm is not very sensitive to these values.10 In our work, we use thedefault values suggested in reference 10 and 50 iterations are implementedin each life-time learning.

A standard (15,100)-ES has been used to optimize the ensemble weightsin equation (C.I) based on the expected error on the validation data. Theinitial step-sizes of the evolution strategy are set to 0.0001 and the weightsare initialized randomly between 0.005 and 0.01. The weight optimizationhas been run for 200 generations.

27.4.2. Results on the Ackley Function

The simulation study has been first conducted on the 3-dimensional Ackleyfunction.13 100 samples are generated randomly between [—3,3], of whichthe first 80 samples are used as training data, another 10 data are used asvalidation data, and the remaining 10 data samples are used as test data.

In the first case, the approximation error and the number of connectionsdescribed in equation (B.8) are used as two objectives in evolving the neuralnetwork. The non-dominated solutions in the 200-th generation are plottedin Fig. 27.2.

The most straightforward approach is to use all obtained non-dominatedsolutions to construct the ensemble. In the final generation, 40 solutionshave been found to be non-dominated. The MSE of the best and worst sin-gle networks from the 40 solutions, the MSE of the simple average ensemble,

662 Y. Jin et al

Fig. 27.2. Non-dominated solutions when number of connections is used as the secondobjective.

and the MSE with the weights being optimized using the algorithm pre-sented in Section 27.3 are given in Table 27.106. Notice that in calculatingthe MSE of the ensemble on the test data, the weights are those optimizedon the basis of the validation data.

Table 27.106. MSE of the ensemble consisting of all 40 non-dominated solutions,

best single worst single simple average weighted average

validation 0.121 2.29 0.409 0.118test 0.348 2.07 0.179 0.361

It is suggested that it might be better to use a subset of available neuralnetworks than to use all.17'22 For this purpose, different strategies havebeen tried. For example, we can select a "representative" subset from thenon-dominated solutions to construct a neural network ensemble. Anotherpossibility is to select the non-dominated solutions whose MSE error .ontraining data is smaller than a specified value, or to select those whoseMSE on the validation data is smaller than a given value. Fig. 27.3 showsthe 14 heuristically selected representative solutions (filled circles).

The MSE of the best and worst single networks, the MSE of the ensembleusing simple average and weighted average of the 14 representatives on thevalidation as well as the test data are shown in Table 27.107.


Fig. 27.3. 14 selected representatives.

Table 27.107. MSE of the ensemble consisting of 14 heuristically selected members,


validation 0.160 2.28 0.279 0.074test 0.468 2.07 0.236 0.449

Some observations can be made from the results. First, the MSE of theensemble using simple average of the 14 selected representatives is worsethan that using all non-dominated solutions. Second, the ensemble withoptimized weights on the basis of the validation data exhibits better per-formance on the validation data than the one with simple average. Unfor-tunately, its MSE on the test data is larger than that of the ensemble usingsimple average. This implies that validation data set and the test data setmight not have the same statistical characteristics. In this case, it mightbe not practical to optimize the weights based on the validation data forpredicting unseen data sets.

The results for ensemble members selected according to the MSE on thetraining and validation data, respectively, are shown in Table 27.108 and27.109.

From Tables 27.108 and 27.109, it can be seen that the MSE on thetest data of both ensembles are larger than that of the ensemble consistingof the 14 representative networks. Furthermore, good performance on datatraining or validation data does not mean good performance on test data.

664 Y. Jin et al

Table 27.108. MSE of the ensemble consisting of 14 networks whose MSE on the trainingdata is smaller than 0.01.


validation 0.57 1.09 0.84 0.50test 0.34 0.61 0.43 0.42

Table 27.109. MSE of the ensemble consisting of 12 networks whose MSE on validationdata is smaller than 0.50.


validation 0.12 0.57 0.073 0.038test 0.50 1.40 0.535 0.618

Fig. 27.4. Non-dominated solutions when the sum of absolute weights is used as thesecond objective. The shaded circles denotes those selected as a subset for constructingan ensemble.

Optimization of the weights of the ensemble members do not necessarilyreduces the error on the test data.

Next, the sum of absolute weights in equation (B.4) is adopted as thesecond objective in the evolution. The obtained non-dominated solutionsare shown in Fig. 27.4.

Similar to the above simulations, we calculate the best and worst MSEof a single network, the MSE of the ensemble with simple and weightedaverage over all the 32 non-dominated solutions, or over a heuristicallyselected representative subset (the circles filled with a star in Fig. 27.4),


a subset whose MSE on the training data is smaller than 0.1, or a subsetwhose MSE on the validation data is smaller than 0.5. The results arepresented in Tables 27.110, 27.111, 27.112 and 27.113, respectively.

Table 27.110. MSE of the ensemble using all 32 non-dominated solutions.


validation 0.174 1.52 0.336 0.152test 0.637 1.91 0.491 0.558

Table 27.111. MSE of the ensemble consisting of 14 heuristically selected members,


validation 0.410 1.52 0.363 0.152test 0.637 1.91 0.361 0.453



validation 0.460 0.62 0.524 0.460test 0.636 1.72 1.21 1.38

Table 27.113. MSE of the ensemble consisting of 9 networks whose MSE on the validationdata is smaller than 0.5.


validation 0.174 0.46 0.172 0.150test 0.770 1.46 0.560 0.570

From the above results, it can be seen that the use of ensemble is areliable way to reduce the prediction error, although the ensemble qualitymust not be better than the best member in it. The results also suggestthat it is still an open question how to properly select an optimal subsetfrom a set of obtained non-dominated solutions to construct an ensemble.Networks with good performance on either training or validation data setsare not necessarily good candidates for the test data set. The optimiza-tion algorithm presented in Section 27.3 is very effective in minimizing theensemble prediction error on the validation data. However, this does not

666 Y. Jin et al

imply that the MSE on the test data will be reduced too using the optimalweights obtained on the validation data.

Finally, a single objective optimization has been run for 14 times, wherethe MSE on the training data is used as the fitness function. The individualnetworks are generated randomly and no interactions between the networkshave been considered. In generating the networks, all parameter settings arethe same as in the multi-objective case. These 14 neural networks are thenused to construct a neural ensemble and the results on validation and testdata are presented in Table 27.114. The results seem worse than those fromthe ensembles consisting of 14 networks that are generated using multi-objective optimization, as shown in Tables 27.107 and 27.111.

Table 27.114. MSE of the ensemble consisting of 14 networks randomly generated usingthe single objective optimization.


validation 0.270 1.79 0.320 0.220test 0.655 1.81 0.532 0.595

Finally, a number of non-dominated solutions are obtained using theMSE and the sum of squared weights as two objectives, which are shown inFig. 27.5. Simulations have been conducted to study the different methodsfor selecting ensemble members and very similar results are obtained. Thus,these results will not be presented in detail here.

27.4.3. Results on the Macky-Glass Function

In this subsection, neural network ensembles are used to predict the outputof the Mackey-Glass series:

" " ' 1 ^ - ' , ) - * ( D 1 )

where a = 0.2, /? = 0.1, r = 17. The task of the neural ensemble is to predictx(t + 85) using x(t), x(t — 6), x(t — 12), anda:(t —18). According to reference8, 500 samples are generated for training, 250 samples for validation andthe another 250 samples for test.

In the 200-th generation, 34 non-dominated solutions have been found,which are illustrated in Fig. 27.6. All the non-dominated solutions are usedto construct an ensemble. The results from the best and the worst singlenetworks, and those from simple average and weighted average of the en-semble members are provided in Table 27.115. From these results, we notice


Fig. 27.5. Non-dominated solutions when the sum of squared weights is used as thesecond objective.

first that the performance of the simple average ensemble is better than theworst member, but worse than the best one. Another important factor isthat the performance of the ensemble using weighted averaging exhibitsbetter performance than the one with simple averaging not only on valida-tion data, but also on the test data. This indicates that in this example,the validation data are able to reflect the feature of the test data.

Table 27.115. MSE of the ensemble consisting of all 34 non-dominated solutions.


validation 0.0111 0.0488 0.0134 0.0117test 0.0097 0.0518 0.0118 0.0104

As done in the previous Section, a second ensemble is constructed byselecting 14 representative solutions from the 34 non-dominated solutions,which are the filled circles in Fig. 27.6. The results of this ensemble arepresented in Table 27.116.

Next, we construct another two ensembles by selecting the networkshaving the MSE smaller than 0.012 on the training data and on the valida-tion data, respectively. According to this criterion, 6 and 7 networks havebeen selected and the results are given in Tables 27.117 and 27.118.

From these results, it can be seen that no big differences exist between

668 Y. Jin et al

Fig. 27.6. Non-dominated solutions when the number of connections is used as thesecond objective. The filled circles are the representatives.

Table 27.116. MSE of the ensemble consisting of 14 representatives.

single best single worst simple average weighted average

validation 0.0112 0.0488 0.0129 0.0112test 0.0097 0.0518 0.0111 0.0099



validation 0.0112 0.0116 0.0113 0.0112test 0.0097 0.0105 0.0102 0.0097

the various methods for selecting ensemble members. Besides, the ensemblewith weighted average shows consistent better performance than the oneusing simple average. However, the performance of the best single networkis better than that of the ensemble with simple average, and thus the perfor-mance of the ensemble with optimized weighted average is almost the same

Table 27.118. MSE of the ensemble consisting of 7 networks whose MSE on the validationdata is smaller than 0.012.


validation 0.0112 0.0117 0.0114 0.0112test 0.0097 0.0109 0.0102 0.0097


as that of the single best, which makes sense. Nevertheless, the ensembleswith simple average or optimized weighted average show consistently betterperformance than that of the single worst network. Furthermore, ensemblesconsisting of the selected networks based on training or validation error arebetter than those consisting of all or a heuristically selected subset of thenon-dominated solutions. This implies that no significant overfitting occursduring the training.

Finally, 14 networks are generated randomly using single objective opti-mization. The results of this ensemble are shown in Table 27.119. It can beseen that the performance of the ensemble is better than that of the singleworst network but worse than that of the single best. Obviously, diversitydoes not help to improve the performance of the ensemble if no significantoverfitting occurs.

Table 27.119. MSE of the ensemble consisting of 14 randomly generated networks,


validation 0.01 0.0143 0.0115 0.01test 0.0095 0.0133 0.0111 0.0095

Simulations have also been conducted when the sum of absolute weightsor the sum of squared weights serves as the second objective on the Macky-Glass series data. The non-dominated solutions from these optimizationruns are plotted in Fig. 27.7 and Fig. 27.8, respectively. Notice that non-dominated solutions whose MSE on the training data is larger than 0.05are missing from the 200-th generation. This does not mean that such solu-tions do not exist. Rather, this is due to the randomness of multi-objectiveoptimization algorithm introduced by the crowded tournament selection.As discussed in reference 5, such randomness occurs when the number ofnon-dominated solutions in the combined population is larger than the pop-ulation size.

The prediction results of the ensembles constructed from these solutionsare omitted here because they are very similar to those presented abovewhen the number of connections is used as the second objective.

27.5. Discussions and Conclusions

Approximation accuracy and complexity have been used as two objectivesto generate neural networks for constructing ensembles. In the algorithm, adhoc mutations such as node/connection addition and deletion are employed

670 Y. Jin et al

Fig. 27.7. Non-dominated solutions when the sum of absolute weights is used as thesecond objective.

Fig. 27.8. Non-dominated solutions when the sum of squared weights is used as thesecond objective.

without crossover. The Rprop learning algorithm is adopted in life-timelearning and a Lamarckian inheritance has been implemented. In selection,the elitist non-dominated sorting and the crowded tournament selectiontechniques have been used. This algorithm has proved to be effective ingenerating neural networks trading off between accuracy and complexity


Fig. 27.9. Trade-off between the MSE on the validation data and the complexity of theneural networks.

through two test problems.Whereas it is able to improve the performance of the ensemble whose

members have a trade-off between complexity and accuracy if overfittingoccurs, no performance improvement can be expected by use of network en-sembles when the networks do not overfit the training data. In fact, it seemsthat in this case, the network that has the best accuracy on the trainingdata also exhibits the best performance on the test data. Thus, ensembleswith different degrees of accuracy will degrade its performance. Note thatthe proposed method for individual network training belongs to the simul-taneous approach. Due to the explicit trade-off between the complexity andaccuracy, the individuals in a population are competitive, which is harmfulto the performance of the ensemble if the test data has the same feature asthe training data. This can easily be observed by plotting the relationshipbetween the MSE on the validation data and the complexity, as shown inFig. 27.9. It can be seen from the figure that the higher the complexity ofthe network is, the better.

As argued in reference11, it is equally important that the ensemble mem-bers cooperate with each other. To this end, the concept of cooperative co-evolution18 could play a significant role in generating ensemble members.This will be our next research direction.

672 Y. Jin et al

Acknowledgments

The authors would like to thank Edgar Korner and Andreas Richter fortheir kind support.

References

1. H.A. Abbass. Speeding up back-propagation using multiobjective evolution-ary algorithms. Neural Computation, 15(ll):2705-2726, 2003.

2. D.H. Ackley and M.L. Littman. A case for lamarckian evolution. In C.G.Langton, editor, Artificial Life, volume 3, pages 3-10. Addison-Wesley, Read-ing, Mass., 1994.

3. C. M. Bishop. Neural Networks for Pattern Recognition. Oxford UniversityPress, Oxford, UK, 1995.

4. R. de A. Teixeira, A.P. Braga, R. H.C. Takahashi, and R. R. Saldanha.Improving generalization of MLPs with multi-objective optimization. Neuro-computing, 35:189-194, 2000.

5. K. Deb. Multi-objective Optimization Using Evolutionary Algorithms. Wiley,Chichester, 2001.

6. K. Deb, S. Agrawal, A. Pratap, and T. Meyarivan. A fast elitist non-dominated sorting genetic algorithm for multi-objective optimization: NSGA-II. In Parallel Problem Solving from Nature, volume VI, pages 849-858, 2000.

7. N. Garcia-Pedrajas, C. Hervas-Martinez, and J. Munoz-Perez. Multi-objective cooperative co-evolution of artificial neural networks (multi-objective cooperative networks). Neural Networks, 15:1259-1278, 2003.

8. E. Hartmann and J.D. Keeler. Predicting the future: Advantages of semilocalunits. Neural Computation, 3(4):566-578, 1991.

9. M. Hiisken, J. E. Gayko, and B. Sendhoff. Optimization for problem classes -Neural networks that learn to learn. In Xin Yao and David B. Fogel, editors,IEEE Symposium on Combinations of Evolutionary Computation and NeuralNetworks (ECNN 2000), pages 98-109. IEEE Press, 2000.

10. C. Igel and M. Hiisken. Improving the Rprop learning algorithm. In Pro-ceedings of the 2nd ICSC International Symposium on Neural Computation,pages 115-121, 2000.

11. Md. M. Islam, X. Yao, and K. Murase. A constructive algorithm for train-ing copperative neural network ensembles. IEEE Trasactions on Neural Net-works, 14(4):820-834, 2003.

12. D. Jimenez. Dynamically weighted ensemble neural networks for classifica-tion. In Proceedings of International Joint Conference on Neural Networks,pages 753-756, Anchorage, 1998. IEEE Press.

13. Y. Jin, M. Olhofer, and B. Sendhoff. A framework for evolutionary optimiza-tion with approximate fitness functions. IEEE Transactions on EvolutionaryComputation, 6(5):481-494, 2002.

14. Y. Liu and X. Yao. Negatively correlated neural networks can produce bestensemble. Australian Journal of Intelligent Information Processing System,4(3-4):176-185, 1997.


15. Y. Liu, X. Yao, and T. Higuchi. Evolutionary ensembles with negative corre-lation learning. IEEE Transactions on Evolutionary Computation, 4(4):380-387, 2000.

16. D.W. Opitz and J. W. Shavlik. Generating accurate and diverse membersof a neural network ensemble. In Advances in Neural Information ProcessingSystems, volume 8, pages 535-541, Cambridge, MA, 1996. MIT Press.

17. M.P. Perrone and L.N. Cooper. When networks disagree: Ensemble meth-ods for hybrid neural networks. In R. J. Mammone, editor, Artificial NeuralNetworks for Speech and Vision, pages 126-142. Chapman & Hall, London,1993.

18. M.A. Potter and K.A. De Jong. Coperative coevolution: An architechture forevolving coadapted subcomponents. Evolutionary Computation, 8(l):l-29,2000.

19. B. E. Rosen. Ensemble learning using decorrelated neural networks. Connec-tion Science, 8(3-4):373-384, 1996.

20. A.J.C. Sharkey and N. E. Sharkey. Diversity, selection and ensembles of arti-ficial neural nets. In Proceedings of Third International Conference on NeuralNetworks and their Applications, pages 205-212, March 1997.

21. S. Sigurdsson, J. Larsen, and L. K. Hansen. On comparison of adaptive reg-ularization methods. In Proceedings of the IEEE Workshop on Neural Net-works for Signal Processing, volume 10, pages 221-230, 2000.

22. X. Yao and Y. Liu. Making use of population information in evolutionary arti-ficial neural networks. IEEE Transactions on Systems, Man, and Cybernetics-Part B:Cybernetics, 28(3):417-425, 1998.

23. B.-T. Zhang and J.G. Joung. Building optimal committee of genetic pro-grams. In Parallel Problem Solving from Nature, volume VI, pages 231-240.Springer, 2000.

24. Z.-H. Zhou, J.-X. Wu, Y. Jiang, and S.-F. Chen. Genetic algorithm basedselective neural network ensemble. In Proceedings of the 17th InternationalJoint Conference on Artificial Intelligence, pages 797-802, Seattle, 2001. Mor-gan Kaufmann.

CHAPTER 28

OPTIMIZING FORECAST MODEL COMPLEXITY USINGMULTI-OBJECTIVE EVOLUTIONARY ALGORITHMS

Jonathan E. Fieldsend and Sameer Singh

Department of Computer Science, University of ExeterNorth Park Road, Exeter, EX4 4QF, UK


When inducing a time series forecasting model there has always beenthe problem of defining a model that is complex enough to describe theprocess, yet not so complex as to promote data 'overfitting' - the so-called bias/variance trade-off. In the sphere of neural network forecastmodels this is commonly confronted by weight decay regularization, orby combining a complexity penalty term in the optimizing function.The correct degree of regularization, or penalty value, to implement forany particular problem however is difficult, if not impossible, to knowa priori. This chapter presents the use of multi-objective optimizationtechniques, specifically those of an evolutionary nature, as a potentialsolution to this problem. This is achieved by representing forecast model'complexity' and 'accuracy' as two separate objectives to be optimized.In doing this one can obtain problem specific information with regards tothe accuracy/complexity trade-off of any particular problem, and, giventhe shape of the front on a set of validation data, ascertain an appropriateoperating point. Examples are provided on a forecasting problem withvarying levels of noise.

28.1. Introduction

The use of neural networks (NNs), specifically multi-layer perceptrons(MLPs), for classification and regression is widespread, and their continu-ing popularity seemingly undiminished. This is not least due to their muchvaunted ability to act as a 'universal approximator' - that is given sufficientnetwork size, any deterministic function mapping can be modelled. This istypically done where the process (function) is unknown, but where exampledata have been collected, from which the estimated model is induced.

675

676 J.E. Fieldsend & S. Singh

Seasoned practitioners will however know that the great amenability ofNNs is a double edged sword. It is difficult, if not impossible, to tell a priorihow complex the function you wish to emulate is, therefore it is difficult toknow how complex your NN design should be. Too complex a model de-sign (too many transformation nodes/weights and/or large synaptic weightvalues) and the NN may overfit its function approximation. It may startmodelling the noise on the examples as opposed to generalizing the pro-cess, or it may find an overly complex mapping given the data provided.Too few nodes and the NN may only be able to model a subset of the ca-sual processes in the data. Both of these effects can lead a NN to producesubstandard results in its future application.

Various approaches to confront this problem have been proposed sinceNNs have become widely applied; such as weight decay regularization topush the NN weights to smaller values (which keeps them in the linearmapping space),5'27 pruning algorithms to remove nodes,21 complexity lossfunctions31 and topology selection based on cross validation.29

More recently the field of evolutionary neural networks (ENNs) has alsobeen addressing this problem. As the evolutionary approach to training isnot susceptible to the local minima trapping of gradient descent approachesa large number of studies have investigated this approach to NN training,a review of a substantial number of these can be found in Yao.33 A numberof studies enable the evolution of different sized ENNs, with some studiesincluding size penalization22 similar to the complexity loss functions usedin gradient descent approaches. However this leads to the problem of howyou define the penalization - as it implicitly means making assumptionsabout the interaction of model complexity and accuracy of the ENN foryour problem (the trade-off between the two).

Through using the formulation and methods developed in the evolu-tionary multi-objective optimization (EMOO) domain,6'8'14'30 the set ofsolutions that describe the trade-off surface for 2 or more objectives of adesign problem can be discovered. This approach can equally be appliedto ENN training in order to discover the set of estimated Pareto optimalENNs for a function modelling problem, where accuracy of function emula-tion, and complexity of model are the two competing objectives. Previousstudies by Abbass 2'3 have tackled this by formulating complexity in termsof the number of transfer units in an ENN, however his model does noteasily permit the use of other measures of complexity. As such this chap-ter will introduce a general and widely applicable methodology for EMOOtraining of NNs, for discovering the complexity/accuracy trade-off for NN

Optimizing Forecast Model Complexity Using MOEAs 677

modelling problems.The chapter will proceed as follows: a basic outline of the MLP NN

model is provided in Section 28.2 for those unfamiliar with NNs. In Section28.3 the traditional approaches for coping with the bias/variance trade-offare discussed, along with their perceived drawbacks. Section 28.4 presentsthe general evolutionary algorithm approach to NN training, along withrecent work on trading-off network size and accuracy and a new modelto encompass many definitions of complexity. In Section 28.5 a set of ex-periments to validate this new approach are described, using time seriesexhibiting differing levels of noise. Results from these experiments are re-ported in Section 28.6. The chapter concludes with a discussion in Section28.7.

28.2. Artificial Neural Networks

The original motivation behind artificial NNs was the observation that thehuman brain computes in a completely different manner than the stan-dard digital computer,16 which enables it to perform tasks such as patternrecognition and motor control far faster and more accurately than standardcomputation. This ability is derived from the fact that the human brain iscomplex, nonlinear and parallel, and has the additional ability to adapt tothe environment it finds itself in (referred to as plasticity). Artificial NNsdeveloped as a method to mimic these properties, and terms relating to NNdesign (neurons, synaptic weights) are taken from the biological descriptionof the brain function. However, it is generally the case that NNs in popularuse by researchers use only the concepts of parallelism, non-linearity andplasticity within a mathematical framework, and do not attempt to copyexactly the functions of the brain (which are still not fully understood).

The most popular NN model is the multi-layer perceptron (MLP) sincethe formalization of the backpropagation (BP) leaning algorithm in theearly 1980s. The basic design of an MLP is shown in Figure 28.1.

The input signal of an MLP (or feature vector) is propagated throughthe network (neuron by neuron), and transformed during its passage bythe combination of the synaptic weights and mathematical properties ofthe neurons, until on the final layer an output signal is generated. In theexample shown in Figure 28.1 the network is defined as being fully con-nected, each neuron (or node) being connected to each other neuron in thelayers directly preceding and proceeding it, and having a / : 3 : 2 : 1 topo-logical design. That is it has / input nodes, followed by two hidden layers,


Input First Second OutputLayer Hidden Hidden Layer

Layer Layer

Error signals."- — Function signals.

Fig. 28.1. Generic multi-layer perceptron, showing the forward flow of the input signal(function signal) and the backward flow of the error signal.

the first containing 3 nodes and the second 2 nodes, with a single outputnode. The two middle layers are referred to as hidden due the fact that theuser does not commonly observe the inputs or outputs from these nodes(unlike the input layer where the feature vector is known and the outputlayer where the output is observed). The most common transfer functionused in the MLP is the sigmoid function tp(). For the jth hidden node of anetwork with a vector of z inputs its logistic form is defined as:

¥> (z) = / , 1—n \Y (R1)l + exp^-lBj + EîWijZi))

where Wij is the ith input weight between node j and the previous layer,Zi is the output of the ith node in the layer preceding node _;' and Bj is theweight of the bias input to the jth node. The bias is similar to the interceptterm used in linear regression and has a fixed value for all patterns.

The adjustment of the synaptic weight parameter variables within anMLP are most commonly performed in a supervised learning manner usingthe fast backpropagation algorithm. Sequences of input and resultant out-puts are collected from an undefined functional process /(a) *-> b. This setof patterns are then presented to the MLP in order for it to emulate theunknown function. The kth input pattern a(fc) is fed through the network


generating an output b(A;), an approximation of the desired output b (illus-trated with the arrows pointing to the right in Figure 28.1). The differencebetween the desired output b and the actual output b(fc) is calculated (usu-ally as the Euclidean distance between the vectors), and this error term,E, is then propagated back through the network, proportional to the par-tial derivative of the error at that node (illustrated with the dashed arrowspointing to the left in Figure 28.1). An in-depth discussion of the history andderivation of the backpropagation algorithm (and associated delta rule),through the calculus chain, can be found in Bishop5 and Haykin16. Eachpattern in turn is presented to the MLP, with its weights adjusted using thedelta rule at each iteration. Only a fraction of the change demanded by thedelta rule is usually applied to avoid rapid weight changes from one patternto the next. This is known as the learning weight. A momentum term (ad-ditionally updating weights with a fraction of their previous update) is alsocommonly applied. The passing of an entire pattern set through the MLPis called a training epoch. MLPs are usually trained, epoch by epoch, untilthe observed average error of the function approximation reaches a plateau.The generalization ability of the approximated function is then assessed onanother set of collected data which the NN has not been trained on.

In recent years there has been increasing interest in the use of evolution-ary computation methods for NN training.33 In these ENNs the adjustableparameters of an NN (weights and also sometimes nodes) are representedas a string of floating point and/or binary numbers, the most popular rep-resentation being the direct encoding form.33 Given a maximum size for athree layer feed-forward MLP ENN of / input units (features), H hiddenunits in the hidden layer, and O output units, the vector length used torepresent this network within an MOEA is of size:

(I + 1)-H + (H + 1)-O + I + H (B.2)

The first (I + 1) • H + (H + 1) • O genes are floating point and storethe weight parameters (plus biases) of the ENN, the next I + H are bitrepresented genes, whose value (0 or 1) denotes the presence of a unit orotherwise (in the hidden layer and input layer). These decision vectors aremanipulated over time using the tools of evolutionary computation (usuallyevolution strategies (ESs) or genetic algorithms (GAs)). At each time step(known as a generation) the ENNs represented by the new decision vectorsare evaluated on the training data, and selection for parameter adjustmentin the next generation is typically based on their relative error on this


data. The popularity of these approaches to NN training is that they are notsusceptible to trapping in local minima that gradient descent based learningalgorithms are, and in addition, can use quite complex problem specific errorfunctions which may be difficult to propagate using derivatives.

Because of the high function complexity that NNs can emulate, thereis always a risk that the NN will simply map the input and output vectorsdirectly without recourse to creating an internal representation of theirgeneration process. An illustration of this is shown in Figure 28.2.

Fig. 28.2. Overfitting illustration. Explanatory variable a and dependent variable b withnoise. Top: generating function. Bottom: overfitted signal.

In the illustration the model approximator is too complex, and thereforefits exactly to the noisy data points instead of modelling the smoothergenerating process.


28.3. Optimal Model Complexity

Procedures to prevent the over-fitting of NNs can be categorized as fallinginto two broad camps. The first group of methods take the approach thatthe model used may be over specified (have more complexity than is neededto model the problem), but by judicious use of more that one data set inthe training process the risk of over fitting can be minimized. The typemost frequently used is the so called 'early stopping' method, where a anadditional validation data set is used in the training process,25 other moreadvanced methods based on bootstrapping27 are also in use. The secondgroup of methods tackles over-fitting with conscious attempts to restrict thecomplexity of the NN during its training process, sometimes in conjunctionwith early stopping methods.

28.3.1. Early Stopping

There are a number of different approaches to early stopping.25 The tradi-tional approach is to train a network and monitor its generalization erroron a validation set and stop training when the error on this set is seento rise. The general problem with this approach is that the generalizationcurve may exhibit a number of local minima, so the early stopping may infact be too 'early'. In order to overcome this the NN is trained as normal,without stopping until the training error reaches a plateau, at the sametime however the generalization error on a validation set is checked - andthe network parameters when this is lowest recorded and used.

28.3.2. Weight Decay Regularization and SummedPenalty Terms

One of the most common approaches to prevent over-fitting through com-plexity minimization is that of weight decay regularization. This approachattempts to inhibit the complexity of a particular model by restrictingthe size of its weights, as it is known that larger weights values areneeded to model functions with a greater degree of curvature (and there-fore complexity).5 In its standard form the sum of the squares of the NNweights are used as a penalty term within the error function, such that

Enew=E + /3Q (C.I)

where E is the default error function (commonly Euclidean error), 0 is thesum of squares of the NN weights, /? is a weighting term and Enew is thenew error term to be propagated through the NN.


Other approaches have been developed by researchers in the ENN fielduse slightly different summed penalty terms in NN training, for example Liuand Yao22 include a penalty for the size of the network in their compositeerror function.

28.3.3. Node and Weight Addition/Deletion

Node pruning/addition techniques ignore the complexity through theweight value approach of weight decay regularization and some of the othercomplexity penalty term approaches, and instead couch the complexity ofa NN in terms solely of the number of transformation nodes. The sim-plest methodology of this approach is exhaustive search, training manydifferent NNs with different numbers of hidden units and comparing theirperformance against each other. The computation cost of this approach isobviously prohibitive, however it can be constrained to a certain degreeby simply adding an additional node to a previously trained NN, usingthe weights of the previous network as a starting point. This method isdescribed as a growing algorithm approach,5 cascade correlation being an-other.

In Kameyama and Kosugi17 the opposite approach is taken, with a largeNN initially specified, followed by the selective pruning of NN units. Le-Cun et al.21 take a different approach of pruning, again citing that the bestgeneralization is obtained by trading off the training error and networkcomplexity, their method called optimal brain damage (OBD) focuses onremoving NN weights. The basic idea is to choose a reasonable networkarchitecture, train the network until a reasonable solution is obtained us-ing gradient descent methods and compute the second derivative for eachparameter (NN weight). The parameters are then sorted by this saliency,and those parameters with low saliency are deleted. Ragg and Gutjahr26

in contrast use mutual information in their routine for topology determi-nation.

28.3.4. Problems with These Methods

Network growing and pruning methods are usually characterized as beingslow to converge, with long training time and, for those that use of gra-dient descent training techniques, susceptible to becoming stuck in localminima.3 The main criticism directed at weight decay regularization andother penalty term approaches to training, is the problem of how to spec-ify the weighting terms needed by these methods. Just as it is difficult to


ascertain the correct model complexity for a model a priori, so the correctdegree of penalization to include in these adjusted error values is difficultto know beforehand. In addition the weighted sum approach is only able toget all points from a Pareto front when it is convex.7 A demonstration ofthe problem of composite error weight specification is illustrated below inFigure 28.3.

Fig. 28.3. Illustration of the problems inherent in using composite error functions todetermine an operating point.

Figure 28.3 shows three different fronts describing the trade-off betweenaccuracy and complexity. Each with a line tangential to them at the pointwhere the values are equal (equivalent to (3=1 in Equation C.I). As canbe seen in the illustration, depending upon the actual interaction of com-plexity and accuracy exhibited by the process, as described by the curves,three very different models will be returned by using this composite errorweighting. One with high error, low complexity (el,c3), one with interme-diate complexity and error (e2,c2) and a third with low accuracy and lowerror (e3,cl). Again, it must be noted that these results are dependent onthe front shape, which is unknown a priori, but which must be implicitly


guessed at if the composite error approach is used. Of course it is feasible torun the composite error algorithm a number of times to discover the shapeof the front, however the algorithm will need to be run as many times asthe number of points desired, which is very time consuming, and even thennon-convex portions of the front will remain undiscovered.

28.4. Using Evolutionary Algorithms to Discover theComplexity/Accuracy Trade-Off

As discussed in Section 28.3, until recently researchers interested in con-straining the complexity of their models had to assign one or more variables,whose value was known to greatly affect the end model, but whose selec-tion was difficult, if not impossible, to assign without knowing how themodel complexity and accuracy interacts for the specific problem. Insteadof trying to simultaneously optimize these separate objectives by combin-ing complexity and accuracy into a single error value, which is shown to beproblematic, they can be optimized as two separate objectives, through theuse of EMOO techniques. By using this methodology a set of ENNs canbe produced showing the realized complexity/accuracy trade-off for eachproblem.

Before discussing this approach further however, the concept of Paretooptimality needs to be briefly described.

28.4.1. Pareto Optimality

Most recent work in EMOO is formulated in terms of non-dominance andPareto optimality.

The multi-objective optimization problem seeks to simultaneously ex-tremize D objectives:

yi = fl(x), i = l , . . . , D (D.I)

where each objective depends upon a vector x of P parameters or deci-sion variables, in the case of this chapter, ENN weights and nodes. Theparameters may also be subject to the J constraints:

e , (x)>0, j = l , . . . J . (D.2)

Without loss of generality it is assumed that the objectives are to beminimized, so that the multi-objective optimization problem may be ex-pressed as:

minimize y = f(x) = (^ (x) , . . . , /o(x)) (D.3)


subject to e(x) = (ei(x),... ,ej(x)) > 0 (D.4)

where x = (xlt... ,xP) and y = (j/i,... ,yD)-When faced with only a single objective an optimal solution is one which

minimizes the objective given the model constraints. However, when thereis more than one objective to be minimized solutions may exist for whichperformance on one objective cannot be improved without sacrificing per-formance on at least one other. Such solutions are said to be Pareto opti-mal30 after the 19th century Engineer, Economist and Sociologist VilfredoPareto, whose work on the distribution of wealth led to the development ofthese trade-off surfaces.24 The set of all Pareto optimal solutions are saidto form the true Pareto front.

The notion of dominance may be used to make Pareto optimality clearer.A decision vector u is said to strictly dominate another v (denoted u -< v)iff

/ i (u)</ i (v) Vt = l , . . . ,D A 3 i / i ( u ) < / i ( v ) (D.5)

Less stringently, u weakly dominates v (denoted u •< v) iff

/ i (u)</ i (v) Vi = l,...,ZJ. (D.6)

A set of M decision vectors {W;} is said to be a non-dominated set (anestimated Pareto front) if no member of the set is dominated by any othermember:

W i / W , Vi,j = l , . . . ,M. (D.7)

28.4.2. Extent, Resolution and Density of EstimatedPareto Set

There are a number of requirements of estimated Pareto fronts that re-searchers strive for their algorithms to produce. These can be broadly de-scribed as high accuracy, representative extent, minimum resolution andequal density.

The first concept, accuracy, is simply that the estimated solutions shouldbe as close as possible to the true Pareto front. As illustrated in Figure28.4, the estimated front of Algorithm A is clearly more accurate thanthat of Algorithm B, however the comparison of A and C is more difficultto quantify, as some members of A dominate members of C, but also thereverse is true.


x True Pareto Front• Estimated Pareto optimal individuals, algorithm A* Estimated Pareto optimal individuals, algorithm Bo Estimated Pareto optimal individuals, algorithm C

Fig. 28.4. Illustration of the true Pareto front, and two estimates of it, estimate ofalgorithm A being clearly more accurate the B, but the comparison of A and C is notas easy to quantify.

Ideally the Pareto solutions returned (or estimates of them) should lieacross the entire surface of the true Pareto front, and not simply be con-cerned with a small subsection of it. Minimum resolution is a commonrequirement as in many applications the end user may wish the separationbetween potential solutions to be no bigger than a fixed value (of course,in discontinuous Pareto problems this requirement is not entirely realistic).

Much emphasis has been placed by researchers on the non-dominatedsolutions returned by the search algorithm being equally distributed (ofeven density),9 however it is arguable that this should only be of concern ifthe generating process results in evenly distributed solutions. In an actualapplication it may well be the case that the generating process producesan unbalanced Pareto front, this information itself may be very pertinentto the decision maker - by forcing multi-objective evolutionary algorithms(MOEAs) to misrepresent this fact by penalizing any representation thanequal density they may well have negative repercussions for the final userof the information.

An illustration of this is provided in Figure 28.5. Figure 28.5a shows the


Fig. 28.5. Comparing the density of estimated Pareto fronts. Illustration of an under-lying true Pareto front (a), and its approximation using an MOEA that is designed toreturn equal density along the front (b) and one that does not (c).

true Pareto front, with Figures 28.5b and 28.5c illustrating the returned setsof two MOEAs, one which focuses on equal density and one that does not,Figure 28.5b gives no indication to the end user of the density of solutionsto the lower left of the front.

28.4.3. The Use of EMOO

Abbass2'3 and Abbass and Sarker1 have recently applied EMOO techniquesto trading off the number of hidden units with respect to the accuracy ofthe NN, where each point on the Pareto frontier is therefore represented byan ENN with a different number of hidden units to any other set member.A description of their memetic Pareto artificial neural network (MPANN)model can be found in Algorithm 2.


Algorithm 2 The memetic Pareto artificial neural network algorithm.1'2'3

M, Size of initial random population of solutions.N, Maximum number of EA generations.E, Maximum number of backpropagation epochs.1: Generate random NN population, S, of size M, such that each

parameter (weight) of the NN, x, ~ A/"(0,1), and the binary partof the decision vector is either initialized at one or ~ U(0,1).

2: Initialize generation counter t:— 0.3: while t < N4: Find set of solutions within S which are dominated, 5,

S:=S\S.5: i f | 5 | < 36: Insert members from 5 until |5| = 3.7: end if8: Randomly mark 20% of training data as validation data.9: while \S\ < M10: Select random representatives from S; x, y and z11: xnew := crossover(x,y,z)12: xnew := mutate(xnew)13: xnew := backpropagation{xnew, E)14: if xnew -< x15: S := S + xnew

16: end if17: end while18: t:=t + l19: end while20: end

The algorithm presented by Abbass is sufficient when concerned withNN complexity defined as the number of transfer units, but is insufficientwhen concerned with complexity defined as the number of weights or thesum of the squared weight values. This is because the algorithm internalisesthe estimated Pareto front F within the search population, and needs themaximum size of the Pareto front to be less than that of the search pop-ulation. This can be seen at line 4 of Algorithm 2, where the dominatedmembers of the search population S are removed. If none of the search pop-ulation members are dominated (it is a mutually non-dominating set) thenno further search will be promoted (line 9) and the Algorithm will simply


do nothing until the maximum number of generations is reached. As thesecond objective in MPANN1'2'3 is discrete, with a maximum limit of Hmax

and a minimum limit of 1, the maximum size of F equals Hmax- As in itsempirical applications1'2'3 the maximum number of hidden units was 10and the search population size 50, this problem was not encountered, how-ever Algorithm 1 cannot be easily applied to situations where the secondobjective is to minimize the number of weights, as the maximum size of F(for a single layer MLP) would be Hmax x I + Hmax + Hmax + 1 and thesearch population would therefore need to be significantly greater than this.In the case of sum of squared weights then there is essentially no limit onthe size of the Pareto set, and therefore no search population in Algorithm2 would be large enough.

The method of search population update1'2'3 is essentially a variant ofthe conservative replacement scheme described by Hanne15, where an in-dividual in the search population is only replaced if it is dominated by aperturbed copy of itself. In this chapter a more generally applicable algo-rithm will be described for the multi-objective evolution of NNs, with theemphasis placed on ease of encoding, for the trade-off of complexity andaccuracy

28.4.4. A General Model

Perhaps the simplest EA in common use is the ES, where the parametersare perturbed to adjust their value from one generation to the next. Itspopularity is probably derived in part from its ease of encoding and use,however it has also formed the base of a number of successful algorithmsin the MOEA domain, not least the Pareto archived evolutionary strategy(PAES) of Knowles and Corne.18-19 Due to its simplicity and previous suc-cess it is also used as the base of Algorithm 3, which is used here to searchfor the complexity/accuracy trade-off front.

28.4.4.1. mutate{)

In ES, the weight space of a network is perturbed by set of values drawnat each generation from a known distribution, as shown in Equation D.8.

Xi = xt + 7 • 0 (D.8)

where Xi is the ith decision parameter of a vector, 0 is a random value drawnfrom some (pre-determined) distribution and 7 is some multiplier. A (//, A)-ES process is one in which \i decision vectors are available at the start of a


Algorithm 3 The ES based MO training algorithm for complex-ity/accuracy trade-off discovery.

M, Size of initial random population of solutions.N, Maximum number of EA generations.Pmut, Probability of weight perturbation.pw, Probability of weight removal.pu, Probability of unit removalE, Maximum number of backpropagation epochs.1: Initialise NN individual z.2: z := backpropagation(z,E)3: Generate random NN population, S, of size M, such that each

parameter (weight) of the NN, Xi ~ N{zi, 1), and the binary partof the decision vector is either initialised at one or ~ U(0,1).

4: Fo = 0. Update Fo with the non-dominated solutions from S U z5: Initialise generation counter t := 0.6: while t < N7: Create copy of search population, S := S8: for i = 1 : M

9: Si := mutate(Si,Pmut)10: Si := weightadjust(Si,pw)

11: Si := unitadjust(Si,Pu)12: end for13: Update Fo with the non-dominated solutions from S14: for i — 1 : M15: if Si -<J5i16: Si := &17: else if Si ^ Si18: if 0.5 > ZY(0,1)19: Si := Si20: end if21: end if22: end for23: S:=replace(S,F, f)24: t:=t + l25: end while27: end


generation (called parents), which are then perturbed to generate A variantsof themselves (called children or offspring). This set of A children is thentruncated to provide the \x parents of the following iteration (generation).The process of selection for which children should form the set of parentsin the next iteration is usually dependent on their evaluated fitness (thefitter being more probable to 'survive'). A (/j, + A)-ES process denotes onewhere the parents compete with the children in the selection process forthe next generation parent set, which is the method used in Algorithm 2.Recent work in the field of EAs has shown that the use of heavier taileddistributions can speed up algorithm performance34 and as such in thischapter 0 is a Laplacian distribution with width parameter 1, and 7 —0.1. In Algorithm 3 mutate(x,pmut) perturbs the weight parameters of thedecision vector x with a probability of pmut-

28.4.4.2. weightadjustQ

In order for partially connected ENNs to lie within the search space ofthe algorithm, the weightadjustQ method is used (line 10 of Algorithm 2).weightadjust(x,pw) acts upon the weight parameters of x, setting them to0 with a probability of pw (effectively removing them).

28.4.4.3. unitadjust{)

Topography and input feature selection is implemented within the model bybit mutation of the section of the decision vector representing the networkarchitecture. This is facilitated by first determining a super-set of inputfeatures and maximum hidden layer sizes. Once this is determined, anyindividual has a fixed maximum representation capability. Manipulation ofstructure is stochastic. By randomly bit flipping members of the first /genes of the binary section of the decision vector the set of input featuresused by the network is adjusted, and flipping the following H genes affectsthe hidden layer.

28.4.4.4. The Elite Archive

In addition to the search population S Algorithm 3 also maintains an elitearchive F of the non-dominated solutions (ENNs) found so far in the searchprocess. No truncation is used on this set as the process can lead to somenegative repercussions; it can cause some members of F to be dominatedby members of F from an earlier generation (empirical proof of this can


be found in Fieldsend11 and theoretical justification in Hanne15). It alsomeans that the final front discovered should be distributed in a way moreindicative of the underlying process, as discussed in Section 28.4.2. Timeconcerns can be addressed by the efficient use of data structures,10'11'12'23

however if growth is significant then some form of truncation may be worthconsidering.20

28.4.4.5. replaced

In order to promote additional search pressure on S in Algorithm 2, thereplace(S,F, 4p) function updates 5 by randomly replacing a fifth (M/5)of its decision vectors with copies of individuals from F. These copies areselected using partitioned quasi-random selection (PQRS),12 which ensuresthat a good spread of solutions is selected from the estimated Pareto front.

28.4.5. Implementation and Generalization

A number of recent approaches to training ENNs when simultaneously ad-justing topology have done so using a hybrid approach, where training withEA methods and gradient descent techniques has been inter-levered.1'2'3'22

Justification for this approach has been made for the very sensible reason ofcomputational efficiency - by using a hybrid learning approach as opposedto a purely EA training methodology the training time is typically reduced.However this is not to say that hybrid training does not create problemsof its own, if the problem at hand demands a 'hand crafted' error func-tion, like many in financial forecasting applications,4'11'13'28'32 they may bedifficult to propagate through gradient descent learning methods. Recentwork has highlighted that the most profitable model is not necessarily theone that minimizes forecast Euclidean error.11'13. As such the method de-scribed in this chapter uses traditional gradient descent methods to seedto search process, line 2 of Algorithm 2, but thereafter is exclusively EAdriven, meaning it is easily applicable to the widest range of time seriesforecast problems with minimal modification requirements.

Algorithm 2 deals solely with fitting the ENNs to a set of training data,which then leads us to the question of how to minimize generalization errorwith this information? The approach advocated in this chapter is disarm-ingly simple. Instead of convoluted training and validation during the train-ing process, validation error/complexity is compared to the Pareto trainingerror/complexity after training, and a suitable operating ENN chosen usingthis comparison. An illustration of this is provided in Figure 28.6.


Fig. 28.6. Illustration of the complexity/error trade-off front. Left: Training data Paretofront, Right: the same ENNs evaluated on validation data, from point 'p' onwards theENNs are overfitted and should not be used.

The curve on Figure 28.6a illustrates the complexity/accuracy trade-offcurve discovered on a set of training data, and Figure 28.6b illustrates thesame ENNs evaluated on some validation data. This curve can be seen asbeing non-Pareto, as it curves back on itself at high-complexity, showingthat those networks have been overfitted. The practitioner should there-fore operate with ENN at point 'p' if they wish to minimize generalizationerror, or at a complexity below that if they have constraints on the dis-tributed complexity of their model (if for instance they are content with alower accuracy if they can reduce the number of transfer units/weights inthe network). The actual generalization error can then be assessed on someadditional unseen test data to reassure the choice of complexity. This ap-proach has an advantage over the common early stopping method describedearlier, in that it doesn't have the potential to be trapped in local minima,and it promotes search in areas which are not confined to the gradientdescent weight trajectory.

28.5. Empirical Validation

The methodology introduced in the previous section will now be validated.Two different measures of complexity will be modelled - that of the sum ofthe squared weights, and the number of weights used. Results from choosingthe model at point 'p' will be compared to the traditional approach of earlystopping on time series problems with various degrees of noise, and examplefronts produced to support the general methodology.


28.5.1. Data

The data used will be of a physical process, the oscillation of a laser,1 wherean underlying function is thought to drive the observations, but where thereis also a degree of measurement noise. Additional noise will be added tothese processes in order to promote lower complexity representations andpenalize high complexity representations. A plot of the training data isshown in Figure 28.7, Figure 28.8a shows the scatter plot of the time seriesversus its first lag, and Figure 28.8b shows the correlation coefficient valuesfor different lags of this data.

Fig. 28.7. Laser oscillation time series.

On inspection of the correlation coefficient values, 40 lags were decidedto be used to model the process, resulting in 960 input/output pairings.This data was then randomly partitioned into an ENN training set of 640samples and validation set of 220 samples. The unseen test set consisted of9053 input/output pairings.

Ten different variants of the series where subsequently made with differ-ent degrees of additional noise, drawn from a Gaussian, to mimic differentlevels of measurement corruption, making a total of eleven time series, eachwith a different propensity to overfitting.

28.5.2. Model Parameters

MOEA training was applied through the process described in Algorithm 2,with the following parameters: Hmax = 10, 7 = 0.1, pmut — 0.2, pw = 0.02,

'The time series data, with full descriptions, can be found at http://www-psych.stanford.edu/~andreas/Time-Series/SantaFe.html .


Fig. 28.8. Laser oscillation time series. Left: Scatter plot of current value against pre-vious value. Right: Correlation coefficient values for different lags of time series.

pu = 0.02, N = 5000, E = 5000 and \S\ = 100. In addition, a NN wastrained for each of the time series using the more advanced early stoppingmethod described in subsection 28.3.1, for 20000 epochs. The leaning ratefor all algorithms using backpropagation was 0.05, with a momentum termof 0.5. In addition the MOEA with the minimizing sum of squared weightsobjective updated F during the initial training of the seed neural network,this was found to improve training as it gave a good first estimate of thetrade-off front.

28.6. Results

Figure 28.9 is an indicative plot of the realized fronts created by the setof optimal ENNs F evaluated on the training, validation and test sets.Although the set is mutually non-dominating on the training data, thevalidation and test data sets both exhibit the folding-back predicted in theprevious section, indicating the ENN to select if the user is solely concernedwith minimising generalization error. The point at which this fold backoccurs is observed to lower as the amount of noise increases (see Table28.120).

Figure 28.10 on the other hand shows the evaluation of ENNs trainingwith the second objective of number of weights minimization. The size ofF at the end of this process is substantially smaller than that of squaredsum of weights minimization (averaging around 100 as opposed to over10000), however this form of training can be viewed as more useful to thepractitioner who is concerned with the trade-off of accuracy versus actualNN size, as shows that the NN can be drastically reduced with only amarginal increase in error, if they wish to distribute a far simpler model.

Table 28.120 gives the error and 'complexity' of the different models


Fig. 28.9. Training, validation and test set fronts for the error and Hw minimizationtraining process, with additional noise W(0,4). The phenomena of the validation andtest set fronts folding back on itself can be clearly seen.

Fig. 28.10. Training, validation and test set fronts for the error and #w minimizationtraining process, without additional noise. The user now gains the information that thenumber of active weights (connectivity) of the NN can be drastically reduced with onlya marginal increase in error, if they wish to distribute a far simpler model.

selected at 'p' by the MOEAs with the different complexity objectives forthe 11 data sets, along with that of a NN trained in the traditional earlystopping fashion. The error rates can be seen to be equivalent, with theMOEAs seeming to perform slightly better as the amount of noise increases.The MOEA models minimizing sum of the squared weights can also be seen


to have much lower weight values compared to the early stopping approachas the noise increases.

Table 28.120. Results of single ENN model selected at point 'p' on validation front,and an early stopping backpropagation NN.

'p', Em2 min. 'p', #weight min. BackpropAdded noise Error T,w2 Error #w Error Eui2

63 459.3 6?9 410 6^9 459.3M(0,1) 7.7 444.6 7.7 410 7.7 444.6M(0,2) 8.9 450.7 8.9 404 8.9 450.7A/"(0,3) 15.8 342.5 16.6 388 15.8 345.5A/"(0,4) 23.0 332.2 28.2 78 23.0 346.9-A/(0,5) 33.4 297.0 37.9 33 33.4 297.1JV(O,6) 46.2 106.9 46.3 75 46.3 154.9AA(0,7) 58.9 42.7 58.9 52 59.7 145.6W(0,8) 74.1 19.4 74.6 19 75.3 141.5•A/"(0,9) 89.7 18.6 90.3 23 89.8 142.5AA(0,10) 107.3 10.39 108.3 24 110.8 138.9

28.7. Discussion

EMOO approaches to NN training have already proved useful in providingtrade-off fronts between competing error objectives in financial time seriesforecasting,11'13 and a methodology already exists for learning the trade-off front between NN accuracy and the number of hidden units.1'2'3 Themethodology described in this chapter takes this further and presents a pro-cess for encapsulating other definitions of NN complexity within a MOEAtraining process. These have been shown to be equivalent or better thanthe popular early stopping approach to NN training on a physical time se-ries process with many different degrees of noise, and therefore over-fittingpropensity, for the selection of a single 'best' NN in terms of generaliza-tion error. However more importantly, by using the assessment of a setof estimated Pareto optimal ENNs on validation data, the non-dominatedENNs can give the user a good representation of the complexity/accuracytrade-off of their problem, such that NNs with very low complexity mayfeasibly be used. In the example series used in this paper the cost in termsof realized error of this approach was surprising low.

Acknowledgements

Jonathan Fieldsend gratefully acknowledges support from the EPRSC grantnumber GR/R24357/01 during the writing of this chapter.


References

1. H. Abbass and R. Sarker. Simultaneous evolution of architectures and con-nection weights in anns. In Artificial Neural Networks and Expert SystemsConference, pages 16-21, Dunedin, New Zealand, 2001.

2. H.A. Abbass. A Memetic Pareto Evolutionary Approach to Artificial Neu-ral Networks. In The Australian Joint Conference on Artificial Intelligence,pages 1-12. Springer, 2001.

3. H.A. Abbass. An Evolutionary Artificial Neural Networks Approach forBreast Cancer Diagnosis. Artificial Intelligence in Medicine, 25(3):265-281,2002.

4. J.S. Armstrong and F. Collopy. Error measures for generalizing about fore-casting methods: Empirical comparisons. International Journal of Forecast-ing, 8(l):69-80, 1992.

5. CM. Bishop. Neural Networks for Pattern Recognition. Oxford UniversityPress, 1998.

6. C.A.C Coello. A Comprehensive Survey of Evolutionary-Based Multiobjec-tive Optimization Techniques. Knowledge and Information Systems. An In-ternational Journal, l(3):269-308, 1999.

7. I. Das and J.Dennis. A closer look at drawbacks of minimizing weighted sumsof objectives for pareto set generation in multicriteria optimization problems.Structural Optimization, 14(l):63-69, 1997.

8. K. Deb. Multi-objective genetic algorithms.' Problem difficulties and con-struction of test problems. Evolutionary Computation, 7(3):205-230, 1999.

9. K. Deb, L. Thiele, M. Laumanns, and E. Zitzler. Scalable Multi-ObjectiveOptimization Test Problems. In Congress on Evolutionary Computation(CEC2002), volume 1, pages 825-830, Piscataway, New Jersey, May 2002.IEEE Service Center.

10. R.M. Everson, J.E. Fieldsend, and S. Singh. Full Elite Sets for Multi-Objective Optimisation. In I.C. Parmee, editor, Adaptive Computing in De-sign and Manufacture V, pages 343-354. Springer, 2002.

11. J.E. Fieldsend. Novel Algorithms for Multi-Objective Search and their appli-cation in Multi-Objective Evolutionary Neural Network Training. PhD thesis,Department of Computer Science, University of Exeter, June 2003.

12. J.E. Fieldsend, R.M. Everson, and S. Singh. Using Unconstrained EliteArchives for Multi-Objective Optimisation. IEEE Transactions on Evolu-tionary Computation, 7(3):305-323, 2003.

13. J.E. Fieldsend and S. Singh. Pareto Multi-Objective Non-Linear RegressionModelling to Aid CAPM Analogous Forecasting. In Proceedings of the 2002IEEE International Joint Conference on Neural Networks, pages 388-393,Hawaii, May 12-17, 2002. IEEE Press.

14. CM. Fonseca and P.J. Fleming. An Overview of Evolutionary Algorithms inMultiobjective Optimization. Evolutionary Computation, 3(1):1—16, 1995.

15. T. Hanne. On the convergence of multiobjective evolutionary algorithms.European Journal of Operational Research, 117:553-564, 1999.

16. S. Haykin. Neural Networks A Comprehensive Foundation. Prentice Hall, 2


edition, 1999.17. K. Kameyama and Y. Kosugi. Automatic fusion and splitting of artificial

neural elements in optimizing the network size. In Proceedings of the In-ternational Conference on Systems, Man and Cybernetics, volume 3, pages1633-1638, 1991.

18. J. Knowles and D. Corne. The pareto archived evolution strategy: A newbaseline algorithm for pareto multiobjective optimisation. In Proceedings ofthe 1999 Congress on Evolutionary Computation, pages 98-105, Piscataway,NJ, 1999. IEEE Service Center.

19. J.D. Knowles and D. Corne. Approximating the Nondominated Front Us-ing the Pareto Archived Evolution Strategy. Evolutionary Computation,8(2):149-172, 2000.

20. J.D. Knowles and D. Corne. Properties of an Adaptive Archiving Algo-rithm for Storing Nondominated Vectors. IEEE Transactions on Evolution-ary Computation, 7(2):100-116, 2003.

21. Y. LeCun, J. Denker, S. Solla, R. E. Howard, and L. D. Jackel. Optimal braindamage. In D. S. Touretzky, editor, Advances in Neural Information Process-ing Systems II, pages 598-605, San Mateo, CA, 1990. Morgan Kauffman.

22. Y. Liu and X. Yao. Towards designing neural network ensembles by evolu-tion. Lecture Notes in Computer Science, 1498:623-632, 1998.

23. S. Mostaghim, J. Teich, and A. Tyagi. Comparison of Data Structures forStoring Pareto-sets in MOEAs. In Congess on Evolutionary Computation(CEC'2002), volume 1, pages 843-848, Piscataway, New Jersey, May 2002.IEEE Press.

24. V. Pareto. Manuel D'Economic Politique. Marcel Giard, Paris, 2nd edition,1927.

25. L. Prechelt. Automatic early stopping using cross validation: quantifying thecriteria. Neural Networks, ll(4):761-767, 1998.

26. T. Ragg and S. Gutjahr. Automatic determination of optimal network topolo-gies based on information theory and evolution. In IEEE Proceedings ofthe 23rd EUROMICRO Conference, pages 549-555. IEEE Compter SocietyPress, 1997.

27. Y. Raviv and N. Intrator. Bootstrapping with noise: An effective regulariza-tion technique. Connection Science, 8:356-372, 1996.

28. E. Saad, D. Prokhorov, and D. Wunsch. Comparative Study of Stock TrendPrediction Using Time Delay, Recurrent and Probabilistic Neural Networks.IEEE Transactions on Neural Networks, 9(6):1456-1470, 1998.

29. J. Utans and J. Moody. Selecting neural network architectures via the predic-tion risk: application to corporate bond rating prediction. In Proc. of the FirstInt. Conf on AI Applications on Wall Street, pages 35-41, Los Alamos,CA,1991. IEEE Computer Society Press.

30. D. Van Veldhuizen and G. Lamont. Multiobjective Evolutionary Algorithms:Analyzing the State-of-the-Art. Evolutionary Computation, 8(2):125-147,2000.

31. D. Wolpert. On bias plus variance. Neural Computation, 9(6):1211-1243,1997.


32. J. Yao and C.L. Tan. Time Dependant Directional Profit Model for Fi-nancial Time Series Forecasting. In Proceedings of IJCNN 2000, ComoIEEE/INNS/ENNS, volume 5, pages 291-296, 2000.

33. X. Yao. Evolving Artificial Neural Networks. Proceedings of the IEEE,87(9):1423-1447, 1999.

34. X. Yao, Y. Liu, and G. Lin. Evolutionary Programming Made Faster. IEEETransactions on Evolutionary Computation, 3(2):82-102, 1999.

CHAPTER 29

EVEN FLOW SCHEDULING PROBLEMSIN FOREST MANAGEMENT

E.I. Ducheyne1, B. De Baets2 and R.R. De Wulf3

Dept. of Veterinary Science, Institute of Tropical MedicineNationalestraat 155, B-2000 Antwerpen, Belgium

E-mail: [email protected]. of Applied Mathematics, Biometrics and Process ControlGhent University, Coupure links 653, B-9000 Gent, Belgium

Lab. of Forest Management and Spatial Information TechniquesGhent University, Coupure links 653, B-9000 Gent, Belgium

The performance of the Multiple Objective Genetic Algorithm (MOG A)and of the Non-dominated Sorting Genetic Algorithm (NSGA-II) arecompared on a forest scheduling problem with a known Pareto-optimalfront. The most performant algorithm is then applied to a harvestscheduling problem with integer decision variables for each of the har-vest periods that can be assigned to the forest stands of Kirkhill forest.This Model I harvest schedule attempts to simultaneously maximize thepresent value derived from timber production and minimize the devia-tions of timber volume between successive harvesting periods. Both theoptimal encoding of the decision variables and the best population sizefor this type of scheduling problem are determined. The results of themultiple objective approach are compared with the maximum value thatcan be attained without the even flow constraint and also to the resultobtained using a single objective genetic algorithm. Finally, proportionalfitness inheritance is applied to this problem. The inheritance techniqueand the regular genetic algorithm are compared by studying the evolu-tion of various performance indices.

29.1. Benchmark Problem

29.1.1. Introduction

Multiple objective genetic algorithms have not yet been used in forestmanagement. Hence, there is no information available as to which algo-

701

702 E.I. Ducheyne et al.

rithms perform well for that domain. In order to gather this information,a comparative study was conducted on a forest management problem withknown Pareto-optimal front. Two multiple objective genetic algorithmswere tested: the Multiple Objective Genetic Algorithm (MOGA) imple-mented by 9 because this has earned its merit in a land use planning prob-lem 18>17 and the Non-dominated Sorting Genetic Algorithm II (NSGA-II)5 because of its reported efficiency. These two algorithms are also recom-mended by 22 as starting points. The outcomes of these algorithms werecompared to a random search strategy in order to decide on the efficiencyof a genetic approach. Because the Pareto-optimal front is unknown forreal-world harvesting problems, a forest management problem defined by11 was chosen as a forest benchmark problem.

In the forest, four different activities need to be scheduled. Those activi-ties are recreation both passive and active, habitat conservation and finallyharvesting timber. There are two different management decisions that canbe assigned to a forest stand: clear felling the stand on the one hand andleaving the stands on the other hand. Each stand can receive only one set ofmanagement activities over the complete planning horizon. For this bench-mark problem, the first objective is to maximize the harvest volume V.Since harvest volume is negatively correlated to the standing volume leftin the forest, it is possible to write the first objective as in Eq. A.I:

Maximize /i = J2iLi v(i,a) (A.I)Vstand

where N is the number of forest stands, a a management decision, v(i, a)the harvest volume associated with stand i and management decision a andVstand the volume left standing.

The second objective is to maximize the benefit that people obtain fromthe standing forest, measured by a utility function U. According to the lawof diminishing returns, this function can be modelled using the square rootof the stand volume n . The more standing volume left in the forest, themore trees present for people to enjoy. However, the increase in benefitderived from the trees will decrease when the forested area is larger asthe marginal gain of leaving an extra tree declines. Therefore the secondobjective can be written as in Eq. A.2:

Maximize f2 = 52i=1 u(i) ~ y/vstand (A.2)

where N is the number of forest stands, m the utility associated with standi and vstand the volume left standing.

Even Flow Scheduling Problems in Forest Management 703

As follows from Eqs. A.I and A.2 the two objectives are conflict-ing. Moreover the Pareto-front between the two objectives is non-convex(Fig. 29.1). This prohibits the use of weighted sum formulations, leading u

to revert to a complex dynamic programming formulation in order to solveit.

29.1.2. Methodology

The forest activities are mapped using one bit per forest stand. 295 standsfrom the 399 stands were retained, the other stands were excluded becausethey were either unplanted or not yet productive during the initial period.A simple land block assignment procedure as in 17 was applied. In this pro-cedure each gene represents a land block and make up a linear chromosome,even though in reality the forest layout is two-dimensional.

For the two genetic algorithms, the population size was set to 100, thenumber of generations to 50, crossover probability to 0.8 and mutationprobability to 0.01. All algorithms were repeated 30 times and for each runthe following performance indices were determined: error ratio, generationaldistance, spacing, spread and hypervolume. The results of the performanceindices were analyzed using both statistical tests (One Way ANOVA orKruskal-Wallis) and a distribution-free bootstrapping method 20. The sig-nificance level for all statistical tests was set to 95%. The 50% attainmentsurface was derived for visual comparison and this is also used as input forthe Mann-Whitney test statistics provided by 15.

29.1.3. Results and Discussion

29.1.3.1. Visual Interpretation

In Fig. 29.1 the median attainment surfaces over 30 runs were obtainedfor MOGA and NSGA-II. The median attainment surface of the randomsimulation is not presented because it is too small, instead the solutionsfound by the random simulations over the 30 runs are presented.

Fig. 29.1 shows that both MOGA and NSGA-II perform much bet-ter than the random search strategy. NSGA-II approximates the Pareto-optimal front better than MOGA. MOGA on the other hand is capable offinding more extreme solutions and this results in a better spread along thePareto-front. None of the algorithms is capable of finding the extreme solu-tions. The lack of spread in the forest management problem is not causedby implementation errors but is most likely caused by the discreteness ofthe problem 6.


Fig. 29.1. Comparison of the median attainment surface between random search,MOGA and NSGA-II versus the non-convex Pareto-optimal front

29.1.3.2. Performance Indices

1. Testing closeness to the Pareto-optimal front The generationaldistance (GD) 23 and the terror ratio 4, with 5 = 0.05, were calculated forboth genetic algorithms. The output results were normalized using the re-spective minimum and maximum values in each objective dimension in thePareto-optimal front because both the error ratio and generational distanceare scaling dependent: the difference in magnitude between the objectivesdisregards the effect of the objective with the lowest magnitude. In Fig. 29.2the result of the bootstrapping method is shown.

A non-parametric Kruskal-Wallis test in combination with the boot-strapping method is applied for the statistical analyis because neither nor-mality nor homoscedasticity are fulfilled. From both the Kruskal-Wallis test(p — 0.0 < 0.05) as well as the bootstrapping method (Fig. 29.2) followsthat NSGA-II performs better than MOGA at a confidence level of 95%.In Fig. 29.2 the test measure is positive, indicating that the mean genera-tional distance as well as the mean error ratio is higher for MOGA than forNSGA-II. As for these two test indices lower values are better, it followsthat NSGA-II performs better in terms of closeness to the Pareto-optimalfront. The standard deviation of the generational distance is smaller forNSGA-II than for MOGA, and this can be interpreted as a more robustbehavior of the NSGA-II algorithm.

2. Testing spread The spread was measured by the spacing measure andby the spread measure. The spacing for NSGA-II is lower than the spac-ing for MOGA indicating that the crowding distance function of NSGA-IIspreads the solutions better than the sharing function. For spacing and


spread the variances are not equal (p = 0.001 < 0.05) and this is confirmedby the bootstrapping method (Fig. 29.3). The test indices for the spacingare in the 5% tails of the histogram. Once more lower values are better,therefore the NSGA-II has more evenly spaced solutions than MOGA.

(a) (b)

Fig. 29.2. Bootstrapping results for the generational distance (29.2(a)) and error ratio(29.2(b)). The confidence boundaries are marked in dark grey bullets (a — 95%), the testmeasure is marked in a light grey bullet. Both the test indices are outside the boundaries

If the distance to the most extreme solutions is included as in the spreadmeasure by 4, however, MOGA has a better performance because it canreach the extremes better. From the bootstrapping results (Fig. 29.3), itfollows that there is no difference between the two algorithms, but as notedbefore, the test value is just inside the boundaries and the boundaries dif-fer between two bootstrapping procedures. The Kruskal-Wallis test cannotdetect any significant differences at the level of 95% (p = 0.095 > 0.05).

3. Combining spread and closeness The hypervolume measure calcu-lates the size of the dominated space and is a combined measure for bothspread and closeness. For this measure the data was not normalized as itis scaling independent. Both normality (p > 0.05 for all groups) and homo-geneity of variances (p = 0.426 > 0.05) assumptions are fulfilled and fromstatistical analysis (One Way ANOVA) follows that NSGA-II is significantlybetter than MOGA at the 95% level, and that both genetic algorithmsare significantly better than the random strategy. From the bootstrapping


method the same conclusion can be drawn when comparing MOGA andNSGA-II.

(a) (b)

Fig. 29.3. Bootstrapping results for spacing (29.3(a)) and spread (29.3(b)). The confi-dence boundaries are marked in dark grey bullets (a = 95%), the test measure is markedin a light grey bullet. Both the test indices are outside the boundaries

Fig. 29.4. Bootstrapping results for hypervolume. The confidence boundaries aremarked in dark grey bullets (a — 95%), the test measure is marked in a light greybullet

29.1.3.3. Statistical Approaches

The previous comparisons are based on performance indices and requirethe transformation of n solutions into a single performance index. There isone way of avoiding this transformation through the use of the attainmentsurfaces proposed by 10. They describe the output from multiple objective


genetic algorithms through the use of attainment surfaces. The median at-tainment surfaces for MOGA and NSGA-II have already been described inFig. 29.1. These surfaces can be used as input for statistical comparison. 15

provided a test measure based on these attainment surfaces showing wherealgorithm A outperforms algorithm B and vice versa. From this measure,it follows that NSGA-II dominates MOGA in 79.4% of the covered searchspace and that MOGA dominates NSGA-II in 16.4% of the cases. Thesestatistics can be explained because MOGA reaches the extreme solutionsbetter than NSGA-II does and therefore MOGA beats NSGA-II in partof the search space. In the largest part of the search space the solutionsfrom MOGA are dominated by NSGA-II because NSGA-II is closer to thePareto-optimal front.

29.1.3.4. Implications for Forest ManagementProblems

Both MOGA and NSGA-II have shown a better performance than a randomsearch strategy. They both approximate the Pareto-optimal front well, butsuffer from a lack of spread. Especially NSGA-II is not capable of findingthe more extreme solutions. This lack of spread, however, is not caused byany implementation errors: both algorithms have a very good spread overthe complete Pareto-front for a non-convex test function commonly used areference function in specialized literature.

NSGA-II is capable of approximating the Pareto-optimal front fasterthan MOGA and has more evenly spaced solutions. If the distance fromthe extreme solutions in the Pareto-front to the extremes from the Pareto-optimal front are included in the spread measure, MOGA scores betterthan NSGA-II. However, this is not significant. The variance between sev-eral runs in generational distance is smaller for NSGA-II than for MOGA,highlighting that NSGA-II is more robust than MOGA in terms of approx-imation of the Pareto-optimal front. When the algorithms are compared interms of both spread and closeness, the hypervolume measure indicates thatthe NSGA-II dominates a higher portion of the search space than MOGAdoes. Using the attainment surfaces similar conclusions can be drawn: theMann-Whitney test procedure shows that NSGA-II beats MOGA in thelarger portion of the search space.

Overall, the NSGA-II algorithms shows a better performance for theforest management problem and therefore this algorithm will be used inthe subsequent case studies.


29.2. Applying Single Objective Genetic Algorithms To aReal-World Problem


Harvest scheduling was studied extensively in the past 13>16.21>8. Forestmanagers need to schedule management treatments over a planning horizon.The two objectives that are mostly used in the harvest scheduling problemsin literature are (1) to maximize net present worth, and (2) to minimize thedeviations between the different cutting periods. Using a Model I harvestscheduling formulation this can be written as 14:

N M

Maximize f = 2__l2^CijXij (B-l)

M

Minimize g = Y^(Vi ~ V) (B-2)j=i

where N is the number of stands, M is the number of time periods, cîs the present value obtained when applying the management treatmentsto stand Xi in period j . Vj is the total volume summed over all stands(m3) cut in period j and V is the average volume over all cutting periods.Eq. B.I expresses the management objective of maximizing the net presentvalue, while Eq. B.2 expresses the objective of minimizing the deviationsin timber volume between the different cutting periods, also referred to aseven flow. For N cutting periods and k management activities, the harvestscheduling problem has Nk possible combinations because we can schedulea forest stand in each of the harvesting periods. Constraining the problemto an integer program is needed because we can schedule a stand for fellingonly once, and this requires N x k decision variables. This increases thenumber of decision variables to such an extent that it can only be solvedusing heuristics. As a consequence the global optimum for the bi-objectiveproblem is unknown. The global optimum under no even flow assumptionson the other hand can be derived. In that case, all felling activities arepostponed to the end of the planning horizon and the maximum presentvalue that can be obtained under that scenario amounts to € 914232. Thisvalue can be used as a benchmark to compare the solutions found withthe genetic algorithm. As the encoding strategy might influence the resultsfrom the genetic algorithm, the effect of three encoding strategies will alsobe investigated.


29.2.2. Methodology

29.2.2.1. Input Data

For each of the stands the yield class is known. This information was usedas input for the production tables from the Forestry Commission 12, andfrom these forecast tables the cumulative volume from thinning and fellingactivities can be derived. To simplify the problem, it was assumed that alltimber, both from thinning and felling, was sold at the felling date eventhough this is not very realistic. Prices were real prices per diameter classpublished in 2000 l. The diameter class at each period was derived from12. The discount rate was 3% and the present value was calculated as inEq. B.3:

v" = (ih <R 3>where Vt is the net revenue obtained after t periods, i is the discount rateand VQ is the revenue today. The difficult task of assigning values to weightscan be simplified by working with relative weights. If one is indifferent toeither of the objectives, then the objectives should be rescaled between0 and 1 in order to remove the differences of scale magnitude 2. However,this might lead to numeric imprecision and therefore the weights are usuallymultiplied by a fixed factor. As the present value is in magnitude 100 timeslarger than the harvest volumes, the present value was divided by 100. Theobjective function can thus be written using the notations from Eqs. B.l-B.2:

Maximize / = £Jl i Zf=i diî + *> * Ejii(Vj - V) (B.4)

with w the weight for the second objective.

29.2.2.2. Implementation

A genetic algorithm with binary tournament selection, one-point crossoverwith a probability of 0.8 and uniform mutation with a probability of 0.01(> 1//) was implemented. The population size was 100 and the numberof generations was set to 50. No fitness scaling was implemented as bi-nary tournament selection is insensitive to differences in objective valuemagnitudes. Because genetic algorithms can lose good solutions during theoptimization process, elitism was applied. In this particular case of elitism,the parent and child population were merged and sorted according to theirfitness values. The best N individuals were used to continue the search


process. Binary, gray and integer encodings were initially tested with equalweights. For the binary and gray codes, 3 bits for each harvesting periodwere used as there are 8 periods in total over the complete planning horizon.This was repeated 10 times 19.

After selection of the representation that led to the best solution, theweights were varied. The weight w was initially linearly distributed on thehalf-open interval ]0,1] in steps of 0.2. If the weight 0 is included in theinterval then the optimization problem is unbounded and all felling activ-ities will be planned at the end of the planning horizon (period 8). Twoadditional weights (0.01 ;0.05) were evaluated in a later phase to get moreinformation on the Pareto-front between the two objectives. For each weightthe genetic algorithm was repeated 10 times.


1. Encoding strategy Initially, the influence of the encoding strategy onthe solution quality was tested. In Table 29.121 the mean objective functionvalue, mean value for present value and mean sum of deviation in volume forbinary, gray and integer coding are presented. As the data is not normallydistributed, the Kruskal-Wallis test was used. Table 29.121 shows that gray

Table 29.121. Influence of binary, gray and integer coding on theperformance of the genetic algorithm. The experiment is repeated 10times for each encoding. OV is the combined objective value, PV isthe present value (*€100), Vj is the total volume (m3) summed overall stands cut in period i and V is the average volume over all cuttingperiods

Encoding type mean OV mean PV y"]"=1 Vj — Vbinary 2265.50 3308.00 1043.10gray 2684.50 3534.00 850.00

integer 2573.50 3326.68 753.58

codes results in a higher objective value than integer and binary codes. TheKruskal-Wallis test statistic (p = 0.263 > 0.05) indicates that there is nosignificant difference between the three groups. As 3 recommends to use themost natural representation of any given problem, integer coding strategieswill be used for this forest management problem.

2. Changing the weight In Table 29.122 and Fig. 29.5 the mean valuesper weight combination for integer encoding are presented. A first obser-vation is that linearly distributing the weights on a small interval does not


result to evenly spaced solutions along the Pareto-front. This has some im-plications: if a forest manager decides to investigate only weights in theinterval ]0,1] in steps of 0.2, a considerable amount of information on theshape of the Pareto-front will be lost. Changing the weight from w = 0.2to w = 0.05 and then to w = 0.01 changes the slope of the Pareto-frontsubstantially. Beyond w = 0.02 a small increase in present value resultsin a large increase in the sum of all deviations and a large increase of thepresent value. The effect of the even flow objective is small even when thepresent value is deemed 5 times more important, but this changes drasti-cally once the present value is considered 100 times more important thaneven flow. The present value obtained with a weight of 0.01 amounts to€666811, which is 72.9% of the maximum value that can be obtained ifthere is no even flow assumption.

Table 29.122. Results for the different weight combinations.The experiment was repeated 10 times. OV is the combinedobjective value, PV is the present value (*€100), Vi is thetotal volume (m3) summed over all stands cut in period iand V is the average volume over all cutting periods

weight mean OV mean PV £"=i Vj-V1 2573.00 3326.58 753.58

0.8 2744.60 3396.14 814.430.6 3151.30 3659.14 846.400.4 3434.80 3752.19 790.980.2 4042.70 4211.20 1984.930.05 5482.10 5508.46 2636.380.01 6469.80 6668.11 19830.60

Fig. 29.5. Results for the different weight combinations. The experiment was repeated10 times. On the x-axis the present value (*€l00), and on the j/-axis the sum of alldeviations in harvest volume


3. Validity In Fig. 29.6 the volumes per cutting period are presented forall the weights. From Fig. 29.6(a) to Fig. 29.6(c) the even flow constraintis strengthened. If this constraint is strengthened then the average vol-ume obtained over all periods declines when increasing w from 0.01 to0.2 (Fig. 29.6(d)). For a weight w = 0.01 the volume harvested per yearamounts to 6.70 m3/ha/yr. For equal weights this is only 6.30 m3/ha/yr. Toillustrate that Kirkhill forest was managed as a productive forest, a typicalFlemish forest yields on average 4 m3/ha/yr. The other weights producesimilar average volumes. Therefore even flow constraint does not only havean effect on the present value but also has a negative side-effect on the aver-age volume over all periods. The felling age of the forest stands (Fig. 29.7)after equal weights indicates that the rotation age should be increased inorder to obtain a normal age distribution. Up to 1/3 of the stands have afelling age over 80 years. From this figure also follows that some stands arecut very young in order to obtain an even flow of timber volume.

The age distribution is affected by the harvesting plan. Looking at theeffect of the plan where the two objectives were equally important, theage distribution almost resembles an age distribution of a normal forest(Fig. 29.8(a)). This is caused implicitly by the even flow objective: thiscontrols the volume and the age distribution is implicitly adjusted to a nor-mal state. This is confirmed in Fig. 29.8(b): if the even flow objective isrelaxed, the state of the forest reduces to a normal forest but to a lesserextent. Running the genetic algorithm for another planning horizon stabi-lizes the age distribution even more if the objectives are equally important(Fig. 29.9(a)). Starting from the age distribution with equal weights in thefirst planning horizon and running it again for a second planning horizonwith a weight of 0.01 affects the age distribution a little: it becomes lessstable (Fig. 29.9(b)).

29.2.4. Conclusion

Genetic algorithms are capable of solving a harvest scheduling problem. Theencoding strategy did not affect the quality of the solutions; there was nosignificant difference between the different codes. As there is no difference,integer codes were used. In order to find the Pareto-front, the weights wereinitially linearly distributed on the interval ]0,1]. It was found that thisdid not lead in evenly spaced solutions along the Pareto-front. In orderto gain more information, two additional weights were chosen: w = 0.01and w = 0.05. When the present value was 100 times more important the


(a) w = 0.01 (b) w = 0.05

(c) w = 1 (d) Effect of w on V

Fig. 29.6. The influence of the weight w on the variation in volume between differentcutting periods. From 29.6(a) to 29.6(c) the even flow constraint is strengthened, in29.6(d) the effect on V is shown

Fig. 29.7. Felling age of the forest stands

slope of the Pareto-front changed a lot. This implies that a user withoutprior knowledge on the problem, might lose a lot of information on thePareto-front if these weights are linearly distributed on a small interval.

Both the age distribution and the average volume are affected by the


(a) (b)

Fig. 29.8. Age distribution of the forest before and after the harvest scheduling planwith 29.8(a) two equally important objectives and 29.8(b) with the present value 100times more important than the even flow objective

(a) (b)

Fig. 29.9. Age distribution of the forest before and after the harvest scheduling planafter a second planning horizon with 29.9(a) two equally important objectives and 29.9(b)with the present value 100 times more important than the even flow objective

even flow objective. Running the genetic algorithm in order to maximize thepresent value and to minimize the deviations between the periods, producesharvesting plans enforcing a balanced age distribution. Relaxing the evenflow objective has an effect on the age distribution, but then some variationsin frequency between the different age classes are still present. The presentvalue obtained with a relaxed even flow constraint amounts to 72.9% ofthe total maximum attainable present value. The even flow objective alsoinfluences the harvested volume: this declines as the even flow objectivebecomes more important.


A practical drawback of using weights is that it is very cumbersome.Rerunning the genetic algorithm or any single objective optimizer for sev-eral weight combinations is a tedious job and requires large amounts ofcomputing time.

29.3. Applying NSGA-II: A Truly Bi-Objective Approach


The harvest scheduling problem as defined in the single objective case willnow be solved using the multiple objective genetic algorithm NSGA-II. Theoriginal objective functions (Eqs. B.I and B.2) are the direct input for thegenetic algorithm and do not have to be combined in any way beforehand.The same data and production forecast tables as in the single objective caseare used to allow for comparison of both approaches.

29.3.2. Methodology

As for the single objective case, the effect of the encoding was investigated.In order to compare to the single objective case, the same settings in theprevious experiment were used. Again binary, gray and integer encodingswere used to represent the 8 different cutting periods per managementunit. The effect of the encoding strategies was inspected visually as wellas using the hypervolume measure and the statistical analysis via the at-tainment surfaces. Other indices for closeness could not be applied becausethe Pareto-optimal front is not known. The spacing measure was also used.The output is compared to that of the single objective case study.

Later on, the population size was increased from 500 up to 1000 indi-viduals in steps of 250. For each of these population sizes, the effect onconvergence and spread was determined. For each encoding type and pop-ulation size, the experiment was repeated 10 times.

For all experiments binary tournament selection with the non-dominance selection criterion was used, together with one-point crossoverwith a crossover probability of 0.8 and uniform mutation with a probabilityof 0.01. Once more the elitist strategy was applied.

29.3.3. Results

29.3.3.1. Effect of Encoding on the Spread and Pareto-Optimality

1. Visual interpretation Integer encoding proves to be the best en-coding strategy in terms of approximating the Pareto-optimal front


(Fig. 29.10), but again gray encoding is a close second. The three encodingsshow a similar spread.

Fig. 29.10. Median attainment surfaces for binary, gray and integer encoding

2. Performance indices The performance of the integer encoding is con-firmed after a Kruskal-Wallis test for the spacing measure, and One WayANOVA for the hypervolume. The bootstrapping procudure is used forboth performance indices.

There is a significant difference between the groups for the spacing mea-sure. Again gray and integer codes score best and both are significantlybetter than binary codes according to a non-parametric post-hoc test. Thebootstrapping procedure confirms this, in both Figs. 29.11(a) and 29.11(e)the test value is outside the confidence interval indicating a significant dif-ference between integer and binary codes and between gray and binarycodes. In Fig. 29.11(c) the test value is within the boundaries of the inter-val showing that there is no difference between gray and integer codes. TheOne Way ANOVA test statistic for the hypervolume measure indicates thatthere are no significant differences (p = 0.656 > 0.05) and this is also con-firmed by the bootstrapping results (Figs. 29.11(b), 29.11(d) and 29.11(f)).Integer codes will be used to solve the harvest scheduling problems becausethey are the most natural representation for the problem.

29.3.3.2. Comparing the Single and Multiple Objective GeneticAlgorithm

Running NSGA-II with integer encoding and a population size of 100 en-ables comparing the results from the single and multiple objective opti-mizer. The multiple objective genetic algorithm was also run for the same


(a) Spacing for Int-Bin (b) Hypervolume for Int-Bin

(c) Spacing for Int-Gray (d) Hypervolume for Int-Gray

(e) Spacing for Bin-Gray (f) Hypervolume for Bin-Gray

Fig. 29.11. Bootstrapping results for the difference in mean spacing (left column) andmean hypervolume (right column) for integer, binary and gray encodings

number of generations. Overlay of the median attainment surface from themultiple objective optimization runs with the median attainment surfacefrom the single objective genetic algorithm is depicted in Fig. 29.12. The


Fig. 29.12. Median attainment surfaces for the single and the multiple objective geneticalgorithm

two median attainment surfaces are very similar. Only the most extremesolution is missing from the Pareto-front found by NSGA-II. Running themultiple objective genetic algorithm has particular benefits in terms of com-puter efficiency. For both algorithms the same population size and numberof generations was chosen. The product of population size and number ofgenerations yields the number of function evaluations. In the case of thesingle objective optimizer, this total number should be multiplied by fiveas the genetic algorithm has to be run five times to get points along thePareto-front. For the multiple objective genetic algorithm, with the samepopulation size and number of generations, the number of function evalu-ations is only one fifth of the total number of evaluations needed for thesingle objective optimizer.

29.3.3.3. Effect of Population Size on Solution Quality

1. Visual interpretation In a last phase the effect of the populationsize on solution quality as well as spread was investigated. The populationsize was increased from 50 to 1000 in steps of 250. This results in thefollowing median attainment surface in Fig. 29.13. The median attainmentsurfaces for the three population sizes are very similar. They approximatethe Pareto-front in the same way and the solutions are evenly spread alongthe attainment surface. The fact that they approximate the same frontindicates that they are very close to the (unknown) Pareto-optimal front.

2. Performance indices The spacing and hypervolume measure are de-termined for the different population sizes. The mean values for the hy-pervolume measure is almost the same for the three population sizes,the spacing along the Pareto-front is also very similar across the differ-


Fig. 29.13. Median attainment surfaces for population sizes 500, 750 and 1000

ent population sizes. For the spacing measure the data is normally dis-tributed but not homoscedastic and therefore the Kruskal-Wallis procedureis applied. From the test value (p = 0.0 < 0.05) it follows that there isa significant difference. These differences are found, according to a non-parametric post-hoc test, between the population size of 500 and the popu-lation sizes of 750 and 1000. This is also confirmed from the bootstrappingresults (Figs. 29.14(a), 29.14(c), 29.14(e)).The statistical analysis of thehypervolume (One Way ANOVA) shows that there is a significant differ-ence between the different population sizes for the hypervolume measure(p = 0.001 < 0.05). According to Tukey's post-hoc test and the boot-strapping procedure (Figs. 29.14(b), 29.14(d), 29.14(f)) this is between thepopulation size of 500 on the one hand and the population sizes of 750 and1000 on the other hand. A population size of 750 is thus sufficiently largeto solve the harvest scheduling problem.

29.3.3.4. Validity of the Plans

Running NSGA-II with a population size of 750 and for 50 generationsyields the following Pareto-front (Fig. 29.15). The maximum present valuethat is attained in 50% of the repetitions amounts to € 670300 (73.3%of the maximum attainable present value) and has a total sum of volumedeviations of 398991 m3. For a weight of 0.01 this was € 667435 (73% of themaximum attainable present value) and a total sum of volume deviationsof 21535.5 m3. The median values are similar for the single and multipleobjective optimizer. Two plans will be investigated more closely as to theirvalidity: the harvest schedule plan with the most strict even flow objectiveand the plan with the best present value. The objective values are listedin Table 29.123. The volume per period, age distribution and the harvest


(a) Spacing for 500-750 (b) Hypervolume for 500-750

(c) Spacing for 500-1000 (d) Hypervolume for 500-1000

(e) Spacing for 750-1000 (f) Hypervolume for 750-1000

Fig. 29.14. Bootstrapping results for the difference in mean spacing (left column) andmean hypervolume (right column) for population sizes 500, 750 and 1000

pattern in the forest are illustrated in Figs. 29.16, 29.17 and 29.18. Fromthese figures a similar conclusion to that of the single objective case follows:the age distribution is forced by the proposed plans towards a normal age


Fig. 29.15. Overlay of Pareto-front for a population size of 750 and 50 generationsversus the best solutions found by the single objective optimizer

Table 29.123. The present valuePV (*€ 100) and the average vol-ume over all cutting periods V(m3) for the best even flow andthe best present value plan

mean PV £?=i Vj - V ~5878 6006851 28514

(a) (b)

Fig. 29.16. The variation in total deviation in volume (m3) between the different cuttingperiods. From 29.16(a) to 29.16(b), the even flow constraint is strengthened

distribution. If the even flow objective becomes more important this effectis stronger than when the present value objective is more important. Againthe average volume that is attained with the relaxed even flow objective(6.80 m3/ha/yr) is higher than when the objective of even flow becomes


(a) (b)

Fig. 29.17. The effect of even flow objective on the age distribution. From 29.17(a) to29.17(b), the even flow constraint is strengthened

(a) (b)

Fig. 29.18. The effect of the even flow objective on the harvest pattern. From 29.18(a)to 29.18(b)), the even flow constraint is strengthened

more important (6.56 m3/ha/yr). From the harvest pattern, it follows thatin order to get a better present value, more stands are scheduled for cuttingin the later planning periods than when the even flow objective is important.From the detailed Pareto-front follows that there is a very narrow rangewhere low deviations from the average volume can be obtained. This showsthat forest managers need to design their plans very carefully as to avoidtoo large deviations.

29.3.4. Conclusion

The encoding strategy is important in terms of approximation of the Pareto-front. The best encoding strategies are gray and integer encoding. As is


suggested in literature, binary encoding does not perform very well. Usingmultiple objective genetic algorithms instead of single objective geneticalgorithm process has particular benefits: in order to find solutions linearlydistributed along the Pareto-front a single run suffices.

There is an effect if the population size is increased from 500 to 750,the Pareto-optimal front is approximated closely. This effect is no longerpresent when the size increases even more to 1000. For both optimizers theeffect of the plans on the age structure of the forest is the same: if theeven flow objective becomes more and more important, the age structureresembles that of a normal forest due to the volume control, even thoughthis is not implicitly mentioned in the objective functions. If the even flowobjective is relaxed, the stands are scheduled in later planning periods thanwhen the even flow objective is very important. Finally, the Pareto-frontis very steep, indicating that forest managers have to design their planscarefully to meet their objectives.

29.4. Speeding Up the Optimization Process


Finally, fitness inheritance is used to speed up the optimization processfor the bi-objective harvest scheduling problem. As this problem is convex,fitness inheritance should be a feasible approach 7.

29.4.2. Methodology

The same input data and Forestry Commission production tables are used.The population size is set to 100, the number of generations without fitnessinheritance to 200 and with proportional inheritance to 400 so that the samenumber of function evaluations is maintained. Average inheritance was nottested as was shown by 7 that its behavior is either similar or worse thanproportional inheritance. One-point crossover is used with a probability of0.8 and uniform mutation with a probability of 0.01. Integer encoding isused together with binary tournament selection and the crowding distanceoperator.


From Fig. 29.19 follows that after the same number of function evaluations,the attainment surface from the inheritance approach equals that of thenon-inheritance approach. This is confirmed by calculating the hypervolume


measure. From a Student t-test statistic follows that there is no significantdifference.

Fig. 29.19. Attainment surfaces for the harvest scheduling problem for non-inheritanceand proportional inheritance approaches

29.4.4. Conclusions

The behavior of the inheritance approach is similar to that of the standardgenetic algorithm. However, this should be relativized because in realitythe same number of function evaluations are necessary to obtain the samePareto-front.

Acknowledgements

This work was funded under the Research Fund of the Ghent University.The authors would like thank Dr. Cameron for the data on Kirkhill For-est. They would also like to thank the anonymous reviewer for the usefulcomments.

References

1. Anonymous. Gemiddelde Prijzen van Hout op Stam. Houthandel en nijver-heid, 5:5, 2000.

2. J. G. Buongiorno and J. K. Gilles. Forest Management and Economics.MacMillan Publishing Company, USA, New York, 1987.

3. L. Davis. Handbook of Genetic Algorithms. Van Nostrand Reinhold, USA,New York, 1991.

4. K. Deb. Multi-Objective Optimization using Evolutionary Algorithms. Wileyand sons, UK, Chichester, 2001.

5. K. Deb, S. Agrawal, A. Pratab, and T. Meyarivan. A Fast and Elitist Multi-objective Genetic Algorithm: NSGA-II. IEEE T. Evolut. Comput, 6(2):182-197, 2002.


6. E. I. Ducheyne. Multiple objective forest management using GIS and geneticoptimisation techniques. PhD thesis, Ghent University, 2003.

7. E. I. Ducheyne, B. De Baets, and R. R. De Wulf. Is Fitness Inheritance ReallyUseful for Real-World Applications? Led. Notes Comput. Sc, 2632:31-43,2003.

8. A. O. Falcao and J. G. Borges. Designing an Evolution Program for SolvingInteger Forest Management Scheduling Models: An Application in Portugal.Forest Set., 47(2):158-168, 2001.

9. C. M. Fonseca and P. J. Fleming. Genetic Algorithms for MultiobjectiveOptimization: Formulation, Discussion and Generalization. In Proc. of theFifth Internat. Conference on Genetic Algorithms, pages 416-423, USA, SanMateo, 1993. Kauffmann Publishers.

10. C. M. Fonseca and P. J. Fleming. On the Performance Assessment and Com-parison of Stochastic Multiobjective Optimizers. In Parallel Problem Solvingfrom Nature (PPSN) - IV, pages 584-593, Germany, Berlin, 1996. Springer-Verlag.

11. P. Gong. Multiobjective Dynamic Programming for Forest Resource Mange-ment. Forest Ecol. Manag., 48:43-54, 1992.

12. G. J. Hamilton and J. M. Christie. Forest Management Tables. Forestry Com-mission Booklet No. 34- Her Majesty's Stationery Office, UK, London, 1971.

13. H. M. Hoganson and D. W. Rose. A Simulation Approach for Optimal Har-vest Scheduling. Forest Sci., 34(4):526-538, 1994.

14. K. N. Johnson and H. L. Scheurman. Techniques for Prescribing OptimalTimber Harvest and Investments under Different Objectives - Discussion andSynthesis. Forest Sci. Mon., 18:31, 1977.

15. J. D. Knowles and D. W. Corne. Approximating the Nondominated Frontusing the Pareto Archived Evolution Strategy. Evol. Comput., 8(2):149-172,2000.

16. C. Lockwood and T. Moore. Harvest Scheduling with Spatial Constraints: ASimulated Annealing Approach. Can. J. Forest Res., 8(2):149-172, 1993.

17. K. B. Matthews, S. Craw, A. R. Sibbald, and I. MacKenzie. Applying GeneticAlgorithms to Multi-Objective Land Use Planning. In Proc. of the Geneticand Evolutionary Computation Conference - GECCO 2001, pages 519-526,San Francisco, 2000. Morgan Kauffman.

18. K. B. Matthews, A. R. Sibbald, and S. Craw. Implementation of a Spatial De-cision Support System for Rural Land Use Planning: Integrating GeographicInformation System and Environmental Models with Search and Optimisa-tion Models. Comput. Electron. Agr., 23:9-26, 1999.

19. A. Osyczka. Evolutionary Algorithms for Single and Multicriteria Design Op-timization. Physica-Verlag, USA, New York, 2002.

20. R. C. Purshouse and P. J. Fleming. Why use elitism and sharing in a multi-objective genetic algorithm? In Proc. of the Genetic and Evolutionary Com-putation Conference - GECCO 2002, pages 520-527, New York, 2002. MorganKauffman.

21. P. Tarp and F. Helles. Spatial Optimization by Simulated Annealing andLinear Programming. Scand. J. Forest Res., 12:390-402, 1997.


22. D. A. Van Veldhuizen and G. B. Lamont. Multiobjective Evolutionary Algo-rithms: Analyzing the State-of-the-Art. Evol. Comput, 8(2):125-147, 2000.

23. D. A. Van Veldhuizen and G. B. Lamont. Multiobjective Optimization withMessy Genetic Algorithms. In Proc. of the 2000 ACM Symposium on AppliedComputing, pages 470-476, Italy, 2000. ACM.

CHAPTER 30

USING DIVERSITY TO GUIDE THE SEARCHIN MULTI-OBJECTIVE OPTIMIZATION

J.D. Landa Silva and E.K. Burke

Automated Scheduling, Optimisation and Planning Research GroupSchool of Computer Science and Information Technology

University of Nottingham, UKE-mail: [email protected], [email protected]

The overall aim in multi-objective optimization is to aid the decision-making process when tackling multi-criteria optimization problems. Inan a posteriori approach, the strategy is to produce a set of non-dominated solutions that represent a good approximation to the Paretooptimal front so that the decision-makers can select the most appropriatesolution. In this paper we propose the use of diversity measures to guidethe search and hence, to enhance the performance of the multi-objectivesearch algorithm. We propose the use of diversity measures to guide thesearch in two different ways. First, the diversity in the objective space isused as a helper objective when evaluating candidate solutions. Secondly,the diversity in the solution space is used to choose the most promis-ing strategy to approximate the Pareto optimal front. If the diversity islow, the emphasis is on exploration. If the diversity is high, the empha-sis is on exploitation. We carry out our experiments on a two-objectiveoptimization problem, namely space allocation in academic institutions.This is a real-world problem in which the decision-makers want to see aset of alternative diverse solutions in order to compare them and selectthe most appropriate allocation.

30.1. Introduction

This paper is concerned with the application of the class of approachesknown as meta-heuristics to tackle multi-objective optimization problems.We assume that the reader is familiar with the fields of multi-criteriadecision-making2'39 and multi-objective optimization7'10. Recent surveyson the application of meta-heuristics to multi-objective optimization prob-

727

728 J.D. Landa Silva and E.K. Burke

lems are those provided by Jones et al.19, Tan et al.i2 and Van Veldhuizenand Lamont45. Multi-objective optimization is a very active research areathat has received increased attention from the scientific community andfrom practitioners in the last ten years or so. One main reason for this isthat many real-world problems are multi-criteria optimization problems.This means that in these problems, the quality of solutions is measuredtaking into account several criteria that are in partial or total conflict.Therefore, there is no such global optimum solution but a number of themthat represent a trade-off between the various criteria. It is also commonlythe case that more than one decision-maker is involved in the selectionof the most appropriate solution to the multi-criteria problem. Then, theoverall aim in multi-objective optimization is to aid the decision-makers totackle this type of problems. One of the strategies for this is to produce aset of solutions that represent a good approximation to the trade-off sur-face. Then, the decision-makers can decide which of the solutions in thisset is the most adequate for the problem at hand. In general terms, a goodapproximation set should be as close as possible to the optimal front and itshould also give a good coverage of the optimal front. The goal of achiev-ing a good coverage of the trade-off surface, i.e. maintain the diversity andspread of solutions, is of particular interest in multi-objective optimization.A number of techniques to accomplish this goal have been proposed inthe literature, e.g. weighted vectors, clustering or niching methods (fitnesssharing, cellular structures, adaptive grids, etc.), restricted mating, relaxedforms of dominance, helper objectives, and objective-driven heuristic se-lection (hyper-heuristics). Most of these techniques are targeted towardsmaintaining diversity in the objective space. However, in some scenarios,the decision-makers are also concerned with the diversity of solutions in thesolution space. Then, to serve as a useful tool in tackling multi-criteria opti-mization problems, the multi-objective optimization algorithm should havethe mechanisms to find the set of solutions that satisfy the requirements ofthe decision-makers. That is, solutions that are close to the optimal frontand have the desired diversity in the objective space, the solution space orboth spaces. One goal in this paper is to present an overview of a numberof techniques that have been proposed in the literature to maintain a di-verse set of solutions when tackling multi-objective optimization problems.Another goal here is to describe some mechanisms that we implemented tohelp a multi-objective search algorithm to obtain a diverse set of solutionsfor a real-world optimization problem with two objectives. These mecha-nisms consist on using diversity measures, in both the objective space and

Using Diversity to Guide the Search in Multi-Objective Optimization 11^

the solution space, to guide the search and enhance the performance of themulti-objective search algorithm. We carry out experiments on three testsinstances of the space allocation in academic institutions. In this prob-lem, a set of entities (staff, computer rooms, teaching rooms, etc.) mustbe allocated into a set of available areas of space or offices and a numberof additional constraints should also be satisfied. In the space allocationproblem, the decision-makers are interested in the diversity of solutions inboth the objective space and the solution space. The results of our experi-ments show that the proposed mechanisms help the algorithm to produce aset of compromise solutions that better satisfies the requirements from thedecision-makers. The rest of this paper is organized as follows. Section 30.2discusses the issue of diversity in the context of multi-objective optimiza-tion. Section 30.3 gives an overview of some of the mechanisms incorporatedinto modern multi-objective search algorithms to achieve a good coverageof the trade-off surface. A description of the two-objective space alloca-tion problem and the way in which diversity in the objective space anddiversity in the solution space are measured in this problem are the sub-ject of Sec. 30.4. The diversity control mechanisms implemented to guidethe search and the algorithm in which these mechanisms were incorporatedare described in Sec. 30.5. The experiments and results are presented anddiscussed in Sec. 30.6 while Sec. 30.7 gives a summary of this paper.

30.2. Diversity in Multi-Objective Optimization

Given two solutions x and y for a /c-criteria optimization problem, x issaid to weakly dominate y if x is as good as y in all the k criteria andbetter in at least one of them. In the case that x is better than y in allthe k criteria, x is said to strictly dominate y. In the following, we referto weak dominance simply as dominance. A solution x is said to be non-dominated with respect to a set of solutions S is there is no solution in 5that dominates x. The Pareto optimal front denoted Sp is the set of allnon-dominated solutions with respect to the whole set of feasible solutionsSF- Then, the goal of a multi-objective search algorithm is to find a set SNDof non-dominated solutions for a given multi-criteria optimization problem.The non-dominated set SND should represent a good approximation to thePareto optimal front Sp. This means that the solutions in SND should be:

• As close as possible to the Pareto optimal front Sp,• widely spread across the entire trade-off surface, and• uniformly distributed across the entire trade-off surface.


The closeness of SND to the Pareto optimal front Sp gives an indica-tion of how good is the convergence towards the optimal front. The spreadand distribution of SND give an indication of how good is the coverage ofthe Pareto optimal front Sp. This is illustrated in Fig. 30.1 where variousnon-dominated sets are depicted for a two-objective minimization problem.Using the notation in Fig. 30.1, it is clear that an effective multi-objectivesearch algorithm should find an approximation set with the characteristicsof Sl(c+,s+,d+). Moreover, in some real-world scenarios the decision-makersare interested on a set of alternative solutions like those in SI but at thesame time, they want to see solutions that have a certain diversity withrespect to the solution space. This is the case for the problem tackled inthis paper, space allocation in academic institutions, as it will be explainedlater. Then, in order to achieve the aim of assisting the decision-makingprocess, a multi-objective search algorithm must also take into account thediversity of SND with respect to the solution space.

Fig. 30.1. The quality of the non-dominated set is given by the closeness to the Paretooptimal front (c+ is close, c~ is far), the spread of solutions (s~*~ is good spread, s~is poor spread) and the distribution of solutions (d~*~ is good distribution, d~ is poordistribution). Then, the quality of the non-dominated sets in this Fig. can be describedas follows: Sl(c+,s+,d+), S2(c",s+,d+), S3(c+ ,s+,d~), S4(c~,s+,d~), S5(c+,s-,d+),S6(c~ ,s~ ,d+), S7(c+,s-,d-), and S8(c~ ,s~,d~).

30.3. Maintaining Diversity in Multi-ObjectiveOptimization

The majority of meta-heuristics proposed for multi-objective optimizationincorporate a specialized mechanism to help achieving a good diversity with

Using Diversity to Guide the Search in Multi-Objective Optimization 731

respect to the objective space. As it was pointed out by Laummans et al.28,this is not a straightforward task because many algorithms that implementspecific mechanisms to maintain diversity suffer from deterioration whichaffect their convergence ability. This Sec. gives an overview of a number ofstrategies that have been proposed in the literature to maintain diversityin multi-objective optimization. For more references to multi-objective op-timization algorithms that incorporate mechanisms for diversification notdiscussed here, see the survey by Tan et al.42 and also the books by CoelloCoello et al7 and Deb10.

30.3.1. Weighted Vectors

One of the first techniques that were proposed to achieve a better diversityof solutions in multi-objective optimization is the use of weighted vectorsto specify the search direction and hence, aim a better coverage of thetrade-off surface. This method consists on setting a vector of k weightsW = [u>i,W2, • • -Wk] where 0 < Wi < 1, k is the number of objectives inthe problem and the sum of all Wi equals 1. The fitness of a solution xis calculated as f(x) = w\f\{x) + 11*2/2(20 + • • -Wkfk{x) where fi(x) mea-sures the quality of x with respect to the ith criterion. The strategy is tosystematically generate a set of vectors in order to approach the trade-offsurface from all directions. Weighted vectors is a popular technique thathas been used in a number of algorithms like the multi-objective cellulargenetic algorithm of Murata et al.35 and the multiobjective simulated an-nealing algorithm of Ulungu et al.43. Another approach that uses weightedvectors to encourage diversity is the Pareto simulated annealing algorithmof Czyzak and Jazkiewicz8. Their strategy is to modify the weights for asolution x so that x is moved away from its closest neighbor xcn by in-creasing the weights of those objectives in which x is better than xcn anddecreasing the weights for those objectives in which x is worse than xcn.In another approach implemented by Gandibleux et al.14, the set of sup-ported solutions is first computed. Then, the information obtained fromthese solutions is used to guide the search and improve the performance ofa population heuristic. Ishibuchi et al.15 used weight vectors in a differentway to encourage diversity. Instead of generating a weighted vector to spec-ify a search direction for a solution, they choose an appropriate solution fora randomly specified weight vector. The selection of the solution for a givenvector is based on the position of the solution in the objective space. Thatis, they attempt to set an appropriate search direction for each new solution


in order to achieve a better approximation set.

30.3.2. Fitness Sharing

In this mechanism the idea is to decrease the fitness of individuals that arelocated in crowded regions in order to benefit the proliferation of solutions insparse regions. Usually, the fitness of an individual is reduced if the distanceto its closer neighbor is smaller than a predefined value. Fitness sharing canbe implemented in the objective space or in the solution space. However,most of the implementations of fitness sharing reported in the literature areon the objective space. For example, Zhu and Leung47 implemented fitnesssharing in their asynchronous self-adjustable island genetic algorithm. Talbiet a/.41 implemented fitness sharing mechanisms in both the objective spaceand the solution space. In their experiments, they observed that fitnesssharing in the objective space appears to have a stronger influence on thesearch than fitness sharing in the solution space, but they also noted thatthe combination of both fitness sharing mechanisms improved the search.

30.3.3. Crowding/Clustering Methods

These methods attempt to control the number of solutions in each regionof the trade-off surface. The general idea here is to limit the proliferationof solutions in crowded or over-populated areas and at the same time, toencourage the proliferation of solutions in sparse or under-populated areas.An example of this type of mechanisms is the adaptive grid implementedby Knowles and Corne22 in their Pareto archived evolutionary strategy.They divide the fc-objective space into 2lk regions where I is the numberof bisections in each of the k dimensions. Then, based on the crowdednessof the region in which the new solution lies, a heuristic procedure is usedto decide if the new solution is accepted or not. Lu and Yen29'30 useda modified version of the adaptive grid of Knowles and Corne. In theiralgorithm they modify the fitness of solutions based on the density valueof the population. They also associate an age indicator to each solution xin the population in order to control its life span.

An agent-based crowding mechanism was proposed by Socha and KisielDorohinicki40 in which agents interact between them in order to encouragethe elimination of too similar solutions or agents. Each agent in the pop-ulation contains an amount of energy and the crowding mechanism seeksto maintain a uniform agent distribution along the trade-off surface andprevent agent clustering in particular areas by discouraging agents from


creating groups of similar solutions. In their mechanism, an agent A com-municates with another agent B and then the solutions from both agents,XA and XB respectively, are compared. If the similarity between XA and XB(measured with a distance metric) is smaller than a predefined value, anamount of energy is transferred from agent A to agent B. The amount ofenergy transferred depends on the degree of similarity between XA and XB •This is similar to fitness sharing but here, one agent receives and the otherprovides.

30.3.4. Restricted Mating

Restricted mating is a mechanism that prevents the recombination of indi-viduals that do not satisfy a predefined criterion. Most of the times, thiscriterion is that mating individuals should not be too close to each other inthe objective space or in the solution space. In this sense, restricted matingcan be regarded as a mechanism that is similar to crowding/clustering. Anexample of restricted mating is the strategy implemented by Kumar andRockett21 in their Pareto converging genetic algorithm. That algorithm isan island based approach in which the genetic operations are restricted toindividuals within the same island. There is no migration between islandsand no cross-fertilization between individuals in two different islands. How-ever, two islands can be merged into one island in order to test convergenceduring the search. Their algorithm is a steady-state approach that pro-duces only one offspring in each iteration. Kumar and Rocket argue thatthe steady-state nature of the algorithm helps maintain diversity becausegenetic drift, which is inherent in generational genetic algorithms, is lesslikely to occur. Other examples of restricted mating mechanisms are usedin the approaches implemented by Lu and Yen29'30 and the cellular geneticalgorithm of Murata et al.35.

30.3.5. Relaxed Forms of Dominance

Another strategy to encourage diversity that has been explored recentlyby several researchers, is to use relaxed forms of the dominance relation toassess the fitness of individuals. As described in Sec. 30.2, in the standarddominance relation a solution x is considered better than another solutiony only if x is not worse than y in all the objectives and x is better thaty in at least one of the objectives. In the relaxed forms of dominance, thebasic idea is to consider a solution x as better that a solution y even ifx is worse that y in some objective(s). Usually, the condition is that such


deterioration must be compensated by a good improvement in the value ofother objective(s). The idea is that by using relaxed forms of dominance,the algorithm will be capable of exploring more of solutions and hence, tomaintain a better diversity. For example, Laumanns et al.28 proposed theuse of e-dominance to implement archiving/selection strategies that per-mit to achieve a better convergence and distribution of the approximationnon-dominated set. Burke and Landa Silva3 used a variant of a-dominance,which is also a relaxed form of dominance, to improve the converge abilityof two multi-objective search algorithms. Mostaghim and Teich34 comparedthe performance of a multi-objective optimization algorithm when using aclustering technique and when using the e-dominance method. They ob-served in their experiments that using e-dominance to update the archiveof non-dominated solutions, was beneficial because it helped to reduce thecomputation time and it also helped to achieve a better convergence andcomparable diversity.

Another interesting aspect of using relaxed forms of the dominance re-lation is that it can help to identify those solutions that are more attractiveto the decision-makers out of the set of solutions in the trade-off surface,which can be of considerable size. As it was pointed out by Farina andAmato13, the number of solutions that can be considerable equal or incom-parable (based on standard dominance) to the current solution, increasesconsiderably with the number of objectives. They developed the notionof fc-dominance in which they proposed to take also into considerationthe number of incomparable or equal objectives in the new solution andthe normalized size of improvement achieved in the other objectives. Infc-dominance v\ fc-dominates i>2 if and only if:

ne<M and nb > ^ ^ where 0 > k < 1

In the above, nt is the number of objectives in which v\ is better thanV2, and ne is the number of objectives in which t>i and V2 are equal. Farinaand Amato also extended fc-dominance by evaluating the number of nt, ne,in a fuzzy way instead of a crisp way by introducing a tolerance on theith objective, that is the interval at which an improvement on objectivei is meaningless. Jin and Wong17 also investigated archiving techniquesbased on their concept of relaxed dominance, called ^-dominance. Themain feature of their archiving mechanism is that it adapts according tothe solutions that have been found. It also includes the concept of hyper-rectangles to enclose the search space even considering unseen solutions.This gives their technique the advantage of not requiring prior knowledge


of the objective space (objective values).

30.3.6. Helper Objectives

The specification of helper objectives is a strategy that has been used toaid the search not only in multi-objective optimization but also in single-objective optimization. For example, this mechanism can be used to handleconstraints by treating each constraint as additional objective to be op-timised. In single-objective optimization the aim of helper objectives ishelp on maintaining diversity and escaping from local optima. For exam-ple, Jensen16 and Knowles et al.20 proposed the 'multi-objectivization' ofsingle-objective optimization problems which is decomposing the single-objective problem into subcomponents by considering multiple objectives.In this way, 'multi-objectivization' can help to remove local optima becausefor the search process to be stuck it is required that all objectives are stuck.The helper objectives should be chosen so that they are in conflict with themain objective, at least partially.

30.3.7. Objective Oriented Heuristic Selection

Another idea that has been proposed to help maintaining diversity in multi-objective optimization is to adapt the local search strategy according tothe current distribution of solutions in the objective and/or the solutionspace. For example, Knowles and Corne23 proposed to adapt the focus ofthe search on exploration or exploitation when approximating the Paretofront, by selecting the most adequate between three search strategies: l)usea population-based method that tries to improve in all objectives at once inorder to approach the Pareto front from all directions, 2)generate a weightedvector which is used to specify a specific search direction, or 3)use a single-solution local search method that tries to move along the Pareto front byperturbing one solution and obtain a nearby point in the front.

The selected strategy depends on the correlation between distance inthe solution space and distance in the objective space. This strategy wasalso investigated by Jin and Sendhoff18 for some continuous test problems.Adapting the local search heuristic according to the value of the objectivesin the solutions has also been proposed as a mechanism to maintain di-versity while converging to the Pareto front. For example, Burke et al.h

implemented an approach that has been termed 'hyper-heuristic'. The ideais to use a guiding/learning method that choses the most promising heuris-tic in order to push solutions towards the desired area in the objectives of


interest. This technique takes into consideration the localization of the so-lution in the objective space and the ability of each local search heuristic toachieve improvements on each objective. The idea is to try improving poorobjectives while maintaining the rich ones. Adapting the heuristic localsearch is interesting when using hybrid approaches that use local search inan efficient way. Then, the analysis or pre-sampling of the fitness landscapecan be useful to design a good hybrid32.

30.3.8. Using Diversity to Guide the Search

Various evolutionary algorithms for multi-objective optimization use esti-mators of density in the objective space to bias the selection operator inorder to maintain diversity in the population. Laumanns et al.27 noted thatthe accuracy of the density estimator used has a strong effect on the perfor-mance of the selection strategy and hence, the density estimator should begood for the diversity maintenance strategy to be effective. Also, Knowleset al.25 proposed a bounded archiving technique that attempts to maximizethe hypervolume covered by the approximation set. They compared the per-formance of their archiving technique against other methods and obtainedpromising results. However, they pointed out that the computational costwas considerable for more than three objectives.

In single-objective optimization some researchers have also made someefforts towards designing evolutionary algorithms that maintain diversityin an adaptive fashion by using diversity measures to guide the search.For example, Ursem44 proposed a diversity-guided evolutionary algorithmthat alternates between phases of exploration and exploitation accordingto a measure of the diversity in the population given by the distance tothe average point. If the diversity is below a threshold diow, the algorithmuses selection and recombination in an exploration mode. If the diversityis above a threshold dîgh, the algorithm uses mutation in an exploitationmode. Another approach that uses diversity measures to guide the searchis the diversity-control-oriented genetic algorithm of Shimodaira38 in whichthe probability of individuals to survive depends on the Hamming distancebetween the individual and the best individual in the population.

30.4. The Two-Objective Space Allocation Problem

The management of physical space in universities is an important and dif-ficult issue as it was discussed by Burke and Varley6. With the continuousincrease in the number of students and staff, it must be ensured that the


available estate is used as efficiently as possible while simultaneously sat-isfying a considerable number of constraints. The allocation of office spaceto staff, postgraduate students, teaching rooms, computers rooms, etc. iscarried out manually is most universities. This is a process that takes a con-siderable amount of time and effort from the space administrators. Moreimportantly, this manual distribution usually provokes an inefficient uti-lization of the available estate.

30.4.1. Problem Description

The space allocation problem can be briefly described as follows. Given a setof n entities (people, teaching rooms, computer rooms, etc.) and a set of mavailable rooms, the problem is to allocate all the n entities into the m roomsin such a way that the office space is used as efficiently as possible and theadditional constraints are satisfied. Each entity requires a certain amountof space"1 according to university regulations and each room has a givencapacity. It is very unlikely that the capacity of a room matches exactly theamount of space required by the entities allocated to the room. Let Ci be thecapacity of the ith room and let s; be the space required by all the entitiesallocated to the room. Then, if d > Sj, space is said to be wasted, whileif Ci < Si, space is said to be overused. It is less desirable to overuse spacethan to waste it. The overall space utilization efficiency is measured by theamount of space that is being misused, i.e. space wasted plus space overusedfor all rooms (space misuse is represented by Fi). In addition to this, spaceadministrators should ensure that certain constraints are satisfied. Someconstraints are hard, i.e. they must be satisfied while other constraints aresoft, i.e. their violation should be minimized. The number of different typesof constraints varies considerably between problem instances but in general,the constraints limit the ways in which the entities can be allocated torooms. For example, two professors must not share a room, a computer roomshould be allocated in the ground floor and adjacent to a seminar room,teaching rooms must be away from noisy areas, postgraduate students in thesame research group should be grouped together, etc. The penalty appliedwhen a constraint is violated depends on the type of constraint and it mayalso vary from one problem instance to another (soft constraints violationis represented by F2). A solution or allocation is represented here by avector II — [m, 7T2,..., 7rn] where each TTJ e {1, 2 , . . . , m} for j — 1, 2 , . . . , n

mNote that here, space is the floor area usually measured in m2.


indicates the room to which the j t h entity has been allocated.In a multi-criteria optimization problem, the criteria can be conflict-

ing, harmonious or independent and this has an influence on the difficultyto achieve a good approximation to the Pareto front as it was discussedby Purshouse and Fleming37. The existence of conflicting criteria makesmore difficult to achieve a good convergence. If the criteria are harmonious,convergence is not affected but achieving a good diversity may be moredifficult because it is very probable that solutions will have similar valuesin the harmonious criteria. If the criteria are independent, it is possible todecompose the problem and then to use a divide and conquer strategy tosolve it. An investigation into the conflicting nature of the criteria in thespace allocation problem was carried out by Landa Silva26. In that inves-tigation it was found that in general, the minimization of space wastageis not in conflict with the minimization of space overuse and that the sat-isfaction of different types of soft constraints is not in conflict with eachother. However, it was also found that the minimization of space misuse(overuse and wastage) is in strong conflict with the minimization of softconstraints violation. Therefore, we consider two objectives in the spaceallocation problem:

(1) Minimization of space misuse, i.e. minimization of F\.(2) Minimization of soft constraints violation, i.e. minimization of F^.

In this problem, space administrators often know of additional con-straints which are not (or cannot for political reasons) be explicitly builtinto the objectives. For example, when two members of staff have a per-sonality clash and cannot be allocated in the same room. Another commonexample is when people have a preference for certain type of rooms. There-fore, in this context, the aim of a multi-objective optimization algorithm isto aid the space administrators by finding a set of alternative high-qualityallocations. Space administrators usually want to see a set of allocationswhich are very similar in certain aspects while being very different in otheraspects. For example, administrators may want to see two or more alterna-tive solutions in which the teaching areas are allocated to the same roomsin each of the allocations but with different ways of distributing offices topeople. Another example is when the allocation needs to be re-organizedand the space administrators want to explore alternative non-dominatedsolutions that are very similar to the existing distribution in order to avoidmajor disruptions. Then, in the space allocation problem it is important totake into consideration the diversity of the set of allocations with respect


to the solution space.Besides its practical interest, the space allocation problem as described

here is of scientific importance because it can be formulated as a variant ofthe multiple knapsack problem which is an important problem in combina-torial optimization (see Dawande et al.g and Martello and Toth31).

30.4.2. Measuring Diversity of Non-Dominated Sets

There are various papers in the.literature that propose, compare and discussindicators to assess the performance of multi-objective optimization algo-rithms. These include those by Knowles and Corne24 Ang et al.1, Farhang-Mehr and Azarm12, Tan et al.42, Okabe et al.36 and others. Assessing thediversity (in the solution space or in the objective space) of a non-dominatedset is a difficult task because, as it was discussed in Sec. 30.2, the diversityshould be measured in terms of the distribution and the spread of solutionsin the set. Some of the indicators proposed in the literature seek to evalu-ate the quality of the spread and the distribution of solutions. For example,the S metric of Zitzler and Thiele48 calculates the hypervolume of the k-dimensional region covered by the approximation set. But a reference pointmust be given in order to compute the hypervolume and the location of thisreference point may have an influence on how two or more non-dominatedsets compare. Deb et al.10 proposed a spacing metric designed to measurehow evenly points are distributed. That metric is based on computing theEuclidean distance between each pair of non-dominated solutions and italso requires the boundary solutions. Another spacing metric which is alsobased on the Euclidean distance between pairs of non-dominated solutionsis the one described by Van Veldheuzien and Lamont46. Other metrics thathave been proposed to estimate the diversity of a population of solutionsare based on entropy as proposed by Farhang-Mehr and Azarm11. Thesemetrics require the division of the objective space into a cellular structure.A high entropy value indicates a better distribution of solutions across thetrade-off surface because it measures the flatness of the distribution of so-lutions or points.

In this paper, diversity in the objective space is measured using a pop-ulation metric proposed by Morrison and De Jong33. We have selected thismetric because it does not require reference solutions and it is also relatedto the Hamming and Euclidean distances between solutions. The metric byMorrison and De Jong is inspired on concepts of mechanical engineering,specifically on the moment of inertia which measures mass distribution of


an object. The centroid of a set of p points in a A;-dimensional space hascoordinates given by eq. D.I, where Xij is the value of the iih dimensionin the j t h point. The measure of diversity for the population of p points,based on their moment of inertia is given by eq. D.2. The higher the valueof / , the higher the diversity of the set of p points.

yp_ x. .Ci = ^ * - i *'] for i = 1,2,... fe (D.I)

* P7 = E E ( ^ - c < ) 2 (D-2)i=i j=i

To measure diversity in the solution space, the metric used should pro-vide a meaningful way to express the similarity between solutions for theproblem at hand. Therefore, we have designed a specific way of measuringdiversity in the solution space for the space allocation problem. EquationD.3 gives the percentage of non-similarity or variety used here as a measureof diversity for a set of allocations, where D(j) is the number of differentvalues in the j i h position for all the p vectors representing the solutions.Figure 30.2 illustrates how the percentage of variety is calculated for a setof p = 5 allocations.

v^n £>(j)-l

V = î=l p~1 • 100 (D.3)n

30.5. Using Diversity to Guide the Search

In this Sec. we describe the strategies that we implemented in order toobtain approximation sets that better satisfy the requirements from thedecision makers in the space allocation problem. The diversity indicators /(eq. D.2) and V (eq. D.3) described above are used to guide the search andfind sets of non-dominated solutions that are diverse with respect to boththe solution space and the objective space.

30.5.1. Diversity as a Helper Objective

We use the diversity in the objective space as a helper objective in orderto decide when a candidate solution is considered attractive. Let P bea population of solutions from which a solution x is used to generate acandidate solution x'. Then, / (eq. D.2) indicates the diversity of the set P


Five strings representing allocations

A A A A A A A

A A B B A B B

A B B C B C C

A B B C B D D

A B B C C D E

D(j) 1 2 2 3 3 4 5

(D(j)-l)l{p-l) 0 0.25 0.25 0.50 0.50 0.75 1

%> = (3.25/7)x 100 = 46.42%

Fig. 30.2. Calculation of the percentage of variety V for a set of p = 5 allocations. Thenumber of entities is n = 7 and the number of rooms is m = 5.

while / ' indicates the diversity of the set P' in which x is replaced by x'.We use the expression u dominates(ci, c2,...) v to indicate that the criteriaci,C2,... are used to determine dominance between vectors u and v. Then,a candidate solution x1 is considered attractive if x' dominates(FT, I) xwhere FT = i*\ + F2. That is, x' is considered better than x if FT(X') <FT{x) and / ' > / or if FT{x') < FT(x) and / ' > / . Note that we usethe aggregated value FT instead the individual criteria i*i and F2- Thisis because in our previous research we have observed that the aggregationmethod was more beneficial than the dominance relation" for the overallperformance of our algorithm over all set of instances (see Burke and LandaSilva4). Then, a candidate solution is accepted if it has better fitness (FT)without worsening the diversity in the objective space (/) or if it has thesame fitness value but it improves the diversity in the objective space.

30.5.2. Diversity to Control Exploration and Exploitation

We use the diversity measure in the solution space to alternate betweenthe phases of exploration and exploitation in our algorithm. This is similarto the strategy implemented by Ursem44 in single-objective optimization.As it was discussed above, the measure V (eq. D.3) is an indication of howdiverse a set of allocations is considered by the space administrators. Thevalue of V(PND) is used to control the algorithm search strategy, where

nWe also found that using relaxed forms of dominance (see Sec. 30.3.5) seems to improvethe performance of our algorithm but only in some problem instances.


PND is the current set of non-dominated solutions. First, two thresholdvalues are set, Vgood is the diversity that is considered as 'good' in theobtained set of non-dominated solutions and Vmin is the minimum diversitythat is accepted in the obtained set of non-dominated solutions. Then,when V(PND) > Vgood the algorithm is in exploitation mode and whenV(PND) < Vmin the algorithm enters the exploration mode. In exploitationmode, the algorithm attempts to find better solutions by using local searchonly. In exploration mode, the algorithm uses local search and a specializedmutation operator in order to increase the diversity V(PND) of the currentset of non-dominated solutions. Based on our previous experience26 withthe space allocation problem, we set Vgood = 70% and Vmin = 30% in ourexperiments.

30.5.3. The Population-Based Hybrid Annealing Algorithm

Our algorithm is a population-based approach in which each individual isevolved by means of local search and a specialized mutation operator. Thealgorithm is shown in pseudocode 1 and is a modified version of our pre-vious approach described elsewhere4. The modification consists on addingthe mechanisms described above to guide the search based on the diversitymeasures. The population Pc contains the current solution for each individ-ual. The population PB contains the best solution (in terms of FT) found byeach individual so far. The population PND is the external archive of non-dominated solutions. A common annealing schedule is used to control theevolution process of the whole population by means of the global acceptanceprobability p (steps 6.3 and 6.4). The local search heuristic His selects thetype of move from relocate, swap, and interchange if all the n entities areallocated. If there are unallocated entities (this occurs when the specializedmutation operator is applied as described below), then His employs theallocate move. Relocate moves an entity from one area to another, swap ex-changes the assigned areas between two entities, interchange exchanges allthe allocated entities between two areas, and allocate finds a suitable areato allocate an unallocated entity. The local search heuristic His incorpo-rates a cooperation mechanism to encourage information sharing betweenindividuals in the population. This cooperation mechanism, maintains twomatrices Mr and MA of size n x t n i n which the cell (j, i) indicates theallocation of the j t h entity to the ith area. Mr stores pairs (entity,area)that are considered tabu for a number of iterations while MA stores thosethat are considered attractive during the search.

Using Diversity to Guide the Search in Multi- Objective Optimization 743

Pseudocode 1. The Population-based Hybrid Annealing Algorithm.

1.Generate the in i t ia l current population of solutions PQ2.Copy PQ to the population of best solutions PQ3. Set acceptance probability p <— 0, cooling factor 0 < a < l ,

decrement step r\, re-heating step </?, and re-heatingcounter r f - 0 (77, ip and r are a number of iterations)

4.For r\ i terations, apply the local search heuristic HLSto each individual in PQ

5. Set p«— 1, mode = exploitation6.For each XQ in Pc an<i i t s corresponding XB in PB •

6.1.Generate a candidate solution XQ using HLS6.2. If X'c dominates(FT, I) Xc , then Xc <- X'c

a) If XQ dominates(Fx, I) XB . then XB <— X'c6.3.If X'c dominates(FT,I) Xc is false, then

a) if p> 0 and a random generated number in the normaldistribution [0,1] is smaller than p, then Xc <— X'Qb)if pxO (in our setting, if p < 0.0001), thenT <— r + 1 and if r > <p, then p <— 1 and r = 0

6.4.If (iterations mod 77) = 0, then p <—a • 76.5. If no solution in / V D dominates(Fi,F2) X'c, update PJV£)

7. If mode = exploitation and V(PND) < ^minimum •then mode <— exploration

8. if mode — exploration and V{PMD) •> Vgood>then mode <— exploitation

9.If mode = exploration, then apply the specializedmutation operator to each individual in P<7

10.If stopping criterion has not been satisfied, go to Step 6

When a move produces a detriment in the fitness of the solution, Mris updated as Mr{j,i) = iterations + tenure which indicates that movesinvolving that pair are considered tabu for tenure RJ n iterations. Whena move produces an improvement in the fitness of the solutions, MA isupdated as MA = MA + 1 to indicate that the higher the value of thecell, the more attractive the moves involving that pair are considered. Thepurpose of the specialized mutation operator is to disturb solutions in acontrolled way in order to promote exploration. This operator unallocatesa maximum of n/5 entities from their assigned area of space. The entities tobe unallocated are selected in decreasing order of their associated penalty(violation of soft constraints associated to the entity). The entities that areunallocated by the mutation operator are re-allocated by the heuristic His-


In the algorithm presented in pseudocode 1, the diversity I is used as ahelper objective in steps 6.2 and 6.3 while the diversity V(PND) is used toguide the search in steps 7-9. In our previous approach4, the preference ofthe candidate solution X'c over Xc and XB in steps 6.2 and 6.3, is basedsolely on the value of FT- The other difference in our previous implemen-tation is that the specialized mutation operator (steps 7-9) is applied whenno individual in Pg has achieved an improvement for r\ iterations insteadof being controlled by the diversity in the solution space as proposed here.

30.6. Experiments and Results

The purpose of our experiments was to investigate if the mechanisms de-scribed above to guide the search based on the diversity measures / (eq. D.2)and V (eq. D.3) help our algorithm to find better sets on non-dominatedsolutions. Here, we are interested in finding sets of non-dominated allo-cations that have a good spread and distribution in the objective spacebut also have high diversity in the solution space. We compared the per-formance of the algorithm presented in pseudocode 1 to our previousimplementation using the same real-world data sets nottl, nottlb andtrentl described in that paper4 (these test instances are available fromhttp://www.cs.nott.ac.uk/~jds/research/spacedata.html).

30.6.1. Experimental Setting

For each test instance, we executed 10 runs of our algorithm described inpseudocode 1 and 10 runs of the previous implementation. In each pairof runs, the same initial set of solutions was used for the two algorithms.In each run, the stopping criterion was a maximum number of solutionevaluations set to 100000, 80000 and 50000 for nottl, nottlb and trentlrespectively. The parameters for the algorithm were set as in our previouspaper4: \Pc\ = \PB\ = 20, a = 0.95, r\ = n and <p = 10-n. We compared thetwo algorithm implementations with respect to the online and the offlineperformance. For the online performance, we directly compare the PNDsets obtained by the algorithms in each run. For the offline performance,an overall non-dominated set is obtained for each algorithm by merging allthe 10 PND sets produced. A visual comparison of two non-dominated setsfound by the two algorithms was not possible because no evident differencewas observed in the bi-dimensional graph with axis F\ and F2. Therefore,we used four criteria to compare two sets of non-dominated solutions: thediversity in the objective space I (eq. D.2), the diversity in the solution


space V (eq. D.3), the number of non-dominated solutions found |P/vz?| andthe C metric of Zitzler et al.49 which is given by eq. F.I. If C(A, B) = 1,all solutions in set B are dominated by at least one solution in set A. IfC(A,B) = 0, no solution in set B is dominated by a solution in set A.We used the C metric because it directly compares the quality of two non-dominated sets, it is simple to compute and it does not require knowledgeof the Pareto optimal front.

C (A, B) = — (b .1)\B\

We carried out our experiments on a PC with a 3.0GHz processor, 768MBof memory and running on Windows XP. The algorithms were coded onMS Visual C++ version 6.0.

30.6.2. Discussion of Obtained Results

The results of the experiments described above are shown in tables 30.124to 30.126. Each table presents the results obtained for one test instance.DGPBAA refers to the implementation described in pseudocode 1 (withthe diversity control mechanisms) and PBAA refers to the previous version(without the diversity control mechanisms). The values in the columns /,V are computed for the set PND- For the values in the column C(A,B), Arepresents the non-dominated set obtained by DGPBAA and B representsthe non-dominated set obtained by the PBAA.

It can be observed that the use of the diversity control mechanismshelps to improve the performance of the search algorithm. For example,for the nottl instance we can see in table 30.124 that in each of the 10runs the non-dominated set obtained by DGPBAA is better than the non-dominated set obtained by PBAA. That is, the approximation sets obtainedwhen the diversity measures are used to guide the search have higher diver-sity in the objective space (/), higher diversity in the solution space (V),more non-dominated solutions (size) and also compares (slightly) betterwhen using the C metric. Similar observations can be made for the testproblems nottlb and trentl in tables 30.125 and 30.126 respectively. It isimportant to highlight that in each single run, the diversity in the solutionspace of the non-dominated set obtained when using the diversity controlmechanisms is greater than Vgooci. On the contrary, when the mechanismsto control diversity are not used, the diversity in the solutions space of the

746 J.D. Landa Silva and E.K, Burke

obtained non-dominated set is below Vgood, except for a few runs in thetest instance trentl as shown in table 30.126. When using the C metricit is not clear whether the DGPBAA implementation finds better non-dominated sets. However, we should emphasize that the main contributionof the implemented mechanisms appears to be that they help the algorithmto maintain diversity in both the objective and the solution space. This isprecisely the aim in the space allocation problem tackled here, to provide aset of non-dominated solutions that better satisfies the requirements of thespace administrators.

Table 30.124. Results for the test instance nottl.

DGPBAA PBAA

run / V size C(A,B) I V size C(B,A)

1 4.70 76.3 23 0.71 3.45 61.6 21 0.462 4.95 74.7 28 0.63 3.83 61.4 16 0.373 4.56 79.3 25 0.69 3.39 56.2 18 0.424 4.87 81.6 24 0.57 3.47 49.4 15 0.475 4.91 76.1 27 0.60 3.76 62.1 20 0.476 4.52 75.9 29 0.73 3.51 56.3 18 0.377 4.59 73.6 28 0.64 3.36 59.2 15 0.398 5.03 77.4 22 0.62 3.28 53.7 19 0.419 4.86 80.2 26 0.66 3.52 58.3 17 0.4910 4.77 81.6 25 0.64 3.41 52.5 19 0.43

offline 6.43 73.2 37 0.62 5.12 47.4 24 0.37

Table 30.125. Results for the test instance nott lb.

DGPBAA PBAA

run / V size C(A,B) I V size C{B,A)

1 4.31 72.5 21 0.59 3.13 65.2 20 0.462 4.48 74.2 18 0.61 3.51 61.6 15 0.513 4.87 75.6 19 0.57 3.04 62.7 18 0.484 4.22 71.8 22 0.46 3.76 60.5 16 0.485 4.95 74.3 17 0.62 3.28 58.4 19 0.566 5.04 75.1 24 0.59 3.16 57.3 18 0.497 4.69 73.5 18 0.63 2.94 61.3 17 0.448 4.27 71.6 19 0.71 3.45 55.7 21 0.319 4.63 74.8 22 0.66 3.31 57.3 14 0.4810 4.91 73.5 21 0.57 3.34 61.6 20 0.41

offline 5.63 67.2 34 0.72 4.12 41.4 21 0.38


Table 30.126. Results for the test instance trentl .

DGPBAA PBAA

run / V size C{A,B) I V size C{B,A)

1 5.45 82.6 25 0.64 4.02 61.2 21 0.56

2 5.51 75.4 23 0.53 4.62 63.6 16 0.483 5.34 80.2 27 0.48 3.56 71.2 18 0.374 5.16 77.5 22 0.51 4.23 64.9 16 0.415 5.46 74.9 25 0.47 4.56 69.5 14 0.396 5.62 79.4 29 0.40 4.18 62.7 22 0.367 5.39 81.0 31 0.59 4.39 64.6 23 0.468 5.26 75.8 24 0.57 4.40 61.1 19 0.519 5.11 79.4 21 0.46 4.04 73.7 16 0.3910 5.74 82.6 25 0.49 3.87 64.2 20 0.35

offline 6.76 72.4 43 0.68 5.12 52.4 28 0.37

30.7. Summary

In this paper we have shown that diversity measures can be used to guidethe search in multi-objective optimization in order to achieve sets of non-dominated solutions that better satisfy the requirements of the decision-makers. We carried out experiments for a real-world problem with twoobjectives, the problem of space allocation in academic institutions. In thisproblem, the decision-makers are interested in obtaining a good approxi-mation set that is also diverse with respect to the solution space. We usedthe moment of inertia to measure diversity in the objective space and aproblem-specific indicator to measure diversity in the solution space. Thealgorithm used in our experiments is a population-based approach in whicheach individual in the population in improved by local search and a special-ized mutation operator is used to disturb a solution in a controlled fashion.Two diversity control mechanisms were incorporated to the algorithm, onebased on diversity in the objective space and another based on diversityin the solution space. In the first mechanism, the diversity in the objec-tive space is used as a helper objective in order to determine if candidatesolutions generated by local search are accepted or not. In the second mech-anism, the diversity in the solution space is used to alternate between thephases of exploitation and exploration. During exploitation, the algorithmemploys local search only. During exploration, the specialized mutation op-erator is also applied in addition to local search. In order to assess thecontribution of the diversity control mechanisms, we carried out experi-ments on three real-world test instances of the space allocation problem inacademic institutions. The results obtained in our experiments show that


the algorithm produces better sets of non-dominated solutions when thediversity control mechanisms are used to guide the search. In particular,these non-dominated sets have higher diversity in the solution space whichis a common requirement by space administrators.

References

1. Ang K.H., Chong G., Li Y., Preliminary statement on the current progress ofmulti-objective evolutionary algorithm performance measurement, Proceed-ings of the 2002 congress on evolutionary computation (CEC 2002), IEEEpress, 1139-1144, (2002).

2. Belton V., Stewart T.J., Multiple criteria decision analysis - an integratedapproach, Kluwer academic publishers, 2002.

3. Burke E.K., Landa Silva J.D., Improving the performance of multiobjec-tive optimizers by using relaxed dominance, Proceedings of the l^th asia-pacific conference on simulated evolution and learning (SEAL 2002), 203-207, (2002).

4. Burke E.K., Landa Silva J.D., The effect of the fitness evaluation method onthe performance of multiobjective search algorithms, to appear in EuropeanJournal of Operational Research, (2004).

5. Burke E.K., Landa Silva J.D., Soubeiga E., Hyperheuristic approaches formultiobjective optimisation, Proceedings of the 5th metaheuristics interna-tional conference (MIC 2003), (2003). Extended version available from theauthors.

6. Burke E.K., Varley D.B., Space Allocation: an analysis of higher educationrequirements. The Practice and theory of automated timetabling II: Selectedpapers from the 2nd international conference on the practice and theoryof automated timetabling (PATAT 97), Lecture notes in computer science,1408, Springer , 20-33, (1998).

7. Coello Coello C.A., Van Veldhuizen D.A., Lamont G.B., Evolutionary al-gorithms for solving multi-objective problems, Kluwer academic publishers,(2002).

8. Czyzak P., Jaszkiewicz A., Pareto simulated annealing - a metaheuristicfor multiple-objective combinatorial optimization, Journal of multicriteriadecision analysis, 7(1), 34-47, (1998).

9. Dawande M., Kalagnanam J., Keskinocak P., Ravi R., Salman F.S., Ap-proximation algorithms for the multiple knapsack problem with assignmentrestrictions, Journal of combinatorial optimization, 4(2), 171-186, (2000).

10. Deb K., Multi-objective optimization using evolutionary algorithms, Wiley,(2001).

11. Farhang-Mehr A., Azarm S., Diversity assessment of pareto optimal solutionsets: an entropy approach, Proceedings of the 2002 congress on evolutionarycomputation (CEC 2002), IEEE press, 723-728, (2002).

12. Farhang-Mehr A., Azarm S., Minimal sets of quality metrics, Proceedingsof the 2nd international conference on evolutionary multi-criterion opti-


mization (EMO 2003), Lecture notes in computer science, 2632, Springer,405-417, (2003).

13. Farina M., Amato P., Fuzzy optimality and evolutionary multiobjective op-timization, Proceedings of the 2nd international conference on evolutionarymulti-criterion optimization (EMO 2003), Lecture notes in computer sci-ence, 2362, Springer, 58-72, (2003).

14. Gandibleux X., Morita H., Katoh N., The supported solutions used as agenetic information in a population heuristics, Proceedings of the 1st in-ternational conference on evolutionary multi-criterion optimization (EMO2001), Lecture notes in computer science, 1993, Springer, 429-442, (2001).

15. Ishibuchi H., Yoshida T., Murata T., Balance between genetic search andlocal search in memetic algorithms for multiobjective permutation flowshopscheduling, IEEE transactions on evolutionary computation, 7(2), IEEEpress, 204-223, (2003).

16. Jensen M.T., Guiding single-objective optimization using multi-objectivemethods, Applications of evolutionary computing, Proceedings of theEvoWorkshops 2002, Lecture notes in computer science, 2611, Springer,268-279, (2003).

17. Jin H., Wong M.L., Adaptive diversity maintenance and convergence guar-antee in multiobjective evolutionary algorithms, Proceedings of the 2003congress on evolutionary computation (CEC 2003), IEEE press, 2498-2505,(2003).

18. Jin Y., Sendhoff B., Connectedness, regularity and the success of localsearch in evolutionary multi-objective optimization, Proceedings of the 2003congress on evolutionary computation (CEC 2003), IEEE press, 1910-1917(2003).

19. Jones D.F., Mirrazavi S.K., Tamiz M., Multiobjective meta-heuristics: anoverview of the current state-of-the-art, European journal of operationalresearch, 137(1), 1-9, (2001).

20. Knowles J.D., Watson R.A., Corne D.W., Reducing local optima in single-objective problems by multi-objectivization. Proceedings of the 1st interna-tional conference on evolutionary multi-criterion optimization (EMO 2001),Lecture notes in computer science, 1993, Springer, 269-283, (2001).

21. Kumar R., Rockett P., Improved sampling of the pareto-front in multiob-jective genetic optimization by steady-state evolution: a pareto converginggenetic algorithm, Evolutionary computation, 10(3), 283-314, (2002).

22. Knowles J., Corne D.C., Approximating the nondominated front using thepareto archived evolution strategy. Evolutionary computation, 8(2), MITpress, 149-172, (2000).

23. Knowles J.D., Corne D.W., Towards landscape analyses to inform the designof a hybrid local search for the multiobjective quadratic assignment problem,Soft computing systems: design, management and applications, IOS Press,271-279, (2002).

24. Knowles J., Corne D., On metrics for comparing nondominated sets, Pro-ceedings of the 2002 congress on evolutionary computation (CEC 2002),IEEE press, 711-716, (2002).


25. Knowles J.D., Corne D.W., Fleischer M., Bounded archiving using thelebesgue measure, Proceedings of the 2003 congress on evolutionary com-putation (CEC 2003), IEEE press, 2490-2497, (2003).

26. Landa Silva J.D., Metaheuristics and multiobjective approaches for spaceallocation. PhD thesis, School of computer science and information technol-ogy, University of Nottingham, UK, (2003).

27. Laumanns M., Zitzler E., Thiele L., On the effects of archiving, elitism,and density based selection in evolutionary multi-objective optimization,Proceedings of the 1st international conference on evolutionary multi-criterion optimization (EMO 2001), Lecture notes in computer science,1993, Springer, 181-196, (2001).

28. Laumanns M., Thiele L., Deb K., Zitzler E., Combining convergence anddiversity in evolutionary multiobjective optimization, Evolutionary compu-tation, 10(3), 263-282, (2002).

29. Lu H., Yen G.G., Dynamic population size in multiobjective evolutionaryalgorithms, Proceedings of the 2002 congress on evolutionary computation(CEC 2002), IEEE press, 1648-1653, (2002).

30. Lu H., Yen G.G., Rank-density based multiobjective genetic algorithm, Pro-ceedings of the 2002 congress on evolutionary computation (CEC 2002),IEEE press, 944-949, (2002).

31. Martello S., Toth P., Knapsack problems - algorithms and computer imple-mentations, Wiley, (1990).

32. Merz P, Freisleben B. Fitness landscape and memetic algorithm design, in:Corne D., Dorigo M., Glover F. (eds.), New ideas in optimisation, McGrawHill, 245-260, (1999).

33. Morrison R.W., De Jong K.A., Measurement of population diversity, Ar-tificial Evolution: Selected Papers of the 5th International Conference onArtificial Evolution (EA 2001), Lecture notes in computer science, 2310,Springer, 31-41, (2001).

34. Mostaghim S., Teich J., The role of e-dominance in multi-objective particleswarm optimization methods, Proceedings of the 2003 congress on evolu-tionary computation (CEC 2003), IEEE press, 1764-1771, (2003).

35. Murata T., Ishibuchi H., Gen M., Specification of genetic search directionsin cellular multi-objective genetic algorithms, Proceedings of the 1st interna-tional conference on evolutionary multi-criterion optimization (EMO 2001),Lecture notes in computer science, 1993, Springer, 82-95, (2001).

36. Okabe T., Jin Y., Sendhoff B. A critical survey of performance indices formulti-objective optimisation. Proceedings of the 2003 congress on evolution-ary computation (CEC 2003), IEEE press, 862-869, (2003).

37. Purshouse R.C., Fleming P.J., Conflict, harmony, and independence: rela-tionships in evolutionary multi-criterion optimisation, Proceedings of the2nd international conference on evolutionary multi-criterion optimization(EMO 2003), Lecture notes in computer science, 2632, Springer, 16-30,(2003).

38. Shimodaira H., A diversity control oriented genetic algorithm (DCGA): de-velopment and experimental results, Proceedings of the 1999 genetic and


evolutionary computation conference (GECCO 1999), Morgan kaufmann,603-611, 1999.

39. Steuer Ralph E., Multiple criteria optimization: theory, computation andapplication, Wiley, (1986).

40. Socha K., Kisiei-Dorohinicki M., Agent-based evolutionary multiobjectiveoptimization, Proceedings of the 2002 congress on evolutionary computation(CEC 2002), IEEE press, 109-114, (2002).

41. Talbi E.G., Rahoudal M., Mabed M.H., Dhaenens C, A hybrid evolution-ary approach for multicriteria optimization problems: application to theflow shop, Proceedings of the 1st international conference on evolutionarymulti-criterion optimization (EMO 2001), Lecture notes in computer sci-ence, 1993, Springer, 416-428, (2001).

42. Tan K.C, Lee T.H., Khor E.F., Evolutionary algorithms for multi-objectiveoptimization: performance assessments and comparisons, Artificial intelli-gence review, 17, 253-290, (2002).

43. Ulungu E.L., Teghem J. Fortemps P.H., Tuyttens D., MOSA method: a toolfor solving multiobjective combinatorial optimization problems, Journal ofmulticriteria decision analysis, 8, 221-236, (1999).

44. Ursem R.K., Diversity-guided evolutionary algorithms, Proceedings of the7th parallel problem solving from nature (PPSN VII), Lecture notes in com-puter science, 2439, Springer, 462-471, (2002).

45. Van Veldhuizen D.A., Lamont G.B., Multiobjective evolutionary algorithms:analyzing the state-of-the-art, Evolutionary computation, 8(2), 125-147,(2000).

46. Van Veldhuizen D.A., Lamont G.B., On measuring multiobjective evolu-tionary algorithms performance, Proceedings of the 2000 congress on evolu-tionary computation (CEC 2000), IEEE press, 204-211, (2000).

47. Zhu Z.Y., Leung K.S., Asynchronous self-adjustable island genetic algorithmfor multi-objective optimization problems, Proceedings of the 2002 congresson evolutionary computation (CEC 2002), IEEE press, 837-842, (2002).

48. Zitzler E., Thiele L., Multiobjective evolutionary algorithms: a compara-tive case study and the strength pareto approach, IEEE transactions onevolutionary computation, 3(4), 257-271, (1999).

49. Zitzler E., Deb K., Thiele L. Comparison of multiobjective evolution-ary algorithms: empirical results. Evolutionary computation, 8(2), 173-195,(2000).

INDEX

-E-dominance, 734 bi-objective assignment problem, 21,a-dominance, 734 567e-constraint method, 254 bi-objective covering tour problem,e-dominance, 5, 17, 87, 734 19, 252fc-dominance, 734 bi-objective harvest scheduling, 715(1+1)-ES, 55 bi-objective knapsack problem, 21,NP- complete problem, 629 567NP-hard, 395, 402, 410 bias-variance dilemma, 394

binary classifier, 369absolute sensors, 126 binding, 273Adaptive Range Multiobjective biological sequences, 20

Genetic Algorithm, 19 black-box optimization, 395additive e-quality measure, 288 blended crossover, 299admissible solutions, 4 BLX, see blended crossoveraerodynamic optimization, 296, 298, bottleneck machine, 505

301 bound sets, 559, 562aggregating functions, 6 brachytherapy, 373allocation, 272 branch-and-cut, 19ANN, see artificial neural networkARAC, 383 C4.5, 610arc routing problem, 248 CACSDARMOGA, see Adaptive Range computer aided control system

Multiobjective Genetic Algorithm design, 155artificial neural network, 639, 677 unified approach, 158attribute based distance function, 490 Capital Asset Pricing Model, 640attribute selection, 603, 605 CAPM, see Capital Asset Pricingautomated parameterization, 81 Modelautonomous vehicle, 125 cartesian genetic programming, 104

catalog, 483, 489, 500backward sequential selection, 605 cell design/formation, 507Bayesian decision, 398 cellular manufacturing system, 20,beam orientation, 377 505benchmark application, 273 Center-of-Gravity Method, 17, 127best-N selection, 299 chemical engineering, 19

753

754 Index

chemotherapy, 381 Cycle Crossover, 587cis-acting DNA sequences, 428 cyclone separator, 317, 319, 322-330city planning, 18, 227 cyclone separatorsclass NP, 629 design, 19class P, 629classification, 383, 603 data density, 408classifier, 395, 407, 418 data mining, 19, 21, 295, 297, 312,cluster, 396, 406, 412, 418, 419 603cluster analysis, 311 decision boundary, 396, 406, 419CoGM, see Center-of-Gravity Method decision making process, 614combinational logic circuit design, 101 decision tree, 397, 400, 604combinatorial optimization, 395, 402, decomposition, 395, 397, 398

410 decomposition-through-competition,compact'genetic algorithm, 10 398complexity, 628 definition of complexity, 629

in neural networks, 22 delay, 271component catalog, 500 design of a valve actuation system,computational finance, 627 498computational fluid dynamics, 296 design of fluid power systems, 20, 483computational parameters, 334 design process, 485computer engineering application, 270 design space exploration, 272computer science applications, 451 design unification and automation,computer-aided diagnosis, 19, 369 158confidence, 383 dimensionality, 400, 406, 412confidence level, 393 direct-drive implosion, 354connection matrix, 655 distance function, 489-491connectionist architecture, 400, 418 diversification, 731constraint, 412 diversity, 402, 404, 410, 730, 739convergence, 402, 404, 406, 410, 420 estimators, 736, 739, 740cooperation mechanism, 742 guided search, 736, 740, 741coverage, 383 measures, 22coverage measure, 288 divide-and-conquer, 396, 420covering tour problem, 247 dominated portfolio, 632credit assignment, 397 dosecredit risk, 642 distribution, 372Credit-Value-at-Risk, 643 optimization, 372CreditRisk+, 645 volume histogram, 372cross-talk, 396, 397 dynamic population sizing, 86cross-validation, 612crossover operator, 322, 493, 557, 564, efficiency goals, 15

566 efficient frontier, 557, 562, 632crowding, 543 efficient solutions, 4, 557

clustering, 732 eigen-value, 400curse of dimensionality, 400 electrocardiogram, 371CVaR, see Credit-Value-at-Risk electrodynamical effects, 62CX, see Cycle Crossover elite archives, 691

Index 755

elite solutions, 21, 535 benchmark problem, 702elitism, 489, 520 management problems, 22engineering design, 17, 29 scheduling problems, 22ensemble based, 394, 419, 420 forward sequential selection, 605entropy, 5 fringing field, 64environmental engineering, 79, 317 fuzzy logic, 136, 429equivalent solutions, 558error ratio, 11 GAP, see Generalized Analysis oferror-function, 394, 396 Promoterserror-surface, 396 gating network, 398Espresso, 103 Gaussianevolutionary algorithms, 1, 177, 178, blobs, 416

557 mutation, 412evolutionary multi-objective noise, 412

optimization, 2 regularization, 657evolutionary neural networks, 679 gene expression, 385evolutionary regularization, 657 general multi-objective optimizationevolutionary strategy, 411 problem, 3evolvable hardware, 103 General Multi-Objective Program, 10expected loss, 643 generalization, 393, 394, 398, 400,expert network, 398 407, 420, 421expert systems, 136 generalization error, 679, 692explicit building block, 451 Generalized Analysis of Promoters,EXPO, 284 20, 429extrapolation, 393 generational distance, 12extrinsic evolution, 103 generational nondominated vector

generation, 14fast messy Genetic Algorithm, 458 generic framework, 393, 420feasible solution, 31 geneticfeatures, 367 algorithm, 185, 317, 320, 529feedforward network, 408, 418 drift, 403FEMO, 286 heritage, 564filter approach, 605 layout optimization, 584finance, 628 local search, 529, 540financial applications, 21 map, 563financial problems, 628 networks, 428finite-element solver, 63 optimization, 393, 402, 404fitness inheritance, 22 GENMOP, see Generalfitness sharing, 299, 732 Multi-Objective Programflowshop, 506 GENOCOP, 637flowshop scheduling, 20, 529, 531 genome representation, 489fluid power, 494 global minimum, 3fluid power system design, 494 Global Positioning System, 126fmGA, see fast messy Genetic gradient-based local search, 646

Algorithm gross tumor volume, 372forest groundwater monitoring, 80

756 Index

group technology, 505 ISPAES, see Inverted and ShrinkablePareto Archived Evolutionary

handshake protocol, 283 Strategyhelper objectives, 735, 740 algorithm, 204heuristic, 412 iterative refinement, 394, 396hierarchical Bayesian network, 10hierarchical classifier, 421 job shop, 506hierarchical partitioning, 397high dose rate brachytherapy, 373 Karnaugh maps, 103high-dimensional, 393, 394, 396, 404, knapsack problems, 630

415, 417 knowledge discovery, 382hillclimbing, 22hybrid algorithm, 637, 641, 642 Lamarckian evolution, 658hybrid annealing algorithm, 742, 743 Laplace regularization, 657hybrid EMO algorithm, 550 LDR, see low dose rate brachytherapyhybrid strategies, 58 learninghyperarea and ratio, 12 architecture, 395hyperplane, 397, 409 complexity, 395, 406, 408, 420hypersphere, 407, 408, 412, 417, 418 cost, 393, 395, 408

error, 395image reconstruction, 366, 367 linear time-invariant, 155imbalanced training set, 409 linear variable differentialimplosion core plasma gradients, 342 transformer, 128indirect-drive implosion, 357 linkage learning algorithm, 10induction heating, 69 local search, 20, 21Inertial Confinement Fusion, 342 local search methods, 605Inertial Measurement Unit, 126 local search operator, 557, 561, 566,infeasible solution, 32 573insertion, 534 local search variation operator, 644intelligent machine, 393 low dose rate brachytherapy, 386intensity modulated beam LTI, see linear time-invariant

radiotherapy, 379inter-island rank histogram, 413-415 machine duplication, 511, 513inter-module, 397 machine learning, 19, 393, 394, 406,intercellular parts movement, 512 415, 417, 419interface benchmark-optimizer, 281 machine under-utilization, 513interpolation, 393 magnetic reactor, 62intra-island rank histogram, 413, 414 magnetic shield, 62intra-module, 397 makespan, 531intrinsic dimensionality, 406, 408, 416 manufacturing cells, 507intrinsic evolution, 103 Markowitz portfolio selection, 631inverse planning, 373 material cost, 64inverse problem, 63, 365 mating restrictions, 5, 20, 403, 404Inverted and Shrinkable Pareto mating scheme, 548

Archived Evolutionary Strategy, 18 maximin fitness function, 18, 229

Index 757

MCEA, see Multi-objective MOGM, see Multi-ObjectiveContinuous Evolutionary Gradient-based MethodAlgorithm moment of inertia, 739, 740

MDESTRA, see Multi Directional MOMGA, see Multi-Objective MessyEvolution Strategy Algorithm, 61 Genetic Algorithm

mechanical engineering, 483 MOMGA-II, see Multi-Objectivemedical image processing, 19 Messy Genetic Algorithm - IImedicine, 19, 365 MOSA method, 571messy Genetic Algorithm, 9, 458 MOSGA, see multi-objective strugglemeta-knowledge, 399 genetic algorithmmeta-learning, 399 MOSS, see Multi-Objective Scattermetrics Search

for MOEAs, 11 mQAP, see multi-objective QuadraticmGA, .see messy Genetic Algorithm, Assigment Problem, 451

Multi Directional Evolution StrategyMGK algorithm, 575 Algorithm, 17, 61Micro-Genetic Algorithm for multi-objective combinatorial

Multi-Objective Optimization, 10 o p t im iz a t ion, 7, 20, 556, 557microarray, 385 i i . - i . - i . - u- i. • i

, , „„ multi-objective combinatorialminimal complete set, 558, 568, 569 , , . __, , . . , _ . , ' , ' problems, 177Minimal Description Length, 401 m i i . - i . - i . - n J.-

. . . , , . , „ , Multi-objective Continuousminimum description length, 431 . . . . ... __MisII 103 Evolutionary Algorithm, 17, 127

. ! • ii j • .1 nrv Multi-Objective Evolutionarymixed variable design problem, 20, r. .•>**'->** ™E

.„„ Algori thms, 4, 125, 155, 295MMOKP, see Modified Multi-Objective Forward Sequential

Multi-objective Knapsack Problem Selection, 21, 611model complexity, 681, 682, 684 Multi-Objective Genetic Algorithm,Modified Multi-objective Knapsack 8> ll' o u o

Problem 20 451 Multi-Objective Gradient-basedmodular system, 394, 398, 419 Method, 17, 127MOEA 177 186 Multi-Objective HierarchicalMOEA'performance measures, 11 Bayesian Optimization Algorithm,

MOEA toolbox, 17 1 0

control module, 159 Multi-Objective Messy Geneticdecision-making module, 159 Algorithm, 9, 458optimization module, 159 Multi-Objective Messy Geneticparameter settings, 167 Algorithm - II, 20, 451specification template, 159 multi-objective optimization, 30, 156,

MOEAs, .see Multi-Objective 432Evolutionary Algorithms multi-objective optimization problem,

MOEAs in 2design of combinational logic multi-objective particle swarm

circuits, 101 optimization, 107MOGA, see Multi-Objective Genetic multi-objective Quadratic Assigment

Algorithm, 19, 22 Problem, 20, 451

758 Index

multi-objective rectangular packing Nondominated Sorting Geneticproblem, 21, 581, 583 Algorithm-II, 6, 9

Multi-Objective Scatter Search, 20, nondominated vector addition, 14429 normal tissue, 372

multi-objective spectroscopic data NPGA, see Niched Pareto Geneticanalysis, 341, 347 Algorithm, see Niched-Pareto

multi-objective struggle genetic Genetic Algorithmalgorithm, 20, 483, 487 NSESA, see Non-dominated Sorting

multi-objective truss optimization, Evolutionary Strategy Algorithm,201 55

multi-start local search, 529, 539 NSGA, 17, 19, 20, see Nondominatedmultiple costs, 395 Sorting Genetic Algorithm, 317,Multiple Objective Heuristics, 556 31s, 323, 336, 519, 633Multiple Objective MetaHeuristics, NSGA-II, see Nondominated Sorting

5 5 6 Genetic Algorithm-II, 19-22, 80,multiple views, 393, 395, 420 255, 286, 542, 637, 644, 659mutation, 322 nugget discovery, 383

nadir point, 562 objective oriented search, 735Navier-Stokes, 296, 301, 302, 311 optimization, 29, 178NCGA, see Neighborhood optimization problems, 630

Cultivation Genetic Algorithm Q r d e r C r o s g o v e r ) 5 3 2 ] 5 8 ?

nearest neighbor, 408, 418 order-based coding, 529, 532Neighborhood Cultivation Genetic , . , „__

A 1 -j.1. 01 mi organs at risk, 372Algorithm, 21, 591 „ . .„„ „„„• iv • j 1 1 ..v. rc-y oscillation strategy, 563, 573

neighborhood search algorithms, 5571 4. 1 oi on one one inv outlier, 407, 409, 417, 418

neural network, 21, 22, 296, 396, 397, ' ' ' 'OQQ overall nondominated vector

neural network ensemble, 21, 659 generation ratio, 13Niched Pareto Genetic Algorithm, 9, overfittmg, 394, 399, 400

343 344 347 ®^-> see Order Crossover

niching, 403, 411No Free Lunch Theorem, 401 P a c k e t processor design, 19Non-dominated Sorting Evolutionary P a c k e t processors, 271

Strategy Algorithm, 17, 55 PAES, see Pareto Archived Evolutionnon-inferior solutions, 4 Strategy, 21non-linear regression, 639 P A F > see Paroxysmal Atrialnon-minimum phase, 165 Fibrillationnon-supported efficient solutions, 558, Pareto Archived Evolution Strategy,

563 10nondeterministic algorithm, 629 Pareto Converging Geneticnondominated portfolio, 632 Algorithm, 20, 395, 411nondominated solutions, 30, 44 Pareto dominance, 488, 543nondominated sorting, 519 Pareto dominance selection , 202Nondominated Sorting Genetic Pareto Evolution Strategy Algorithm,

Algorithm, 8, 32 17, 59

Index 759

Pareto front, 4, 347, 395, 402, 404, PGBA, see Pareto Gradient Based405, 412 Algorithm, see Pareto Gradient

Pareto frontier Based Algorithmsampling, 594 phenotype based distance function,

Pareto Gradient Based Algorithm, 49017, 58 physics, 19

Pareto optimal controller, 155 PISA, 283Pareto optimal designs, 29 Placement-based PartiallyPareto optimal set, 4, 318, 329, 331, Exchanging Crossover, 589

335 planning target volume, 372Pareto optimal solutions, 127 plant uncertainty, 167Pareto optimality, 3 PMX, see Partially MappedPareto ranking, 8, 59, 411, 414 CrossoverPareto ranking method, 299 polymer extrusion, 177, 178, 184Pareto-based approaches, 8 population-based approaches, 7Paroxysmal Atrial Fibrillation, 370 Population-based Hybrid Annealingpart families, 505 Algorithm, 22part subcontracting, 511, 513 portfolio management problems, 21partial classification, 383 portfolio selection, 631, 642Partially Mapped Crossover, 587 potential efficient solutions, 556, 561particle swarm optimization, 101 PPEX, see Placement-based Partially

multiobjective, 107 Exchanging Crossoverpartition, 393, 407, 412, 415, 419 pre-processor, 393, 395partitioning, 393, 395, 396, 406, 415, predictive accuracy, 606

420 Principal Component Analysis, 406path-relinking operator, 21, 557, 559, progress measure, 13

565, 566 pruning, 400pattern space, 395, 396, 406, 407PCA, see. Principal Component QAP, see quadratic assignment

Analysis problemPCGA, see Pareto Converging quadratic assignment problem, 454

Genetic Algorithm quadratic programming, 633performance measures, 11 qualitative features, 427performance satisfaction, 157 quality-of-service, 271performance specification, 161 quasi-Newton method, 141

actuator saturation, 163 Quine-McCluskey method, 103disturbance rejection, 162 quota traveling salesman problem,minimal controller order, 164 251robust stability, 162stability, 161 radiotherapy, 376step response specifications, 162 randomized search, 395

permutation, 529, 531 rank-histogram, 395, 405, 411PESTRA, see Pareto Evolution real-world application, 393, 401, 419

Strategy Algorithm, see Pareto receiver operating characteristic, 369Evolution Strategy Algorithm rectangular packing problems, 583

760 Index

Reduced Pareto Set Genetic Simulated Binary Crossover, 637Algorithm with Elitism, 18 simulation, 495, 500

regional planning, 18 single objective harvest scheduling,regression, 21 708regularization, 656 space allocation, 22, 736relative sensors, 126 spacing, 13relaxed dominance, 733 spatial cross-talk, 396, 397replacement scheme, 489 SPEA, see Strength Paretorestricted mating, 733 Evolutionary Algorithm, 611, 640return function, 632 SPEA2, see Strength Paretorisk Evolutionary Algorithm 2, 286

function, 632 specificity, 369measure, 632 spectroscopic data analysis, 19

risk-adjusted performance measure, steady-state algorithm, 411645 steepest descent method, 141

RMSE, see Root Mean Squared Error stock market, 628RNA polymerase, 433 stream, 275ROC, see receiver operating Strength Pareto Evolutionary

characteristic Algorithm, 9Root Mean Squared Error, 639 Strength Pareto EvolutionaryRprop, 658 Algorithm 2, 9Rprop+, 659 structural criterion of complexity, 649RPSGAe, see Reduced Pareto Set struggle crowding, 488

Genetic Algorithm with Elitism, stud genetic algorithm, 10177, 178, 186, 187, 194, 196 subspace learning, 407

supersonic transport, 296SBX, see Simulated Binary Crossover supported efficient solutions, 558,scalability, 394, 420 562, 564scalar formulation, 65 swap, 534scalar objective function, 545 switch, 534scheduling, 273search space reduction, 207 tabu search, 22secondary population, 5 tardiness, 531seeded starting generation, 234 task, 275selective traveling salesman problem, teletherapy, 376

251 temporal cross-talk, 396, 397Self-Organizing Map, 19, 296 term structure of interest rates, 647SEMO, 286 testingsensitivity, 369 important issues, 15sequence-pair, 585, 586 TFH, 73shape design, 62 throughput, 271sharing, 403, 404, 411 time series forecasting, 21, 693sharing function, 321 time series prediction, 639significant Pareto dominance, 619 trade-off, 155, 158, 170similarity measure, 489, 490 trading strategy, 639simulated annealing, 22 training, 369

Index 761

transaction cost, 635, 636 weight decay, 400transportation planning, 227 weight decay regularization, 681transverse-flux heating, 62 weight vector, 546traveling salesman problem, 18, 177, weighted vectors, 731

187, 188, 248 winner-takes-all, 398traveling salesman problems with wrapper approach, 605

profit, 248treatment planning, 19, 372tree classifier, 397truss optimization, 18, 201TSP, see traveling salesman problemtwo set coverage, 11

UAVs, see Unmanned Aerial VehiclesUgly Duckling Theorem, 401ULTIC, see unified linear

time-invariant controlevolutionary CACSD paradigm,

159evolutionary design application,

165optimal design, 158system formulation, 160

underrating, 394, 399, 400unexpected loss, 643unified linear time-invariant control,

17, 156uniform crossover, 493uniform selection, 488Unmanned Aerial Vehicles, 451usage scenarios, 272

validation error, 395, 409, 418valuation models for financial

products, 647Vector Evaluated Genetic Algorithm,

7, 102VEGA, see Vector Evaluated Genetic

Algorithmvehicle routing, 248vehicle routing problem, 248venturi scrubber, 317, 319, 331-334venturi scrubbers

design, 19

weak dominance, 729

Documents

Applications Of Multi-Objective Evolutionary Algorithms (Advances in Natural Computation)